Mar 12, 2026

How PostgreSQL Optimizes Storage with Page-Level Pruning

PostgreSQL manages table storage using heap pages. When rows are updated multiple times, PostgreSQL does not overwrite the existing tuple immediately. Instead, it creates new tuple versions and marks older ones as dead or obsolete. Over time, this leads to page fragmentation, and thereby it also induces slow performance issues in the database.

Many developers think that only VACUUM is responsible for cleanup. However, PostgreSQL also performs an internal lightweight cleanup called the page-level pruning mechanism, which happens automatically during normal query execution. Also, PostgreSQL has the inbuilt autovacuum demon that regularly checks the tables and cleans up the dead tuples, but the manual execution of vacuum full is not supported to use in production databases because of heavy locking.

Before understanding this, let’s look at the difference between VACUUM and VACUUM FULL in PostgreSQL.

VACUUM:

Removes dead tuples from tables and marks the space as reusable for future inserts or updates. The disk space is not returned to the operating system; it is only reused internally by PostgreSQL.

VACUUM FULL:

Rewrites the entire table by creating a new compact version, removing all dead tuples and fragmentation. This operation releases unused space back to the operating system but requires an exclusive lock on the table.

Now, let’s dive into the internal working of page level pruning

Test Setup

First, we create a table with specific storage parameters.

create table test_table (
    id int,
    module char(2000)
) with (
    fillfactor = 75,
);

Why these parameters?

fillfactor = 75

fillfactor = 75 means that only 75% of each data page is filled during insert operations, while the remaining 25% is intentionally left free.

This reserved free space allows future updates to store the modified tuple on the same page instead of moving it to a new page. As a result, it improves the chances of HOT (Heap-Only Tuple) updates and helps PostgreSQL perform efficient pruning.

This setting is especially useful for tables that are updated frequently. Because extra space already exists on the page, updates can reuse that space rather than creating new pages, which reduces page splits and improves performance.

By default, PostgreSQL tables use a fill factor of 100%, meaning pages are filled as much as possible during inserts. When a page becomes full, later updates may need to place new tuple versions on different pages, which can reduce HOT update opportunities.

Generating Multiple Tuple Versions

Insert and update the same row several times.

insert into test_table values(1,'Sales');

You can check the size of this page by this command

show block_size ;

Result :

block_size 
------------
 8192
(1 row)

Now update the same row three times and fill this page size

update test_table set module = 'Service' where id = 1;
update test_table set module = 'Crm' where id = 1;
update test_table set module = 'Purchase' where id = 1;

What happens internally?

PostgreSQL creates a new tuple version for each update:

The old tuple becomes dead or obsolete.
The new tuple points to the latest version using t_ctid.
A HOT chain may be formed if space allows.

Inspecting Heap Page Content

Enable the extension:

create extension pageinspect;

We created the extension named pageinspect for using the function get_raw_page().

Purpose :

Read the binary content of the database page from disk.

Inspect the page:

select lp, t_xmin, t_xmax, t_ctid,
       t_infomask2, cast(t_infomask2 as bit(16)),
       t_infomask,  cast(t_infomask as bit(16))
from heap_page_items(get_raw_page('test_table',0));

First Output

 lp | t_xmin | t_xmax | t_ctid | t_infomask2 |   t_infomask2    | t_infomask |    t_infomask    
----+--------+--------+--------+-------------+------------------+------------+------------------
  1 |  30885 |  30886 | (0,2)  |       16386 | 0100000000000010 |       1282 | 0000010100000010
  2 |  30886 |  30887 | (0,3)  |       49154 | 1100000000000010 |       9474 | 0010010100000010
  3 |  30887 |  30888 | (0,4)  |       49154 | 1100000000000010 |       8450 | 0010000100000010
  4 |  30888 |      0 | (0,4)  |       32770 | 1000000000000010 |      10242 | 0010100000000010

This shows a HOT update chain:

Tuple 1 - Tuple 2 - Tuple 3 - Tuple 4 (latest version)
Earlier tuples are now dead.

Purpose of each column :

lp - The line pointer number showing the slot position of the tuple inside the page.
t_xmin - The transaction ID that created (inserted) this tuple version.
t_xmax - The transaction ID that deleted or updated this tuple; 0 means the tuple is still alive.
t_ctid - A pointer to the current or next tuple version, used to form update/HOT chains.
t_infomask2 - Internal flag bits storing extra tuple metadata such as attribute count and HOT-related information.
t_infomask - Visibility and status flags indicating whether the tuple is committed, updated, locked, or dead.

Page-Level Pruning in Action

In order to work the page level pruning, we need to make a sequential scan on this table to trigger this function

select * from test_table;

Then check again

After some time, running the same inspection again:

lp | t_xmin | t_xmax | t_ctid | t_infomask2 |   t_infomask2    | t_infomask |    t_infomask    
----+--------+--------+--------+-------------+------------------+------------+------------------
  1 |        |        |        |             |                  |            | 
  2 |        |        |        |             |                  |            | 
  3 |        |        |        |             |                  |            | 
  4 |  30888 |      0 | (0,4)  |       32770 | 1000000000000010 |      10498 | 0010100100000010
(4 rows)

Older tuples disappeared.

No manual VACUUM was executed.

This means PostgreSQL performed page-level pruning automatically by the function named heap_page_prune_opt from postgres source code.

Where Does This Happen in Source Code?

The responsible function is:

src/backend/access/heap/pruneheap.c

Function:

void heap_page_prune_opt(Relation relation, Buffer buffer)

This page-level pruning mechanism is a lightweight process, and it has minimal locking compared to vacuum full, and it is safe for frequent execution.

Adding Logging Inside PostgreSQL Source Code

To observe when pruning happens, add a log inside the function:

void
heap_page_prune_opt(Relation relation, Buffer buffer)
{
    ereport(LOG,
        errmsg("heap_page_prune_opt: relation %s, block %u",
            RelationGetRelationName(relation),
            BufferGetBlockNumber(buffer)));
}

You can also check the path of the logfile by this command

pg_lsclusters

Result

postgres@cybrosys:/home/cybrosys$ pg_lsclusters
Ver Cluster Port Status Owner    Data directory              Log file
17  main    5432 online postgres /var/lib/postgresql/17/main /var/log/postgresql/postgresql-17-main.log

After rebuilding the PostgreSQL source code and running queries, you can see this log statement from the PostgreSQL log file.

LOG: heap_page_prune_opt: relation test_table, block 0

This confirms that PostgreSQL triggered pruning automatically during normal scan operations.

How Page-Level Pruning Works Internally

Page-level pruning performs the following:

Removes DEAD tuples that are no longer visible.
Repairs HOT chains.
Compacts line pointers.
Frees space for future updates.

Important characteristics:

Happens during SELECT, UPDATE, or INSERT.
Does not scan the entire table.
Operates on a single page.
Very low overhead compared to VACUUM.

Page-Level Pruning vs VACUUM vs VACUUM FULL

Page-Level Pruning

Triggers during normal scan on dead tuples containing tables
Lightweight
Per-page cleanup
Does not shrink table files

Use case:

Useful for High-update workloads like Odoo transactional tables.

VACUUM

Background or manual process
Marks dead tuples reusable
Updates visibility maps
Prevents transaction ID wraparound
Does not reduce file size

VACUUM scans many pages and performs global maintenance.

VACUUM FULL

Rewrites entire table
Removes all bloat
Requires exclusive lock
Shrinks disk size
Heavy operation

VACUUM FULL is more like rebuilding the table.

In postgresql, there is a live progress tracking for the vacuum process named pg_stat_progress_vacuum

select * from pg_stat_progress_vacuum ;

Why Fillfactor Matters for Pruning

In our example:

fillfactor = 75

This left free space inside each page.

Because space existed:

Updates stayed on the same page.
HOT updates were possible.
Page pruning could remove dead tuples immediately.

If fillfactor were 100:

Updates might move to another page.
HOT chains break.
More index updates occur.
Pruning effectiveness reduces.

For write-heavy applications such as ERP systems, lower fillfactor improves performance.

Why This Benefits High-Update Systems

Applications like Odoo frequently update rows:

Status changes
Workflow transitions
Computed fields

Without pruning:

Pages become fragmented.
Dead tuples accumulate.
Performance decreases.

With pruning:

Space reused quickly.
Less need for aggressive VACUUM.
Better cache locality.

PostgreSQL performs cleanup even without VACUUM, but that happens at the page level .heap_page_prune_opt function is a critical internal function in postgresql that is responsible for the page level pruning mechanism in PostgreSQL.

Logging inside source code is a powerful way to learn internal behavior. Fillfactor strongly influences HOT updates and pruning efficiency, and also setting fill factor for frequently updated tables is a good way to improve the performance. The Page pruning mechanism in PostgreSQL is complementary to VACUUM, not a replacement.

Understanding PostgreSQL internals requires observing real page behavior. By combining:

repeated updates
pageinspect inspection
source code logging

We can see how PostgreSQL intelligently maintains storage using page-level pruning.

This mechanism allows PostgreSQL to maintain performance without constantly running heavy maintenance tasks. When tuned properly with fillfactor and update patterns, pruning reduces bloat and keeps frequently updated tables efficient.

For developers working directly with PostgreSQL source code or optimizing databases for high-update workloads, exploring pruning behavior provides deep insight into how PostgreSQL manages data at the lowest level.