Introduction
When you run a DELETE query in PostgreSQL, you might expect the row to be instantly removed from the database. But PostgreSQL follows a more sophisticated approach. It uses a mechanism called Multi-Version Concurrency Control (MVCC) to handle concurrent access and ensure data consistency.
Instead of immediately removing the row, PostgreSQL marks the row as deleted and cleans it up later through a process called VACUUM.
In this blog, you’ll learn exactly what happens when a tuple (row) is deleted in PostgreSQL, how dead tuples affect your database, and how to monitor and clean them effectively using built-in tools and extensions like pgstattuple.
What Happens When You Delete a Tuple?
Suppose you run this query:
DELETE FROM users WHERE id = 10;
Behind the Scenes:
* PostgreSQL locates the tuple (row) with id = 10.
* Instead of deleting it immediately, PostgreSQL sets a marker on the tuple to indicate it is deleted by setting the xmax (transaction ID of the deleting transaction).
* The tuple remains on disk and is invisible to future transactions.
* This ensures other concurrent transactions that started earlier still see a consistent view of the data.
The Role of MVCC
PostgreSQL uses MVCC to allow multiple transactions to occur simultaneously without interfering with each other.
* Each tuple contains metadata columns such as:
xmin: The ID of the transaction that inserted the tuple.
xmax: The ID of the transaction that deleted the tuple (if any).
* When a tuple is deleted, xmax is set.
* Other transactions will verify these IDs to determine whether the tuple should be visible or hidden.
Internal Process: Marking a Tuple as Deleted
The actual deletion is handled internally by functions like:
heap_delete()
* This marks the tuple’s xmax with the current transaction ID.
* The change is written to the Write-Ahead Log (WAL) for durability.
* The tuple is flagged as "dead" but physically remains on disk.
Why Isn’t the Tuple Removed Immediately?
Since other transactions may still need to see the tuple (due to MVCC snapshots), immediate physical removal would cause inconsistencies.
PostgreSQL delays physical removal to maintain:
* Data consistency
* Transactional isolation
Cleaning Up Dead Tuples: VACUUM and VACUUM FULL
Dead tuples occupy disk space and can degrade performance if left unchecked.
VACUUM
* Reclaims space from dead tuples.
* Frees up space for re-use inside the database.
* Does not return space to the operating system.
* Runs automatically in the background as autovacuum.
VACUUM my_table;
VACUUM FULL
* Performs a more aggressive cleanup.
* Physically rewrites the entire table to compact it.
* Returns unused space to the OS.
* Requires an exclusive lock (slower).
VACUUM FULL my_table;
How to Check the Dead Tuples Count?
You can monitor the number of dead tuples in your tables using the system view:
SELECT relname, n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;
This helps identify tables with high bloat due to dead tuples.
Using the pgstattuple Extension for Detailed Tuple Statistics
PostgreSQL offers an extension called pgstattuple that provides more detailed tuple-level statistics.
How to Enable pgstattuple?
Run:
CREATE EXTENSION IF NOT EXISTS pgstattuple;
Using pgstattuple:
Get detailed stats for a table:
SELECT * FROM pgstattuple('my_table');
This will return several useful columns:
* table_len – The total size of the table on disk, in bytes.
This includes all live tuples, dead tuples, and free space.
* tuple_count – The number of live (active) rows currently present in the table.
These are the rows visible to current transactions.
* tuple_len – The total space occupied by live tuples, in bytes.
This shows how much of the table is actively used by live data.
* tuple_percent – The percentage of the table taken up by live tuples.
Indicates how efficiently the table is being used.
* dead_tuple_count – The number of dead (deleted or outdated) rows.
These rows are invisible to transactions and await cleanup by VACUUM.
* dead_tuple_len – The total space used by dead tuples, in bytes.
This is wasted space that hasn't been reclaimed yet.
* dead_tuple_percent – The percentage of table space occupied by dead tuples.
High values indicate table bloat and poor space utilization.
* free_space – The total amount of free space inside the table, in bytes.
This is leftover space not currently occupied by any tuple.
* free_percent – The percentage of the table occupied by free space.
Can be reused for new rows without increasing the table size.
This is very useful to understand the real bloat situation in your table.
What Information Does PostgreSQL Store in a Tuple?
Each tuple stores hidden metadata columns:
* xmin – The transaction ID that inserted the tuple into the table.
* xmax – The transaction ID that deleted the tuple, if it has been marked for deletion.
* ctid – The physical location of the tuple on disk, represented as (block, offset).
Other system flags – Indicates the tuple's current status, such as whether it's frozen, locked, or updated.
When deleting a tuple:
xmax is set to mark it as deleted.
The tuple’s visibility changes for transactions started after the deletion.
The physical tuple remains until vacuumed.
Conclusion
Understanding how tuple deletion works in PostgreSQL reveals the power and complexity behind its MVCC (Multi-Version Concurrency Control) model. While this model ensures high concurrency and data integrity, it also introduces the need for regular maintenance. Deleted tuples don’t vanish immediately—they linger on disk until VACUUM or VACUUM FULL reclaims the space.
This is why tools like pgstattuple and processes like autovacuum play such a vital role in keeping your database healthy and performant. By learning how to monitor dead tuples, manage bloat, and optimize cleanup operations, you can ensure that your PostgreSQL database remains fast, efficient, and reliable over time. Learn advanced PostgreSQL techniques with our guide on "How to Master PostgreSQL Table Inheritance for Better Data Organization." We explore how this feature allows for logical data partitioning and simplifies complex database structures.