Aug 12, 2025

How PostgreSQL Stores and Retrieves Data

PostgreSQL Storage Architecture Overview

PostgreSQL stores your data in fixed-size 8KB pages on disk. Each tuple (row version) lives inside a page with metadata pointers. By understanding how data is laid out on disk pages, line pointers, CTIDs, and visibility, you can optimize performance, debug corruption, and even perform low-level recovery.

Table of Contents

1. Introduction to PostgreSQL Storage Architecture

2. What Is a Page in PostgreSQL?

3. Tuple Layout Inside a Page

4. Understanding CTID: The Tuple Address

5. Inspecting Pages with pageinspect

6. Use Cases for Page-Level Knowledge

7. Performance Implications

8. Tips for Page-Aware Optimization

9. Conclusion

Introduction to PostgreSQL Storage Architecture

Every PostgreSQL table is a file on disk, divided into fixed-size 8KB blocks called pages. These pages are the atomic unit of I/O and memory management. Reads and writes happen at the page level. All row versions (tuples) live in pages.
Understanding this layout is essential for optimizing storage, performance, and recovery.

What Is a Page in PostgreSQL?

A PostgreSQL heap page contains:
[ Page Header ] [ Line Pointers ] [ Tuples ] [ Free Space ]

Break Down

* Page Header: 24 bytes minimum, holds LSN, checksum, etc.
* Line Pointers: An array pointing to tuple positions inside the page.
* Tuples: The actual row data, including MVCC metadata.
* Free Space: Reclaimed space from updates/deletes.

What is LSN?

LSN stands for Log Sequence Number. It's a byte offset in the write-ahead log (WAL) and tracks changes to ensure durability and crash recovery. Each page stores the LSN of the last WAL record that modified it. This ensures consistency between the data file and WAL.

Tuple Layout Inside a Page

A tuple is more than just your columns— it contains metadata as well:

Field	Description
t_xmin	Transaction ID that created the row
t_xmax	Transaction ID that deleted or updated it
t_ctid	Address of the row (page, offset)
Flags	Visibility, HOT chain info
Column Data	Your actual fields

This layout powers MVCC and allows PostgreSQL to keep multiple versions of a row.

Understanding CTID: The Tuple Address

ctid = (page_number, tuple_index)SELECT ctid, * FROM orders WHERE id = 1001;-- Might return: (42, 5)

Why it Matters

* ctid uniquely identifies a row version, not the logical row.

* An UPDATE creates a new ctid; old rows still linger until vacuumed.

* You can use CTIDs for low-level forensics and recovery.

Inspecting Pages with pageinspect

To inspect tuples directly in a page:

CREATE EXTENSION IF NOT EXISTS pageinspect;SELECT *FROM heap_page_items(get_raw_page('orders', 42));

Output:

* lp: Line pointer index
* t_xmin, t_xmax: Transaction visibility
* t_ctid: Tuple’s CTID (same as (42, lp))

To inspect page metadata:

SELECT *FROM page_header(get_raw_page('orders', 42));

Gives LSN, tuple count, flags, and more.

Use Cases for Page-Level Knowledge

Forensics: Find who updated/deleted a record (t_xmin, t_xmax)
Recovery: Recover deleted rows using CTID and raw page reads
Debugging: Check if HOT updates are happening
Performance: Understand bloat and tuple churn
Auditing: Cross-check CTIDs across replicas or old backups

Performance Implications

Situation	Page-Level Insight
HOT updates fail	No room in page ? use fillfactor wisely
Index bloat	Old tuple versions still indexed
TOAST overhead	Very wide rows stored out-of-page
Sequential scan slowdown	Too many pages = bloat from dead tuples
MVCC latency	Long-lived tuples invisible but still present

Efficient tuple management = lean pages = faster queries.

Tips for Page-Aware Optimization

Tip	Impact
Lower fillfactor (e.g. 80%)	Leaves space for HOT updates
Use pgstattuple	Estimates dead/live space in a table
Tune autovacuum	Ensures dead tuples are cleaned
Avoid wide rows	Prevents TOAST overhead, fits more in-page
Use pg_repack	Rewrites bloated tables/indexes efficiently

Conclusion

PostgreSQL’s internal page format is a hidden treasure for anyone building performant or resilient systems. When you truly understand pages, tuples, and CTIDs:

* You can optimize update-heavy tables with HOT awareness.

* You can recover lost data in forensic scenarios.

* You can tune your database like an engineer, not a guesser.

Take time to inspect your pages—you might discover what your query planner and logs aren’t telling you.