How PostgreSQL Stores and Retrieves Data

PostgreSQL Storage Architecture Overview

PostgreSQL stores your data in fixed-size 8KB pages on disk. Each tuple (row version) lives inside a page with metadata pointers. By understanding how data is laid out on disk pages, line pointers, CTIDs, and visibility, you can optimize performance, debug corruption, and even perform low-level recovery.

Table of Contents

1. Introduction to PostgreSQL Storage Architecture

2. What Is a Page in PostgreSQL?

3. Tuple Layout Inside a Page

4. Understanding CTID: The Tuple Address

5. Inspecting Pages with pageinspect

6. Use Cases for Page-Level Knowledge

7. Performance Implications

8. Tips for Page-Aware Optimization

9. Conclusion

Introduction to PostgreSQL Storage Architecture

Every PostgreSQL table is a file on disk, divided into fixed-size 8KB blocks called pages. These pages are the atomic unit of I/O and memory management. Reads and writes happen at the page level. All row versions (tuples) live in pages.
Understanding this layout is essential for optimizing storage, performance, and recovery.

What Is a Page in PostgreSQL?

A PostgreSQL heap page contains:
[ Page Header ] [ Line Pointers ] [ Tuples ] [ Free Space ]

Break Down

  • Page Header: 24 bytes minimum, holds LSN, checksum, etc.
  • Line Pointers: An array pointing to tuple positions inside the page.
  • Tuples: The actual row data, including MVCC metadata.
  • Free Space: Reclaimed space from updates/deletes.

What is LSN?

LSN stands for Log Sequence Number. It's a byte offset in the write-ahead log (WAL) and tracks changes to ensure durability and crash recovery. Each page stores the LSN of the last WAL record that modified it. This ensures consistency between the data file and WAL.

Tuple Layout Inside a Page

A tuple is more than just your columns— it contains metadata as well:

Field Description
t_xminTransaction ID that created the row
t_xmaxTransaction ID that deleted or updated it
t_ctidAddress of the row (page, offset)
FlagsVisibility, HOT chain info
Column DataYour actual fields

This layout powers MVCC and allows PostgreSQL to keep multiple versions of a row.

Visual: PostgreSQL Page Layout
+----------------------------------------------------+
| Page Header |
+-----------------+----------------------------------+
| Line Pointer 1 | --> Tuple 1 |
| Line Pointer 2 | --> Tuple 2 |
| Line Pointer 3 | --> Tuple 3 |
+-----------------+----------------------------------+
| Tuple Data |
| [xmin, xmax, ctid, columns...] |
+----------------------------------------------------+
| Free Space Area |
+----------------------------------------------------+

Understanding CTID: The Tuple Address

ctid = (page_number, tuple_index)SELECT ctid, * FROM orders WHERE id = 1001;-- Might return: (42, 5)  

  • Why it Matters

  • ctid uniquely identifies a row version, not the logical row.

  • An UPDATE creates a new ctid; old rows still linger until vacuumed.

  • You can use CTIDs for low-level forensics and recovery.

Inspecting Pages with pageinspect

To inspect tuples directly in a page:

CREATE EXTENSION IF NOT EXISTS pageinspect;SELECT *FROM heap_page_items(get_raw_page('orders', 42));    

Output:

  • * lp: Line pointer index
  • t_xmin, t_xmax: Transaction visibility
  • t_ctid: Tuple’s CTID (same as (42, lp))
To inspect page metadata:
SELECT *FROM page_header(get_raw_page('orders', 42));    

Gives LSN, tuple count, flags, and more.

Use Cases for Page-Level Knowledge

  • Forensics: Find who updated/deleted a record (t_xmin, t_xmax)
  • Recovery: Recover deleted rows using CTID and raw page reads
  • Debugging: Check if HOT updates are happening
  • Performance: Understand bloat and tuple churn
  • Auditing: Cross-check CTIDs across replicas or old backups

Performance Implications

Situation Page-Level Insight
HOT updates fail No room in page ? use fillfactor wisely
Index bloat Old tuple versions still indexed
TOAST overhead Very wide rows stored out-of-page
Sequential scan slowdown Too many pages = bloat from dead tuples
MVCC latency Long-lived tuples invisible but still present

Efficient tuple management = lean pages = faster queries.

Tips for Page-Aware Optimization

Tip Impact
Lower fillfactor (e.g. 80%) Leaves space for HOT updates
Use pgstattuple Estimates dead/live space in a table
Tune autovacuum Ensures dead tuples are cleaned
Avoid wide rows Prevents TOAST overhead, fits more in-page
Use pg_repack Rewrites bloated tables/indexes efficiently

Conclusion

PostgreSQL’s internal page format is a hidden treasure for anyone building performant or resilient systems. When you truly understand pages, tuples, and CTIDs:

* You can optimize update-heavy tables with HOT awareness.

* You can recover lost data in forensic scenarios.

* You can tune your database like an engineer, not a guesser.

Take time to inspect your pages—you might discover what your query planner and logs aren’t telling you. 

whatsapp_icon
location

Calicut

Cybrosys Technologies Pvt. Ltd.
Neospace, Kinfra Techno Park
Kakkancherry, Calicut
Kerala, India - 673635

location

Kochi

Cybrosys Technologies Pvt. Ltd.
1st Floor, Thapasya Building,
Infopark, Kakkanad,
Kochi, India - 682030.

location

Bangalore

Cybrosys Techno Solutions
The Estate, 8th Floor,
Dickenson Road,
Bangalore, India - 560042

Send Us A Message