How to Understand PostgreSQL's Data Directory

When you initialize a PostgreSQL cluster using the initdb command, PostgreSQL creates a directory filled with folders and configuration files. These may look mysterious at first, but each of them plays a specific and important role in PostgreSQL’s internal operations.

In this blog, we’ll walk through each folder and file inside the PostgreSQL data directory and explain their purpose in a simple and easy-to-understand way — perfect for beginners who want to get closer to PostgreSQL internals.

Where Do These Folders Come From?

When you run:

initdb -D /your/path/to/data/directory

PostgreSQL sets up a new cluster and creates the required folders and files under the specified directory.

Here’s an example output of a typical PostgreSQL cluster directory:

base    global   pg_commit_ts  pg_dynshmem  pg_logical   pg_multixact  pg_notify  pg_serial  pg_snapshots pg_stat pg_stat_tmp  pg_subtrans  pg_tblspc  pg_twophase  pg_wal  pg_xact  log  postgresql.conf 

Let’s explore what each of these means.

Folder-by-Folder Breakdown

1. base/

* Purpose: Stores the actual data of user-created databases.

* Details: Inside this folder, each subdirectory is named after a database's OID (object identifier). These subdirectories contain the data files that store table and index data for each database. Each file within the subdirectories represents a relation (table or index).

* Use Case: When a query runs, PostgreSQL reads from these files to retrieve the actual table or index contents.

SELECT oid, datname FROM pg_database;

Compare this oid to the subfolder's name inside the base directory

2. global/

* Purpose: Stores data that is global across all databases.

* Details: This includes critical system catalogs that are shared across all databases, such as pg_authid (user roles), pg_database (list of databases), and other system-wide settings.

* Use Case: Modifications to user roles or cluster-wide settings reflect here.

Global data includes:

* Users and their privileges

* Roles and memberships

* Cluster-wide settings

* List of all databases

* Shared object descriptions

They are "global" because they affect every database in the PostgreSQL instance.

Some queries to show global data among all databases

# List all databases (from pg_database)
SELECT * FROM pg_database;
List all roles (from pg_roles ? view of pg_authid)
SELECT * FROM pg_roles;
# See descriptions on global objects (from pg_shdescription)

SELECT * FROM pg_shdescription;

3. pg_wal/ (Previously pg_xlog)

* Purpose: Stores Write-Ahead Log (WAL) files.

* Details: WAL is a core feature of PostgreSQL, ensuring durability. Every change made to the data files is first logged in WAL. These logs are essential for crash recovery and replication.

* Use Case: WAL files help PostgreSQL restore consistency in case of a crash and are used during streaming replication.

#To get the current WAL write location (LSN - Log Sequence Number):

SELECT pg_current_wal_lsn();

4. pg_xact/ (Previously pg_clog)

Purpose: Tracks transaction status information.

* Details: Contains status (committed, aborted, etc.) for each transaction. PostgreSQL checks these statuses during query execution and recovery.

* Use Case: Crucial for MVCC (Multi-Version Concurrency Control) to determine tuple visibility.

Some sql queries based on this 

# Current Transaction ID
SELECT txid_current();
# Snapshot Information
SELECT txid_current_snapshot();
# It returns something like:
# 458:458: ? meaning current XID and visible transactions.

5. pg_commit_ts/

* Purpose: Stores commit timestamps for transactions.

* Details: This optional feature can be enabled to track exactly when each transaction is committed.

* Use Case: Helps in auditing, logical replication, and temporal queries.

Committed/Aborted Transaction Status

# You can check whether a transaction ID is still active or known:
SELECT txid_status(12345);  -- Replace with a real txid
# It will return: 'in progress', 'committed', 'aborted', 'unknown'

6. pg_multixact/

* Purpose: Handles multi-transaction locks.

* Details: Enables more than one transaction to share a lock on a single tuple.

* Use Case: Required when multiple transactions hold a share lock, e.g., in concurrent SELECT FOR SHARE scenarios.

-- View all current row-level locks (including shared ones)
SELECT * FROM pg_locks WHERE mode LIKE '%Share%';

7. pg_subtrans/

* Purpose: Tracks subtransaction information.

* Details: PostgreSQL supports nested transactions using savepoints. This directory helps in tracking parent-child transaction relationships.

* Use Case: Important for correct rollback and commit behavior in complex transaction logic.

PostgreSQL does not expose subtransactions directly in system views, but you can observe savepoint behavior in error handling or logs.

Example to simulate nested transactions:
BEGIN;
SAVEPOINT sp1;
-- some operations
ROLLBACK TO SAVEPOINT sp1;
COMMIT;

8. pg_logical/

* Purpose: Supports logical replication infrastructure.

* Details: Stores information needed to replicate data logically (row-by-row changes instead of binary WAL).

* Use Case: Enables use of replication slots and tracking changes for logical decoding.

-- List logical replication slots
SELECT * FROM pg_replication_slots;
-- Show current WAL sender processes (used in replication)
SELECT * FROM pg_stat_replication;

9. pg_stat/

* Purpose: Holds statistical information about the databases.

* Details: Used by the query planner to optimize queries by analyzing table sizes, index usage, etc.

* Use Case: Regular ANALYZE operations update these stats to improve performance.

-- General table statistics
SELECT * FROM pg_stat_user_tables;
-- Index usage stats
SELECT * FROM pg_stat_user_indexes;
-- Database-wide stats
SELECT * FROM pg_stat_database;

10. pg_stat_tmp/

* Purpose: Temporary storage for runtime statistics.

* Details: During server operation, PostgreSQL writes temporary stats here and flushes them to pg_stat/ at shutdown.

* Use Case: Useful for runtime monitoring without permanent changes.

-- View stats collected during runtime
SELECT * FROM pg_stat_activity;

11. pg_notify/

* Purpose: Handles the LISTEN/NOTIFY mechanism.

* Details: Enables one backend process to send messages to others through the NOTIFY command.

* Use Case: Widely used in real-time apps or pub-sub patterns.

-- Listening to a channel
LISTEN my_channel;
-- From another session
NOTIFY my_channel, 'Hello from another backend!';
 You can also observe active listeners:
SELECT * FROM pg_listening_channels();

12. pg_snapshots/

* Purpose: Stores exported snapshots.

* Details: Used in long-running read-only transactions to provide a consistent view of data.

* Use Case: Common in logical decoding and parallel query setups.

-- Export a snapshot
SELECT pg_export_snapshot();
-- Output: a snapshot ID usable by other transactions

13. pg_serial/

* Purpose: Tracks dependencies for serializable transactions.

* Details: Ensures serializability by keeping track of read/write dependencies.

* Use Case: Helps maintain consistency under the SERIALIZABLE isolation level.

-- Current transaction isolation level
SHOW transaction_isolation;

14. pg_tblspc/

* Purpose: Manages tablespaces (external storage locations).

* Details: Contains symbolic links to directories on different file systems.

* Use Case: Used to store large tables on separate disks for performance or storage scaling.

-- List tablespaces
SELECT * FROM pg_tablespace;

15. pg_twophase/

* Purpose: Stores prepared transactions for two-phase commits.

* Details: Part of distributed transaction management.

* Use Case: Required for systems needing atomic commits across multiple databases.

BEGIN;
-- some operations
PREPARE TRANSACTION 'txn_id';
-- Later
COMMIT PREPARED 'txn_id';
-- or ROLLBACK PREPARED 'txn_id';
Check prepared transactions:
SELECT * FROM pg_prepared_xacts;

16. pg_replslot/

* Purpose: Stores replication slot information.

* Details: Ensures WAL files are not recycled until confirmed as received by replicas.

* Use Case: Prevents data loss in replication scenarios.

SELECT * FROM pg_replication_slots;

17. pg_dynshmem/

* Purpose: Manages dynamic shared memory segments.

* Details: Used for backend processes like parallel workers.

* Use Case: Essential for modern parallel query execution.

SELECT * FROM pg_stat_activity WHERE backend_type LIKE '%worker%';

Important Configuration and Control Files

1. postgresql.conf

* Purpose: Main server configuration file.

* Details: Controls memory usage, connection limits, logging, query planning behavior, and more.

* Use Case: Tuning database performance and behavior.

2. pg_hba.conf

* Purpose: Manages host-based authentication.

* Details: Defines which users can connect from which IPs using which authentication methods.

* Use Case: Secures database access from the network.

3. pg_ident.conf

* Purpose: Maps system users to PostgreSQL users.

* Details: Only relevant when IDENT authentication is in use.

* Use Case: Helps integrate PostgreSQL with OS-level user management.

4. postmaster.pid

* Purpose: Indicates the running state of the server.

* Details: Contains the PID of the postmaster process, startup time, port, etc.

* Use Case: Prevents multiple PostgreSQL instances from running on the same directory.

5. PG_VERSION

* Purpose: Records the version used during initialization.

* Details: Single-line file showing the PostgreSQL version (e.g., 16).

* Use Case: Ensures version compatibility during upgrades or server startup.

Conclusion

Understanding the file structure inside the PostgreSQL data directory unlocks a deeper level of database administration and performance tuning. What may appear as just folders and files are actually the building blocks of PostgreSQL's robust architecture.

Whether you're a student exploring database systems, a developer aiming to debug low-level issues, or a DBA optimizing replication and performance, this knowledge is essential. By demystifying these directories, you gain confidence in managing PostgreSQL clusters and can better appreciate the engineering that powers your data. Learn how to use advanced PostgreSQL security features to protect your database from unauthorized access and data breaches. This article covers essential techniques like robust authentication methods, fine-grained access control with roles and row-level security, data encryption, and secure connection configurations, helping you build a more resilient PostgreSQL environment.

whatsapp_icon
location

Calicut

Cybrosys Technologies Pvt. Ltd.
Neospace, Kinfra Techno Park
Kakkancherry, Calicut
Kerala, India - 673635

location

Kochi

Cybrosys Technologies Pvt. Ltd.
1st Floor, Thapasya Building,
Infopark, Kakkanad,
Kochi, India - 682030.

location

Bangalore

Cybrosys Techno Solutions
The Estate, 8th Floor,
Dickenson Road,
Bangalore, India - 560042

Send Us A Message