Jan 19, 2026

How PostgreSQL Works Internally When You Initialize a New Cluster

In PostgreSQL, a cluster is a single PostgreSQL server instance that manages multiple databases using one data directory (PGDATA).

Initializing a PostgreSQL cluster looks simple on the surface. You run one command:

initdb -D mydata

And PostgreSQL replies with a few friendly messages like:

creating directory ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok

But internally, PostgreSQL performs many carefully ordered steps involving filesystem setup, locale handling, a special bootstrap backend, and system catalog creation.

This blog explains exactly what happens internally when PostgreSQL initializes a new cluster, in a way that beginners can understand, while still being accurate to the PostgreSQL source code.

What Does “Initialize a Cluster” Mean in PostgreSQL?

In PostgreSQL, a cluster is:

A data directory
A set of system catalogs
A shared WAL (write ahead log) and control structure
A collection of databases (template0, template1, postgres, and user DBs)

Running initdb does not start a server.

It creates the physical and logical foundation required for PostgreSQL to run.

How initdb Works Internally

Internally, initdb works in two worlds:

Frontend program (initdb.c)
Backend bootstrap server (postgres --boot)

Think of it like this:

initdb (frontend)
   |
   |-- prepares filesystem, config, scripts
   |
   |-- starts postgres in bootstrap mode
   |        |
   |        |-- creates system catalogs
   |
   |-- runs post-bootstrap SQL
   |
   |-- creates template databases

The 8 Internal Stages of initializing a new cluster in postgresql

Even though initdb.c is over 3000 lines long, the entire process can be understood in 8 clear stages.

1. Locale and Environment Setup

Before PostgreSQL writes anything to disk, it must ensure that:

Locale names are valid
Encoding matches locale
ICU or libc collation is consistent

Relevant source code functions (from initdb.c):

save_global_locale()
restore_global_locale()
check_locale_name()
check_locale_encoding()
check_icu_locale_encoding()
setlocales()
set_info_version()

What happens here?

Reads environment variables like LC_ALL, LANG
Validates encoding compatibility (UTF-8, etc.)
Prepares collation rules
Stores version information for Information Schema

At this stage:

No directories exist
No files are created

2. Creating the Data Directory

Now PostgreSQL creates the cluster directory structure.

Key functions:

create_data_directory()
initialize_data_directory()
setup_data_file_paths()

What is created?

$PGDATA/
 +-- base/
 +-- global/
 +-- pg_xact/
 +-- pg_multixact/
 +-- pg_commit_ts/
 +-- pg_subtrans/
 +-- pg_tblspc/

And the most critical file:

global/pg_control

pg_control stores:

System identifier
WAL state
Next transaction ID

Checkpoint information

PostgreSQL cannot start without this file.

3. WAL Directory Initialization

PostgreSQL now prepares the Write-Ahead Log (WAL).

Source code function:

create_xlog_or_symlink()

Result:

pg_wal/

(or a symbolic link, if configured)

This stage ensures crash safety before any catalogs exist.

4. Configuration File Creation

Now PostgreSQL creates minimal configuration files so the backend can start.

Source code functions:

choose_dsm_implementation()
setup_config()

Files created:

postgresql.conf
pg_hba.conf
pg_ident.conf

This matches the output:

creating configuration files ... ok

At this point:

PostgreSQL still has no system catalogs
But it is ready to start a backend process

5. Bootstrap Backend – System Catalog Creation

This is the most important stage.

PostgreSQL has a problem:

System catalogs are tables, but tables cannot exist until catalogs exist.

The solution is bootstrap mode.

Frontend function (in initdb.c):

bootstrap_template1()

What this function does:

Prepares a BKI script (postgres.bki)

Starts the backend like this:

postgres --boot
Sends catalog definitions to the backend

Backend execution (server source code)

File:

src/backend/bootstrap/bootstrap.c

Main function:

BootstrapModeMain()

This backend:

Runs as a single process
No WAL
No shared buffers

No SQL executor
No background workers

What catalogs are created here?

Examples:

pg_class
pg_attribute
pg_type
pg_proc
pg_namespace
pg_database

pg_authid

These definitions come from:

src/include/catalog/*.h

Example:

CATALOG(pg_class, 1259, RelationRelationId)

Physical storage is created using:

heap_create_with_catalog()

At the end of this stage:

PostgreSQL finally has working system catalogs
A real database (template1) now exists

6. Post-Bootstrap Catalog Population

Now PostgreSQL can execute normal SQL.

The frontend generates SQL and sends it to the backend.

Source code functions:

setup_depend()
setup_description()
setup_collation()
setup_privileges()
setup_schema()
setup_run_file()

What is loaded?

pg_depend
pg_description

pg_collation
Initial privileges
Information Schema
System views

System functions

From files like:

information_schema.sql
system_views.sql
system_functions.sql

7. Creating Default Databases

PostgreSQL now creates the standard databases.

Source code functions:

make_template0()
make_postgres()

Actual order:

template1 – created during bootstrap
template0 – copied from template1 and frozen
postgres – copied from template1

Database creation is done by filesystem copying, not SQL row-by-row inserts.

This is why:

template0 is never modified
New databases are created instantly

8. Final Sync and Completion

The final step ensures durability.

Source code function:

sync_data_directory()

This:

Flushes all data to disk
Makes the cluster crash-safe

Output:

syncing data to disk ... ok

At this point, the PostgreSQL cluster is ready.

Final Execution Flow (Simplified)

initdb
 +-- locale checks
 +-- data directory creation
 +-- WAL setup
 +-- config file creation
 +-- bootstrap backend
 ¦    +-- system catalogs created
 +-- post-bootstrap SQL
 +-- template database creation
 +-- disk sync

Why Understanding initdb Matters

Understanding how initdb works helps you:

Modify PostgreSQL system catalogs safely
Debug cluster initialization failures
Understand why catalog OIDs are fixed
Work confidently with PostgreSQL source code
Build strong fundamentals in database internals

Learning how PostgreSQL initializes a cluster is a solid starting point for developers who want to explore the PostgreSQL source code. The core of PostgreSQL lies in the data directory, where understanding system catalog creation and WAL directory initialization is essential for a database administrator.

A clear idea of the cluster initialization process helps in understanding how PostgreSQL manages metadata, storage, and recovery from the very beginning. This knowledge also makes it easier to debug low-level issues and confidently work with PostgreSQL internals.

How PostgreSQL Works Internally When You Initialize a New Cluster

What Does “Initialize a Cluster” Mean in PostgreSQL?

How initdb Works Internally

The 8 Internal Stages of initializing a new cluster in postgresql

1. Locale and Environment Setup

What happens here?

2. Creating the Data Directory

3. WAL Directory Initialization

4. Configuration File Creation

5. Bootstrap Backend – System Catalog Creation

What catalogs are created here?

6. Post-Bootstrap Catalog Population

7. Creating Default Databases

8. Final Sync and Completion

Final Execution Flow (Simplified)

Why Understanding initdb Matters

Category

Related Post