When you install PostgreSQL and connect to it, you may notice some default databases like postgres, template1, and template0. But have you ever wondered what actually happens internally when you create a new database, and how PostgreSQL stores metadata about these databases?
In this blog, we will explore:
* What happens during PostgreSQL installation
* The role of template databases
* The internal mechanics of CREATE DATABASE
* The detailed structure and purpose of the system catalog tables pg_database and pg_stat_database
* Useful SQL queries for inspecting database metadata
What Happens When PostgreSQL is Installed?
When PostgreSQL is initialized using the initdb process, it creates a database cluster, which is essentially a collection of databases managed by a single PostgreSQL server instance. This initialization process:
* Creates the default directories and configuration files required for operation.
* Initializes system catalogs, which store essential metadata.
* Creates three default databases automatically:
1. Postgres: a general-purpose default database for users.
2. template1: The default template database used as a source when creating new databases.
3. template0: a clean, unmodified template database reserved for special use cases.
Template Databases: template1 and template0 — A Closer Look
Template databases in PostgreSQL play a fundamental role in how new databases are created and customized. When you create a new database, PostgreSQL essentially clones an existing template database, copying its data files to set up the new database quickly and efficiently. Here's a more detailed breakdown:
template1 — The Default and Customizable Template
* Default Template for New Databases:
By default, whenever you run CREATE DATABASE, PostgreSQL uses template1 as the source template. This means the new database starts as an exact physical copy of template1.
* Customizable:
You can modify template1 to include any objects, configurations, or extensions you want all new databases to inherit. For example, if you want every new database to have certain extensions installed (like uuid-ossp or pgcrypto), or if you want standard tables, functions, or data loaded by default, you would install or create them in template1.
* Flexible:
This makes it very convenient to standardize database environments without repeating setup tasks each time.
* Potential Caution:
Since template1 is a live database, changes to it affect all future databases created using it. If template1 becomes corrupt or misconfigured, it can impact the creation of new databases.
template0 — The Pristine, Untouched Template
* Immutable Backup:
template0 is a read-only database meant to remain untouched. It serves as a clean, pristine copy of the original database cluster right after initialization.
* Use Cases:
It is typically used when you want to create a new database with specific locale or encoding settings that differ from template 1 without risking inherited customizations or modifications. For instance, if template1 has certain extensions installed or locale settings applied, but you want a new database without those, you can explicitly specify template0 as the template during creation.
Example Command:
You can create a new database using template0 like this:
CREATE DATABASE mydb TEMPLATE template0 ENCODING 'UTF8' LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8';
Internal Steps in CREATE DATABASE
When you execute the command:
CREATE DATABASE mydb;
PostgreSQL does the following internally:
* It locks the system catalogs to ensure the operation happens without conflicts.
* It copies the physical files from the template1 database directory (or any specified template) to create the new database’s files.
* It updates system catalog tables to register the new database within the cluster.
* It flushes all these changes to disk to ensure durability and consistency.
This process is efficient and reliable because it operates at the file system level rather than executing a series of SQL commands.
System Catalogs: pg_database and pg_stat_database
PostgreSQL stores metadata and statistics about databases in system catalogs, which are special tables stored under the pg_catalog schema.
pg_database
This catalog table holds essential metadata about each database in the cluster. Below are its columns and their purposes, explained line by line:
* oid: This is the internal object identifier, a unique ID assigned to each database.
* datname: The name of the database.
* datdba: The owner of the database, represented by the Object Identifier (OID) of the role that owns it.
* encoding: The character encoding used by the database, stored as an integer code corresponding to a specific encoding.
* datlocprovider: The locale provider used by the database, such as libc or ICU, which controls collation and character classification.
* datistemplate: A boolean value indicating whether the database can be used as a template for creating new databases.
* datallowconn: A boolean that indicates if connections to the database are currently allowed.
* dathasloginevt: Indicates whether login events are logged for this database.
* datconnlimit: The maximum number of concurrent connections allowed to this database. A value of -1 means no limit.
* datfrozenxid: The transaction ID (XID) that marks the oldest still-visible transaction for preventing transaction ID wraparound issues.
* datminmxid: The minimum multixact ID still required by this database.
* dattablespace: The Object Identifier of the default tablespace where the database objects are stored.
* datcollate: The LC_COLLATE setting that controls the string sort order for the database.
* datctype: The LC_CTYPE setting controls character classification and case conversion.
* datlocale: A deprecated column that previously held locale information.
* daticurules: A deprecated column related to older rule systems.
* datcollversion: The collation version, especially relevant for ICU collations.
* datacl: The access control list (ACL), which stores permissions for the database.
pg_stat_database
This is a system view that provides runtime statistics for each database, helping database administrators monitor usage and performance. Below are its columns and their purposes, line by line:
* datid: The Object Identifier (OID) of the database.
* datname: The name of the database.
* numbackends: The number of active sessions currently connected to the database.
* xact_commit: The total number of transactions committed in the database.
* xact_rollback: The total number of transactions rolled back.
* blks_read: The number of disk blocks read from the file system (i.e., physical reads).
* blks_hit: The number of disk blocks found in the shared buffer cache (i.e., cache hits).
* tup_returned: The number of rows returned by queries in this database.
* tup_fetched: The number of rows fetched by queries.
* tup_inserted: The total number of rows inserted into tables.
* tup_updated: The total number of rows updated.
* tup_deleted: The total number of rows deleted.
* conflicts: The number of queries canceled due to conflicts with recovery or vacuuming operations.
* temp_files: The number of temporary files created by queries in this database.
* temp_bytes: The total size in bytes of temporary files created.
* deadlocks: The number of deadlocks detected in this database.
* checksum_failures: The number of data checksum failures detected.
* checksum_last_failure: The timestamp of the last checksum failure.
* blk_read_time: The total time spent reading blocks from disk, measured in milliseconds.
* blk_write_time: The total time spent writing blocks to disk, in milliseconds.
* session_time: The total time all sessions have been active in this database, in milliseconds.
* active_time: The total time sessions have been actively executing queries, in milliseconds.
* idle_in_transaction_time: The total time sessions have been idle inside transactions, in milliseconds.
* sessions: The total number of sessions started in this database.
* sessions_abandoned: The number of sessions abandoned by clients without properly disconnecting.
* sessions_fatal: The number of sessions terminated due to fatal errors.
* sessions_killed: The number of sessions terminated by an administrator.
* stats_reset: The timestamp when statistics for this database were last reset.
Useful Queries for Database Metadata and Stats
Here are some handy SQL queries to inspect database metadata and runtime statistics:
-- List all databases with their owners, encodings, and connection limits
SELECT
datname,
pg_catalog.pg_get_userbyid(datdba) AS owner,
encoding,
datcollate,
datctype,
datconnlimit,
datallowconn
FROM pg_database
ORDER BY datname;
-- Show runtime statistics for each database
SELECT
datname,
numbackends,
xact_commit,
xact_rollback,
blks_read,
blks_hit,
tup_returned,
tup_fetched,
tup_inserted,
tup_updated,
tup_deleted,
deadlocks
FROM pg_stat_database
ORDER BY datname;
-- Check active sessions and currently running queries
SELECT pid, usename, datname, state, query, backend_start
FROM pg_stat_activity
WHERE datname IS NOT NULL;
Conclusion
PostgreSQL’s database creation and management process is both elegant and efficient. Using template databases, such as template1 and template0, ensures the rapid, consistent, and customizable creation of new databases. Understanding the differences and purposes of these templates helps you better control how your databases are initialized and configured.
Metadata about these databases is carefully tracked in system catalogs like pg_database, while real-time statistics are accessible through views like pg_stat_database.
By learning to query these catalogs and leveraging PostgreSQL’s powerful extensions, you can effectively monitor, optimize, and maintain your PostgreSQL environment with confidence. Uncover the powerful built-in tool that allows you to inspect your PostgreSQL database. This guide on "How to Use PostgreSQL’s Information Schema to Explore Your Database Structure in Depth" will show you how to query system views to gain a comprehensive understanding of your tables, columns, constraints, and more.