Managing high availability and disaster recovery in PostgreSQL requires understanding the built-in functions that monitor and control the recovery process. Whether you're managing streaming replication or performing point-in-time recovery (PITR), these functions provide crucial insights into your database's replication health and recovery status.
This guide covers the essential recovery monitoring and control functions available in PostgreSQL 10 and later versions.
Recovery Status and Monitoring Functions
Checking Recovery Mode Status
pg_is_in_recovery()
This fundamental function determines whether your PostgreSQL instance is currently operating in recovery mode.
SELECT pg_is_in_recovery();
Returns:
- true - Instance is a standby server in recovery mode
- false - Instance is a primary server or has completed recovery
Practical Use: Essential for automated scripts and monitoring systems to identify server roles in a replication setup.
Monitoring WAL Reception
pg_last_wal_receive_lsn()
Tracks the most recent WAL (Write-Ahead Log) location that has been received and written to disk on a standby server.
SELECT pg_last_wal_receive_lsn();
Returns: LSN (Log Sequence Number) of the last received WAL record, or NULL if no WAL has been received.
Use Case: Monitor network-level replication lag by comparing this value with the primary server's current WAL position.
Tracking WAL Replay Progress
pg_last_wal_replay_lsn()
Shows the LSN of the last WAL record that has been successfully applied during recovery.
SELECT pg_last_wal_replay_lsn();
Returns: LSN of the last replayed WAL record.
Critical for: Determining how current your standby data is and assessing readiness for failover scenarios.
Measuring Replication Delay
pg_last_xact_replay_timestamp()
Provides the timestamp of the most recently replayed transaction, offering a time-based view of replication lag.
SELECT pg_last_xact_replay_timestamp();
Returns: Timestamp of the last replayed transaction, or NULL if no transactions have been replayed.
Why it matters: Unlike LSN-based measurements, this gives you replication delay in human-readable time units.
Recovery Control Functions
Checking Replay Status
pg_is_wal_replay_paused()
Determines whether WAL replay is currently paused on the standby server.
SELECT pg_is_wal_replay_paused();
Returns: true if replay is paused, false if replay is active.
Controlling WAL Replay
Pausing Replay:
SELECT pg_wal_replay_pause();
Resuming Replay:
SELECT pg_wal_replay_resume();
When to pause replay:
- Investigating data consistency issues
- Creating consistent snapshots for testing
- Troubleshooting query conflicts on hot standby
- Performing maintenance that requires a stable data state
Practical Examples
Monitoring Replication Health
-- Comprehensive replication status check
SELECT
pg_is_in_recovery() AS is_standby,
pg_last_wal_receive_lsn() AS last_received,
pg_last_wal_replay_lsn() AS last_replayed,
pg_last_xact_replay_timestamp() AS last_replay_time,
now() - pg_last_xact_replay_timestamp() AS replay_delay;
Calculating Replication Lag
-- Time-based lag measurement
SELECT
CASE
WHEN pg_last_xact_replay_timestamp() IS NULL THEN 'No transactions replayed'
ELSE (now() - pg_last_xact_replay_timestamp())::text
END AS replication_lag;
Conclusion
These PostgreSQL recovery functions provide database administrators with powerful tools for monitoring and controlling replication environments. By understanding when and how to use each function, you can maintain robust high-availability systems, minimize downtime during maintenance, and ensure data consistency across your PostgreSQL infrastructure.
Regular use of these monitoring functions, combined with proper alerting and automation, forms the foundation of a reliable PostgreSQL replication strategy.