PostgreSQL relies on the underlying TCP network layer to manage communication between the database server and connected clients. In long-running applications, connections may remain open for extended periods, and network disruptions can sometimes leave these connections in an unusable state without either side immediately realizing it.
To address this situation, PostgreSQL provides several configuration parameters that control how TCP connections are monitored and when they should be terminated if they become unresponsive. The most important of these parameters are tcp_keepalives_idle, tcp_keepalives_interval, tcp_keepalives_count, and tcp_user_timeout.
These settings allow administrators to fine-tune how quickly PostgreSQL detects broken or stalled connections, helping maintain stable communication between applications and the database server.
Why TCP Connection Monitoring Matters
In many real-world deployments, connections between applications and PostgreSQL can pass through multiple layers such as load balancers, connection poolers, firewalls, or VPN tunnels. If any of these components fail or drop packets silently, a connection may appear active while it is actually unusable.
Without proper monitoring, the server might continue to hold resources for a connection that will never send another request. Over time, this can lead to wasted backend processes, stalled transactions, and connection pool exhaustion.
TCP keepalive settings help PostgreSQL identify these dead connections and clean them up.
1. tcp_keepalives_idle
tcp_keepalives_idle defines how long a connection must remain inactive before PostgreSQL asks the operating system to send a TCP keepalive probe.
A keepalive probe is a small packet sent to verify whether the remote client is still reachable. If the client responds, the connection is considered healthy. If the client does not respond, PostgreSQL begins retrying according to additional parameters.
For example:
tcp_keepalives_idle = 60
This configuration means that if a connection remains idle for 60 seconds, the first keepalive probe will be sent.
Lower values make PostgreSQL detect broken connections more quickly, while higher values reduce the number of network probes.
2. tcp_keepalives_interval
tcp_keepalives_interval determines how long PostgreSQL waits between successive keepalive probes when the remote system does not respond.
Example configuration:
tcp_keepalives_interval = 10
If the first probe receives no response, PostgreSQL waits 10 seconds before sending the next probe.
This interval continues until either the client responds or the maximum number of attempts is reached.
3. tcp_keepalives_count
tcp_keepalives_count controls how many failed keepalive probes are allowed before PostgreSQL concludes that the connection is dead.
Example:
tcp_keepalives_count = 5
In this case, PostgreSQL will send up to five keepalive probes. If none receive a response, the connection is closed.
How These Three Parameters Work Together
These parameters combine to determine how long PostgreSQL waits before dropping an unresponsive connection.
Example configuration:
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 5
Timeline:
- 60 seconds of inactivity before the first probe
- 10 seconds between each retry
- 5 retries allowed
Total time before the connection is closed:
60 + (10 × 5) = 110 seconds
This means PostgreSQL will detect a dead connection in roughly two minutes.
4. tcp_user_timeout
While the previous parameters focus on idle connections, tcp_user_timeout addresses a different scenario: data that has been sent but never acknowledged.
tcp_user_timeout specifies how long transmitted data may remain unacknowledged before the connection is forcibly closed.
Unlike the keepalive parameters, this value is measured in milliseconds.
Example:
tcp_user_timeout = 30000
This configuration means that if PostgreSQL sends data and the remote system does not acknowledge it within 30 seconds, the TCP connection will be terminated.
This parameter is particularly useful in environments with unstable networks or packet loss, where data transmission can stall without triggering standard keepalive checks.
Key Differences Between Keepalive and User Timeout
Although these parameters work together, they address different types of network issues.
| Parameter | Purpose |
| tcp_keepalives_idle | Time before startinghealth checks on idle connections |
| tcp_keepalives_interval | Delay between keepaliveprobes |
| tcp_keepalives_count | Maximum number of failed probes |
| tcp_user_timeout | Maximum time waiting for acknowledgment of transmitted data |
In simple terms:
- Keepalive probes check whether a connection is still alive.
- User timeout ensures that transmitted data is acknowledged within a reasonable time.
When Should These Settings Be Tuned?
Default values provided by the operating system may work well for many deployments, but tuning these parameters becomes important in certain environments, including:
- High availability database clusters
- Systems behind load balancers or proxies
- Applications using persistent connection pools
- Cloud infrastructure with aggressive network timeouts
For example, in a high availability setup with failover mechanisms, faster detection of broken connections can help applications reconnect to the new primary node more quickly.
Implementation Inside PostgreSQL
PostgreSQL itself does not implement TCP probing logic. Instead, it configures the underlying operating system’s TCP stack using socket options.
Internally, PostgreSQL applies these settings when establishing client connections by calling the operating system’s setsockopt() function with options such as:
- TCP_KEEPIDLE
- TCP_KEEPINTVL
- TCP_KEEPCNT
- TCP_USER_TIMEOUT
Because these features rely on kernel networking capabilities, their behavior can vary slightly across operating systems.
Managing network reliability is an essential part of operating a PostgreSQL system in production. The parameters tcp_keepalives_idle, tcp_keepalives_interval, tcp_keepalives_count, and tcp_user_timeout provide a mechanism to detect broken connections and prevent the database from holding resources for clients that are no longer reachable.
Understanding how these parameters interact with the TCP stack allows database administrators to tune connection behavior appropriately for their infrastructure, ensuring both reliability and efficient resource usage.