The process that is working on pretty much every PostgreSQL database right now, one that most developers who are using it don’t even think about, is one that doesn’t respond to queries, isn’t accepting any connections, and can never be seen in application logs. But it’s doing something really important, although perhaps not glamorous, something that is absolutely crucial, and when it’s not able to do its job, the database itself begins to get sick in very peculiar ways.
This process is called autovacuum. And once you know what it is really doing there, many things related to PostgreSQL suddenly become very clear.
The Mess MVCC Makes
PostgreSQL’s concurrency system is beautiful. When you change a row, it doesn’t modify the old version; it writes the whole new version next to it and flags the old one as being no longer current. When you delete a row, it doesn’t delete it from disk; it just adds the stamp that says “this row is deleted by this transaction” and goes off to do its other work. This is how readers and writers can work together without interfering.
Unfortunately, this leads to the accumulation of mess inside your tables. Old row versions which shouldn’t be there anymore. Deleted rows which are still there, occupying disk space. Without any cleaning up, tables will grow without bounds and each query will have to look at more and more garbage to find something useful.
That is where vacuum comes in. Autovacuum is vacuuming without someone having to schedule it explicitly.
Two Processes for the Same Job
It should be noted that autovacuum is not one process but rather two types of processes working together to achieve a task.
The first process is the launcher. It is a separate process that starts once the database launches and runs continuously. One may consider it a supervisor of sorts patrolling the aisles of a warehouse and checking which aisles require any work to be done. However, it does not perform the job on its own; it only observes. Every sixty seconds, it scans all the tables of all the databases, evaluates the number of dead rows in each table, and determines which tables have reached the critical level of contamination. After which, it requests the postmaster to launch workers.
The worker process is temporary in nature. It connects to a particular database and vacuums the tables in need of that. By default, there could be up to three workers launched simultaneously. In case twelve tables are in need of vacuuming at the same time, three workers launch, and nine are left waiting. Once the workers finish their task and exit, they inform the launcher process, which then launches the next one.
The Formula for The Threshold That Determines Everything
The launcher doesn't get into a state of panic just because one row was deleted at any one time. It would produce unnecessary overhead instead of saving resources. What it actually does is wait for the number of dead rows on a particular table to reach a threshold, depending on the size of the table.
The formula written into the source code is easy enough: the threshold equals to fifty rows and then twenty percent of the total number of rows. For example, on a table with one million rows, the launcher will wait until there are about two hundred thousand dead rows before taking any action. In case the table contains ten million rows, the threshold equals two million. The point is to make sure that every vacuum operation cleans some significant number of rows, not just wakes up a worker process to clean three rows and goes to sleep.
Similarly, ANALYZE operates on a different but related formula. The threshold at which the query planner stats are updated is fifty rows and ten percent of the total number of rows – a lower one compared to vacuum since old stats may mislead the planner into selecting a poor query plan.
The Polite Worker
Unlike other database processes, autovacuum is a polite worker from its very conception. After all, it is meant to do some background cleanup tasks, and it must not prevent legitimate user queries.
To make sure that it remains polite, autovacuum calculates how much I/O work it does with the help of cost accounting method. The cost of reading a page from the buffer pool is low, but the cost of reading a page from disk is higher. The cost of writing a dirty page is the highest of all. When total cost reaches certain threshold (default value is two hundred units), autovacuum pauses for two milliseconds.
This would cause throttling to become the bottleneck when the database writes faster than the autovacuum process can clean up after itself. The solution lies in increasing the cost threshold and/or decreasing the wait time between bursts of activity. This is because the defaults have been set conservatively; they are made with a generic system in mind.
The One Situation Where Politeness Comes to an End
There is one strict constraint that the design of PostgreSQL imposes on autovacuum, and it takes priority over everything else: transaction ID wraparound.
PostgreSQL assigns 32-bit IDs to all transactions it executes. Such ID can take 2.1 billion different values before wrapping around and starting from the same point once again. But the problem does not lie in the wrapping because the very conception of the PostgreSQL architecture implies this. The problem lies in the fact that the transaction ID that creates the row is stored within the row. In case of the wrapping, a row that was created two billion transactions ago will be treated as created in the future.
This is where the PostgreSQL will freeze the outdated rows – change the real transaction ID to something special that says "the row is permanently visible, do not care about the counter". The function called autovacuum performs all this freezing process. Autovacuum is aware of the age of the oldest unfrozen transaction ID in each table. Once this number gets above two hundred million transactions, autovacuum will become no more voluntary. No matter what, it will vacuum the table, no matter what, even if you've turned autovacuum off, and will work without any cost throttling.
This is an emergency vacuum – and it should not be viewed as a problem because it does its job. But it is noisy. If you monitor the I/O and notice some activity from vacuum on tables that did not expect any activity before, check the age of the transactions on those tables – a very old relfrozenxid in pg_class will show you this.
What Falling Behind Actually Looks Like
When the autovacuum falls behind, it does not declare itself openly. Instead, you see it through a gradual build-up of symptoms over several days or even weeks.
Your tables get bigger than the amount of live data contained within them. Your queries slow down, as they have to go through more pages to retrieve the same number of rows. You start seeing huge numbers in the dead tuple columns of your pg_stat_user_tables. The time since the last vacuum keeps on increasing and increasing. And then, when the time comes close to wraparound, PostgreSQL makes an all-out effort to vacuum all your tables at once – when in reality, this problem could have been avoided through maintenance from day one.
The reasons why it happens? You simply have too few autovacuum workers for your number of active tables, your cost throttling policy is overly cautious for your writing speed, or your large tables simply suck all your available resources away.
There's no such thing as autovacuum being plug-and-play; it has to be adapted to the actual load. The same database which was okay with the defaults at ten thousand rows per table isn't okay with those at ten million rows per table. Fortunately, keeping an eye on the dead tuples count, how long it's been since vacuum was run, and relfrozenxid ages is easy, and better done before the problem occurs than afterward.