Why PostgreSQL Needs Incremental Materialized View Refresh More

Consider a whiteboard in your office. Every morning, you spend two hours writing out a summary of all the activities that have taken place in your company over the last 24 hours — sales per product type, inventory levels per warehouse, customer invoices, and more. This takes two hours because there is a lot of data, but once it’s written on the board, any visitor to the office can get immediate access to it by just looking at the whiteboard.

At 10:00 AM, only one sale is reported—a single transaction represented by a single row. But you are told that to maintain the accuracy of the whiteboard, you have to erase it all and spend two hours doing everything again from scratch.

That is precisely how PostgreSQL handles the task of refreshing a materialized view. For a while, no one pays much attention to this process, as the whiteboard in their case is small enough that the two hours turn into two seconds. But later on, when the database has grown enough to make those two seconds into two minutes, the business cannot afford to wait two minutes each time something changes.

How a Full Refresh Really Works for Your Database?

The thing about refreshing a materialized view in PostgreSQL is that it doesn't care about anything that was modified. It doesn't care about anything that was added. It starts from scratch, runs the entire defining SQL query from start to finish on all involved tables, stores all resulting rows in a completely new table, and then replaces the old one.

Everything is read. All joins are done. All filtering, grouping, and aggregation happen all over again. If you have a view that summarizes two million transactions in accounting and only five of them were added, then PostgreSQL will read all two million records anyway. Five records that were added and one million nine hundred and ninety-five thousand that didn't change – everything goes through the same procedure.

It is like working on a ten-thousand-piece puzzle, finishing it, swapping out five pieces, and getting instructions to dismantle the entire puzzle.

The Invisible Decay

What makes this issue hard is that it goes unnoticed for years. In the initial stage of development, materialized views seem to be magic. A query that took eight seconds suddenly returns in forty milliseconds. People create more views. Reports become much faster. Life is good.

Until the data size changes. And it always changes. After six months, a refresh that used to run for two seconds now takes twelve seconds. After a year, it is forty-five seconds. The interval in which the refresh runs once a minute is changed to five minutes in order not to interfere with other refreshes. Then fifteen minutes. Users begin to notice that numbers on the dashboard are always a bit outdated. Finally, someone analyses the execution plan of the refresh and notices that it is reading thirty million rows – and realizes that no matter what kind of speed your queries have, you cannot refresh thirty million rows each minute without a database constantly refreshing itself.

It is not bad luck or a bad design decision. It is the result of using a refresh strategy, which is doing the same amount of work no matter how many things have really changed.

Real Numbers

Let's consider what actually changes in an enterprise environment during each database refresh cycle. For example, in five minutes, a busy ERP system posts 500 new journal lines of its 2 million total — that's 0.025 percent of data. Warehouse system records 2000 new stock movements of its 1 million — that's 0.2 percent. Support system gets 300 new messages from its 3 million — 0.01 percent.

In most practical workloads, the overwhelming portion of the data remains absolutely unchanged between refresh cycles. There was nothing new in them. There was no new insight to gain from them. Reading them will produce the same old numbers again. But the whole table still needs to be refreshed because there is no way to determine what actually changed and what didn't.

A refresh that only processed the rows that actually changed would do a fraction of a percent of the current work. The same query that takes forty-five seconds would take under a second. The refresh you can only afford to run every fifteen minutes could run every thirty seconds. The dashboard numbers your users see would be practically real-time.

Why Nobody Has Simply Built This Already?

If incremental refresh is so obviously better, why does PostgreSQL not have it? This is a fair question, and the answer is genuinely interesting rather than embarrassing.

It is not difficult to determine the changes made. It is easy to determine which rows have been inserted, updated, or deleted because that is how logical replication works in PostgreSQL. The difficulty lies in interpreting what the changes mean for the pre-computed result.

Let us take an example of a view that returns the sum of account balances per customer based on all of their transactions. There are five new transactions that are made by three different customers. Easy — just insert the new values into the view for those three customers.

Make it a little bit more complicated. This time, the view is created by joining the transaction table with the product category table, filtering out cancelled transactions, and grouping by category and month. Again, there is a new transaction. On which rows does the new transaction change the view? It changes the corresponding category and month where the transaction is made, provided that it is not cancelled. In other words, PostgreSQL has to calculate the join and the filter conditions and find out to which particular pre-computed group of rows the new row belongs.

Now add a subquery, a window function, and handle a delete instead of an insert, which means finding the group that previously included this row, subtracting its contribution, and checking if the group should now disappear entirely. Every query shape has different rules for how changes propagate. A general solution that handles all of them correctly, under concurrent writes, with proper transactional guarantees, is one of the genuinely hard problems in database engineering.

A full refresh avoids this complexity entirely—it does not need to determine how changes propagate through the system. It just recomputes everything, and the answer is always correct. That simplicity is exactly why it has survived as the only option.

The Gap This Leaves

Right now, teams using PostgreSQL with large datasets face a choice that feels like it should not exist in a modern database. Either you use materialized views and accept that their maintenance cost grows until it becomes a problem, or you do not use them and accept that your reporting queries are slow. There is no third option built into the engine.

This is the gap. Not a theoretical gap that researchers talk about. A practical, daily gap that teams hit when their data crosses a certain size, and the refresh that used to be instant starts taking long enough to matter. The infrastructure to build something better is already inside PostgreSQL. The pieces exist. They just have not been assembled into a working incremental refresh mechanism that ships as part of the database.

That is the thing that would change what materialized views can be used for — and at what scale they remain worth using.

whatsapp_icon
location

Calicut

Cybrosys Technologies Pvt. Ltd.
Neospace, Kinfra Techno Park
Kakkancherry, Calicut
Kerala, India - 673635

location

Kochi

Cybrosys Technologies Pvt. Ltd.
1st Floor, Thapasya Building,
Infopark, Kakkanad,
Kochi, India - 682030.

Send Us A Message