Jun 09, 2026

The Hidden Cost of Refreshing a Materialized View That Did Not Need It

Materialized views solve a real problem. Pre-computing expensive aggregations and storing the result as a physical table turns a five-second reporting query into a five-millisecond index scan. Once a team discovers this, they wire up a scheduled job — refresh every five minutes, refresh every hour — and declare the performance problem solved. And it is solved, until it quietly becomes a different problem.

The new problem is this: the refresh runs whether the underlying data has changed or not. At two in the morning, when no transactions are running, and the accounting tables have been untouched for six hours, the scheduled job fires anyway. It reads two million rows, computes the full aggregation, writes the result to disk, rebuilds the indexes, and goes back to sleep. Every cycle of this costs real resources. On a busy database server shared between multiple processes, those resources are not free even when the business is closed.

What a Refresh Actually Does

To understand why unnecessary refreshes hurt, it helps to know what happens inside PostgreSQL when you run one. The default form of REFRESH MATERIALIZED VIEW acquires the strongest possible lock on the view — an Access Exclusive lock. While this lock is held, every query that tries to read from the materialized view blocks and waits. The refresh creates a new heap table in the background, runs the full underlying SELECT query from scratch against the base tables, loads the result into the new heap, and then atomically swaps out the old physical files for the new ones. All indexes are dropped and rebuilt from the ground up. Only after all of this is complete does the lock release, and waiting queries proceed.

For a materialized view that summarizes two million journal entries into a few thousand grouped rows, this process involves reading every row of the source tables, evaluating every filter and join, writing the aggregated result, and rebuilding whatever indexes exist on the view. If the source tables had zero new transactions since the last refresh, every byte of that work was redundant.

The concurrent form of refresh is gentler on readers — it holds a weaker lock that allows SELECT queries to continue — but it does more total work, not less. It builds the new result in a temporary location, then performs a full outer join between the old materialized data and the new result to identify which rows changed. On a view with a million summary rows, that join reads two million rows just to discover that nothing changed. The temporary tables involved can spill to disk. The CPU cost of the diff is real. None of it was necessary if the source data was unchanged.

The Problem Is Especially Sharp in ERP Workloads

In an Odoo installation, the data distribution across a day is rarely uniform. Most accounting entries are posted in bursts — during morning processing, after a batch import, or at month-end close. For significant stretches of the day, and through entire nights and weekends, the core accounting tables do not change at all. A refresh job running every ten minutes over a twelve-hour quiet period executes seventy-two full refreshes for no reason. Each one locks the materialized view, reads the base tables, rebuilds the result, and releases. Users who happen to query during a refresh window wait for a lock that does not need to exist.

This cost scales with the size of the data. A materialized view on a small table barely notices. A materialized view that aggregates ten years of journal entries from a busy ERP system notices very much. The larger and more complex the underlying query, the more expensive each unnecessary refresh becomes, and the more each unnecessary lock affects the users who trigger it.

There Is No Built-In Answer

What makes this genuinely interesting from a database perspective is that PostgreSQL currently has no mechanism to detect whether a refresh is necessary. There is no equivalent of "refresh only if something changed." The refresh command does not check. It does not compare a hash of the old and new results. It does not inspect whether rows were recently inserted or updated into the source tables. It simply executes.

Workarounds exist and are used in practice. Some teams track the last modification timestamp on source tables and skip the refresh call if nothing was written since the previous run. Others use application-level flags that are set whenever a relevant transaction commits. These approaches work but they are manual, fragile, and require extra infrastructure that lives outside the database itself. They are workarounds for a gap, not solutions to it.

The gap matters because materialized views are increasingly central to how modern PostgreSQL applications handle reporting at scale. As data volumes grow and refresh costs rise, the question of whether to refresh at all becomes as important as how fast the refresh runs. A database that could answer that question natively — that could look at a materialized view, examine what changed in its source tables since the last refresh, and decide whether the work is worth doing — would make one of PostgreSQL's best features significantly more practical to use in production.

That capability does not exist yet in PostgreSQL. The cost of its absence is paid, quietly, by every scheduled job that fires at three in the morning to refresh a view that was already correct.

The Hidden Cost of Refreshing a Materialized View That Did Not Need It

What a Refresh Actually Does

The Problem Is Especially Sharp in ERP Workloads

There Is No Built-In Answer

Category

Related Post