How PostgreSQL’s Query Planner Chooses (or Rejects) Your Index

I created the index, but PostgreSQL still does a sequential scan.” 

If you’ve worked with PostgreSQL, you’ve almost certainly faced this situation.

This behavior is not a bug. It’s a deliberate decision made by PostgreSQL’s cost-based query planner.

In this article, we’ll explore how PostgreSQL decides whether to use an index, why sequential scans are often chosen even when indexes exist, and how to correctly analyze and fix these situations.

How the PostgreSQL Query Planner Works

PostgreSQL uses a cost-based optimizer.

This means:

  • It generates multiple possible execution plans
  • Estimates the cost of each plan
  • Chooses the plan with the lowest estimated cost

PostgreSQL does not choose the fastest plan in reality—it chooses the cheapest plan based on available statistics.

Cost-Based Optimization: The Core Idea

Each execution plan is evaluated using a simplified cost model:

total_cost = startup_cost + (cpu_cost + io_cost) × estimated_rows

Some important cost parameters include:

ParameterPurpose
seq_page_costCost of sequential page reads
random_page_costCost of random page reads
cpu_tuple_costCPU cost per row
cpu_operator_costCost of evaluating expressions
effective_cache_sizePlanner’s assumption about cache size

By default:

seq_page_cost    = 1.0
random_page_cost = 4.0

This means PostgreSQL assumes random I/O is 4× more expensive than sequential I/O.

Sequential Scan vs Index Scan

Let’s look at a simple query:

SELECT * FROM orders WHERE status = 'completed';

PostgreSQL evaluates two main options.

Option 1: Sequential Scan

  • Read table pages sequentially
  • Apply filter on each row
  • Very cache-friendly

Option 2: Index Scan

  • Traverse B-tree index
  • Follow index pointers to heap rows
  • Causes random heap access

If PostgreSQL estimates that many rows match the condition, a sequential scan is cheaper — even if an index exists.

The Role of Statistics (ANALYZE)

Planner decisions depend heavily on statistics collected by ANALYZE.

PostgreSQL tracks:

  • Total row count
  • Number of distinct values
  • Most Common Values (MCV)
  • Histograms
  • Column correlation

If statistics are outdated or inaccurate, the planner will make poor decisions.

ANALYZE orders;

For skewed data distributions, you may need higher statistics targets:

ALTER TABLE orders
ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;

Correlation and Physical Data Order

Indexes store logical order, not physical row order.

PostgreSQL measures how well heap rows are ordered relative to an index using correlation.

SELECT correlation
FROM pg_stats
WHERE tablename = 'orders'
  AND attname = 'id';
~1.0
Heap is ordered like the index

~0.0
Heap is randomly ordered

~-1.0
Reverse order

Low correlation means index scans cause random heap reads, which are expensive.

This is why even ORDER BY id LIMIT 50 can result in a sequential scan.

Common Myths About Index Usage

If an index exists, PostgreSQL must use it

Indexes are optional access paths, not instructions.

Sequential scans are always bad

Sequential scans are:

  • Fast
  • Cache-efficient
  • Often optimal for large result sets

Disable sequential scans to force index usage

SET enable_seqscan = off;

This breaks planner logic and causes unpredictable performance.

Never use this in production.

Real EXPLAIN ANALYZE Examples

Example 1: Highly Selective Predicate

EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'user@example.com';
Index Scan using users_email_idx
Actual Rows: 1

Index scan is optimal.

Example 2: Low Selectivity Predicate

EXPLAIN ANALYZE
SELECT * FROM users WHERE is_active = true;
Seq Scan on users
Actual Rows: 950000

Sequential scan is cheaper.

When PostgreSQL Should Use Your Index (But Doesn’t)

This usually indicates bad statistics or poor index design.

Possible fixes:

  • Run ANALYZE
  • Increase statistics target
  • Use partial indexes

Example:

CREATE INDEX idx_orders_completed
ON orders(id)
WHERE status = 'completed';

Partial indexes reduce index size and improve selectivity.

Best Practices for Reliable Index Usage

  • Always use EXPLAIN ANALYZE, not just EXPLAIN
  • Compare estimated rows vs actual rows
  • Monitor correlation and data layout
  • Prefer index-only scans when possible
  • Design indexes for access patterns, not assumptions
  • Avoid forcing planner decisions

Final Takeaways

PostgreSQL does not “ignore” your index.

It simply decides:

This index is more expensive than a sequential scan.

Indexes alone do not guarantee performance. 

Correct statistics, good data locality, and realistic cost assumptions do.

Indexes don’t make queries fast — accurate estimates do

whatsapp_icon
location

Calicut

Cybrosys Technologies Pvt. Ltd.
Neospace, Kinfra Techno Park
Kakkancherry, Calicut
Kerala, India - 673635

location

Kochi

Cybrosys Technologies Pvt. Ltd.
1st Floor, Thapasya Building,
Infopark, Kakkanad,
Kochi, India - 682030.

location

Bangalore

Cybrosys Techno Solutions
The Estate, 8th Floor,
Dickenson Road,
Bangalore, India - 560042

Send Us A Message