“I created the index, but PostgreSQL still does a sequential scan.”
If you’ve worked with PostgreSQL, you’ve almost certainly faced this situation.
This behavior is not a bug. It’s a deliberate decision made by PostgreSQL’s cost-based query planner.
In this article, we’ll explore how PostgreSQL decides whether to use an index, why sequential scans are often chosen even when indexes exist, and how to correctly analyze and fix these situations.
How the PostgreSQL Query Planner Works
PostgreSQL uses a cost-based optimizer.
This means:
- It generates multiple possible execution plans
- Estimates the cost of each plan
- Chooses the plan with the lowest estimated cost
PostgreSQL does not choose the fastest plan in reality—it chooses the cheapest plan based on available statistics.
Cost-Based Optimization: The Core Idea
Each execution plan is evaluated using a simplified cost model:
total_cost = startup_cost + (cpu_cost + io_cost) × estimated_rows
Some important cost parameters include:
| Parameter | Purpose |
| seq_page_cost | Cost of sequential page reads |
| random_page_cost | Cost of random page reads |
| cpu_tuple_cost | CPU cost per row |
| cpu_operator_cost | Cost of evaluating expressions |
| effective_cache_size | Planner’s assumption about cache size |
By default:
seq_page_cost = 1.0
random_page_cost = 4.0
This means PostgreSQL assumes random I/O is 4× more expensive than sequential I/O.
Sequential Scan vs Index Scan
Let’s look at a simple query:
SELECT * FROM orders WHERE status = 'completed';
PostgreSQL evaluates two main options.
Option 1: Sequential Scan
- Read table pages sequentially
- Apply filter on each row
- Very cache-friendly
Option 2: Index Scan
- Traverse B-tree index
- Follow index pointers to heap rows
- Causes random heap access
If PostgreSQL estimates that many rows match the condition, a sequential scan is cheaper — even if an index exists.
The Role of Statistics (ANALYZE)
Planner decisions depend heavily on statistics collected by ANALYZE.
PostgreSQL tracks:
- Total row count
- Number of distinct values
- Most Common Values (MCV)
- Histograms
- Column correlation
If statistics are outdated or inaccurate, the planner will make poor decisions.
ANALYZE orders;
For skewed data distributions, you may need higher statistics targets:
ALTER TABLE orders
ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;
Correlation and Physical Data Order
Indexes store logical order, not physical row order.
PostgreSQL measures how well heap rows are ordered relative to an index using correlation.
SELECT correlation
FROM pg_stats
WHERE tablename = 'orders'
AND attname = 'id';
~1.0
Heap is ordered like the index
~0.0
Heap is randomly ordered
~-1.0
Reverse order
Low correlation means index scans cause random heap reads, which are expensive.
This is why even ORDER BY id LIMIT 50 can result in a sequential scan.
Common Myths About Index Usage
“If an index exists, PostgreSQL must use it”
Indexes are optional access paths, not instructions.
“Sequential scans are always bad”
Sequential scans are:
- Fast
- Cache-efficient
- Often optimal for large result sets
“Disable sequential scans to force index usage”
SET enable_seqscan = off;
This breaks planner logic and causes unpredictable performance.
Never use this in production.
Real EXPLAIN ANALYZE Examples
Example 1: Highly Selective Predicate
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'user@example.com';
Index Scan using users_email_idx
Actual Rows: 1
Index scan is optimal.
Example 2: Low Selectivity Predicate
EXPLAIN ANALYZE
SELECT * FROM users WHERE is_active = true;
Seq Scan on users
Actual Rows: 950000
Sequential scan is cheaper.
When PostgreSQL Should Use Your Index (But Doesn’t)
This usually indicates bad statistics or poor index design.
Possible fixes:
- Run ANALYZE
- Increase statistics target
- Use partial indexes
Example:
CREATE INDEX idx_orders_completed
ON orders(id)
WHERE status = 'completed';
Partial indexes reduce index size and improve selectivity.
Best Practices for Reliable Index Usage
- Always use EXPLAIN ANALYZE, not just EXPLAIN
- Compare estimated rows vs actual rows
- Monitor correlation and data layout
- Prefer index-only scans when possible
- Design indexes for access patterns, not assumptions
- Avoid forcing planner decisions
Final Takeaways
PostgreSQL does not “ignore” your index.
It simply decides:
“This index is more expensive than a sequential scan.”
Indexes alone do not guarantee performance.
Correct statistics, good data locality, and realistic cost assumptions do.
Indexes don’t make queries fast — accurate estimates do