Index Strategies¶

Indexes accelerate reads but slow writes. Choosing which indexes to create, their column order, and type is critical for production performance.

Index Scan Types (PostgreSQL)¶

Scan Type	Description	When Used
Sequential Scan	Read entire heap page by page	No usable index, large result fraction, small table
Index Scan	Traverse B+Tree, fetch heap pages for matches	Selective query, few matching rows
Index-Only Scan	Return data from index without heap access	All columns in index (covering), visibility map clean
Bitmap Index Scan	Build bitmap of matching pages, read sequentially	Moderate selectivity, multiple indexes combinable

Key Facts¶

Composite index on (a, b, c) usable for WHERE a=1, WHERE a=1 AND b=2, but NOT for WHERE b=2 alone
Covering index includes all columns query needs - eliminates heap access entirely
Partial/filtered index: index only rows matching a condition (smaller, faster to maintain)
SARGable: WHERE col = 5 uses index; WHERE YEAR(col) = 2024 does NOT (function on column)
Each index adds overhead to INSERT/UPDATE/DELETE - avoid over-indexing in OLTP

Patterns¶

Composite Index¶

-- Column order matters: equality columns first, then range
CREATE INDEX idx_orders ON orders (status, created_at);
-- Usable for: WHERE status = 'active' AND created_at > '2024-01-01'
-- Usable for: WHERE status = 'active'
-- NOT usable for: WHERE created_at > '2024-01-01' (skips leftmost column)

Covering Index (Index-Only Scan)¶

-- PostgreSQL: INCLUDE for non-key columns in leaf pages
CREATE INDEX idx_orders_covering ON orders (user_id) INCLUDE (amount, status);
-- Query can be answered entirely from index:
SELECT amount, status FROM orders WHERE user_id = 42;

Partial/Filtered Index¶

-- Index only active users (much smaller than full index)
CREATE INDEX idx_active_users ON users (email) WHERE status = 'active';
-- Only used when query includes WHERE status = 'active'

CREATE INDEX CONCURRENTLY¶

-- PostgreSQL: build index without blocking DML
CREATE INDEX CONCURRENTLY idx_name ON table (column);
-- Takes longer, two table scans, can't run in transaction, may fail (cleanup required)

-- MySQL equivalent: online DDL
ALTER TABLE t ADD INDEX idx_name (column), ALGORITHM=INPLACE, LOCK=NONE;

Generating Test Data¶

-- PostgreSQL: generate millions of rows for index testing
INSERT INTO grades (g) SELECT floor(random() * 100) FROM generate_series(1, 10000000);

Optimizer Index Selection¶

When multiple indexes exist, optimizer estimates cost using: - Table size and statistics from pg_stats - Index selectivity (what fraction of rows match) - Correlation (physical ordering vs index ordering) - Cost parameters (seq_page_cost, random_page_cost)

Hint mechanisms: pg_hint_plan extension (PostgreSQL), USE INDEX/FORCE INDEX (MySQL). Generally trust the optimizer unless EXPLAIN ANALYZE shows it's wrong.

SQL Server Specifics¶

Columnstore indexes: Column-oriented storage for OLAP. Rows grouped into rowgroups (~1M rows), batch mode processing (~900 rows at once). Combinable with rowstore indexes.
Hash indexes: Memory-optimized tables only. O(1) point lookups, not for ranges.
Included columns: Non-key columns in leaf for covering without affecting sort order.

Gotchas¶

Standard CREATE INDEX acquires ShareLock (PostgreSQL) - blocks INSERT/UPDATE/DELETE
CONCURRENTLY can fail and leave invalid index - must DROP and retry
Index-only scan in PostgreSQL requires visibility map to be clean - run VACUUM first
Too many indexes on write-heavy tables dramatically slow INSERT/UPDATE/DELETE
Bitmap scans combine multiple indexes (AND/OR) - sometimes two mediocre indexes beat one perfect one
MySQL InnoDB secondary indexes always do double lookup through clustered index