30 SQL Performance Tips Every Developer Should Know in 2026
A slow SQL query is not just a user experience problem — it is a business problem. In 2026, the average SaaS application makes between 20 and 80 database calls per API request. When those queries are unoptimized, the effects cascade: API latency climbs from 80ms to 4 seconds, cloud database bills double without any increase in traffic, dashboards timeout at peak hours, and AI pipelines that depend on fast data retrieval stall completely.
The good news: SQL performance optimization is one of the highest-leverage activities a backend developer can do. A single well-placed index can take a query from 8 seconds to 2 milliseconds. Switching from OFFSET to cursor-based pagination can make a paginated API endpoint handle 100x the load. Fixing one N+1 query pattern can eliminate 99 unnecessary database round trips per API call.
This guide covers all 30 most impactful SQL optimization techniques — from query-level rewrites you can apply in 5 minutes to architectural patterns that scale your database to millions of rows. Every tip includes a real production scenario, before/after SQL examples, and a clear explanation of why the change works. Updated specifically for 2026, covering PostgreSQL, MySQL, SQL Server, and SQLite.
How to Write SQL Queries That Don't Waste Resources
The fastest optimization wins come from the queries themselves — before you touch indexes, schemas, or infrastructure. These six patterns apply to every SQL database and can be implemented immediately.
Avoid SELECT * — Fetch Only the Columns You Need ⚡ High Impact
SELECT * is the most common SQL performance mistake. It forces the database to read every column from disk — even if your application only uses 3 of 40 columns. It also prevents the query optimizer from using covering indexes (index-only scans), which are the fastest possible access path.
Bad — fetches all 35 columns for every row:
-- Reading all 35 columns from 500,000 rows
SELECT * FROM orders WHERE status = 'pending';
-- Even if you only display: id, user_id, total, created_atOptimized — fetch only what you need:
-- 4 columns instead of 35: ~90% less data read from disk
SELECT id, user_id, total, created_at
FROM orders
WHERE status = 'pending';Optimize WHERE Clause Selectivity ⚡ High Impact
The WHERE clause is the most powerful tool for reducing the rows a query must process. High-selectivity conditions (filtering to a small percentage of rows) let indexes work most efficiently. Low-selectivity conditions (returning 40%+ of rows) often lead the optimizer to prefer a full table scan.
-- Low selectivity: 'active' may be 70% of users — full scan likely
SELECT id, email FROM users WHERE status = 'active';
-- High selectivity: a specific email matches ~1 row — index always used
SELECT id, email FROM users WHERE email = 'alice@example.com';
-- Combined: push the high-selectivity filter first
SELECT id, email, status
FROM users
WHERE email LIKE '%@acme.com' -- narrows to one company
AND status = 'active'; -- then refines furtherNever Apply Functions to Indexed Columns in WHERE ⚡ High Impact
Applying any function to an indexed column in a WHERE clause destroys the ability to use that index. The database cannot look up YEAR(created_at) = 2024 in a B-tree index on created_at — it must compute the function for every row and scan the whole table.
-- BAD: wrapping indexed column in a function = full table scan
SELECT * FROM orders WHERE YEAR(order_date) = 2024;
SELECT * FROM users WHERE UPPER(email) = 'ALICE@EXAMPLE.COM';
SELECT * FROM logs WHERE DATE(created_at) = '2024-06-15';
-- GOOD: rewrite to a range that the index can satisfy directly
SELECT * FROM orders
WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';
SELECT * FROM users WHERE email = 'alice@example.com'; -- store normalized
SELECT * FROM logs
WHERE created_at >= '2024-06-15' AND created_at < '2024-06-16';CREATE INDEX ON users (LOWER(email)). Then WHERE LOWER(email) = ... can use it. But the simplest fix is to normalize data at write time.Always LIMIT Exploratory Queries ⚡ Medium Impact
During development and data exploration, always add LIMIT to prevent accidentally full-scanning multi-million row tables. A query without LIMIT on a 50M row table can run for minutes and exhaust memory. LIMIT also lets the database return the first matching rows without finding all of them.
-- Development: always limit exploration queries
SELECT * FROM events ORDER BY created_at DESC LIMIT 100;
-- Understand data distribution without full scan
SELECT country, COUNT(*) FROM users GROUP BY country LIMIT 20;
-- Find examples of a specific pattern (stops after first 5 matches)
SELECT id, payload FROM webhook_logs
WHERE status = 'failed' LIMIT 5;Optimize ORDER BY — Sort on Indexed Columns ⚡ Medium Impact
Sorting is expensive when done in-memory (the optimizer creates a temporary sort buffer). When ORDER BY columns are covered by an index in the same sort direction, the database reads rows already in order — no sort buffer needed. This is a significant performance difference on large result sets.
-- Expensive: created_at not indexed — filesort required
SELECT id, total FROM orders ORDER BY created_at DESC LIMIT 50;
-- With index on created_at: reads rows in index order — no sort buffer
CREATE INDEX idx_orders_created ON orders (created_at DESC);
SELECT id, total FROM orders ORDER BY created_at DESC LIMIT 50;
-- Combined sort + filter: composite index (status, created_at)
SELECT id, total FROM orders
WHERE status = 'completed'
ORDER BY created_at DESC
LIMIT 20;
-- Index: CREATE INDEX ON orders (status, created_at DESC);Push Filters Down — Filter Before You JOIN ⚡ High Impact
JOIN operations multiply row counts. Filtering data before or immediately during a JOIN drastically reduces the number of rows the join must process. Many databases handle this automatically, but writing explicit early filters guarantees efficient execution regardless of optimizer version.
-- Inefficient: joins all orders, then filters
SELECT u.first_name, o.total
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.status = 'completed'
AND o.created_at >= '2025-01-01';
-- Same result, but the filter intent is clear and forces efficient execution:
-- Use a subquery or CTE to pre-filter before joining
SELECT u.first_name, recent.total
FROM users u
JOIN (
SELECT user_id, total
FROM orders
WHERE status = 'completed'
AND created_at >= '2025-01-01'
) recent ON recent.user_id = u.id;
-- Most optimizers will handle the first form equally well
-- but the subquery approach guarantees pre-filtering regardlessSQL Indexing Best Practices for Large Databases
Indexes are the single most impactful tool in SQL performance optimization. But a poorly chosen index can slow writes without helping reads. These five tips cover when to index, how to structure composite indexes, how to build covering indexes, and how to avoid over-indexing.
Index Columns Used in WHERE, JOIN ON, and ORDER BY ⚡ High Impact
The rule of thumb for indexing: create indexes on columns that appear in WHERE conditions, JOIN ON clauses, and ORDER BY clauses with high cardinality (many distinct values). Indexes on low-cardinality columns (like a boolean is_active) are rarely useful because the optimizer often prefers a full scan when 50% of rows match.
-- Columns that warrant indexes:
CREATE INDEX idx_orders_user_id ON orders (user_id); -- JOIN target
CREATE INDEX idx_orders_status ON orders (status); -- WHERE filter
CREATE INDEX idx_orders_created_at ON orders (created_at); -- ORDER BY / range
-- Columns NOT worth indexing alone:
-- is_deleted BOOLEAN (only 2 values)
-- gender VARCHAR (very few distinct values)
-- These are candidates for partial indexes instead:
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';
-- Covers only the subset you care about — much smaller, much fasterMaster Composite Indexes — Column Order Is Everything ⚡ High Impact
A composite index on (A, B, C) can serve queries filtering on A, A+B, or A+B+C — but NOT on B alone or C alone. This is the left-prefix rule. Beyond that, column order within the prefix matters: put equality predicates before range predicates to maximize index usage.
| Query | Index (user_id, status, created_at) | Can Use Index? |
|---|---|---|
| WHERE user_id = 42 | Left prefix match | ✓ Yes — full index prefix |
| WHERE user_id = 42 AND status = 'active' | Left prefix match | ✓ Yes — both equality columns |
| WHERE user_id = 42 AND status = 'active' AND created_at > '2024-01-01' | Full index | ✓ Yes — equality first, then range |
| WHERE status = 'active' | No left prefix | ✗ No — skips first column |
| WHERE created_at > '2024-01-01' | No left prefix | ✗ No — skips first two columns |
-- Optimal: equality predicates first, range predicate last
CREATE INDEX idx_orders_composite ON orders (user_id, status, created_at);
-- Efficient query that uses all three columns of the index:
SELECT id, total FROM orders
WHERE user_id = 42
AND status = 'completed'
AND created_at >= '2025-01-01';
-- Interview question: why is (created_at, user_id) a bad composite for this query?
-- Answer: created_at is a range predicate — putting it first means the index
-- is only used for the created_at range, not the user_id equality filter.Build Covering Indexes to Enable Index-Only Scans ⚡ High Impact
A covering index includes all columns needed to satisfy a query — the WHERE predicates, the JOIN keys, AND the SELECT output columns. When a covering index exists, the database never needs to touch the main table (the heap) at all. This is called an index-only scan, and it is the fastest possible access path.
-- Query: get email + name for users in a specific country
SELECT email, first_name, last_name
FROM users
WHERE country = 'US';
-- Basic index: covers the WHERE, but SELECT columns require heap fetch
CREATE INDEX idx_users_country ON users (country);
-- Covering index: covers WHERE + all SELECT columns — pure index read
-- PostgreSQL / SQL Server:
CREATE INDEX idx_users_country_covering ON users (country)
INCLUDE (email, first_name, last_name);
-- MySQL: all SELECT columns must be in the index key itself
CREATE INDEX idx_users_country_covering ON users (country, email, first_name, last_name);
-- With covering index, EXPLAIN shows "Index Only Scan" in PostgreSQL
-- or "Using index" in MySQL — no table access neededAvoid Over-Indexing — Every Index Has a Write Penalty ⚡ Medium Impact
Each index on a table must be updated on every INSERT, UPDATE, and DELETE. A table with 12 indexes on a high-write workload can have writes that are 5–8x slower than necessary. Additionally, the query optimizer must evaluate more index candidates per query, sometimes making worse choices. For OLTP systems, aim for no more than 5–7 indexes per table.
-- Audit your indexes for actual usage (PostgreSQL):
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE 'pg_toast%'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Indexes with idx_scan = 0 after months of traffic are unused.
-- Candidates for removal. Always test DROP INDEX CONCURRENTLY in staging first.Monitor and Rebuild Fragmented Indexes ⚡ Medium Impact
B-tree indexes become fragmented as rows are inserted, updated, and deleted. Fragmented indexes waste space and require more disk reads to traverse. On heavily-written tables, periodic index maintenance restores performance.
-- PostgreSQL: rebuild a bloated index without locking the table
REINDEX INDEX CONCURRENTLY idx_orders_created_at;
-- PostgreSQL: check index bloat
SELECT
indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
-- MySQL: rebuild index and reclaim space
OPTIMIZE TABLE orders;
-- SQL Server: rebuild fragmented indexes
ALTER INDEX idx_orders_created_at ON orders REBUILD;
-- Or reorganize (lighter, online operation):
ALTER INDEX idx_orders_created_at ON orders REORGANIZE;SQL JOIN Optimization Techniques for Production Queries
JOINs are where many performance problems hide. A JOIN between two large, unindexed tables can scan billions of row combinations. These tips ensure your JOINs stay fast as data grows. For a deep dive on JOIN syntax, see the SQL JOINs Explained guide.
Always JOIN on Indexed Columns ⚡ High Impact
The JOIN ON column on the inner table must be indexed for the optimizer to use an efficient join strategy (Nested Loop Index Scan or Hash Join). Without an index on the join column, the database resorts to a full scan of the inner table for every row in the outer table — O(n×m) complexity.
-- Ensure both sides of the JOIN ON have indexes:
-- orders.user_id must have an index for this JOIN to be efficient
SELECT u.first_name, COUNT(o.id) AS order_count
FROM users u
JOIN orders o ON o.user_id = u.id -- o.user_id must be indexed
GROUP BY u.id, u.first_name
ORDER BY order_count DESC
LIMIT 10;
-- Check the join key has an index:
CREATE INDEX idx_orders_user_id ON orders (user_id);
-- EXPLAIN (ANALYZE) will show:
-- WITHOUT index: Hash Join + Seq Scan on orders
-- WITH index: Nested Loop + Index Scan on ordersEXISTS vs IN — Know When Each Is Faster ⚡ Medium Impact
Both EXISTS and IN check for membership in a subquery result, but they evaluate differently. EXISTS short-circuits on the first match. IN evaluates the complete subquery result first, then checks membership. The performance difference matters significantly when the subquery result set is large.
-- IN: evaluates the full subquery, builds a hash table of all IDs
SELECT id, first_name FROM users
WHERE id IN (
SELECT DISTINCT user_id FROM orders WHERE status = 'completed'
);
-- EXISTS: stops at first matching row — short-circuits
SELECT id, first_name FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.user_id = u.id AND o.status = 'completed'
);
-- NOT EXISTS vs NOT IN — important difference:
-- NOT IN returns no rows if the subquery contains ANY NULL value!
-- NOT EXISTS handles NULLs correctly:
-- DANGEROUS: returns zero rows if any order has NULL user_id
SELECT id FROM users WHERE id NOT IN (SELECT user_id FROM orders);
-- SAFE: handles NULLs correctly
SELECT id FROM users u WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);| Pattern | Subquery Size | Performance | NULL Safe? |
|---|---|---|---|
| IN | Small (<1000 rows) | Good | No — returns 0 rows if NULL present |
| IN | Large (10000+ rows) | Poor | No |
| EXISTS | Any size | Good | Yes |
| NOT IN | Any size | Varies | No — dangerous with NULLs |
| NOT EXISTS | Any size | Good | Yes — always preferred |
Avoid Accidental Cartesian Products ⚡ High Impact
A Cartesian product (cross join) multiplies every row from one table with every row from another. On two tables of 10,000 rows each, this produces 100 million row combinations. Missing or wrong JOIN conditions silently create cross joins that can crash a production database.
-- DANGEROUS: missing ON clause = silent Cartesian product
-- On users(50K) x orders(500K) = 25 BILLION row combinations
SELECT u.first_name, o.total
FROM users u, orders o; -- old comma syntax — always avoid
-- Also dangerous: ON clause doesn't actually filter the join
SELECT u.first_name, o.total
FROM users u
JOIN orders o ON 1 = 1; -- explicit cross join!
-- Correct: always include a meaningful join condition
SELECT u.first_name, o.total
FROM users u
JOIN orders o ON o.user_id = u.id;
-- Detection: EXPLAIN showing very high "rows" estimate = suspect a Cartesian productUnderstanding SQL Query Execution Plans
The execution plan is the database's blueprint for how it will execute a query. Reading plans is the most direct way to diagnose slow queries — more reliable than guessing, faster than blind trial and error.
How to Read EXPLAIN and EXPLAIN ANALYZE ⚡ High Impact
EXPLAIN shows what the optimizer plans to do. EXPLAIN ANALYZE actually executes the query and shows real timings. Use EXPLAIN first (safe, no data modification, no execution cost). Use EXPLAIN ANALYZE when you need actual row counts and timings — but be aware it runs the query, so never use it on a slow write query without wrapping it in a transaction you roll back.
-- PostgreSQL: plan only (safe, no execution)
EXPLAIN SELECT id, total FROM orders WHERE user_id = 42;
-- PostgreSQL: full analysis with actual timings and buffer stats
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT id, total FROM orders WHERE user_id = 42;
-- Key nodes to understand:
-- Seq Scan → full table scan (bad on large tables)
-- Index Scan → uses index to find rows, then fetches from heap
-- Index Only Scan → reads only the index (fastest — no heap access)
-- Nested Loop → efficient for small inner sets, slow for large
-- Hash Join → efficient for large unsorted join inputs
-- Sort → in-memory sort (check if an index can eliminate it)
-- Limit → good — stops after N rows
-- MySQL: equivalent command
EXPLAIN FORMAT=JSON
SELECT id, total FROM orders WHERE user_id = 42;
-- Look for "type": "ALL" (full scan) vs "type": "ref" (index use)Enable and Monitor Slow Query Logs ⚡ High Impact
EXPLAIN tells you about specific queries you already know are slow. Slow query logs catch queries you haven't noticed yet — the background jobs, the ORM queries generated at runtime, the reporting queries that run at 2 AM. Enable them in production with a low threshold to catch regressions before users do.
-- PostgreSQL: in postgresql.conf
log_min_duration_statement = 200 -- log queries slower than 200ms
log_statement = 'none' -- don't log all statements (too noisy)
log_line_prefix = '%t [%p]: [%l-1] '
-- Or per-session (no restart required):
SET log_min_duration_statement = 500;
-- MySQL: in my.cnf
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 0.5 -- 500ms threshold
log_queries_not_using_indexes = 1 -- also log full table scans
-- Analyze MySQL slow query log with pt-query-digest (Percona Toolkit):
pt-query-digest /var/log/mysql/slow.log | head -100SQL Pagination Optimization for Scalable Applications
Pagination is one of the most frequently written SQL patterns and one of the most frequently mis-optimized. The default OFFSET/LIMIT approach works for page 1 but becomes catastrophically slow by page 1,000.
Why OFFSET/LIMIT Pagination Fails at Scale ⚡ High Impact
OFFSET is not a "skip" — it is a "read and discard." The database must find, read, and count every row from position 1 to position N before returning your N+1 through N+20. On page 1, this is fast. On page 10,000 (OFFSET 200,000), the database reads and discards 200,000 rows every time a user clicks "next."
-- Page 1: reads 20 rows + fetches 20 — fast
SELECT id, title FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 0;
-- Page 100: reads and discards 2,000 rows + fetches 20 — slow
SELECT id, title FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 2000;
-- Page 10,000: reads and discards 200,000 rows + fetches 20 — very slow
SELECT id, title FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 200000;
-- EXPLAIN shows: always a Seq Scan regardless of the ORDER BY index
-- because the optimizer can't skip pages with OFFSETUse Keyset (Cursor) Pagination for Constant-Time Pagination ⚡ High Impact
Keyset pagination (also called cursor pagination) uses the last-seen value from the previous page as a WHERE filter instead of an OFFSET. The database uses an index to find the starting position in O(log n) time — the same cost whether you're on page 1 or page 100,000.
-- Keyset pagination: uses the last seen id from previous page
-- Page 1 (first load):
SELECT id, title, created_at
FROM posts
ORDER BY id DESC
LIMIT 20;
-- Returns posts with ids: 5000, 4999, 4998 ... 4981. Last id = 4981.
-- Page 2 (cursor = last id from page 1):
SELECT id, title, created_at
FROM posts
WHERE id < 4981 -- index seek to start position
ORDER BY id DESC
LIMIT 20;
-- Returns next 20 posts. Same performance as page 1.
-- Page 10,000: identical performance to page 1
SELECT id, title, created_at
FROM posts
WHERE id < :last_cursor -- parameterized cursor value
ORDER BY id DESC
LIMIT 20;| Approach | Page 1 Speed | Page 1,000 Speed | Arbitrary Jump | Data Freshness |
|---|---|---|---|---|
| OFFSET/LIMIT | Fast | Very slow (linear) | Yes | Consistent |
| Keyset/Cursor | Fast | Fast (constant) | No | Stable cursor — skips inserted rows |
Diagnosing and Fixing the N+1 Query Problem
The N+1 query problem is the most common cause of API endpoints that seem fast in development but collapse under production load. Understanding it is essential for every developer using an ORM.
Identify and Eliminate N+1 Query Patterns ⚡ High Impact
N+1 occurs when application code fetches a list of N records, then makes an additional database query for each record to load related data. In development with 10 records, this means 11 queries — barely noticeable. In production with 1,000 records per page, it means 1,001 database round trips for one API response.
-- The N+1 pattern (pseudocode of what ORMs do implicitly):
posts = db.query("SELECT id, title FROM posts LIMIT 100")
-- 1 query above, then N more:
for post in posts:
post.comments = db.query(
"SELECT * FROM comments WHERE post_id = ?", post.id
)
-- Total: 1 + 100 = 101 queries. For 1000 posts: 1001 queries.
-- THE FIX: one JOIN query replaces all N+1 queries
SELECT
p.id AS post_id,
p.title,
c.id AS comment_id,
c.body AS comment_body,
c.created_at AS comment_date
FROM posts p
LEFT JOIN comments c ON c.post_id = p.id
ORDER BY p.id, c.created_at
LIMIT 100;
-- 1 query. 1 round trip. All data returned together.include: { comments: true }. Sequelize: include: [Comment]. Rails: .includes(:comments). Django: .prefetch_related('comments'). The ORM generates a single JOIN (or two queries) instead of N individual queries.Batch Queries with IN() Instead of Looping ⚡ High Impact
Even when you can't use a JOIN (different data sources, multiple microservices), you can almost always batch ID lookups using IN() instead of making N individual queries. One query fetching 100 IDs beats 100 individual round trips by an order of magnitude.
-- BAD: N queries, one per user (100 users = 100 round trips)
for user_id in user_ids:
user = db.query("SELECT id, email FROM users WHERE id = ?", user_id)
-- GOOD: one query for all users (1 round trip regardless of N)
SELECT id, email, first_name
FROM users
WHERE id IN (1, 2, 3, 4, 5, ..., 100);
-- In application code (Python/Node example):
user_ids = [1, 2, 3, ..., 100]
placeholders = ','.join(['?'] * len(user_ids))
users = db.query(f"SELECT id, email FROM users WHERE id IN ({placeholders})", user_ids)
-- Pagination of large IN() lists: batch in chunks of 500-1000
-- IN() with 10,000 IDs can hit query size limits and plan cache missesAdvanced SQL Optimization Techniques for Production Systems
These patterns go beyond individual query tuning — they address architectural and schema-level optimization that delivers sustained performance gains as data scales. For deeper exploration of CTEs, see the CTE guide.
Query Caching — Application-Level vs Database-Level ⚡ Medium Impact
Database-level query caches (MySQL 5.7's deprecated query cache, PostgreSQL's lack of one) are insufficient for production systems. Application-level caching with Redis is the standard approach: cache the results of expensive, frequently-read, rarely-changing queries with an appropriate TTL.
-- Example: cache a heavy dashboard aggregation in Redis
-- Instead of running this on every dashboard load:
SELECT
DATE_TRUNC('day', created_at) AS day,
COUNT(*) AS orders,
ROUND(SUM(total), 2) AS revenue
FROM orders
WHERE status = 'completed'
AND created_at >= NOW() - INTERVAL '30 days'
GROUP BY day
ORDER BY day;
-- Cache pattern (pseudocode):
cache_key = "dashboard:revenue:last_30_days"
cached = redis.get(cache_key)
if cached:
return json.parse(cached)
result = db.query(above_sql)
redis.setex(cache_key, 300, json.stringify(result)) -- cache for 5 minutes
return result
-- Invalidation: set cache TTL based on how stale data can acceptably be.
-- For real-time dashboards: 30–60s TTL
-- For daily reports: 1 hour TTL
-- For reference data: 24 hours TTLMaterialized Views for Expensive Pre-computed Aggregations ⚡ High Impact
A materialized view stores the result of a query as a physical table. Querying the materialized view reads the pre-computed result rather than running the expensive aggregation every time. This trades query speed for refresh delay — the view shows data as of the last refresh.
-- Create a materialized view for expensive revenue aggregation
-- PostgreSQL:
CREATE MATERIALIZED VIEW daily_revenue AS
SELECT
DATE_TRUNC('day', o.created_at) AS day,
p.category,
COUNT(DISTINCT o.id) AS orders,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE o.status = 'completed'
GROUP BY 1, 2;
-- Add an index to the materialized view for fast lookups:
CREATE INDEX ON daily_revenue (day, category);
-- Querying the view is now instant (reads pre-stored data):
SELECT day, revenue FROM daily_revenue
WHERE category = 'Electronics'
AND day >= NOW() - INTERVAL '7 days';
-- Refresh options:
REFRESH MATERIALIZED VIEW daily_revenue; -- locks view
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue; -- no lock (PostgreSQL 9.4+)
-- Schedule via pg_cron, a cron job, or application logicTable Partitioning for Very Large Tables ⚡ High Impact
Partitioning splits a large table into smaller physical sub-tables (partitions) while maintaining a unified logical interface. When a query includes a filter on the partition key, the optimizer reads only the relevant partition(s) — partition pruning — dramatically reducing I/O on tables with hundreds of millions of rows.
-- PostgreSQL: range partition a large events table by month
CREATE TABLE events (
id BIGSERIAL,
user_id INT NOT NULL,
event_type VARCHAR(50),
created_at TIMESTAMPTZ NOT NULL,
payload JSONB
) PARTITION BY RANGE (created_at);
-- Create monthly partitions:
CREATE TABLE events_2025_01 PARTITION OF events
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE events_2025_02 PARTITION OF events
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
-- ... add monthly partitions via automation
-- Query automatically prunes to relevant partition:
SELECT COUNT(*) FROM events
WHERE created_at >= '2025-06-01' AND created_at < '2025-07-01';
-- Reads ONLY the June partition — ignores all others
-- Also useful: drop old partitions without expensive DELETE:
DROP TABLE events_2023_01; -- instant, no table lockStrategic Denormalization for Read-Heavy Workloads ⚡ Medium Impact
Database normalization eliminates redundancy and ensures data integrity — correct for write-heavy OLTP systems. But for read-heavy reporting, analytics, and public-facing APIs, strategic denormalization stores pre-computed or redundant data to avoid expensive runtime JOINs.
-- Normalized (3NF): requires JOIN to get username with a post
SELECT p.title, u.username FROM posts p JOIN users u ON u.id = p.user_id;
-- Denormalized: store username directly in posts
-- Trade: redundant data, but zero JOIN needed
ALTER TABLE posts ADD COLUMN author_username VARCHAR(50);
UPDATE posts p SET author_username = (
SELECT username FROM users WHERE id = p.user_id
);
SELECT title, author_username FROM posts; -- no JOIN needed
-- Common denormalization patterns:
-- Counter caches: posts.comment_count (updated via trigger or app logic)
-- Aggregated totals: orders.item_count, users.total_spend
-- Embedded foreign key names: order_items.product_name (snapshot at time of order)
-- When to denormalize:
-- 1. The JOIN is expensive and the data changes rarely
-- 2. Read volume is much higher than write volume (10:1 or more)
-- 3. The denormalized field is a snapshot (historical accuracy trumps live accuracy)UNION ALL vs UNION — Always Prefer UNION ALL When Duplicates Are Acceptable ⚡ Medium Impact
UNION deduplicates the result by running an implicit DISTINCT — which requires sorting or hashing all results. UNION ALL skips deduplication entirely and is always faster. Use UNION ALL unless you specifically need to eliminate duplicates across the combined result sets.
-- UNION: sorts + deduplicates all results (expensive for large sets)
SELECT user_id FROM orders WHERE status = 'completed'
UNION
SELECT user_id FROM payments WHERE status = 'success';
-- UNION ALL: no sort, no dedup — significantly faster
SELECT user_id FROM orders WHERE status = 'completed'
UNION ALL
SELECT user_id FROM payments WHERE status = 'success';
-- If you need unique IDs from UNION ALL, use a CTE + DISTINCT:
WITH combined AS (
SELECT user_id FROM orders WHERE status = 'completed'
UNION ALL
SELECT user_id FROM payments WHERE status = 'success'
)
SELECT DISTINCT user_id FROM combined;
-- Often faster than UNION because you control where the dedup happensCOUNT(*) Optimization — When Approximate Is Fine ⚡ Medium Impact
COUNT(*) on a large table without a WHERE clause requires a full table scan (in PostgreSQL). For UI elements showing "approximately 3.2 million products," an exact count is unnecessary and expensive. Use table statistics for approximate counts.
-- Exact COUNT(*): full table scan on large tables (PostgreSQL)
SELECT COUNT(*) FROM products; -- might take seconds on 50M rows
-- Fast approximate count from PostgreSQL statistics (milliseconds):
SELECT reltuples::BIGINT AS approx_count
FROM pg_class
WHERE relname = 'products';
-- MySQL: approximate count from information_schema
SELECT TABLE_ROWS FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'mydb' AND TABLE_NAME = 'products';
-- For exact counts with WHERE: use partial indexes
-- "How many pending orders?" — with a partial index, this is instant:
CREATE INDEX idx_orders_pending ON orders (id) WHERE status = 'pending';
SELECT COUNT(*) FROM orders WHERE status = 'pending';
-- PostgreSQL uses the partial index — reads only the small index, not the full table
-- COUNT(*) vs COUNT(column) vs COUNT(DISTINCT column):
-- COUNT(*) — counts all rows (including NULL) — use this
-- COUNT(column) — excludes NULL rows for that column
-- COUNT(DISTINCT x) — unique non-NULL values — expensive, sorts all valuesData Type Optimization for Smaller Indexes and Faster Queries
Choosing the right data type is a schema-level decision that affects index size, sort performance, comparison speed, and storage cost. These choices compound at scale: a column change on a 500 million row table is a multi-hour migration.
Choose the Right Data Types to Minimize Storage and Index Size ⚡ Medium Impact
Smaller data types produce smaller indexes. Smaller indexes fit more entries per disk page, require fewer I/O reads to traverse, and consume less memory in the buffer cache. Make type choices carefully at schema design time — changing them in production requires full table rewrites.
| Scenario | Wrong Type | Right Type | Savings |
|---|---|---|---|
| Small integer (0–65535) | INT (4 bytes) | SMALLINT (2 bytes) | 50% per value, significant in high-cardinality indexes |
| Row count fits in 2B | BIGINT (8 bytes) | INT (4 bytes) | 50% per value — use BIGINT only when you need >2B rows |
| Phone number | INT | VARCHAR(20) | Avoids silent truncation of leading zeros and + prefix |
| Price / monetary value | FLOAT/DOUBLE | DECIMAL(10,2) | Prevents floating point rounding errors ($9.99 → $9.989999...) |
| UUIDs as primary key | UUID (16 bytes) | BIGSERIAL (8 bytes) | UUIDs fragment B-tree indexes; sequential IDs fill pages cleanly |
| Status with few values | VARCHAR(20) | ENUM or SMALLINT | ENUM: constraints enforced at DB level; INT: smaller, faster compare |
Use Partial Indexes for Subset Queries ⚡ High Impact
A partial index includes only rows that match a condition. For a table of 10 million orders where 95% are "completed" and 5% are "pending," a partial index on pending orders is tiny (500K entries) compared to a full index (10M entries). Queries filtering on pending orders use the smaller, faster partial index.
-- Full index: 10 million entries, most of which are "completed" orders
CREATE INDEX idx_orders_status ON orders (status);
-- Partial index: only ~500K entries (5% of table)
-- Queries filtering on pending orders use this tiny, fast index
CREATE INDEX idx_orders_pending_created ON orders (created_at)
WHERE status = 'pending';
-- This query uses the partial index:
SELECT id, user_id, total
FROM orders
WHERE status = 'pending'
AND created_at >= NOW() - INTERVAL '7 days';
-- Soft deletes pattern: index only non-deleted rows
CREATE INDEX idx_users_active ON users (email)
WHERE deleted_at IS NULL;
-- Query: instant lookup on active users, ignores deleted rows' index overhead
SELECT id FROM users WHERE email = 'alice@example.com' AND deleted_at IS NULL;Scalable Database Architecture for High-Traffic Applications
Individual query optimization has limits. At scale, architectural patterns — connection pooling and read replicas — become the most impactful levers for database performance.
Database Connection Pooling Is Non-Negotiable at Scale ⚡ High Impact
Opening a database connection is expensive — it involves TCP handshake, authentication, and session initialization, taking 10–50ms per connection. A SaaS application that opens a new connection per API request at 1,000 req/s would attempt to open 1,000 connections per second to the database. Connection poolers maintain a pool of live connections and reuse them across application requests.
-- PostgreSQL max_connections is a hard limit (default: 100)
-- Each connection consumes ~5–10MB RAM in the database process
-- At 500 connections, PostgreSQL uses 2.5–5GB RAM just for connections
-- Solution: PgBouncer in transaction pooling mode
-- 10,000 app connections → 50 actual database connections
-- Connection overhead drops from 50ms to <1ms (pool reuse)
-- PgBouncer configuration (pgbouncer.ini):
[databases]
mydb = host=localhost dbname=mydb
[pgbouncer]
listen_port = 6432
pool_mode = transaction -- connection released after each transaction
max_client_conn = 10000 -- app connections (up to 10K)
default_pool_size = 50 -- actual DB connections (50 max)
min_pool_size = 10
server_idle_timeout = 600
-- Application connects to PgBouncer on port 6432, not PostgreSQL on 5432
-- AWS RDS Proxy provides a managed equivalent for RDS/Aurora
-- Optimal pool size formula (rule of thumb):
-- pool_size = (num_cores * 2) + effective_spindle_count
-- For a 4-core database server: pool_size ≈ 9–12Read Replicas — Scale Read Traffic Without Scaling the Primary ⚡ High Impact
A read replica is a continuously-synchronized copy of the primary database that handles SELECT queries. By routing reporting, analytics, search, and dashboard queries to replicas, you free the primary to handle writes and latency-sensitive reads. Most managed databases (AWS RDS, Google Cloud SQL, PlanetScale, Neon) support read replicas out of the box.
-- Connection pattern (Node.js example with two pools):
const primaryPool = new Pool({ host: 'primary.db.internal', /* writes */ });
const replicaPool = new Pool({ host: 'replica.db.internal', /* reads */ });
// Route reads to replica, writes to primary:
async function getOrders(userId) {
return replicaPool.query(
'SELECT * FROM orders WHERE user_id = $1', [userId]
);
}
async function createOrder(data) {
return primaryPool.query(
'INSERT INTO orders (user_id, total) VALUES ($1, $2)', [data.userId, data.total]
);
}
-- Route reporting queries explicitly to replica:
-- Slow aggregation report: runs on replica, doesn't impact primary OLTP latency
SELECT category, SUM(quantity * unit_price) AS revenue
FROM order_items oi
JOIN products p ON p.id = oi.product_id
JOIN orders o ON o.id = oi.order_id
WHERE o.status = 'completed'
GROUP BY category;PostgreSQL, MySQL, SQL Server, and SQLite Performance Tips
Core SQL optimization principles apply universally, but each database engine has specific behaviors, tools, and configuration options worth knowing.
PostgreSQL Performance Tuning
PostgreSQL is the most feature-rich open-source database and provides the best tooling for query analysis. Key PostgreSQL-specific optimizations:
-- 1. VACUUM ANALYZE after bulk operations (refreshes statistics, reclaims bloat)
VACUUM ANALYZE orders;
-- 2. Tune work_mem for complex sort and hash operations (per-query, per-operation)
SET work_mem = '64MB'; -- allows sorts/hashes to stay in memory
-- 3. pg_stat_statements: find the slowest queries by total execution time
SELECT query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC LIMIT 20;
-- 4. Parallel query execution (PostgreSQL 10+)
SET max_parallel_workers_per_gather = 4; -- use 4 workers for large seq scans
-- 5. BRIN indexes for append-only large tables (log tables, time-series)
-- Much smaller than B-tree, works well when data is physically ordered by insert time
CREATE INDEX idx_logs_created_brin ON logs USING BRIN (created_at);
-- 6. GIN indexes for JSONB, full-text search, and array containment
CREATE INDEX idx_events_payload ON events USING GIN (payload);
-- Enables fast: WHERE payload @> '{"type": "click"}'MySQL / MariaDB Performance Tuning
MySQL uses the InnoDB storage engine with clustered primary key indexes — a key architectural difference from PostgreSQL that affects optimization strategies:
-- 1. InnoDB clusters data by primary key — sequential inserts are critical
-- UUID primary keys cause random index insertion and severe fragmentation
-- Use BIGINT AUTO_INCREMENT for high-insert tables:
ALTER TABLE events MODIFY id BIGINT UNSIGNED AUTO_INCREMENT;
-- 2. Check query execution type with EXPLAIN:
EXPLAIN SELECT * FROM orders WHERE user_id = 42;
-- type: "ALL" = full table scan
-- type: "ref" = index lookup (good)
-- type: "eq_ref" = unique index (best)
-- 3. Enable the performance_schema for query analysis:
SELECT digest_text, count_star, avg_timer_wait/1e12 AS avg_sec
FROM performance_schema.events_statements_summary_by_digest
ORDER BY avg_timer_wait DESC LIMIT 10;
-- 4. innodb_buffer_pool_size: most important MySQL config (set to 70-80% of RAM)
-- In my.cnf:
-- innodb_buffer_pool_size = 12G (for 16GB RAM server)
-- 5. MySQL covering index: all selected columns must be in the index key
-- (MySQL doesn't support INCLUDE columns like PostgreSQL/SQL Server)
CREATE INDEX idx_covering ON orders (user_id, status, total, created_at);SQL Server Performance Tuning
SQL Server has rich built-in tooling for performance analysis, including the Query Store introduced in SQL Server 2016:
-- 1. Query Store: tracks query performance over time, catches plan regressions
ALTER DATABASE mydb SET QUERY_STORE = ON;
-- View top resource consumers:
SELECT TOP 20
qt.query_sql_text,
rs.avg_duration,
rs.avg_logical_io_reads
FROM sys.query_store_query_text qt
JOIN sys.query_store_query q ON q.query_text_id = qt.query_text_id
JOIN sys.query_store_plan p ON p.query_id = q.query_id
JOIN sys.query_store_runtime_stats rs ON rs.plan_id = p.plan_id
ORDER BY rs.avg_duration DESC;
-- 2. Covering index with INCLUDE (SQL Server syntax):
CREATE INDEX idx_orders_covering
ON orders (user_id, status)
INCLUDE (total, created_at, order_date); -- included columns in leaf pages
-- 3. Columnstore indexes for analytical workloads (OLAP)
-- Compresses data column-by-column; aggregate queries 10-100x faster
CREATE COLUMNSTORE INDEX idx_orders_cs ON orders (status, total, created_at);
-- 4. Read Committed Snapshot Isolation (RCSI): prevents reader-writer blocking
ALTER DATABASE mydb SET READ_COMMITTED_SNAPSHOT ON;SQLite Performance Tuning
SQLite is the most widely deployed database engine in the world (every Android and iOS device). It excels at embedded use cases but needs proper configuration for performance:
-- 1. Enable WAL mode: allows concurrent reads during writes
PRAGMA journal_mode = WAL;
-- 2. Set appropriate cache size (default is very small)
PRAGMA cache_size = -65536; -- 64MB cache (negative = kilobytes)
-- 3. Synchronous setting: OFF for max speed (data loss risk), NORMAL for balance
PRAGMA synchronous = NORMAL;
-- 4. Wrap bulk inserts in a transaction (100x faster than auto-commit inserts)
BEGIN TRANSACTION;
INSERT INTO logs VALUES (...);
INSERT INTO logs VALUES (...);
-- ... thousands of inserts
COMMIT;
-- 5. ANALYZE to update statistics (important after bulk data loads)
ANALYZE;
-- 6. SQLite doesn't support EXPLAIN ANALYZE, but EXPLAIN QUERY PLAN is available:
EXPLAIN QUERY PLAN
SELECT * FROM users WHERE email = 'alice@example.com';
-- Shows: SEARCH users USING INDEX idx_users_email (email=?) ← index used
-- Or: SCAN users ← full table scanTest Your SQL Performance Knowledge
6 multiple-choice questions. Click an option to reveal the answer and explanation.
What is the primary performance benefit of avoiding SELECT *?
For composite index (user_id, created_at, status), which WHERE clause can efficiently use it?
What does 'Seq Scan' in a PostgreSQL EXPLAIN output indicate?
Your ORM fetches 100 blog posts, then runs a separate query per post to get its comments. This is called:
Why does OFFSET 100000 LIMIT 20 perform poorly on large tables?
When is EXISTS significantly faster than IN with a correlated subquery?
Frequently Asked Questions
Ready to Apply These Optimizations?
Practice EXPLAIN, write optimized JOINs, and test your indexes in the SQLab Hub free SQL playground — real queries, real data, no signup needed.