SQL window functions are the most powerful analytical tool in the SQL language — and the feature that separates junior analysts from senior data engineers. They let you compute rankings, running totals, period-over-period comparisons, and moving averages without losing a single row from your result set. No complex self-joins. No correlated subqueries running once per row. No temporary tables. This guide covers every SQL window function from first principles to advanced technique. You will find the complete OVER() and PARTITION BY syntax explained from scratch, detailed examples of ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, FIRST_VALUE, and LAST_VALUE, real-world analytics patterns for e-commerce and HR, performance optimization advice with indexing strategy, database-specific notes for MySQL 8, PostgreSQL, and SQL Server, 6 SQL window function interview questions with full answers, a common-errors reference, and a complete FAQ section with schema markup. Every example uses the SQLab Hub schema — users, orders, products, employees — so you can run every query immediately in the free SQL playground with zero setup.
SQL window functions perform calculations across a set of rows related to the current row — called a "window" — without collapsing those rows into a single output row. This is the defining difference from GROUP BY: window functions return one row for every input row, with the computed value attached as an additional column.
Think of it this way: GROUP BY asks "give me one summary row per group." A window function asks "for each row, also tell me something about its group." The result set stays intact; the computed value rides alongside the original data.
Every window function uses the same syntax pattern:
function_name() OVER (
[PARTITION BY column(s)]
[ORDER BY column(s)]
[ROWS|RANGE frame_specification]
)
The OVER() clause is what makes a function a window function. Without OVER(), SUM() is a regular aggregate that collapses rows. With OVER(), SUM() becomes a running total or group sum that appears on every row.Before window functions, computing "each employee's salary compared to their department average" required a correlated subquery — a subquery that re-executes once per outer row:
-- Old approach: correlated subquery (runs once per employee row)
SELECT e.first_name, e.salary,
(SELECT AVG(salary) FROM employees e2
WHERE e2.department = e.department) AS dept_avg
FROM employees e;
-- Modern approach: window function (single pass)
SELECT first_name, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
-- On a 50,000-row employees table:
-- Correlated subquery: 50,000 inner query executions
-- Window function: one scan + one sort
-- Window function is typically 10-100x faster at scale
-- The window version also composes — add more window functions
-- without additional scans:
SELECT first_name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg,
MAX(salary) OVER (PARTITION BY department) AS dept_max,
salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;-- GROUP BY: 4 departments → 4 rows in the result
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
-- dept avg_salary
-- Engineering 87500
-- Marketing 68000
-- Sales 61000
-- Finance 79000
-- Window function: 20 employees → 20 rows in the result
-- Each employee row includes the department average alongside their own data
SELECT first_name, department, salary,
ROUND(AVG(salary) OVER (PARTITION BY department), 0) AS dept_avg,
salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM employees
ORDER BY department, salary DESC;
-- first_name department salary dept_avg diff_from_avg
-- Alice Engineering 95000 87500 +7500
-- Frank Engineering 88000 87500 +500
-- Bob Engineering 80000 87500 -7500
-- Carol Marketing 72000 68000 +4000
-- Dave Marketing 64000 68000 -4000
-- You cannot produce this output with GROUP BY alone.
-- Individual salary disappears the moment you aggregate.
-- Window functions preserve all rows while adding group context.| Use Case | Function(s) | Industry |
|---|---|---|
| Rank employees by salary within department | RANK(), DENSE_RANK() | HR / Payroll |
| Deduplicate — keep the latest record per user | ROW_NUMBER() | Data Engineering |
| Month-over-month revenue change | LAG() | Finance / BI |
| 7-day rolling average of daily signups | AVG() OVER (ROWS 6 PRECEDING) | Product Analytics |
| Cumulative revenue running total | SUM() OVER (ORDER BY date) | E-commerce |
| Each product's revenue as % of category total | SUM() OVER (PARTITION BY category) | Retail |
| Top 3 customers per region | ROW_NUMBER() + CTE filter | Sales |
| First and last purchase date per customer | FIRST_VALUE(), LAST_VALUE() | CRM |
| Divide users into spend quartiles | NTILE(4) | Marketing |
| Days between consecutive orders per customer | LAG() on order_date | Logistics |
An empty OVER() means the window is the entire result set. Every row sees the same aggregate value — useful for computing percentages of a grand total:
-- What percentage of total payroll is each employee's salary?
SELECT first_name, salary,
SUM(salary) OVER () AS total_payroll,
ROUND(salary * 100.0 / SUM(salary) OVER (), 1) AS pct_of_total
FROM employees;
-- first_name salary total_payroll pct_of_total
-- Alice 95000 435000 21.8
-- Frank 88000 435000 20.2
-- Bob 80000 435000 18.4
-- Carol 72000 435000 16.6
-- Dave 64000 435000 14.7
-- Eve 36000 435000 8.3
-- SUM(salary) OVER () computes once and attaches the same value to every row.
-- No GROUP BY needed. Individual rows are preserved.PARTITION BY divides the result set into independent groups. The window function computes separately within each partition, as if each group were a separate table:
-- Department-level statistics alongside each employee row
SELECT first_name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg,
COUNT(*) OVER (PARTITION BY department) AS dept_headcount,
MAX(salary) OVER (PARTITION BY department) AS dept_max
FROM employees
ORDER BY department, salary DESC;
-- The window resets completely for each department.
-- Partitioning by multiple columns is supported:
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY department, job_level
ORDER BY salary DESC
) AS rank_in_dept_level
FROM employees;
-- This creates separate windows per (department, job_level) combination.
-- A row in Engineering/Senior is ranked independently from Engineering/Junior.ORDER BY inside OVER() controls the order rows are processed within the window — it is completely independent of the query-level ORDER BY:
-- hire_sequence: 1 for the earliest hire, 2 for the second, etc.
-- cumulative_payroll: grows as each employee is added chronologically
SELECT first_name, hire_date, salary,
ROW_NUMBER() OVER (ORDER BY hire_date) AS hire_sequence,
SUM(salary) OVER (ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_payroll
FROM employees
ORDER BY hire_date; -- this query-level ORDER BY controls display order only
-- KEY BEHAVIOR: when ORDER BY is present inside OVER(),
-- the default frame changes from "entire partition" to
-- "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW"
-- This is what makes SUM() a running total instead of a grand total.
-- To get a grand total on every row when ORDER BY is present,
-- you must explicitly override the frame:
SELECT salary,
SUM(salary) OVER (ORDER BY salary
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS always_grand_total
FROM employees;The frame clause defines which rows within the partition are included relative to the current row. Mastering this is what separates intermediate from advanced SQL:
-- Frame clause syntax:
-- ROWS|RANGE BETWEEN start_boundary AND end_boundary
-- Boundary options:
-- UNBOUNDED PRECEDING — from the first row of the partition
-- N PRECEDING — N rows (ROWS) or N value-units (RANGE) before current
-- CURRENT ROW — the current row itself
-- N FOLLOWING — N rows/units after current
-- UNBOUNDED FOLLOWING — to the last row of the partition
-- 3-row moving average (current row + 2 rows before it)
SELECT order_date, daily_revenue,
ROUND(AVG(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2) AS three_day_avg
FROM daily_sales;
-- ROWS vs RANGE distinction:
-- ROWS: counts physical rows — predictable, explicit
-- RANGE: includes all rows with the same ORDER BY value as ties
-- If three rows share the same date, RANGE treats them as "current row"
-- and includes all three in calculations — often unexpected behavior
-- For time series work, ROWS BETWEEN N PRECEDING AND CURRENT ROW
-- is almost always the correct choice.ROW_NUMBER() assigns a unique sequential integer to each row within its partition, starting at 1. Even if two rows are completely identical across all columns, they receive different row numbers. This guaranteed uniqueness distinguishes ROW_NUMBER() from RANK() and DENSE_RANK(), which assign the same number to tied rows. Use ROW_NUMBER() when you need exactly one row per group — for deduplication, pagination, or selecting the single top record. Use RANK() or DENSE_RANK() when tied rows should receive equal standing.
-- Syntax
ROW_NUMBER() OVER ([PARTITION BY col] [ORDER BY col])
-- Number every employee company-wide by salary (highest earns #1)
SELECT first_name, department, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS company_rank
FROM employees;
-- first_name department salary company_rank
-- Alice Engineering 95000 1
-- Frank Engineering 88000 2
-- Bob Engineering 80000 3
-- Carol Marketing 72000 4
-- Dave Marketing 64000 5
-- Eve Sales 61000 6
-- Even if Frank and Bob had the same salary they still receive
-- different row numbers. ROW_NUMBER() never produces ties.-- Rank within each department independently
SELECT first_name, department, salary,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS rank_in_dept
FROM employees
ORDER BY department, rank_in_dept;
-- first_name department salary rank_in_dept
-- Alice Engineering 95000 1
-- Frank Engineering 88000 2
-- Bob Engineering 80000 3
-- Carol Marketing 72000 1 ← resets to 1 for Marketing
-- Dave Marketing 64000 2
-- Eve Sales 61000 1 ← resets to 1 for Sales
-- PARTITION BY department causes the row number sequence to restart
-- independently for each department.-- A real ETL problem: users table has duplicate emails from a bad import
-- Keep only the most recently created record per email address
WITH deduped AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_at DESC, id DESC -- tie-break on id
) AS rn
FROM users
)
SELECT id, first_name, email, created_at
FROM deduped
WHERE rn = 1;
-- How it works:
-- 1. PARTITION BY email groups all rows with the same email
-- 2. ORDER BY created_at DESC puts the newest record first (rn = 1)
-- 3. WHERE rn = 1 keeps only that newest record
-- 4. All duplicates (rn > 1) are filtered out
-- This is one of the most common production SQL patterns.
-- You will see it in ETL pipelines, data quality scripts,
-- and warehouse load jobs constantly.-- Find the top 2 highest-paid employees in each department
WITH ranked AS (
SELECT first_name, department, salary,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS rn
FROM employees
)
SELECT first_name, department, salary
FROM ranked
WHERE rn <= 2
ORDER BY department, salary DESC;
-- This is the "top N per group" pattern.
-- A simple ORDER BY + LIMIT cannot do this — LIMIT applies to the entire
-- query result, not per department.
-- ROW_NUMBER() inside a CTE + WHERE on the outer query is the solution.
-- For pagination (page 2 of 10 results per page):
WHERE rn BETWEEN 11 AND 20
-- If ties should ALL be included at the boundary, use DENSE_RANK instead:
-- WHERE dr <= 2 includes all employees tied for 2nd place-- WRONG: window functions are evaluated AFTER WHERE, so this fails SELECT * FROM employees WHERE ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) = 1; -- SQL Error: window functions are not allowed in WHERE clause -- CORRECT pattern 1: CTE (recommended for readability) WITH r AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn FROM employees ) SELECT * FROM r WHERE rn = 1; -- CORRECT pattern 2: subquery (equivalent, less readable) SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn FROM employees ) ranked WHERE rn = 1; -- The same rule applies to HAVING — window functions cannot appear there either.
Given employees with salaries 95000, 88000, 88000, 72000 — the same salary values produce different outputs depending on which function you choose:
| Function | Output for 95k, 88k, 88k, 72k | Ties | Gaps after ties |
|---|---|---|---|
| ROW_NUMBER() | 1, 2, 3, 4 | Never — always unique | No |
| RANK() | 1, 2, 2, 4 | Same rank for ties | Yes — skips 3 |
| DENSE_RANK() | 1, 2, 2, 3 | Same rank for ties | No — always consecutive |
Decision guide: Use ROW_NUMBER() for deduplication and top-N when exactly one row per group is needed. Use RANK() for leaderboards where a rank of 4 means "three people beat you." Use DENSE_RANK() for prize tiers, salary bands, and the classic "find the Nth highest salary" interview question.
SELECT name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
RANK() OVER (ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rnk
FROM employees
ORDER BY salary DESC;
-- name salary row_num rnk dense_rnk
-- Alice 95000 1 1 1
-- Frank 88000 2 2 2
-- Bob 88000 3 2 2 ← same salary, same rank
-- Carol 72000 4 4 3 ← RANK gaps to 4; DENSE stays at 3
-- Dave 64000 5 5 4
-- RANK() rationale: Frank and Bob occupy positions 2 AND 3.
-- The 4th person (Carol) is truly 4th — two people beat her.
-- DENSE_RANK() rationale: there is no "3rd salary tier" — just 1st, 2nd, 3rd.-- Interview question: "Return all employees with a top-3 salary. -- Include everyone tied for 3rd place." -- Wrong approach: ROW_NUMBER arbitrarily cuts off one tied row WITH r AS ( SELECT *, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn FROM employees ) SELECT * FROM r WHERE rn <= 3; -- If two employees tie for 2nd at 88000, one is rn=2, one is rn=3. -- The employee at rn=4 (also 88000) is excluded — wrong. -- Correct approach: DENSE_RANK includes all tied rows WITH r AS ( SELECT *, DENSE_RANK() OVER (ORDER BY salary DESC) AS dr FROM employees ) SELECT * FROM r WHERE dr <= 3; -- All employees at the 3rd salary tier are included. -- The choice between ROW_NUMBER and DENSE_RANK is a classic -- interview follow-up question. Know when each is appropriate.
-- NTILE(n) assigns each row to one of n roughly equal buckets.
-- Bucket 1 contains the "top" rows by the ORDER BY column.
-- Segment customers into 4 spend quartiles
WITH customer_spend AS (
SELECT user_id, ROUND(SUM(total), 2) AS total_spend
FROM orders GROUP BY user_id
)
SELECT user_id, total_spend,
NTILE(4) OVER (ORDER BY total_spend DESC) AS spend_quartile
FROM customer_spend
ORDER BY total_spend DESC;
-- quartile 1 = top 25% spenders (highest value customers)
-- quartile 4 = bottom 25% spenders
-- Business use cases:
-- Email campaigns: different messaging per quartile
-- Loyalty programs: tier assignment (Gold/Silver/Bronze/Standard)
-- A/B test holdout groups: NTILE(10) to create 10% control group
-- Note: NTILE(100) creates percentiles; NTILE(10) creates deciles.
-- If rows don't divide evenly, earlier buckets receive one extra row.-- PERCENT_RANK(): relative rank as a fraction from 0.0 to 1.0
-- Formula: (rank - 1) / (total_rows - 1)
-- First row is always 0.0, last row is always 1.0
SELECT name, salary,
ROUND(PERCENT_RANK() OVER (ORDER BY salary), 3) AS pct_rank
FROM employees;
-- A pct_rank of 0.75 means this employee earns more than 75% of colleagues
-- CUME_DIST(): cumulative distribution — fraction of rows <= current
-- Formula: rank / total_rows
-- Always > 0; always reaches 1.0 for the highest value
SELECT name, salary,
ROUND(CUME_DIST() OVER (ORDER BY salary), 3) AS cume_dist
FROM employees;
-- A cume_dist of 0.8 means 80% of employees earn at or below this salary
-- Use PERCENT_RANK for relative positioning; use CUME_DIST for
-- "what fraction of the data falls at or below this value"LAG() and LEAD() are offset functions that access column values from other rows in the window without a self-join. LAG() looks backward (previous rows); LEAD() looks forward (following rows).
Full syntax:
LAG(column, offset, default_value) OVER (PARTITION BY ... ORDER BY ...)
LEAD(column, offset, default_value) OVER (PARTITION BY ... ORDER BY ...)
column the column or expression to read from the offset row
offset how many rows back (LAG) or forward (LEAD) to look (default: 1)
default_value what to return when no row exists at the offset
(the first row has no previous row; without a default, NULL is returned)
Both functions always require ORDER BY inside OVER() — "previous" and "next" are meaningless without defined ordering.WITH monthly AS (
SELECT
STRFTIME('%Y-%m', order_date) AS month,
ROUND(SUM(total), 2) AS revenue
FROM orders
GROUP BY month
)
SELECT
month,
revenue,
LAG(revenue, 1, 0) OVER (ORDER BY month) AS prev_month_revenue,
revenue - LAG(revenue, 1, 0) OVER (ORDER BY month) AS abs_change,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY month))
* 100.0 / LAG(revenue) OVER (ORDER BY month),
1) AS pct_change
FROM monthly
ORDER BY month;
-- month revenue prev_revenue abs_change pct_change
-- 2025-01 42000 0 42000 NULL
-- 2025-02 51000 42000 +9000 +21.4%
-- 2025-03 48000 51000 -3000 -5.9%
-- 2025-04 63000 48000 +15000 +31.3%
-- LAG(revenue, 1, 0) returns 0 for the first month (no prior month).
-- LAG(revenue) alone returns NULL for the first month.
-- pct_change is NULL for January because there is nothing to compare against.-- For each order, show the customer's next order date and gap in days
SELECT
user_id,
order_date,
total,
LEAD(order_date) OVER (
PARTITION BY user_id
ORDER BY order_date
) AS next_order_date,
CAST(
JULIANDAY(LEAD(order_date) OVER (PARTITION BY user_id ORDER BY order_date))
- JULIANDAY(order_date)
AS INTEGER) AS days_to_next_order
FROM orders
ORDER BY user_id, order_date;
-- user_id order_date total next_order_date days_to_next_order
-- 1 2025-01-05 120 2025-01-28 23
-- 1 2025-01-28 85 2025-03-10 41
-- 1 2025-03-10 200 NULL NULL ← last order
-- 2 2025-02-14 55 2025-04-01 46
-- PARTITION BY user_id ensures LEAD only looks at the next order
-- from the SAME customer, not the next order in the entire table.
-- Without PARTITION BY, customer 1's last order would incorrectly
-- look at customer 2's first order as "next."-- Compare each month's revenue to the same month one year prior
-- LAG(revenue, 12) looks back 12 months in the ordered month list
WITH monthly AS (
SELECT STRFTIME('%Y-%m', order_date) AS month, SUM(total) AS revenue
FROM orders GROUP BY month
)
SELECT
month,
ROUND(revenue, 2) AS revenue,
ROUND(LAG(revenue, 12) OVER (ORDER BY month), 2) AS same_month_last_year,
ROUND(
(revenue - LAG(revenue, 12) OVER (ORDER BY month))
* 100.0 / LAG(revenue, 12) OVER (ORDER BY month),
1) AS yoy_pct
FROM monthly
ORDER BY month;
-- The second argument to LAG/LEAD is the offset.
-- LAG(col, 1) = previous row, LAG(col, 12) = 12 rows back.
-- On monthly data ordered by month, 12 rows back = same calendar month, prior year.
-- This is the standard pattern for seasonality-adjusted YoY reporting.-- TRAP: forgetting PARTITION BY on per-customer data
-- Without PARTITION BY, LAG crosses customer boundaries:
-- WRONG: no PARTITION BY
SELECT user_id, order_date, total,
LAG(total) OVER (ORDER BY order_date) AS prev_total
FROM orders;
-- The "previous" order for customer 2's first order is
-- customer 1's last order — completely wrong for customer-level analysis
-- CORRECT: partition by customer
SELECT user_id, order_date, total,
LAG(total) OVER (
PARTITION BY user_id
ORDER BY order_date
) AS prev_order_total
FROM orders;
-- Now "previous" always means previous order FOR THAT CUSTOMER
-- Second trap: LAG/LEAD without ORDER BY inside OVER()
-- Result is non-deterministic — always specify ORDER BY for these functions-- Compare each product's price to the cheapest in its category SELECT name, category, price, FIRST_VALUE(name) OVER (PARTITION BY category ORDER BY price ASC) AS cheapest_product, FIRST_VALUE(price) OVER (PARTITION BY category ORDER BY price ASC) AS min_price, price - FIRST_VALUE(price) OVER (PARTITION BY category ORDER BY price ASC) AS premium FROM products ORDER BY category, price; -- category name price cheapest_product min_price premium -- Laptops Entry Model 499 Entry Model 499 0 -- Laptops Mid Range 899 Entry Model 499 400 -- Laptops Pro Model 1499 Entry Model 499 1000 -- Phones Budget 299 Budget 299 0 -- Phones Flagship 999 Budget 299 700 -- FIRST_VALUE always works correctly with the default frame because -- UNBOUNDED PRECEDING always reaches back to the partition's first row.
-- LAST_VALUE() has a notorious gotcha.
-- Default frame: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- "Last row in frame" = CURRENT ROW — not the partition's last row!
-- WRONG: returns each row's OWN value, not the partition's last value
SELECT name, category, price,
LAST_VALUE(name) OVER (PARTITION BY category ORDER BY price) AS most_expensive
FROM products;
-- Every row returns its own name — LAST_VALUE is broken without frame fix
-- CORRECT: extend the frame to cover the entire partition
SELECT name, category, price,
LAST_VALUE(name) OVER (
PARTITION BY category
ORDER BY price
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS most_expensive_in_category
FROM products;
-- Now "last row" = actual last row of the partition = highest price product
-- SIMPLEST ALTERNATIVE: avoid LAST_VALUE entirely.
-- Use FIRST_VALUE with DESC ordering — same result, no frame complexity:
SELECT name, category, price,
FIRST_VALUE(name) OVER (PARTITION BY category ORDER BY price DESC) AS most_expensive
FROM products;-- NTH_VALUE(column, n) returns the value from the nth row of the window
-- Useful for accessing 2nd place, 3rd place, etc.
-- Show each employee alongside the 2nd highest salary in their department
SELECT
first_name, department, salary,
NTH_VALUE(salary, 1) OVER (
PARTITION BY department ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS dept_1st_salary,
NTH_VALUE(salary, 2) OVER (
PARTITION BY department ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS dept_2nd_salary
FROM employees;
-- Like LAST_VALUE, NTH_VALUE requires UNBOUNDED FOLLOWING in the frame
-- to correctly access rows that come after the current row.
-- Supported in PostgreSQL, MySQL 8.0.2+, SQL Server, SQLite.-- Cumulative revenue over time
SELECT
order_date,
ROUND(daily_revenue, 2) AS daily_revenue,
ROUND(SUM(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
), 2) AS cumulative_revenue
FROM (
SELECT order_date, SUM(total) AS daily_revenue
FROM orders GROUP BY order_date
) daily
ORDER BY order_date;
-- order_date daily_revenue cumulative_revenue
-- 2025-01-01 4200 4200
-- 2025-01-02 6800 11000
-- 2025-01-03 3100 14100
-- 2025-01-04 8900 23000
-- The frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- is the default when ORDER BY is present, so this shorthand also works:
SUM(daily_revenue) OVER (ORDER BY order_date)
-- Both produce identical running totals.-- 7-day rolling average of daily revenue (smooths daily noise)
SELECT
order_date,
ROUND(daily_revenue, 2) AS daily_revenue,
ROUND(AVG(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS seven_day_avg
FROM (
SELECT order_date, SUM(total) AS daily_revenue
FROM orders GROUP BY order_date
) daily
ORDER BY order_date;
-- For the first 6 days, fewer than 7 rows are available;
-- AVG uses only the existing rows (not padded with zeros).
-- This is correct behavior — a "partial window" at the start.
-- Real-world applications:
-- Finance: 50-day and 200-day moving average of stock prices
-- Product: 7-day rolling active users (WAU)
-- E-commerce: 30-day rolling revenue for trend lines
-- Operations: hourly rolling error rate monitoring-- Each month's revenue as % of annual total, plus cumulative %
WITH monthly AS (
SELECT STRFTIME('%Y-%m', order_date) AS month,
ROUND(SUM(total), 2) AS revenue
FROM orders
WHERE STRFTIME('%Y', order_date) = '2025'
GROUP BY month
)
SELECT
month,
revenue,
SUM(revenue) OVER () AS annual_total,
ROUND(revenue * 100.0 / SUM(revenue) OVER (), 1) AS pct_of_year,
ROUND(SUM(revenue) OVER (ORDER BY month)
* 100.0 / SUM(revenue) OVER (), 1) AS cumulative_pct
FROM monthly
ORDER BY month;
-- month revenue annual_total pct_of_year cumulative_pct
-- 2025-01 42000 525000 8.0% 8.0%
-- 2025-02 51000 525000 9.7% 17.7%
-- 2025-03 48000 525000 9.1% 26.9%
-- 2025-12 68000 525000 13.0% 100.0%
-- SUM() OVER () = grand total on every row
-- SUM() OVER (ORDER BY month) = running total that grows each month-- Find the Nth highest distinct salary (N = 3 in this example)
WITH ranked AS (
SELECT DISTINCT salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dr
FROM employees
)
SELECT salary AS third_highest_salary
FROM ranked
WHERE dr = 3;
-- Why DENSE_RANK and not LIMIT/OFFSET?
-- DENSE_RANK handles ties correctly:
-- If two employees share the top salary, they're both rank 1.
-- The 3rd highest is then the row with dr = 3, not necessarily the 3rd row.
-- Full department breakdown with rankings:
SELECT
first_name, department, salary,
ROUND(AVG(salary) OVER (PARTITION BY department), 0) AS dept_avg,
MAX(salary) OVER (PARTITION BY department) AS dept_max,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees
ORDER BY department, dept_rank;SQL processes clauses in this logical order: 1. FROM + JOIN → build the working dataset 2. WHERE → filter rows 3. GROUP BY → group rows 4. HAVING → filter groups 5. SELECT → compute expressions (window functions run HERE) 6. ORDER BY → sort the result 7. LIMIT → cut to N rows Window functions execute at step 5 — after WHERE and HAVING. This means: • You cannot filter on window function results in WHERE or HAVING (the window function has not been computed yet at those stages) • Wrap in a CTE to filter on window results -- ALWAYS filter early to reduce the rows window functions must process: WITH recent AS ( SELECT * FROM orders WHERE order_date >= '2025-01-01' -- filter BEFORE window ) SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) AS rn FROM recent;
Each distinct OVER() specification requires a sort of the data. Without an index, this is a full table sort — O(n log n). With a matching index, the sort is eliminated entirely. -- General rule: create an index on (PARTITION BY cols, ORDER BY cols) -- Frequent query: SELECT *, RANK() OVER (PARTITION BY department ORDER BY salary DESC) FROM employees; -- Optimal index: CREATE INDEX idx_emp_dept_salary ON employees (department, salary DESC); -- Per-user running totals ordered by date: SELECT *, SUM(total) OVER (PARTITION BY user_id ORDER BY order_date) FROM orders; -- Optimal index: CREATE INDEX idx_orders_user_date ON orders (user_id, order_date); -- Verify with EXPLAIN: look for "Index Scan" instead of "Sort" -- PostgreSQL: EXPLAIN (ANALYZE, BUFFERS) SELECT ... -- MySQL: EXPLAIN SELECT ... -- SQL Server: include actual execution plan
-- Correlated subquery: executes once PER OUTER ROW — O(n²)
SELECT e.first_name, e.salary,
(SELECT AVG(salary) FROM employees e2
WHERE e2.department = e.department) AS dept_avg
FROM employees e;
-- 1,000 employees → 1,000 inner subquery executions
-- 100,000 employees → 100,000 inner executions (very slow)
-- Window function: single pass — O(n log n) at worst
SELECT first_name, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
-- 1,000 employees → one scan + one sort
-- 100,000 employees → one scan + one sort (still fast)
-- Benchmark guideline:
-- < 1,000 rows: negligible difference
-- 10k - 100k rows: window function typically 5-50x faster
-- 1M+ rows: correlated subquery can take minutes;
-- window function completes in seconds
-- Window functions vs CTEs:
-- CTEs improve readability, not raw performance.
-- A window function inside a CTE runs the same as in a main query.
-- The optimizer materializes or inlines CTEs depending on the database.-- Each distinct OVER() spec triggers a separate sort.
-- When multiple functions share the same window, use a named WINDOW:
-- Without WINDOW clause: may trigger multiple sorts
SELECT first_name, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dr,
LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS prev_sal
FROM employees;
-- With WINDOW clause: one sort reused by all three functions
SELECT first_name, salary,
RANK() OVER w AS rnk,
DENSE_RANK() OVER w AS dr,
LAG(salary) OVER w AS prev_sal
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY salary DESC);
-- Supported in: PostgreSQL, MySQL 8.0, SQL Server 2022, SQLite
-- Not supported in: SQL Server 2019 and earlierMySQL added full window function support in MySQL 8.0 (released April 2018). MySQL 5.7 and earlier do NOT support window functions — any OVER() clause on MySQL 5.x throws a syntax error. MySQL 8.0 supports all standard window functions: • Ranking: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE() • Offset: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE(), NTH_VALUE() • Aggregate: SUM(), AVG(), MIN(), MAX(), COUNT() with OVER() • Distribution: PERCENT_RANK(), CUME_DIST() • Named WINDOW clause for reusable window definitions To check your version: SELECT VERSION(); If the result is 8.0.x or higher, all window functions are available. MySQL 8.0 also added CTEs (WITH clause) and JSON_TABLE() in the same release — a significant upgrade from 5.7 overall.
-- MySQL 8.0: rank + dense rank + row number in one query
SELECT
first_name, department, salary,
ROW_NUMBER() OVER dept_window AS row_num,
RANK() OVER dept_window AS rnk,
DENSE_RANK() OVER dept_window AS dense_rnk
FROM employees
WINDOW dept_window AS (PARTITION BY department ORDER BY salary DESC);
-- Month-over-month in MySQL 8.0 (uses DATE_FORMAT instead of STRFTIME)
WITH monthly AS (
SELECT DATE_FORMAT(order_date, '%Y-%m') AS month,
SUM(total) AS revenue
FROM orders GROUP BY month
)
SELECT month, revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
revenue - LAG(revenue) OVER (ORDER BY month) AS change
FROM monthly ORDER BY month;1. RANGE with numeric/interval offsets: MySQL 8.0 does NOT support RANGE BETWEEN N PRECEDING AND CURRENT ROW with numeric offsets. Use ROWS BETWEEN instead. ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ← works in MySQL 8 RANGE BETWEEN INTERVAL 7 DAY PRECEDING AND CURRENT ROW ← NOT MySQL 2. IGNORE NULLS in LAG/LEAD: LAG(col IGNORE NULLS) is not supported in MySQL. Workarounds require CTEs or subqueries. 3. NTH_VALUE() offset: Available from MySQL 8.0.2+. Ensure you are on 8.0.2 or later. 4. Window functions in UPDATE/DELETE: Not supported. Use a CTE or subquery to pre-compute, then JOIN. 5. QUALIFY clause: MySQL does not support QUALIFY (a Snowflake/DuckDB extension). Always use CTE + WHERE: WITH r AS (SELECT *, ROW_NUMBER() OVER ... AS rn FROM t) SELECT * FROM r WHERE rn = 1;
PostgreSQL has supported window functions since version 8.4 (2009) and offers the most complete implementation of the SQL standard among open-source databases. Key advantages: • FILTER clause: apply a condition within a window aggregate SUM(total) FILTER (WHERE status = 'completed') OVER (ORDER BY order_date) • GROUPS frame mode (PostgreSQL 11+): a third frame type alongside ROWS and RANGE • RANGE with interval offsets: AVG(total) OVER (ORDER BY order_date RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW) This includes all rows within the last 6 calendar days — handles date gaps correctly (ROWS BETWEEN 6 PRECEDING ignores gaps; RANGE with interval does not) • IGNORE NULLS in LAG/LEAD (PostgreSQL 14+): LAG(salary IGNORE NULLS) OVER (ORDER BY hire_date) Returns the most recent non-NULL value, skipping NULL rows • Named WINDOW clause with inheritance (extend a window definition)
-- FILTER clause: running total of completed orders only
SELECT order_date, status, total,
SUM(total) FILTER (WHERE status = 'completed')
OVER (ORDER BY order_date) AS running_completed_revenue
FROM orders;
-- RANGE with interval: correct 7-day rolling average (handles date gaps)
SELECT order_date, total,
ROUND(AVG(total) OVER (
ORDER BY order_date::date
RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW
), 2) AS seven_day_avg
FROM orders;
-- If January 3rd and January 10th exist but Jan 4-9 do not,
-- RANGE INTERVAL correctly includes only Jan 3 in Jan 10's window.
-- ROWS BETWEEN 1 PRECEDING would include Jan 3 as the "1 row before" Jan 10.
-- Named WINDOW with multiple functions
SELECT first_name, salary,
RANK() OVER w,
LAG(salary) OVER w,
LEAD(salary) OVER w
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY salary DESC);SQL Server introduced ROW_NUMBER() in SQL Server 2005 and added the full set (RANK, DENSE_RANK, LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTILE, CUME_DIST, PERCENT_RANK) in SQL Server 2012. All modern versions (2016, 2017, 2019, 2022) have complete window function support.
The named WINDOW clause was added in SQL Server 2022.
-- T-SQL equivalent of the month-over-month example
WITH monthly AS (
SELECT FORMAT(order_date, 'yyyy-MM') AS month, SUM(total) AS revenue
FROM orders GROUP BY FORMAT(order_date, 'yyyy-MM')
)
SELECT
month, revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
revenue - LAG(revenue) OVER (ORDER BY month) AS change,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY month))
* 100.0 / LAG(revenue) OVER (ORDER BY month),
1) AS pct_change
FROM monthly
ORDER BY month;
-- SQL Server uses FORMAT() for date strings instead of STRFTIME().
-- Otherwise window function syntax is identical to PostgreSQL and MySQL 8.| Feature | PostgreSQL | MySQL 8+ | SQL Server | SQLite 3.25+ |
|---|---|---|---|---|
| ROW_NUMBER, RANK, DENSE_RANK | ✓ | ✓ | ✓ | ✓ |
| LAG, LEAD | ✓ | ✓ | ✓ | ✓ |
| FIRST_VALUE, LAST_VALUE, NTH_VALUE | ✓ | 8.0.2+ | ✓ | ✓ |
| NTILE, PERCENT_RANK, CUME_DIST | ✓ | ✓ | ✓ | ✓ |
| Named WINDOW clause | ✓ | ✓ | 2022+ | ✓ |
| RANGE with interval offsets | ✓ | ✗ | ✓ | ✗ |
| IGNORE NULLS in LAG/LEAD | 14+ | ✗ | ✓ | ✗ |
| FILTER clause in window aggregates | ✓ | ✗ | ✗ | ✓ |
-- ERROR
SELECT * FROM employees
WHERE ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) = 1;
-- SQL Error [42P20]: window functions are not allowed in WHERE
-- FIX: wrap in a CTE, then filter in the outer query
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees
)
SELECT * FROM ranked WHERE rn = 1;
-- The same pattern fixes window functions incorrectly placed in HAVING:
-- HAVING RANK() OVER (...) = 1 → CTE + WHERE rn = 1-- SYMPTOM: LAST_VALUE always returns the current row's value
-- CAUSE: default frame RANGE UNBOUNDED PRECEDING TO CURRENT ROW
-- "Last row of frame" = current row when using the default
-- WRONG (common mistake)
SELECT name, salary,
LAST_VALUE(salary) OVER (PARTITION BY dept ORDER BY salary) AS highest
FROM employees;
-- Returns each employee's own salary — useless
-- FIX: extend frame to partition end
SELECT name, salary,
LAST_VALUE(salary) OVER (
PARTITION BY dept ORDER BY salary
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS highest
FROM employees;
-- BETTER FIX: use FIRST_VALUE with DESC — no frame modification needed
SELECT name, salary,
FIRST_VALUE(salary) OVER (PARTITION BY dept ORDER BY salary DESC) AS highest
FROM employees;-- WRONG: ROW_NUMBER() without ORDER BY — numbers are arbitrary SELECT *, ROW_NUMBER() OVER () AS rn FROM employees; -- rn values change between query runs — unpredictable -- WRONG: LAG without ORDER BY — "previous row" is undefined SELECT *, LAG(salary) OVER () AS prev FROM employees; -- Result is meaningless — which row is "previous"? -- FIX: always specify ORDER BY for ranking and offset functions SELECT *, ROW_NUMBER() OVER (ORDER BY hire_date, id) AS rn FROM employees; -- id as tie-breaker ensures stable, deterministic row numbers -- Note: aggregate window functions (SUM, AVG) without ORDER BY -- are fine — they compute grand totals, and order is irrelevant: SELECT *, SUM(salary) OVER () AS total FROM employees; -- always correct
-- SYMPTOM: LAG returns a value from a different customer's orders
-- WRONG: no PARTITION BY — LAG crosses all orders
SELECT user_id, order_date, total,
LAG(total) OVER (ORDER BY order_date) AS prev_total
FROM orders;
-- Row 1 (user 2's first order) gets prev_total from user 1's last order
-- FIX: add PARTITION BY user_id
SELECT user_id, order_date, total,
LAG(total) OVER (PARTITION BY user_id ORDER BY order_date) AS prev_total
FROM orders;
-- Now LAG only looks backward within the same user's orders
-- The first order per user returns NULL (no previous order for that user)-- Use DENSE_RANK to include all tied rows at each rank level
WITH ranked AS (
SELECT
first_name, department, salary,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS dr
FROM employees
)
SELECT first_name, department, salary
FROM ranked
WHERE dr <= 2
ORDER BY department, salary DESC;
-- Key interview follow-up: "What if you want exactly 2 rows per department
-- with no ties?" → use ROW_NUMBER() instead of DENSE_RANK()
-- "What if a department has fewer than 2 employees?" → query still works;
-- it returns whatever exists up to rank 2-- users table has duplicate emails; keep only the newest per email
WITH deduped AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_at DESC, id DESC
) AS rn
FROM users
)
SELECT id, first_name, email, created_at
FROM deduped
WHERE rn = 1;
-- Interview insight: why ROW_NUMBER and not RANK?
-- RANK would assign rn=1 to ALL rows with the same created_at
-- (if two duplicates were created at the exact same timestamp),
-- meaning WHERE rn = 1 could still return multiple rows.
-- ROW_NUMBER guarantees exactly one row per email group.WITH monthly AS (
SELECT STRFTIME('%Y-%m', order_date) AS month,
ROUND(SUM(total), 2) AS revenue
FROM orders GROUP BY month
)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
CASE
WHEN LAG(revenue) OVER (ORDER BY month) IS NULL THEN NULL
ELSE ROUND(
(revenue - LAG(revenue) OVER (ORDER BY month))
* 100.0 / LAG(revenue) OVER (ORDER BY month),
2)
END AS mom_growth_pct
FROM monthly
ORDER BY month;
-- Interviewers look for:
-- 1. Correct use of LAG (not a self-join)
-- 2. Handling NULL for the first month (CASE or NULLIF)
-- 3. Correct % formula: (new - old) / old * 100-- Use this live demo to explain: SELECT salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn, RANK() OVER (ORDER BY salary DESC) AS rnk, DENSE_RANK() OVER (ORDER BY salary DESC) AS dr FROM (VALUES (100),(90),(90),(80)) t(salary); -- salary rn rnk dr -- 100 1 1 1 -- 90 2 2 2 -- 90 3 2 2 ← tied rows get same rank -- 80 4 4 3 ← RANK gaps to 4; DENSE stays at 3 -- Verbal explanation: -- ROW_NUMBER: always unique 1,2,3,4 — no ties, arbitrary at ties -- RANK: same number for ties, then skips — 1,2,2,4 means "4th person = truly 4th" -- DENSE_RANK: same number for ties, no gaps — 1,2,2,3 means "3rd tier"
SELECT
u.first_name,
o.order_date,
ROUND(o.total, 2) AS order_total,
ROUND(SUM(o.total) OVER (
PARTITION BY o.user_id
ORDER BY o.order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
), 2) AS cumulative_spend
FROM orders o
JOIN users u ON u.id = o.user_id
ORDER BY o.user_id, o.order_date;
-- Concepts tested:
-- PARTITION BY to reset running total per customer
-- ORDER BY inside OVER for chronological accumulation
-- ROWS frame for running total semantics (vs grand total)
-- JOIN to bring in customer name alongside order dataWITH user_months AS (
SELECT DISTINCT user_id,
STRFTIME('%Y-%m', order_date) AS order_month
FROM orders
),
with_next AS (
SELECT user_id, order_month,
LEAD(order_month) OVER (
PARTITION BY user_id ORDER BY order_month
) AS next_month
FROM user_months
)
SELECT DISTINCT user_id
FROM with_next
WHERE next_month = STRFTIME('%Y-%m',
DATE(order_month || '-01', '+1 month'));
-- Approach: get distinct (user, month) pairs, then LEAD to get
-- the user's next active month. If next_month is exactly one
-- calendar month later, the user was active in consecutive months.
-- The DATE arithmetic adds one month to the current month's first day.ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), and LEAD() all require ORDER BY inside OVER() to produce meaningful, deterministic results. Without ORDER BY, the database assigns values in arbitrary storage order — which can differ between query runs. For tie-breaking, add a unique column (usually the primary key) as the final sort key: ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC, id) This ensures every row always receives the same row number, even when salaries are equal.
Window functions execute after WHERE and HAVING. The only way to filter on a window function result is to materialize it first in a CTE:
WITH ranked AS (
SELECT *, DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS dr
FROM employees
)
SELECT * FROM ranked WHERE dr <= 3;
For multi-step analytics, chain CTEs:
WITH monthly AS (
SELECT STRFTIME('%Y-%m', order_date) AS month, SUM(total) AS revenue
FROM orders GROUP BY month
),
with_growth AS (
SELECT *, LAG(revenue) OVER (ORDER BY month) AS prev_revenue FROM monthly
)
SELECT * FROM with_growth WHERE prev_revenue IS NOT NULL AND revenue > prev_revenue;LAST_VALUE() with the default frame returns the current row's value, not the partition's last row. Rather than adding the verbose frame clause every time, flip the ORDER BY and use FIRST_VALUE(): -- Verbose: LAST_VALUE with explicit frame LAST_VALUE(salary) OVER ( PARTITION BY dept ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) -- Simpler: FIRST_VALUE with reversed ORDER BY FIRST_VALUE(salary) OVER (PARTITION BY dept ORDER BY salary DESC) Both return the same result. The FIRST_VALUE approach is cleaner and immune to the frame gotcha.
ROWS BETWEEN N PRECEDING AND CURRENT ROW counts N physical rows before the current one — predictable and explicit. RANGE BETWEEN uses value-based offsets and includes all rows that tie on the ORDER BY column. For daily data with no gaps, both give the same result. When gaps exist or ties are possible, ROWS is almost always what you want: -- 7-day window: the 7 most recent physical rows AVG(revenue) OVER (ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) -- Use RANGE with interval only when you need calendar-based windows -- and your database supports it (PostgreSQL, SQL Server): AVG(revenue) OVER (ORDER BY order_date RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW)
Window functions connect to several other advanced SQL topics. Study these alongside window functions to build complete analytical SQL fluency:
Run every query from this guide in the SQLab Hub interactive SQL playground — no signup, no download, instant results.