What is the OVER() clause in SQL?

The OVER() clause defines the window — the set of rows a window function operates on. It can include PARTITION BY to split rows into independent groups, ORDER BY to establish row ordering within each partition, and a frame clause (ROWS/RANGE) to define which surrounding rows are included. An empty OVER() applies the function across the entire result set.

What is PARTITION BY in SQL?

PARTITION BY divides the result set into partitions (groups) and applies the window function independently within each partition. It behaves like GROUP BY but does not collapse rows — every original row is retained in the output. For example, PARTITION BY department computes rankings separately within each department, resetting to 1 for each new group.

What is the difference between RANK() and DENSE_RANK()?

RANK() assigns rankings with gaps after ties — if two rows tie at position 2, the next rank is 4 (skipping 3). DENSE_RANK() assigns rankings without gaps — if two rows tie at position 2, the next rank is 3. Use DENSE_RANK() when you need consecutive rankings even after ties; use RANK() when you want to show how many competitors are ahead.

What is ROW_NUMBER() used for in SQL?

ROW_NUMBER() assigns a unique sequential integer to each row within its partition, starting at 1. Even if two rows are identical, they receive different row numbers. It is commonly used to deduplicate records (keeping only the first or latest per group), implement pagination, and identify the top-N rows per group using a CTE + WHERE rn <= N pattern.

What is the difference between LAG() and LEAD()?

LAG() accesses the value of a column from a previous row within the window. LEAD() accesses the value from a following row. Both require ORDER BY inside OVER() to define which row is "previous" or "next." They are primarily used for period-over-period comparisons — comparing this month's revenue to last month's with LAG(), or computing days to the next order with LEAD().

Are SQL window functions slow?

Window functions are generally efficient. They avoid the correlated subquery anti-pattern where a query runs once per row. The main cost is the sort required by ORDER BY inside OVER(). Adding an index on PARTITION BY + ORDER BY columns eliminates that sort. For large tables, window functions are typically 5–50x faster than equivalent correlated subqueries.

Which databases support SQL window functions?

Window functions are supported in PostgreSQL (since 8.4), SQL Server (since 2005/2012), Oracle (since 8i), SQLite (since 3.25.0), and MySQL (since 8.0). MySQL 5.x does not support window functions. BigQuery, Snowflake, Redshift, and DuckDB all support window functions with standard SQL syntax.

Can SQL window functions replace GROUP BY?

Window functions complement but do not replace GROUP BY. GROUP BY aggregates rows into one row per group — useful when you need only the summary. Window functions keep all rows while adding computed values — useful when you need both row-level detail and aggregate context. For pure summaries, use GROUP BY; for row-level analytics that reference group metrics, use window functions.

Can you use window functions in a WHERE clause?

No. Window functions cannot be used directly in WHERE or HAVING because they are evaluated after those clauses. To filter on a window function result, wrap the query in a CTE: WITH ranked AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn FROM employees) SELECT * FROM ranked WHERE rn = 1. This is the standard pattern for top-N-per-group queries.

Home/Blog/SQL Window Functions Explained with Practical Examples (2026 Guide)

Advanced SQLWindow FunctionsData AnalyticsSQL Tutorial

SQL Window Functions Explained with Practical Examples (2026 Guide)

Q: What are SQL window functions?

SQL window functions perform calculations across a set of rows related to the current row without collapsing the result set. Unlike GROUP BY — which reduces multiple rows to one summary row — window functions return a value for every input row while still computing aggregates, rankings, or comparisons across related rows defined by an OVER() clause.

Q: What is the difference between LAG() and LEAD()?

LAG() accesses the value of a column from a previous row within the window. LEAD() accesses the value from a following row. Both require ORDER BY inside OVER() to define which row is "previous" or "next." They are primarily used for period-over-period comparisons — comparing this month's revenue to last month's with LAG(), or computing days to the next order with LEAD().

Q: Are SQL window functions slow?

Window functions are generally efficient. They avoid the correlated subquery anti-pattern where a query runs once per row. The main cost is the sort required by ORDER BY inside OVER(). Adding an index on PARTITION BY + ORDER BY columns eliminates that sort. For large tables, window functions are typically 5–50x faster than equivalent correlated subqueries.

Q: Which databases support SQL window functions?

Window functions are supported in PostgreSQL (since 8.4), SQL Server (since 2005/2012), Oracle (since 8i), SQLite (since 3.25.0), and MySQL (since 8.0). MySQL 5.x does not support window functions. BigQuery, Snowflake, Redshift, and DuckDB all support window functions with standard SQL syntax.

Q: Can SQL window functions replace GROUP BY?

Window functions complement but do not replace GROUP BY. GROUP BY aggregates rows into one row per group — useful when you need only the summary. Window functions keep all rows while adding computed values — useful when you need both row-level detail and aggregate context. For pure summaries, use GROUP BY; for row-level analytics that reference group metrics, use window functions.

Q: Can you use window functions in a WHERE clause?

No. Window functions cannot be used directly in WHERE or HAVING because they are evaluated after those clauses. To filter on a window function result, wrap the query in a CTE: WITH ranked AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn FROM employees) SELECT * FROM ranked WHERE rn = 1. This is the standard pattern for top-N-per-group queries.

SQLab Hub Team·2026-05-14·28 min read

SQL window functions are the most powerful analytical tool in the SQL language — and the feature that separates junior analysts from senior data engineers. They let you compute rankings, running totals, period-over-period comparisons, and moving averages without losing a single row from your result set. No complex self-joins. No correlated subqueries running once per row. No temporary tables. This guide covers every SQL window function from first principles to advanced technique. You will find the complete OVER() and PARTITION BY syntax explained from scratch, detailed examples of ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, FIRST_VALUE, and LAST_VALUE, real-world analytics patterns for e-commerce and HR, performance optimization advice with indexing strategy, database-specific notes for MySQL 8, PostgreSQL, and SQL Server, 6 SQL window function interview questions with full answers, a common-errors reference, and a complete FAQ section with schema markup. Every example uses the SQLab Hub schema — users, orders, products, employees — so you can run every query immediately in the free SQL playground with zero setup.

Practice these queries in SQLab Hub →

Real SQLite engine · 8 tables · No signup needed

Open Playground

What Are SQL Window Functions?

The core concept

SQL window functions perform calculations across a set of rows related to the current row — called a "window" — without collapsing those rows into a single output row. This is the defining difference from GROUP BY: window functions return one row for every input row, with the computed value attached as an additional column.

Think of it this way: GROUP BY asks "give me one summary row per group." A window function asks "for each row, also tell me something about its group." The result set stays intact; the computed value rides alongside the original data.

Every window function uses the same syntax pattern:

  function_name() OVER (
    [PARTITION BY column(s)]
    [ORDER BY column(s)]
    [ROWS|RANGE frame_specification]
  )

The OVER() clause is what makes a function a window function. Without OVER(), SUM() is a regular aggregate that collapses rows. With OVER(), SUM() becomes a running total or group sum that appears on every row.

Why window functions replaced correlated subqueries

Before window functions, computing "each employee's salary compared to their department average" required a correlated subquery — a subquery that re-executes once per outer row:

-- Old approach: correlated subquery (runs once per employee row)
SELECT e.first_name, e.salary,
       (SELECT AVG(salary) FROM employees e2
        WHERE e2.department = e.department) AS dept_avg
FROM employees e;

-- Modern approach: window function (single pass)
SELECT first_name, salary,
       AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

-- On a 50,000-row employees table:
-- Correlated subquery: 50,000 inner query executions
-- Window function: one scan + one sort
-- Window function is typically 10-100x faster at scale

-- The window version also composes — add more window functions
-- without additional scans:
SELECT first_name, department, salary,
       AVG(salary)    OVER (PARTITION BY department) AS dept_avg,
       MAX(salary)    OVER (PARTITION BY department) AS dept_max,
       salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg,
       RANK()         OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

GROUP BY vs window functions — the most important distinction

-- GROUP BY: 4 departments → 4 rows in the result
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
-- dept        avg_salary
-- Engineering 87500
-- Marketing   68000
-- Sales       61000
-- Finance     79000

-- Window function: 20 employees → 20 rows in the result
-- Each employee row includes the department average alongside their own data
SELECT first_name, department, salary,
       ROUND(AVG(salary) OVER (PARTITION BY department), 0) AS dept_avg,
       salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM employees
ORDER BY department, salary DESC;

-- first_name  department   salary  dept_avg  diff_from_avg
-- Alice       Engineering  95000   87500     +7500
-- Frank       Engineering  88000   87500     +500
-- Bob         Engineering  80000   87500     -7500
-- Carol       Marketing    72000   68000     +4000
-- Dave        Marketing    64000   68000     -4000

-- You cannot produce this output with GROUP BY alone.
-- Individual salary disappears the moment you aggregate.
-- Window functions preserve all rows while adding group context.

Real-world use cases

Use Case	Function(s)	Industry
Rank employees by salary within department	RANK(), DENSE_RANK()	HR / Payroll
Deduplicate — keep the latest record per user	ROW_NUMBER()	Data Engineering
Month-over-month revenue change	LAG()	Finance / BI
7-day rolling average of daily signups	AVG() OVER (ROWS 6 PRECEDING)	Product Analytics
Cumulative revenue running total	SUM() OVER (ORDER BY date)	E-commerce
Each product's revenue as % of category total	SUM() OVER (PARTITION BY category)	Retail
Top 3 customers per region	ROW_NUMBER() + CTE filter	Sales
First and last purchase date per customer	FIRST_VALUE(), LAST_VALUE()	CRM
Divide users into spend quartiles	NTILE(4)	Marketing
Days between consecutive orders per customer	LAG() on order_date	Logistics

SQL OVER() Clause Explained

OVER() with no arguments — entire result set as window

An empty OVER() means the window is the entire result set. Every row sees the same aggregate value — useful for computing percentages of a grand total:

-- What percentage of total payroll is each employee's salary?
SELECT first_name, salary,
       SUM(salary) OVER () AS total_payroll,
       ROUND(salary * 100.0 / SUM(salary) OVER (), 1) AS pct_of_total
FROM employees;

-- first_name  salary  total_payroll  pct_of_total
-- Alice       95000   435000         21.8
-- Frank       88000   435000         20.2
-- Bob         80000   435000         18.4
-- Carol       72000   435000         16.6
-- Dave        64000   435000         14.7
-- Eve         36000   435000         8.3

-- SUM(salary) OVER () computes once and attaches the same value to every row.
-- No GROUP BY needed. Individual rows are preserved.

PARTITION BY — split into independent groups

PARTITION BY divides the result set into independent groups. The window function computes separately within each partition, as if each group were a separate table:

-- Department-level statistics alongside each employee row
SELECT first_name, department, salary,
       AVG(salary) OVER (PARTITION BY department) AS dept_avg,
       COUNT(*)    OVER (PARTITION BY department) AS dept_headcount,
       MAX(salary) OVER (PARTITION BY department) AS dept_max
FROM employees
ORDER BY department, salary DESC;

-- The window resets completely for each department.
-- Partitioning by multiple columns is supported:
SELECT *,
       ROW_NUMBER() OVER (
         PARTITION BY department, job_level
         ORDER BY salary DESC
       ) AS rank_in_dept_level
FROM employees;

-- This creates separate windows per (department, job_level) combination.
-- A row in Engineering/Senior is ranked independently from Engineering/Junior.

ORDER BY inside OVER() — affects the computation, not the output

ORDER BY inside OVER() controls the order rows are processed within the window — it is completely independent of the query-level ORDER BY:

-- hire_sequence: 1 for the earliest hire, 2 for the second, etc.
-- cumulative_payroll: grows as each employee is added chronologically
SELECT first_name, hire_date, salary,
       ROW_NUMBER() OVER (ORDER BY hire_date)        AS hire_sequence,
       SUM(salary)  OVER (ORDER BY hire_date
         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_payroll
FROM employees
ORDER BY hire_date;  -- this query-level ORDER BY controls display order only

-- KEY BEHAVIOR: when ORDER BY is present inside OVER(),
-- the default frame changes from "entire partition" to
-- "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW"
-- This is what makes SUM() a running total instead of a grand total.
-- To get a grand total on every row when ORDER BY is present,
-- you must explicitly override the frame:
SELECT salary,
       SUM(salary) OVER (ORDER BY salary
         ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS always_grand_total
FROM employees;

Window frame clause — ROWS vs RANGE

The frame clause defines which rows within the partition are included relative to the current row. Mastering this is what separates intermediate from advanced SQL:

-- Frame clause syntax:
-- ROWS|RANGE BETWEEN start_boundary AND end_boundary

-- Boundary options:
-- UNBOUNDED PRECEDING  — from the first row of the partition
-- N PRECEDING          — N rows (ROWS) or N value-units (RANGE) before current
-- CURRENT ROW          — the current row itself
-- N FOLLOWING          — N rows/units after current
-- UNBOUNDED FOLLOWING  — to the last row of the partition

-- 3-row moving average (current row + 2 rows before it)
SELECT order_date, daily_revenue,
       ROUND(AVG(daily_revenue) OVER (
         ORDER BY order_date
         ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
       ), 2) AS three_day_avg
FROM daily_sales;

-- ROWS vs RANGE distinction:
-- ROWS: counts physical rows — predictable, explicit
-- RANGE: includes all rows with the same ORDER BY value as ties
-- If three rows share the same date, RANGE treats them as "current row"
-- and includes all three in calculations — often unexpected behavior
-- For time series work, ROWS BETWEEN N PRECEDING AND CURRENT ROW
-- is almost always the correct choice.

ROW_NUMBER() Function in SQL

What ROW_NUMBER() does and when to use it

ROW_NUMBER() assigns a unique sequential integer to each row within its partition, starting at 1. Even if two rows are completely identical across all columns, they receive different row numbers. This guaranteed uniqueness distinguishes ROW_NUMBER() from RANK() and DENSE_RANK(), which assign the same number to tied rows.

Use ROW_NUMBER() when you need exactly one row per group — for deduplication, pagination, or selecting the single top record. Use RANK() or DENSE_RANK() when tied rows should receive equal standing.

Basic syntax and example

-- Syntax
ROW_NUMBER() OVER ([PARTITION BY col] [ORDER BY col])

-- Number every employee company-wide by salary (highest earns #1)
SELECT first_name, department, salary,
       ROW_NUMBER() OVER (ORDER BY salary DESC) AS company_rank
FROM employees;

-- first_name  department   salary  company_rank
-- Alice       Engineering  95000   1
-- Frank       Engineering  88000   2
-- Bob         Engineering  80000   3
-- Carol       Marketing    72000   4
-- Dave        Marketing    64000   5
-- Eve         Sales        61000   6

-- Even if Frank and Bob had the same salary they still receive
-- different row numbers. ROW_NUMBER() never produces ties.

ROW_NUMBER() with PARTITION BY — rank within each group

-- Rank within each department independently
SELECT first_name, department, salary,
       ROW_NUMBER() OVER (
         PARTITION BY department
         ORDER BY salary DESC
       ) AS rank_in_dept
FROM employees
ORDER BY department, rank_in_dept;

-- first_name  department   salary  rank_in_dept
-- Alice       Engineering  95000   1
-- Frank       Engineering  88000   2
-- Bob         Engineering  80000   3
-- Carol       Marketing    72000   1   ← resets to 1 for Marketing
-- Dave        Marketing    64000   2
-- Eve         Sales        61000   1   ← resets to 1 for Sales

-- PARTITION BY department causes the row number sequence to restart
-- independently for each department.

Deduplication — keep only the latest record per group

-- A real ETL problem: users table has duplicate emails from a bad import
-- Keep only the most recently created record per email address

WITH deduped AS (
  SELECT *,
         ROW_NUMBER() OVER (
           PARTITION BY email
           ORDER BY created_at DESC, id DESC  -- tie-break on id
         ) AS rn
  FROM users
)
SELECT id, first_name, email, created_at
FROM deduped
WHERE rn = 1;

-- How it works:
-- 1. PARTITION BY email groups all rows with the same email
-- 2. ORDER BY created_at DESC puts the newest record first (rn = 1)
-- 3. WHERE rn = 1 keeps only that newest record
-- 4. All duplicates (rn > 1) are filtered out

-- This is one of the most common production SQL patterns.
-- You will see it in ETL pipelines, data quality scripts,
-- and warehouse load jobs constantly.

Top N per group — a pattern every analyst must know

-- Find the top 2 highest-paid employees in each department

WITH ranked AS (
  SELECT first_name, department, salary,
         ROW_NUMBER() OVER (
           PARTITION BY department
           ORDER BY salary DESC
         ) AS rn
  FROM employees
)
SELECT first_name, department, salary
FROM ranked
WHERE rn <= 2
ORDER BY department, salary DESC;

-- This is the "top N per group" pattern.
-- A simple ORDER BY + LIMIT cannot do this — LIMIT applies to the entire
-- query result, not per department.
-- ROW_NUMBER() inside a CTE + WHERE on the outer query is the solution.

-- For pagination (page 2 of 10 results per page):
WHERE rn BETWEEN 11 AND 20

-- If ties should ALL be included at the boundary, use DENSE_RANK instead:
-- WHERE dr <= 2 includes all employees tied for 2nd place

Common mistake: using ROW_NUMBER() directly in WHERE

-- WRONG: window functions are evaluated AFTER WHERE, so this fails
SELECT * FROM employees
WHERE ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) = 1;
-- SQL Error: window functions are not allowed in WHERE clause

-- CORRECT pattern 1: CTE (recommended for readability)
WITH r AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
  FROM employees
)
SELECT * FROM r WHERE rn = 1;

-- CORRECT pattern 2: subquery (equivalent, less readable)
SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
  FROM employees
) ranked
WHERE rn = 1;

-- The same rule applies to HAVING — window functions cannot appear there either.

RANK() vs DENSE_RANK() vs NTILE()

How the three ranking functions differ

Given employees with salaries 95000, 88000, 88000, 72000 — the same salary values produce different outputs depending on which function you choose:

Function	Output for 95k, 88k, 88k, 72k	Ties	Gaps after ties
ROW_NUMBER()	1, 2, 3, 4	Never — always unique	No
RANK()	1, 2, 2, 4	Same rank for ties	Yes — skips 3
DENSE_RANK()	1, 2, 2, 3	Same rank for ties	No — always consecutive

Decision guide: Use ROW_NUMBER() for deduplication and top-N when exactly one row per group is needed. Use RANK() for leaderboards where a rank of 4 means "three people beat you." Use DENSE_RANK() for prize tiers, salary bands, and the classic "find the Nth highest salary" interview question.

All three side by side in one query

SELECT name, salary,
       ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
       RANK()       OVER (ORDER BY salary DESC) AS rnk,
       DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rnk
FROM employees
ORDER BY salary DESC;

-- name    salary  row_num  rnk  dense_rnk
-- Alice   95000   1        1    1
-- Frank   88000   2        2    2
-- Bob     88000   3        2    2     ← same salary, same rank
-- Carol   72000   4        4    3     ← RANK gaps to 4; DENSE stays at 3
-- Dave    64000   5        5    4

-- RANK() rationale: Frank and Bob occupy positions 2 AND 3.
-- The 4th person (Carol) is truly 4th — two people beat her.
-- DENSE_RANK() rationale: there is no "3rd salary tier" — just 1st, 2nd, 3rd.

Interview scenario: top N with ties (DENSE_RANK vs ROW_NUMBER)

-- Interview question: "Return all employees with a top-3 salary.
-- Include everyone tied for 3rd place."

-- Wrong approach: ROW_NUMBER arbitrarily cuts off one tied row
WITH r AS (
  SELECT *, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn FROM employees
)
SELECT * FROM r WHERE rn <= 3;
-- If two employees tie for 2nd at 88000, one is rn=2, one is rn=3.
-- The employee at rn=4 (also 88000) is excluded — wrong.

-- Correct approach: DENSE_RANK includes all tied rows
WITH r AS (
  SELECT *, DENSE_RANK() OVER (ORDER BY salary DESC) AS dr FROM employees
)
SELECT * FROM r WHERE dr <= 3;
-- All employees at the 3rd salary tier are included.

-- The choice between ROW_NUMBER and DENSE_RANK is a classic
-- interview follow-up question. Know when each is appropriate.

NTILE() — divide rows into equal buckets

-- NTILE(n) assigns each row to one of n roughly equal buckets.
-- Bucket 1 contains the "top" rows by the ORDER BY column.

-- Segment customers into 4 spend quartiles
WITH customer_spend AS (
  SELECT user_id, ROUND(SUM(total), 2) AS total_spend
  FROM orders GROUP BY user_id
)
SELECT user_id, total_spend,
       NTILE(4) OVER (ORDER BY total_spend DESC) AS spend_quartile
FROM customer_spend
ORDER BY total_spend DESC;

-- quartile 1 = top 25% spenders (highest value customers)
-- quartile 4 = bottom 25% spenders

-- Business use cases:
-- Email campaigns: different messaging per quartile
-- Loyalty programs: tier assignment (Gold/Silver/Bronze/Standard)
-- A/B test holdout groups: NTILE(10) to create 10% control group

-- Note: NTILE(100) creates percentiles; NTILE(10) creates deciles.
-- If rows don't divide evenly, earlier buckets receive one extra row.

PERCENT_RANK() and CUME_DIST()

-- PERCENT_RANK(): relative rank as a fraction from 0.0 to 1.0
-- Formula: (rank - 1) / (total_rows - 1)
-- First row is always 0.0, last row is always 1.0
SELECT name, salary,
       ROUND(PERCENT_RANK() OVER (ORDER BY salary), 3) AS pct_rank
FROM employees;
-- A pct_rank of 0.75 means this employee earns more than 75% of colleagues

-- CUME_DIST(): cumulative distribution — fraction of rows <= current
-- Formula: rank / total_rows
-- Always > 0; always reaches 1.0 for the highest value
SELECT name, salary,
       ROUND(CUME_DIST() OVER (ORDER BY salary), 3) AS cume_dist
FROM employees;
-- A cume_dist of 0.8 means 80% of employees earn at or below this salary

-- Use PERCENT_RANK for relative positioning; use CUME_DIST for
-- "what fraction of the data falls at or below this value"

LAG() and LEAD() Functions

What LAG() and LEAD() do

LAG() and LEAD() are offset functions that access column values from other rows in the window without a self-join. LAG() looks backward (previous rows); LEAD() looks forward (following rows).

Full syntax:
  LAG(column, offset, default_value) OVER (PARTITION BY ... ORDER BY ...)
  LEAD(column, offset, default_value) OVER (PARTITION BY ... ORDER BY ...)

  column        the column or expression to read from the offset row
  offset        how many rows back (LAG) or forward (LEAD) to look (default: 1)
  default_value what to return when no row exists at the offset
                (the first row has no previous row; without a default, NULL is returned)

Both functions always require ORDER BY inside OVER() — "previous" and "next" are meaningless without defined ordering.

Month-over-month revenue with LAG()

WITH monthly AS (
  SELECT
    STRFTIME('%Y-%m', order_date) AS month,
    ROUND(SUM(total), 2)          AS revenue
  FROM orders
  GROUP BY month
)
SELECT
  month,
  revenue,
  LAG(revenue, 1, 0) OVER (ORDER BY month)      AS prev_month_revenue,
  revenue - LAG(revenue, 1, 0) OVER (ORDER BY month) AS abs_change,
  ROUND(
    (revenue - LAG(revenue) OVER (ORDER BY month))
    * 100.0 / LAG(revenue) OVER (ORDER BY month),
  1) AS pct_change
FROM monthly
ORDER BY month;

-- month    revenue  prev_revenue  abs_change  pct_change
-- 2025-01  42000    0             42000       NULL
-- 2025-02  51000    42000         +9000       +21.4%
-- 2025-03  48000    51000         -3000       -5.9%
-- 2025-04  63000    48000         +15000      +31.3%

-- LAG(revenue, 1, 0) returns 0 for the first month (no prior month).
-- LAG(revenue) alone returns NULL for the first month.
-- pct_change is NULL for January because there is nothing to compare against.

Days between consecutive orders with LEAD()

-- For each order, show the customer's next order date and gap in days
SELECT
  user_id,
  order_date,
  total,
  LEAD(order_date) OVER (
    PARTITION BY user_id
    ORDER BY order_date
  ) AS next_order_date,
  CAST(
    JULIANDAY(LEAD(order_date) OVER (PARTITION BY user_id ORDER BY order_date))
    - JULIANDAY(order_date)
  AS INTEGER) AS days_to_next_order
FROM orders
ORDER BY user_id, order_date;

-- user_id  order_date  total  next_order_date  days_to_next_order
-- 1        2025-01-05  120    2025-01-28       23
-- 1        2025-01-28  85     2025-03-10       41
-- 1        2025-03-10  200    NULL             NULL  ← last order
-- 2        2025-02-14  55     2025-04-01       46

-- PARTITION BY user_id ensures LEAD only looks at the next order
-- from the SAME customer, not the next order in the entire table.
-- Without PARTITION BY, customer 1's last order would incorrectly
-- look at customer 2's first order as "next."

Year-over-year comparison with LAG(12)

-- Compare each month's revenue to the same month one year prior
-- LAG(revenue, 12) looks back 12 months in the ordered month list
WITH monthly AS (
  SELECT STRFTIME('%Y-%m', order_date) AS month, SUM(total) AS revenue
  FROM orders GROUP BY month
)
SELECT
  month,
  ROUND(revenue, 2)                                          AS revenue,
  ROUND(LAG(revenue, 12) OVER (ORDER BY month), 2)          AS same_month_last_year,
  ROUND(
    (revenue - LAG(revenue, 12) OVER (ORDER BY month))
    * 100.0 / LAG(revenue, 12) OVER (ORDER BY month),
  1)                                                         AS yoy_pct
FROM monthly
ORDER BY month;

-- The second argument to LAG/LEAD is the offset.
-- LAG(col, 1) = previous row, LAG(col, 12) = 12 rows back.
-- On monthly data ordered by month, 12 rows back = same calendar month, prior year.
-- This is the standard pattern for seasonality-adjusted YoY reporting.

Common LAG/LEAD interview trap: missing PARTITION BY

-- TRAP: forgetting PARTITION BY on per-customer data
-- Without PARTITION BY, LAG crosses customer boundaries:

-- WRONG: no PARTITION BY
SELECT user_id, order_date, total,
       LAG(total) OVER (ORDER BY order_date) AS prev_total
FROM orders;
-- The "previous" order for customer 2's first order is
-- customer 1's last order — completely wrong for customer-level analysis

-- CORRECT: partition by customer
SELECT user_id, order_date, total,
       LAG(total) OVER (
         PARTITION BY user_id
         ORDER BY order_date
       ) AS prev_order_total
FROM orders;
-- Now "previous" always means previous order FOR THAT CUSTOMER

-- Second trap: LAG/LEAD without ORDER BY inside OVER()
-- Result is non-deterministic — always specify ORDER BY for these functions

FIRST_VALUE() and LAST_VALUE()

FIRST_VALUE() — reference the first row in the window

-- Compare each product's price to the cheapest in its category
SELECT
  name,
  category,
  price,
  FIRST_VALUE(name)  OVER (PARTITION BY category ORDER BY price ASC) AS cheapest_product,
  FIRST_VALUE(price) OVER (PARTITION BY category ORDER BY price ASC) AS min_price,
  price - FIRST_VALUE(price) OVER (PARTITION BY category ORDER BY price ASC) AS premium
FROM products
ORDER BY category, price;

-- category  name          price  cheapest_product  min_price  premium
-- Laptops   Entry Model   499    Entry Model       499        0
-- Laptops   Mid Range     899    Entry Model       499        400
-- Laptops   Pro Model     1499   Entry Model       499        1000
-- Phones    Budget        299    Budget            299        0
-- Phones    Flagship      999    Budget            299        700

-- FIRST_VALUE always works correctly with the default frame because
-- UNBOUNDED PRECEDING always reaches back to the partition's first row.

LAST_VALUE() — the frame trap that catches everyone

-- LAST_VALUE() has a notorious gotcha.
-- Default frame: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- "Last row in frame" = CURRENT ROW — not the partition's last row!

-- WRONG: returns each row's OWN value, not the partition's last value
SELECT name, category, price,
       LAST_VALUE(name) OVER (PARTITION BY category ORDER BY price) AS most_expensive
FROM products;
-- Every row returns its own name — LAST_VALUE is broken without frame fix

-- CORRECT: extend the frame to cover the entire partition
SELECT name, category, price,
       LAST_VALUE(name) OVER (
         PARTITION BY category
         ORDER BY price
         ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
       ) AS most_expensive_in_category
FROM products;
-- Now "last row" = actual last row of the partition = highest price product

-- SIMPLEST ALTERNATIVE: avoid LAST_VALUE entirely.
-- Use FIRST_VALUE with DESC ordering — same result, no frame complexity:
SELECT name, category, price,
       FIRST_VALUE(name) OVER (PARTITION BY category ORDER BY price DESC) AS most_expensive
FROM products;

NTH_VALUE() — access any specific row in the window

-- NTH_VALUE(column, n) returns the value from the nth row of the window
-- Useful for accessing 2nd place, 3rd place, etc.

-- Show each employee alongside the 2nd highest salary in their department
SELECT
  first_name, department, salary,
  NTH_VALUE(salary, 1) OVER (
    PARTITION BY department ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS dept_1st_salary,
  NTH_VALUE(salary, 2) OVER (
    PARTITION BY department ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS dept_2nd_salary
FROM employees;

-- Like LAST_VALUE, NTH_VALUE requires UNBOUNDED FOLLOWING in the frame
-- to correctly access rows that come after the current row.
-- Supported in PostgreSQL, MySQL 8.0.2+, SQL Server, SQLite.

Running Totals, Moving Averages, and Cumulative Analytics

Running total (cumulative sum)

-- Cumulative revenue over time
SELECT
  order_date,
  ROUND(daily_revenue, 2) AS daily_revenue,
  ROUND(SUM(daily_revenue) OVER (
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ), 2) AS cumulative_revenue
FROM (
  SELECT order_date, SUM(total) AS daily_revenue
  FROM orders GROUP BY order_date
) daily
ORDER BY order_date;

-- order_date  daily_revenue  cumulative_revenue
-- 2025-01-01  4200           4200
-- 2025-01-02  6800           11000
-- 2025-01-03  3100           14100
-- 2025-01-04  8900           23000

-- The frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- is the default when ORDER BY is present, so this shorthand also works:
SUM(daily_revenue) OVER (ORDER BY order_date)
-- Both produce identical running totals.

7-day moving average

-- 7-day rolling average of daily revenue (smooths daily noise)
SELECT
  order_date,
  ROUND(daily_revenue, 2) AS daily_revenue,
  ROUND(AVG(daily_revenue) OVER (
    ORDER BY order_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ), 2) AS seven_day_avg
FROM (
  SELECT order_date, SUM(total) AS daily_revenue
  FROM orders GROUP BY order_date
) daily
ORDER BY order_date;

-- For the first 6 days, fewer than 7 rows are available;
-- AVG uses only the existing rows (not padded with zeros).
-- This is correct behavior — a "partial window" at the start.

-- Real-world applications:
-- Finance: 50-day and 200-day moving average of stock prices
-- Product: 7-day rolling active users (WAU)
-- E-commerce: 30-day rolling revenue for trend lines
-- Operations: hourly rolling error rate monitoring

Percentage of total and cumulative percentage

-- Each month's revenue as % of annual total, plus cumulative %
WITH monthly AS (
  SELECT STRFTIME('%Y-%m', order_date) AS month,
         ROUND(SUM(total), 2) AS revenue
  FROM orders
  WHERE STRFTIME('%Y', order_date) = '2025'
  GROUP BY month
)
SELECT
  month,
  revenue,
  SUM(revenue) OVER ()                                               AS annual_total,
  ROUND(revenue * 100.0 / SUM(revenue) OVER (), 1)                  AS pct_of_year,
  ROUND(SUM(revenue) OVER (ORDER BY month)
    * 100.0 / SUM(revenue) OVER (), 1)                             AS cumulative_pct
FROM monthly
ORDER BY month;

-- month    revenue  annual_total  pct_of_year  cumulative_pct
-- 2025-01  42000    525000        8.0%         8.0%
-- 2025-02  51000    525000        9.7%         17.7%
-- 2025-03  48000    525000        9.1%         26.9%
-- 2025-12  68000    525000        13.0%        100.0%

-- SUM() OVER () = grand total on every row
-- SUM() OVER (ORDER BY month) = running total that grows each month

Nth highest salary — the classic interview query

-- Find the Nth highest distinct salary (N = 3 in this example)
WITH ranked AS (
  SELECT DISTINCT salary,
         DENSE_RANK() OVER (ORDER BY salary DESC) AS dr
  FROM employees
)
SELECT salary AS third_highest_salary
FROM ranked
WHERE dr = 3;

-- Why DENSE_RANK and not LIMIT/OFFSET?
-- DENSE_RANK handles ties correctly:
-- If two employees share the top salary, they're both rank 1.
-- The 3rd highest is then the row with dr = 3, not necessarily the 3rd row.

-- Full department breakdown with rankings:
SELECT
  first_name, department, salary,
  ROUND(AVG(salary) OVER (PARTITION BY department), 0) AS dept_avg,
  MAX(salary)    OVER (PARTITION BY department)        AS dept_max,
  DENSE_RANK()   OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees
ORDER BY department, dept_rank;

SQL Window Functions Performance Tips

Execution order — why you cannot use window functions in WHERE

SQL processes clauses in this logical order:

  1. FROM + JOIN    → build the working dataset
  2. WHERE          → filter rows
  3. GROUP BY       → group rows
  4. HAVING         → filter groups
  5. SELECT         → compute expressions (window functions run HERE)
  6. ORDER BY       → sort the result
  7. LIMIT          → cut to N rows

Window functions execute at step 5 — after WHERE and HAVING. This means:
• You cannot filter on window function results in WHERE or HAVING
  (the window function has not been computed yet at those stages)
• Wrap in a CTE to filter on window results

-- ALWAYS filter early to reduce the rows window functions must process:
WITH recent AS (
  SELECT * FROM orders WHERE order_date >= '2025-01-01'  -- filter BEFORE window
)
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) AS rn
FROM recent;

Indexing strategy for window functions

Each distinct OVER() specification requires a sort of the data. Without an index, this is a full table sort — O(n log n). With a matching index, the sort is eliminated entirely.

-- General rule: create an index on (PARTITION BY cols, ORDER BY cols)

-- Frequent query:
SELECT *, RANK() OVER (PARTITION BY department ORDER BY salary DESC) FROM employees;
-- Optimal index:
CREATE INDEX idx_emp_dept_salary ON employees (department, salary DESC);

-- Per-user running totals ordered by date:
SELECT *, SUM(total) OVER (PARTITION BY user_id ORDER BY order_date) FROM orders;
-- Optimal index:
CREATE INDEX idx_orders_user_date ON orders (user_id, order_date);

-- Verify with EXPLAIN: look for "Index Scan" instead of "Sort"
-- PostgreSQL: EXPLAIN (ANALYZE, BUFFERS) SELECT ...
-- MySQL: EXPLAIN SELECT ...
-- SQL Server: include actual execution plan

Window functions vs correlated subqueries — performance comparison

-- Correlated subquery: executes once PER OUTER ROW — O(n²)
SELECT e.first_name, e.salary,
       (SELECT AVG(salary) FROM employees e2
        WHERE e2.department = e.department) AS dept_avg
FROM employees e;
-- 1,000 employees → 1,000 inner subquery executions
-- 100,000 employees → 100,000 inner executions (very slow)

-- Window function: single pass — O(n log n) at worst
SELECT first_name, salary,
       AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
-- 1,000 employees → one scan + one sort
-- 100,000 employees → one scan + one sort (still fast)

-- Benchmark guideline:
-- < 1,000 rows:   negligible difference
-- 10k - 100k rows: window function typically 5-50x faster
-- 1M+ rows:        correlated subquery can take minutes;
--                  window function completes in seconds

-- Window functions vs CTEs:
-- CTEs improve readability, not raw performance.
-- A window function inside a CTE runs the same as in a main query.
-- The optimizer materializes or inlines CTEs depending on the database.

Named WINDOW clause — avoid redundant sorts

-- Each distinct OVER() spec triggers a separate sort.
-- When multiple functions share the same window, use a named WINDOW:

-- Without WINDOW clause: may trigger multiple sorts
SELECT first_name, salary,
       RANK()       OVER (PARTITION BY department ORDER BY salary DESC) AS rnk,
       DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dr,
       LAG(salary)  OVER (PARTITION BY department ORDER BY salary DESC) AS prev_sal
FROM employees;

-- With WINDOW clause: one sort reused by all three functions
SELECT first_name, salary,
       RANK()       OVER w AS rnk,
       DENSE_RANK() OVER w AS dr,
       LAG(salary)  OVER w AS prev_sal
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY salary DESC);

-- Supported in: PostgreSQL, MySQL 8.0, SQL Server 2022, SQLite
-- Not supported in: SQL Server 2019 and earlier

SQL Window Functions in MySQL

MySQL 8.0 window function support

MySQL added full window function support in MySQL 8.0 (released April 2018). MySQL 5.7 and earlier do NOT support window functions — any OVER() clause on MySQL 5.x throws a syntax error.

MySQL 8.0 supports all standard window functions:
• Ranking: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE()
• Offset: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE(), NTH_VALUE()
• Aggregate: SUM(), AVG(), MIN(), MAX(), COUNT() with OVER()
• Distribution: PERCENT_RANK(), CUME_DIST()
• Named WINDOW clause for reusable window definitions

To check your version: SELECT VERSION();
If the result is 8.0.x or higher, all window functions are available.

MySQL 8.0 also added CTEs (WITH clause) and JSON_TABLE() in the same release — a significant upgrade from 5.7 overall.

MySQL window function example with named WINDOW

-- MySQL 8.0: rank + dense rank + row number in one query
SELECT
  first_name, department, salary,
  ROW_NUMBER() OVER dept_window AS row_num,
  RANK()       OVER dept_window AS rnk,
  DENSE_RANK() OVER dept_window AS dense_rnk
FROM employees
WINDOW dept_window AS (PARTITION BY department ORDER BY salary DESC);

-- Month-over-month in MySQL 8.0 (uses DATE_FORMAT instead of STRFTIME)
WITH monthly AS (
  SELECT DATE_FORMAT(order_date, '%Y-%m') AS month,
         SUM(total) AS revenue
  FROM orders GROUP BY month
)
SELECT month, revenue,
       LAG(revenue) OVER (ORDER BY month)           AS prev_revenue,
       revenue - LAG(revenue) OVER (ORDER BY month) AS change
FROM monthly ORDER BY month;

MySQL-specific limitations to know

1. RANGE with numeric/interval offsets:
   MySQL 8.0 does NOT support RANGE BETWEEN N PRECEDING AND CURRENT ROW
   with numeric offsets. Use ROWS BETWEEN instead.
   ROWS BETWEEN 6 PRECEDING AND CURRENT ROW  ← works in MySQL 8
   RANGE BETWEEN INTERVAL 7 DAY PRECEDING AND CURRENT ROW  ← NOT MySQL

2. IGNORE NULLS in LAG/LEAD:
   LAG(col IGNORE NULLS) is not supported in MySQL.
   Workarounds require CTEs or subqueries.

3. NTH_VALUE() offset:
   Available from MySQL 8.0.2+. Ensure you are on 8.0.2 or later.

4. Window functions in UPDATE/DELETE:
   Not supported. Use a CTE or subquery to pre-compute, then JOIN.

5. QUALIFY clause:
   MySQL does not support QUALIFY (a Snowflake/DuckDB extension).
   Always use CTE + WHERE:
   WITH r AS (SELECT *, ROW_NUMBER() OVER ... AS rn FROM t)
   SELECT * FROM r WHERE rn = 1;

SQL Window Functions in PostgreSQL

PostgreSQL — the most complete window function implementation

PostgreSQL has supported window functions since version 8.4 (2009) and offers the most complete implementation of the SQL standard among open-source databases. Key advantages:

• FILTER clause: apply a condition within a window aggregate
  SUM(total) FILTER (WHERE status = 'completed') OVER (ORDER BY order_date)

• GROUPS frame mode (PostgreSQL 11+): a third frame type alongside ROWS and RANGE

• RANGE with interval offsets:
  AVG(total) OVER (ORDER BY order_date RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW)
  This includes all rows within the last 6 calendar days — handles date gaps correctly
  (ROWS BETWEEN 6 PRECEDING ignores gaps; RANGE with interval does not)

• IGNORE NULLS in LAG/LEAD (PostgreSQL 14+):
  LAG(salary IGNORE NULLS) OVER (ORDER BY hire_date)
  Returns the most recent non-NULL value, skipping NULL rows

• Named WINDOW clause with inheritance (extend a window definition)

PostgreSQL-specific examples

-- FILTER clause: running total of completed orders only
SELECT order_date, status, total,
       SUM(total) FILTER (WHERE status = 'completed')
         OVER (ORDER BY order_date) AS running_completed_revenue
FROM orders;

-- RANGE with interval: correct 7-day rolling average (handles date gaps)
SELECT order_date, total,
       ROUND(AVG(total) OVER (
         ORDER BY order_date::date
         RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW
       ), 2) AS seven_day_avg
FROM orders;
-- If January 3rd and January 10th exist but Jan 4-9 do not,
-- RANGE INTERVAL correctly includes only Jan 3 in Jan 10's window.
-- ROWS BETWEEN 1 PRECEDING would include Jan 3 as the "1 row before" Jan 10.

-- Named WINDOW with multiple functions
SELECT first_name, salary,
       RANK()       OVER w,
       LAG(salary)  OVER w,
       LEAD(salary) OVER w
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY salary DESC);

SQL Window Functions in SQL Server

SQL Server window function history and syntax

SQL Server introduced ROW_NUMBER() in SQL Server 2005 and added the full set (RANK, DENSE_RANK, LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTILE, CUME_DIST, PERCENT_RANK) in SQL Server 2012. All modern versions (2016, 2017, 2019, 2022) have complete window function support.

The named WINDOW clause was added in SQL Server 2022.

-- T-SQL equivalent of the month-over-month example
WITH monthly AS (
  SELECT FORMAT(order_date, 'yyyy-MM') AS month, SUM(total) AS revenue
  FROM orders GROUP BY FORMAT(order_date, 'yyyy-MM')
)
SELECT
  month, revenue,
  LAG(revenue)  OVER (ORDER BY month) AS prev_revenue,
  revenue - LAG(revenue) OVER (ORDER BY month) AS change,
  ROUND(
    (revenue - LAG(revenue) OVER (ORDER BY month))
    * 100.0 / LAG(revenue) OVER (ORDER BY month),
  1) AS pct_change
FROM monthly
ORDER BY month;

-- SQL Server uses FORMAT() for date strings instead of STRFTIME().
-- Otherwise window function syntax is identical to PostgreSQL and MySQL 8.

Database support comparison

Feature	PostgreSQL	MySQL 8+	SQL Server	SQLite 3.25+
ROW_NUMBER, RANK, DENSE_RANK	✓	✓	✓	✓
LAG, LEAD	✓	✓	✓	✓
FIRST_VALUE, LAST_VALUE, NTH_VALUE	✓	8.0.2+	✓	✓
NTILE, PERCENT_RANK, CUME_DIST	✓	✓	✓	✓
Named WINDOW clause	✓	✓	2022+	✓
RANGE with interval offsets	✓	✗	✓	✗
IGNORE NULLS in LAG/LEAD	14+	✗	✓	✗
FILTER clause in window aggregates	✓	✗	✗	✓

Common SQL Window Function Errors and Fixes

Error: window function not allowed in WHERE

-- ERROR
SELECT * FROM employees
WHERE ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) = 1;
-- SQL Error [42P20]: window functions are not allowed in WHERE

-- FIX: wrap in a CTE, then filter in the outer query
WITH ranked AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
  FROM employees
)
SELECT * FROM ranked WHERE rn = 1;

-- The same pattern fixes window functions incorrectly placed in HAVING:
-- HAVING RANK() OVER (...) = 1  →  CTE + WHERE rn = 1

Error: LAST_VALUE() returning current row instead of partition last

-- SYMPTOM: LAST_VALUE always returns the current row's value

-- CAUSE: default frame RANGE UNBOUNDED PRECEDING TO CURRENT ROW
-- "Last row of frame" = current row when using the default

-- WRONG (common mistake)
SELECT name, salary,
       LAST_VALUE(salary) OVER (PARTITION BY dept ORDER BY salary) AS highest
FROM employees;
-- Returns each employee's own salary — useless

-- FIX: extend frame to partition end
SELECT name, salary,
       LAST_VALUE(salary) OVER (
         PARTITION BY dept ORDER BY salary
         ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
       ) AS highest
FROM employees;

-- BETTER FIX: use FIRST_VALUE with DESC — no frame modification needed
SELECT name, salary,
       FIRST_VALUE(salary) OVER (PARTITION BY dept ORDER BY salary DESC) AS highest
FROM employees;

Error: non-deterministic ROW_NUMBER() without ORDER BY

-- WRONG: ROW_NUMBER() without ORDER BY — numbers are arbitrary
SELECT *, ROW_NUMBER() OVER () AS rn FROM employees;
-- rn values change between query runs — unpredictable

-- WRONG: LAG without ORDER BY — "previous row" is undefined
SELECT *, LAG(salary) OVER () AS prev FROM employees;
-- Result is meaningless — which row is "previous"?

-- FIX: always specify ORDER BY for ranking and offset functions
SELECT *, ROW_NUMBER() OVER (ORDER BY hire_date, id) AS rn FROM employees;
-- id as tie-breaker ensures stable, deterministic row numbers

-- Note: aggregate window functions (SUM, AVG) without ORDER BY
-- are fine — they compute grand totals, and order is irrelevant:
SELECT *, SUM(salary) OVER () AS total FROM employees;  -- always correct

Error: LAG crossing partition boundaries

-- SYMPTOM: LAG returns a value from a different customer's orders

-- WRONG: no PARTITION BY — LAG crosses all orders
SELECT user_id, order_date, total,
       LAG(total) OVER (ORDER BY order_date) AS prev_total
FROM orders;
-- Row 1 (user 2's first order) gets prev_total from user 1's last order

-- FIX: add PARTITION BY user_id
SELECT user_id, order_date, total,
       LAG(total) OVER (PARTITION BY user_id ORDER BY order_date) AS prev_total
FROM orders;
-- Now LAG only looks backward within the same user's orders
-- The first order per user returns NULL (no previous order for that user)

SQL Window Functions Interview Questions

Q1: Write a query to find the top 2 salaries per department (include ties)

-- Use DENSE_RANK to include all tied rows at each rank level
WITH ranked AS (
  SELECT
    first_name, department, salary,
    DENSE_RANK() OVER (
      PARTITION BY department
      ORDER BY salary DESC
    ) AS dr
  FROM employees
)
SELECT first_name, department, salary
FROM ranked
WHERE dr <= 2
ORDER BY department, salary DESC;

-- Key interview follow-up: "What if you want exactly 2 rows per department
-- with no ties?" → use ROW_NUMBER() instead of DENSE_RANK()
-- "What if a department has fewer than 2 employees?" → query still works;
-- it returns whatever exists up to rank 2

Q2: Deduplicate a table — keep the most recent record per email

-- users table has duplicate emails; keep only the newest per email
WITH deduped AS (
  SELECT *,
         ROW_NUMBER() OVER (
           PARTITION BY email
           ORDER BY created_at DESC, id DESC
         ) AS rn
  FROM users
)
SELECT id, first_name, email, created_at
FROM deduped
WHERE rn = 1;

-- Interview insight: why ROW_NUMBER and not RANK?
-- RANK would assign rn=1 to ALL rows with the same created_at
-- (if two duplicates were created at the exact same timestamp),
-- meaning WHERE rn = 1 could still return multiple rows.
-- ROW_NUMBER guarantees exactly one row per email group.

Q3: Calculate month-over-month revenue growth rate

WITH monthly AS (
  SELECT STRFTIME('%Y-%m', order_date) AS month,
         ROUND(SUM(total), 2) AS revenue
  FROM orders GROUP BY month
)
SELECT
  month,
  revenue,
  LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
  CASE
    WHEN LAG(revenue) OVER (ORDER BY month) IS NULL THEN NULL
    ELSE ROUND(
      (revenue - LAG(revenue) OVER (ORDER BY month))
      * 100.0 / LAG(revenue) OVER (ORDER BY month),
    2)
  END AS mom_growth_pct
FROM monthly
ORDER BY month;

-- Interviewers look for:
-- 1. Correct use of LAG (not a self-join)
-- 2. Handling NULL for the first month (CASE or NULLIF)
-- 3. Correct % formula: (new - old) / old * 100

Q4: Explain the difference between ROW_NUMBER, RANK, and DENSE_RANK

-- Use this live demo to explain:
SELECT salary,
  ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn,
  RANK()       OVER (ORDER BY salary DESC) AS rnk,
  DENSE_RANK() OVER (ORDER BY salary DESC) AS dr
FROM (VALUES (100),(90),(90),(80)) t(salary);

-- salary  rn  rnk  dr
-- 100     1   1    1
-- 90      2   2    2
-- 90      3   2    2   ← tied rows get same rank
-- 80      4   4    3   ← RANK gaps to 4; DENSE stays at 3

-- Verbal explanation:
-- ROW_NUMBER: always unique 1,2,3,4 — no ties, arbitrary at ties
-- RANK: same number for ties, then skips — 1,2,2,4 means "4th person = truly 4th"
-- DENSE_RANK: same number for ties, no gaps — 1,2,2,3 means "3rd tier"

Q5: Cumulative spend per customer, resetting for each customer

SELECT
  u.first_name,
  o.order_date,
  ROUND(o.total, 2) AS order_total,
  ROUND(SUM(o.total) OVER (
    PARTITION BY o.user_id
    ORDER BY o.order_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ), 2) AS cumulative_spend
FROM orders o
JOIN users u ON u.id = o.user_id
ORDER BY o.user_id, o.order_date;

-- Concepts tested:
-- PARTITION BY to reset running total per customer
-- ORDER BY inside OVER for chronological accumulation
-- ROWS frame for running total semantics (vs grand total)
-- JOIN to bring in customer name alongside order data

Q6: Find users who placed orders in two consecutive months

WITH user_months AS (
  SELECT DISTINCT user_id,
         STRFTIME('%Y-%m', order_date) AS order_month
  FROM orders
),
with_next AS (
  SELECT user_id, order_month,
         LEAD(order_month) OVER (
           PARTITION BY user_id ORDER BY order_month
         ) AS next_month
  FROM user_months
)
SELECT DISTINCT user_id
FROM with_next
WHERE next_month = STRFTIME('%Y-%m',
  DATE(order_month || '-01', '+1 month'));

-- Approach: get distinct (user, month) pairs, then LEAD to get
-- the user's next active month. If next_month is exactly one
-- calendar month later, the user was active in consecutive months.
-- The DATE arithmetic adds one month to the current month's first day.

SQL Window Functions Best Practices

Always add ORDER BY for ranking and offset functions

ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), and LEAD() all require ORDER BY inside OVER() to produce meaningful, deterministic results. Without ORDER BY, the database assigns values in arbitrary storage order — which can differ between query runs.

For tie-breaking, add a unique column (usually the primary key) as the final sort key:
  ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC, id)

This ensures every row always receives the same row number, even when salaries are equal.

Use CTEs to filter on window function results

Window functions execute after WHERE and HAVING. The only way to filter on a window function result is to materialize it first in a CTE:

WITH ranked AS (
  SELECT *, DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS dr
  FROM employees
)
SELECT * FROM ranked WHERE dr <= 3;

For multi-step analytics, chain CTEs:
WITH monthly AS (
  SELECT STRFTIME('%Y-%m', order_date) AS month, SUM(total) AS revenue
  FROM orders GROUP BY month
),
with_growth AS (
  SELECT *, LAG(revenue) OVER (ORDER BY month) AS prev_revenue FROM monthly
)
SELECT * FROM with_growth WHERE prev_revenue IS NOT NULL AND revenue > prev_revenue;

Prefer FIRST_VALUE(DESC) over LAST_VALUE() to avoid frame issues

LAST_VALUE() with the default frame returns the current row's value, not the partition's last row. Rather than adding the verbose frame clause every time, flip the ORDER BY and use FIRST_VALUE():

-- Verbose: LAST_VALUE with explicit frame
LAST_VALUE(salary) OVER (
  PARTITION BY dept ORDER BY salary
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

-- Simpler: FIRST_VALUE with reversed ORDER BY
FIRST_VALUE(salary) OVER (PARTITION BY dept ORDER BY salary DESC)

Both return the same result. The FIRST_VALUE approach is cleaner and immune to the frame gotcha.

Use ROWS not RANGE for time series moving windows

ROWS BETWEEN N PRECEDING AND CURRENT ROW counts N physical rows before the current one — predictable and explicit.

RANGE BETWEEN uses value-based offsets and includes all rows that tie on the ORDER BY column. For daily data with no gaps, both give the same result. When gaps exist or ties are possible, ROWS is almost always what you want:

-- 7-day window: the 7 most recent physical rows
AVG(revenue) OVER (ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

-- Use RANGE with interval only when you need calendar-based windows
-- and your database supports it (PostgreSQL, SQL Server):
AVG(revenue) OVER (ORDER BY order_date RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW)

Internal links — related SQL topics to study next

Window functions connect to several other advanced SQL topics. Study these alongside window functions to build complete analytical SQL fluency:

SQL JOINs Explained — window functions often replace complex self-joins; understanding JOINs first clarifies why window functions are simpler
SQL Subqueries — CTEs and subqueries are required to filter on window function results; these two topics are deeply linked
SQL Interview Questions — window functions appear in virtually every advanced SQL interview; review the full question set

Start practicing SQL for free

Run every query from this guide in the SQLab Hub interactive SQL playground — no signup, no download, instant results.

Open SQL Playground →View SQL Exercises →

Top 50 SQL Interview Questions & Answers (2026 Guide)25 min read SQL JOINs Explained: INNER, LEFT, RIGHT, FULL OUTER & Real Examples24 min read GROUP BY vs HAVING: What's the Difference?10 min read