SQL CTEs โ written using the WITH clause โ are one of those features that feel like cheating once you start using them. You name a temporary result set before the main query runs, reference that name just like a table, and suddenly your eight-level nested subquery becomes a sequence of clearly labeled steps. CTEs do not just make SQL prettier; they change the way you think about breaking problems down. This guide covers everything: the WITH clause syntax from first principles, 15+ practical examples from basic aggregations to running totals and recursive hierarchy traversal, an honest comparison of CTEs versus subqueries and temp tables, the real story on CTE performance, 20+ interview questions with answers, and practice problems at three difficulty levels. Every query runs against the SQLab Hub schema โ users, orders, order_items, products, employees โ so you can test each one immediately in the free SQL playground without any setup.
A Common Table Expression โ CTE โ is a named, temporary result set that you define before a query and then reference by name inside that same query. It exists only for the duration of the statement that defines it. Nothing is saved to disk. Nothing needs to be dropped when you finish. The WITH keyword introduces it, which is why CTEs are also called the SQL WITH clause.
-- Define the CTE, then query it like a table
WITH recent_orders AS (
SELECT id, user_id, total, created_at
FROM orders
WHERE created_at >= DATE('now', '-30 days')
)
SELECT user_id, COUNT(*) AS orders_last_30_days, SUM(total) AS total_spent
FROM recent_orders
GROUP BY user_id
ORDER BY total_spent DESC;WITH starts the block. recent_orders is the name you give the CTE โ choose something that describes what it contains. The parentheses hold the SELECT statement that defines the CTE. After the closing parenthesis, the main query begins. Inside the main query, recent_orders is used exactly like a real table name. The database runs the CTE query first, holds the result in memory, then runs the main query against it.
The "common" part means the result can be referenced in multiple places within the same query. "Table expression" reflects that it behaves as a virtual table โ it has columns and rows, and can be filtered, joined, and aggregated just like a physical table. In practice, most engineers simply call them CTEs or the WITH clause.
The WITH clause is part of the SQL:1999 standard. Support arrived in major databases at different times: SQL Server 2005, PostgreSQL around the same era, MySQL only in version 8.0 (2018). SQLite supports CTEs since version 3.8.3 (2014), which covers the SQLab Hub playground. MySQL 5.x does not support CTEs โ use derived tables or temp tables there instead.
The single biggest reason to use CTEs is readability. Complex analytical queries often require multiple preparation steps โ filter, aggregate, rank, then filter again. Without CTEs, this logic ends up nested three or four levels deep. With CTEs, you name each step. Anyone reading the query can scan the CTE names before reading a single line of SQL inside them and already understand the logic flow.
-- Without CTE: reader must parse nesting before understanding intent
SELECT u.email
FROM users u
JOIN (
SELECT DISTINCT user_id FROM orders
WHERE strftime('%m', created_at) = '01'
) jan ON jan.user_id = u.id
JOIN (
SELECT DISTINCT user_id FROM orders
WHERE strftime('%m', created_at) = '02'
) feb ON feb.user_id = u.id;
-- With CTEs: intent is obvious before the JOIN
WITH jan_buyers AS (
SELECT DISTINCT user_id FROM orders
WHERE strftime('%m', created_at) = '01'
),
feb_buyers AS (
SELECT DISTINCT user_id FROM orders
WHERE strftime('%m', created_at) = '02'
)
SELECT u.email
FROM users u
JOIN jan_buyers j ON j.user_id = u.id
JOIN feb_buyers f ON f.user_id = u.id;CTEs are inherently debuggable. If the final result looks wrong, you can run each CTE in isolation by temporarily ending the query after that CTE and adding SELECT * FROM cte_name LIMIT 20. This verifies each intermediate result independently. With nested subqueries, isolating a step means manually extracting and re-running the inner query โ an error-prone process, especially three nesting levels in.
Within a single query, a CTE can be referenced multiple times by name. If you calculate a base metric once in a CTE and then join it with two different conditions, the CTE name appears twice but the underlying query is written once. With subqueries, reusing the same logic means repeating the entire subquery โ creating a maintenance problem every time the logic needs to change.
Complex analytical queries often need staged transformations: clean the data, aggregate it, rank it, then filter on the rank. With CTEs, each stage gets its own name. The final SELECT becomes a clean summary of the logic flow. This composability is why data engineers who write dbt models use CTEs for virtually every transformation โ the resulting SQL is predictable, reviewable, and testable step by step.
WITH cte_name AS ( -- Any valid SELECT statement SELECT column1, column2 FROM table_name WHERE condition ) -- Main query follows immediately โ no separator SELECT column1, column2 FROM cte_name WHERE other_condition;
WITH
first_cte AS (
SELECT id, total FROM orders WHERE status = 'completed'
),
second_cte AS (
-- Later CTEs can reference earlier ones
SELECT user_id, SUM(total) AS revenue
FROM first_cte
GROUP BY user_id
),
third_cte AS (
SELECT * FROM second_cte WHERE revenue > 500
)
SELECT * FROM third_cte ORDER BY revenue DESC;WITH appears once, before all CTE definitions. Multiple CTEs are separated by commas. There is NO comma between the last CTE and the main SELECT. The main query must immediately follow โ no other statements between the WITH block and the SELECT. Column aliases work inside CTE queries exactly as in normal queries. You can explicitly declare output columns in the CTE header: WITH cte_name (col1, col2) AS (SELECT a, b FROM ...).
-- Declaring output column names in the CTE header WITH order_summary (user_id, order_count, revenue) AS ( SELECT user_id, COUNT(*), SUM(total) FROM orders GROUP BY user_id ) SELECT u.email, os.order_count, os.revenue FROM users u JOIN order_summary os ON os.user_id = u.id ORDER BY os.revenue DESC;
-- Recursive CTE uses UNION ALL to reference itself WITH RECURSIVE cte_name AS ( -- Anchor member: base case, runs exactly once SELECT id, parent_id, name, 0 AS depth FROM categories WHERE parent_id IS NULL UNION ALL -- Recursive member: references cte_name for next level SELECT c.id, c.parent_id, c.name, r.depth + 1 FROM categories c JOIN cte_name r ON r.id = c.parent_id WHERE r.depth < 10 -- always include a termination guard ) SELECT * FROM cte_name ORDER BY depth, name; -- Note: PostgreSQL and SQLite require RECURSIVE keyword -- SQL Server supports self-referencing CTEs without RECURSIVE
-- CTE isolates the aggregation; main query handles the JOIN WITH user_spend AS ( SELECT user_id, SUM(total) AS total_spent FROM orders WHERE status = 'completed' GROUP BY user_id ) SELECT u.first_name, u.email, us.total_spent FROM users u JOIN user_spend us ON us.user_id = u.id WHERE us.total_spent > 500 ORDER BY us.total_spent DESC;
-- CTE handles the date formatting; main query aggregates cleanly
WITH monthly_orders AS (
SELECT
strftime('%Y-%m', created_at) AS month,
total
FROM orders
WHERE status != 'cancelled'
)
SELECT
month,
COUNT(*) AS order_count,
ROUND(SUM(total), 2) AS revenue,
ROUND(AVG(total), 2) AS avg_order_value
FROM monthly_orders
GROUP BY month
ORDER BY month;-- Two CTEs: first aggregates by day, second adds the window function
WITH daily_revenue AS (
SELECT
DATE(created_at) AS order_date,
SUM(total) AS day_revenue
FROM orders
WHERE status = 'completed'
GROUP BY DATE(created_at)
),
running AS (
SELECT
order_date,
day_revenue,
SUM(day_revenue) OVER (ORDER BY order_date) AS cumulative_revenue
FROM daily_revenue
)
SELECT * FROM running ORDER BY order_date;-- Rank products by revenue within each category, return top 3 per category
WITH product_revenue AS (
SELECT
p.id, p.name, p.category,
SUM(oi.quantity * oi.unit_price) AS total_revenue
FROM products p
JOIN order_items oi ON oi.product_id = p.id
JOIN orders o ON o.id = oi.order_id
WHERE o.status = 'completed'
GROUP BY p.id, p.name, p.category
),
ranked AS (
SELECT
*,
RANK() OVER (PARTITION BY category ORDER BY total_revenue DESC) AS rank_in_cat
FROM product_revenue
)
SELECT * FROM ranked WHERE rank_in_cat <= 3 ORDER BY category, rank_in_cat;-- Four-stage pipeline: filter โ aggregate โ qualify โ rank
WITH completed AS (
SELECT * FROM orders WHERE status = 'completed'
),
user_metrics AS (
SELECT
user_id,
COUNT(*) AS orders,
SUM(total) AS revenue,
AVG(total) AS avg_order
FROM completed
GROUP BY user_id
),
qualified AS (
SELECT * FROM user_metrics WHERE revenue > 200
)
SELECT
u.email, u.city,
q.orders, q.revenue,
RANK() OVER (ORDER BY q.revenue DESC) AS revenue_rank
FROM users u
JOIN qualified q ON q.user_id = u.id
ORDER BY q.revenue DESC;-- Keep only the most recently created record per email address
WITH ranked AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_at DESC
) AS rn
FROM users
)
SELECT id, email, first_name, last_name, created_at
FROM ranked
WHERE rn = 1;WITH order_counts AS (
SELECT u.id, u.email, COUNT(o.id) AS cnt
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.email
)
SELECT email FROM order_counts
WHERE cnt = 0
ORDER BY email;-- How many days was each customer active (time between first and last order)? WITH first_ord AS ( SELECT user_id, MIN(DATE(created_at)) AS first_date FROM orders GROUP BY user_id ), last_ord AS ( SELECT user_id, MAX(DATE(created_at)) AS last_date FROM orders GROUP BY user_id ) SELECT u.email, fo.first_date, lo.last_date, CAST(julianday(lo.last_date) - julianday(fo.first_date) AS INTEGER) AS days_active FROM users u JOIN first_ord fo ON fo.user_id = u.id JOIN last_ord lo ON lo.user_id = u.id ORDER BY days_active DESC;
A recursive CTE has two parts connected by UNION ALL. The anchor member runs exactly once and produces the starting rows. The recursive member runs repeatedly, each iteration reading the rows from the previous iteration and using them to generate the next level. Recursion stops when the recursive member produces zero rows. Without a termination condition, the query runs until the database hits its recursion depth limit and throws an error.
WITH RECURSIVE tree AS ( -- Anchor: run once, start at root nodes (no parent) SELECT id, name, parent_id, 0 AS depth FROM categories WHERE parent_id IS NULL UNION ALL -- Recursive: join one level deeper each iteration SELECT c.id, c.name, c.parent_id, t.depth + 1 FROM categories c JOIN tree t ON t.id = c.parent_id WHERE t.depth < 10 -- termination guard: never recurse beyond depth 10 ) SELECT depth, id, name FROM tree ORDER BY depth, name;
-- Navigate an employees table: id, name, manager_id
-- Returns every employee with their depth and full reporting path
WITH RECURSIVE org_chart AS (
-- Anchor: executives with no manager
SELECT id, name, manager_id, name AS path, 0 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: one level deeper each pass
SELECT e.id, e.name, e.manager_id,
oc.path || ' > ' || e.name AS path,
oc.level + 1
FROM employees e
JOIN org_chart oc ON oc.id = e.manager_id
WHERE oc.level < 20
)
SELECT level, name, path FROM org_chart ORDER BY path;-- Generate integers 1โ100 without a numbers table -- Useful for date-range filling, histogram buckets, test data WITH RECURSIVE nums AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM nums WHERE n < 100 ) SELECT n FROM nums;
-- Find every subcategory under category id = 5 (any depth) WITH RECURSIVE descendants AS ( SELECT id, name, parent_id, 0 AS depth FROM categories WHERE id = 5 -- starting node UNION ALL SELECT c.id, c.name, c.parent_id, d.depth + 1 FROM categories c JOIN descendants d ON d.id = c.parent_id ) SELECT depth, id, name FROM descendants ORDER BY depth, name;
Always include a depth guard (WHERE depth < N). Verify your data has no cycles โ if a parent_id creates a circular reference, recursion will never terminate naturally. In PostgreSQL, use CYCLE detection syntax if circular data is possible. In SQL Server, the MAXRECURSION query hint limits recursion depth per query. The safe pattern: depth counter + WHERE depth < max_possible_depth + 1.
Feature | CTE | Subquery ---------------------------|-------------------------------|------------------------------ Syntax position | Before the main query (WITH) | Inline in FROM/WHERE/SELECT Reusable within one query | Yes โ reference by name | No โ must repeat the code Supports recursion | Yes (WITH RECURSIVE) | No Readability | High โ named, sequential | Low when deeply nested Debuggability | Easy โ run in isolation | Hard โ must extract manually Performance | Usually identical | Usually identical Correlated queries | No (CTEs are independent) | Yes (correlated subqueries) Best for | Complex multi-step logic | Simple one-off inline filters
-- Subquery: intent buried inside the JOIN SELECT u.email, sub.total_spent FROM users u JOIN ( SELECT user_id, SUM(total) AS total_spent FROM orders WHERE status = 'completed' GROUP BY user_id HAVING SUM(total) > 500 ) sub ON sub.user_id = u.id; -- CTE: same result, readable from top to bottom WITH big_spenders AS ( SELECT user_id, SUM(total) AS total_spent FROM orders WHERE status = 'completed' GROUP BY user_id HAVING SUM(total) > 500 ) SELECT u.email, bs.total_spent FROM users u JOIN big_spenders bs ON bs.user_id = u.id;
Subqueries are not bad. For a simple EXISTS check or a single WHERE IN filter, a subquery is concise and perfectly readable. WHERE user_id IN (SELECT id FROM users WHERE is_active = 1) does not need a CTE. The rule of thumb: if the logic is one level deep and used once, a subquery is appropriate. If it is multi-step, reused, or nested more than two levels, use a CTE.
Feature | CTE | Temp Table ---------------------------|-------------------------------|------------------------------ Persistence | Single statement only | Session or transaction Indexable | No | Yes Re-queryable across stmts | No | Yes Setup / cleanup required | None | CREATE + INSERT / DROP Large intermediate results | May re-evaluate | Stored physically Best for | Single-query complex logic | Multi-step ETL, large data
If the same intermediate result is needed across multiple separate SQL statements, a temp table is the only option โ CTEs do not persist beyond the statement that defines them. Additionally, if an intermediate result contains millions of rows that need to be queried multiple times with different filters, materializing it into a temp table with an index will dramatically outperform a CTE that the optimizer evaluates repeatedly.
-- Pattern for multi-step processing where temp tables are appropriate CREATE TEMP TABLE active_customers AS SELECT id, email, city FROM users WHERE is_active = 1; CREATE INDEX idx_ac_city ON active_customers (city); -- Multiple different queries reuse the same materialised result SELECT city, COUNT(*) FROM active_customers GROUP BY city; SELECT * FROM active_customers WHERE city = 'New York'; DROP TABLE active_customers; -- cleanup (or it drops at session end)
Use a CTE for temporary named results within a single query. Use a temp table when you need intermediate results across multiple queries in the same session, when intermediate results are large and need indexes, or when the same data will be scanned many times in different ways. For anything that fits in one SQL statement at reasonable data volumes, prefer the CTE โ zero setup, zero cleanup, zero state.
CTEs are not inherently faster or slower than subqueries. For most queries, the optimizer produces an identical execution plan regardless of whether you write a CTE or an equivalent subquery. Performance differences, when they exist, are driven by materialization behavior โ whether the optimizer evaluates the CTE once and caches the result (materializes), or inlines it and potentially evaluates it multiple times.
-- PostgreSQL < 12: CTEs were ALWAYS materialized -- (evaluated once, cached โ predicates NOT pushed through) -- This could be faster OR slower depending on the query shape -- PostgreSQL 12+: optimizer can inline non-recursive CTEs unless: -- 1. The CTE is referenced more than once -- 2. The CTE contains volatile functions (RANDOM(), NOW(), etc.) -- 3. You explicitly use AS MATERIALIZED -- Force inlining in PostgreSQL 12+: WITH my_cte AS NOT MATERIALIZED ( SELECT * FROM large_table WHERE status = 'active' ) SELECT * FROM my_cte WHERE created_at > '2025-01-01'; -- Force materialization (useful when CTE is reused and costly to re-run): WITH my_cte AS MATERIALIZED ( SELECT user_id, COUNT(*) AS cnt FROM orders GROUP BY user_id ) SELECT * FROM my_cte WHERE cnt > 10 UNION ALL SELECT * FROM my_cte WHERE cnt = 1;
-- EXPLAIN ANALYZE shows whether the CTE is materialized or inlined EXPLAIN ANALYZE WITH big_spenders AS ( SELECT user_id, SUM(total) AS total_spent FROM orders GROUP BY user_id ) SELECT u.email, bs.total_spent FROM users u JOIN big_spenders bs ON bs.user_id = u.id WHERE bs.total_spent > 500; -- Look for "CTE Scan" in the plan = materialized (evaluated once) -- No "CTE Scan" = inlined (predicate pushed into base table scan)
Filter early: apply WHERE conditions inside the CTE, not just in the outer query โ this reduces the rows the CTE must produce. Avoid referencing large CTEs multiple times if the database materializes them. For recursive CTEs, always include a depth limit. Index the underlying base tables, not the CTE itself โ CTEs have no indexes. If a CTE is referenced three or more times in a complex query and profiling shows degraded performance, consider a temp table with an index instead.
A Common Table Expression is a named temporary result set defined with the WITH clause, scoped to a single query. You use CTEs to improve readability by breaking complex multi-step queries into named stages, to reuse an intermediate result set within one query without repeating logic, and to enable recursive queries that process hierarchical data.
A CTE is defined before the main query with WITH and referenced by name anywhere in that query. A subquery is embedded inline inside FROM, WHERE, or SELECT. CTEs can be referenced multiple times within the same query; subqueries must be repeated. CTEs support recursion; subqueries do not. Both often produce identical execution plans, so the choice is primarily about readability and maintainability.
A recursive CTE calls itself using UNION ALL inside the WITH block. It has an anchor member (base case, runs once) and a recursive member that references the CTE name and runs until it returns no rows. Common use cases: employee org chart traversal to find all reports under a manager, category tree walking in e-commerce, bill-of-materials hierarchies, and generating date or number sequences.
Yes. Once defined in the WITH block, a CTE can be referenced anywhere in the main query and multiple times. You can JOIN the same CTE to itself under two aliases, reference it in two separate JOINs, or use it in both a subquery and the main SELECT. In databases that materialize CTEs, the underlying query runs once regardless of how many times the name is referenced.
Run each CTE in isolation. Temporarily end the query after the CTE definition and replace the main query with SELECT * FROM cte_name LIMIT 10. This verifies each intermediate result before the final logic. This is the biggest practical debugging advantage CTEs have over nested subqueries, where isolating a step requires manually extracting and re-running the inner query.
A CTE exists only for the duration of the statement that defines it โ no persistence, no schema object, no permissions to manage. A view is a permanent database object stored in the schema, reusable by any user with access. Use a CTE for one-off complex queries. Use a view when you want to make reusable query logic available across an application or to multiple users without repeating the SQL.
WITH ranked AS (
SELECT
id, name, salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
)
SELECT name, salary
FROM ranked
WHERE rnk = 2;WITH email_counts AS ( SELECT email, COUNT(*) AS cnt FROM users GROUP BY email ) SELECT email, cnt FROM email_counts WHERE cnt > 1 ORDER BY cnt DESC;
WITH daily_sales AS ( SELECT DATE(created_at) AS sale_date, SUM(total) AS revenue FROM orders WHERE status = 'completed' GROUP BY DATE(created_at) ) SELECT sale_date, revenue, SUM(revenue) OVER (ORDER BY sale_date) AS running_total FROM daily_sales ORDER BY sale_date;
Without a termination condition, the recursive member continues producing rows until the database hits its maximum recursion depth. PostgreSQL defaults to 100 iterations; SQL Server defaults to 100 and accepts a MAXRECURSION hint. When the limit is hit, the query fails with an error. Always include a depth counter and a WHERE depth < N guard in every recursive CTE you write in production.
-- Yes, in PostgreSQL and SQL Server. PostgreSQL example: WITH old_cancelled AS ( SELECT id FROM orders WHERE status = 'cancelled' AND created_at < '2024-01-01' ) DELETE FROM orders WHERE id IN (SELECT id FROM old_cancelled); -- SQL Server allows updating CTE rows directly in some patterns. -- Useful for complex filtered mutations where inline logic is awkward.
In PostgreSQL 12+, you control whether a CTE is materialized (evaluated once, result cached) or inlined (treated like a subquery). WITH cte AS MATERIALIZED (...) forces caching. WITH cte AS NOT MATERIALIZED (...) forces inlining and allows predicate pushdown. Non-recursive CTEs with no side effects are inlined by default in PostgreSQL 12+. Volatile functions (RANDOM(), clock_timestamp()) always force materialization.
WITH RECURSIVE counter AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM counter WHERE n < 50 ) SELECT n FROM counter;
A derived table is a subquery in the FROM clause given an alias: SELECT ... FROM (SELECT ...) AS alias. A CTE is defined before the query with WITH. Both produce a named temporary result, but CTEs can be referenced multiple times within the same query, support recursion, and are easier to read in complex scenarios. Derived tables cannot reference themselves and must be repeated if the same result is needed in more than one place.
WITH ranked AS (
SELECT
id, name, category,
SUM(units_sold) AS total_sold,
RANK() OVER (
PARTITION BY category
ORDER BY SUM(units_sold) DESC
) AS rnk
FROM products
GROUP BY id, name, category
)
SELECT * FROM ranked WHERE rnk <= 5 ORDER BY category, rnk;The anchor member is the first SELECT in a recursive CTE, before the UNION ALL. It runs exactly once and establishes the starting rows. It cannot reference the CTE itself. The anchor defines the base case โ top-level categories with no parent, the CEO with no manager, or the integer 1 in a sequence generator. The recursive member then builds on these anchor rows one level at a time.
Syntactically yes in most databases, but the ordering is not guaranteed to propagate to the main query. Ordering is only reliable in the outermost SELECT. SQL Server will error on ORDER BY inside a CTE without TOP or FETCH NEXT. As a rule: apply ORDER BY only in the final SELECT, not inside CTE definitions โ unless you are deliberately using TOP/LIMIT within the CTE itself.
CTEs have no special NULL handling โ they inherit standard SQL NULL semantics. NULL comparisons inside CTEs use IS NULL, IS NOT NULL, and COALESCE exactly as in regular queries. If the CTE produces rows with NULLs in join columns, those rows behave identically to a real table: dropped by INNER JOIN, preserved with NULL fill by LEFT JOIN.
WITH introduces a standard non-recursive CTE. WITH RECURSIVE is required in PostgreSQL and SQLite when the CTE references itself. SQL Server does not use RECURSIVE โ the same WITH syntax handles both cases, and self-referencing is detected automatically. Using a self-referencing CTE without RECURSIVE in PostgreSQL produces a "relation does not exist" error because the CTE name is not yet in scope.
WITH user_months AS (
SELECT DISTINCT
user_id,
strftime('%Y-%m', created_at) AS order_month
FROM orders
),
with_prev AS (
SELECT
user_id,
order_month,
LAG(order_month, 1) OVER (PARTITION BY user_id ORDER BY order_month) AS prev1,
LAG(order_month, 2) OVER (PARTITION BY user_id ORDER BY order_month) AS prev2
FROM user_months
)
SELECT DISTINCT user_id
FROM with_prev
WHERE prev1 IS NOT NULL
AND prev2 IS NOT NULL;
-- A stricter version would verify months are exactly 1 month apart
-- using date arithmetic on the year-month strings-- Pivot order status counts per user using conditional aggregation
WITH status_pivot AS (
SELECT
user_id,
SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) AS completed,
SUM(CASE WHEN status = 'pending' THEN 1 ELSE 0 END) AS pending,
SUM(CASE WHEN status = 'cancelled' THEN 1 ELSE 0 END) AS cancelled
FROM orders
GROUP BY user_id
)
SELECT u.email, sp.completed, sp.pending, sp.cancelled
FROM users u
JOIN status_pivot sp ON sp.user_id = u.id
ORDER BY sp.completed DESC;-- Write a CTE that calculates total revenue per product.
-- Return the top 5 by revenue.
WITH product_revenue AS (
SELECT
p.name,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM products p
JOIN order_items oi ON oi.product_id = p.id
GROUP BY p.name
)
SELECT name, ROUND(revenue, 2) AS revenue
FROM product_revenue
ORDER BY revenue DESC
LIMIT 5;-- Use two CTEs: one for the global average, one for per-user average. -- Return users whose average order value exceeds the global average. WITH global_avg AS ( SELECT AVG(total) AS avg_all FROM orders ), user_avg AS ( SELECT user_id, AVG(total) AS avg_user FROM orders GROUP BY user_id ) SELECT u.email, ROUND(ua.avg_user, 2) AS avg_order FROM users u JOIN user_avg ua ON ua.user_id = u.id JOIN global_avg ga ON ua.avg_user > ga.avg_all ORDER BY avg_order DESC;
-- Calculate the percentage change in revenue between consecutive months.
WITH monthly AS (
SELECT
strftime('%Y-%m', created_at) AS month,
SUM(total) AS revenue
FROM orders
WHERE status = 'completed'
GROUP BY strftime('%Y-%m', created_at)
),
with_prev AS (
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue
FROM monthly
)
SELECT
month,
ROUND(revenue, 2) AS revenue,
ROUND(100.0 * (revenue - prev_revenue) / prev_revenue, 1) AS pct_change
FROM with_prev
WHERE prev_revenue IS NOT NULL
ORDER BY month;-- Find customers who ordered historically but not in the last 90 days.
WITH last_order AS (
SELECT user_id, MAX(DATE(created_at)) AS last_date
FROM orders
GROUP BY user_id
)
SELECT
u.email,
lo.last_date,
CAST(julianday('now') - julianday(lo.last_date) AS INTEGER) AS days_since
FROM users u
JOIN last_order lo ON lo.user_id = u.id
WHERE lo.last_date < DATE('now', '-90 days')
ORDER BY days_since DESC;-- Given employees(id, name, manager_id),
-- return each employee with depth and full path from root.
WITH RECURSIVE hierarchy AS (
SELECT id, name, manager_id,
name AS path, 0 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id,
h.path || ' > ' || e.name,
h.depth + 1
FROM employees e
JOIN hierarchy h ON h.id = e.manager_id
WHERE h.depth < 15
)
SELECT depth, name, path FROM hierarchy ORDER BY path;-- Three stages: revenue per user โ rank within city โ filter top-1
WITH user_revenue AS (
SELECT user_id, SUM(total) AS revenue
FROM orders
WHERE status = 'completed'
GROUP BY user_id
),
city_ranked AS (
SELECT
u.city, u.email, ur.revenue,
RANK() OVER (
PARTITION BY u.city
ORDER BY ur.revenue DESC
) AS rnk
FROM users u
JOIN user_revenue ur ON ur.user_id = u.id
)
SELECT city, email, ROUND(revenue, 2) AS revenue
FROM city_ranked
WHERE rnk = 1
ORDER BY revenue DESC;-- Wrong โ extra comma before the main SELECT causes a syntax error WITH cte_a AS (SELECT id FROM orders WHERE status = 'completed'), cte_b AS (SELECT user_id FROM cte_a GROUP BY user_id), -- โ bad comma SELECT * FROM cte_b; -- Correct โ no comma after the last CTE definition WITH cte_a AS (SELECT id FROM orders WHERE status = 'completed'), cte_b AS (SELECT user_id FROM cte_a GROUP BY user_id) SELECT * FROM cte_b;
A CTE exists only within the single statement containing the WITH clause. Running a CTE definition as one query and then referencing its name in the next query will fail โ the name is not in scope. If you need the same logic available across multiple statements, use a view or a temp table instead. Each statement that needs the CTE must define it.
-- Dangerous: no WHERE clause, runs until the database error limit WITH RECURSIVE bad AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM bad -- never terminates ) SELECT * FROM bad; -- fails at recursion limit -- Safe: always guard with a condition WITH RECURSIVE safe AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM safe WHERE n < 1000 ) SELECT * FROM safe;
-- Inefficient: CTE scans the full table, outer query filters after WITH all_orders AS (SELECT * FROM orders) SELECT * FROM all_orders WHERE status = 'completed'; -- Better: filter inside the CTE to reduce rows before any join WITH completed AS (SELECT * FROM orders WHERE status = 'completed') SELECT * FROM completed;
A CTE produces a result set with no guaranteed row order unless ORDER BY is specified in the outermost SELECT. Even if the CTE query returns rows in a particular order internally, the optimizer can process them in any sequence. Always apply ORDER BY in the final SELECT when output ordering matters โ never rely on ordering inherited from inside a CTE.
Not every query needs a CTE. WHERE user_id IN (SELECT id FROM users WHERE is_active = 1) is clear as a subquery. Wrapping it in a CTE adds two extra lines for no readability gain. CTEs add value when queries are complex, multi-step, or need to reuse the same intermediate result. For single, simple filters, a plain subquery or a JOIN is the right tool.
high_value_customers, monthly_revenue, and ranked_products are good CTE names. subquery_1, temp_result, and filtered_data are not. The CTE name is documentation. A reader scanning the WITH block should understand the query's logic before reading a single line of SQL inside the CTEs.
One CTE should do one thing: filter, aggregate, rank, or join. If a CTE is simultaneously cleaning data, aggregating it, and ranking the result, split it into three named stages. The benefit of multiple CTEs is composability โ each step can be read, understood, and tested independently. A CTE that does everything cannot be debugged step by step.
Apply WHERE conditions inside the CTE that generates the data, not in the outer query after the fact. Most optimizers push predicates through anyway, but making the intent explicit in the CTE ensures correct behavior across different databases and different optimizer versions โ and makes the query's filtering logic immediately visible to the reader.
Add WHERE depth < N in the recursive member. Set N to slightly more than the maximum real depth in your data. The guard protects against unexpected circular references that would otherwise cause runtime failures. If your org chart has at most 8 levels, guard at 15. For unknown depth, 100 is a safe general-purpose limit.
While building a multi-CTE query, stop after each CTE and run SELECT * FROM cte_name LIMIT 20 to verify the intermediate result. Fix each stage before adding the next. This incremental approach catches logic errors before they compound โ a wrong filter in the first CTE will silently corrupt every downstream CTE if not caught early.
WHERE user_id IN (SELECT id FROM users WHERE is_active = 1) is readable as-is. The intent is clear and the nesting is one level. Converting this to a CTE adds two lines without adding clarity. Use CTEs when the complexity justifies the added structure โ not as a universal replacement for every subquery.
If a CTE produces millions of rows and you reference it three times in one query, and the optimizer inlines it rather than materializing it, you could be scanning a large dataset three times. In this scenario, a temp table with an index will significantly outperform the CTE approach. Always profile with EXPLAIN ANALYZE at realistic data volumes before deploying CTE-heavy queries to production.
CTEs disappear when the statement ends. For multi-step ETL pipelines where step 3 depends on data computed in step 1 from a previous query, you need a temp table or a permanent staging table. This is not a CTE limitation to work around โ it is an intentional design boundary that separates single-query named results from multi-statement intermediate storage.
A Common Table Expression (CTE) is a named temporary result set defined with the WITH clause, placed before a SELECT, INSERT, UPDATE, or DELETE statement. It exists only for the duration of that query, requires no setup or cleanup, and makes complex SQL readable by letting you name and reuse intermediate results instead of nesting subqueries.
Not automatically. CTEs and equivalent subqueries usually produce identical execution plans in modern optimizers. The primary benefit of CTEs is readability and maintainability. In PostgreSQL, a CTE can act as an optimization fence that prevents predicate pushdown โ which can make it slower than an equivalent subquery in some cases. Verify with EXPLAIN ANALYZE when performance matters.
A recursive CTE references itself inside the WITH clause. It has an anchor member (base case, runs once) and a recursive member (references the CTE, runs until no rows are produced). Use cases include employee org charts, category trees, bill-of-materials hierarchies, and generating number or date sequences.
A CTE exists only within a single query statement and requires no cleanup. A temp table persists for the session or transaction, can be indexed, and supports queries across multiple statements. For single-query usage, CTEs are simpler. For multi-step processing or large intermediate results that need indexes, temp tables are more appropriate.
Yes. Multiple CTEs are separated by commas inside a single WITH block. Later CTEs can reference earlier ones. This lets you build complex multi-step logic as a clean, sequential pipeline without any nested subqueries.
A CTE is temporary โ it exists only for the query that defines it. A view is a permanent database object stored in the schema, queryable by any user with appropriate permissions at any time. Use CTEs for one-off queries; use views for reusable logic shared across multiple queries or multiple users.
Yes, in PostgreSQL and SQL Server. You can reference a CTE in UPDATE and DELETE statements to perform complex filtered mutations. This is useful for deleting duplicates or updating records based on multi-step logic that is awkward to express inline.
Yes. SQLite supports both standard and recursive CTEs since version 3.8.3 (2014). The SQLab Hub SQL Playground runs on SQLite, so every example in this guide is directly executable in the playground โ no database setup required.
RECURSIVE enables a CTE to reference itself, allowing iterative processing. PostgreSQL and SQLite require the keyword explicitly. SQL Server does not use RECURSIVE โ its WITH clause supports self-referencing automatically. In databases that require it, omitting RECURSIVE on a self-referencing CTE produces a "relation does not exist" error at runtime.
PostgreSQL, MySQL 8.0+, SQL Server 2005+, Oracle 9i+, SQLite 3.8.3+, MariaDB 10.2+, Snowflake, BigQuery, Amazon Redshift, DuckDB, and all modern SQL dialects. The WITH clause is part of the SQL:1999 standard. MySQL 5.x does not support CTEs.
Both create a named temporary result set, but a derived table is a subquery in the FROM clause with an alias. A CTE is defined before the main query with WITH. CTEs are reusable within the same query, support recursion, and read sequentially. Derived tables cannot reference themselves and must be repeated if needed in more than one place.
Run every query from this guide in the SQLab Hub interactive SQL playground โ no signup, no download, instant results.