SQL subqueries — queries nested inside other queries — are one of the most powerful and frequently tested concepts in SQL. Every experienced data professional reaches for them daily: to filter results based on aggregates, to compute derived values inline, to check for existence without a JOIN, and to build complex analytics that would be impossible as a flat query. This guide covers everything from how a subquery executes and the difference between correlated and non-correlated types, through scalar subqueries, inline views, EXISTS vs IN performance tradeoffs, subquery vs JOIN comparisons, and over 20 real examples drawn from e-commerce, HR, finance, and SaaS scenarios. Whether you are writing your first nested query, preparing for a SQL interview at a tech company, or debugging a slow correlated subquery in production — this is the complete reference. Every example uses the SQLab Hub schema (users, orders, products, employees) so you can run them immediately in the free SQL playground.
A subquery — also called a nested query or inner query — is a SELECT statement written inside another SQL statement. The outer statement is the parent query or outer query. The subquery produces a result set that the outer query uses to filter rows, compute values, or define a temporary table. Think of it as a question inside a question: "Show me employees (outer query) whose salary is above the average salary (inner query)."
-- Inner query computes the average first -- Outer query uses that result to filter rows SELECT first_name, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees); -- Execution order: -- Step 1: (SELECT AVG(salary) FROM employees) runs → returns e.g. 72000 -- Step 2: Outer WHERE becomes: WHERE salary > 72000 -- The inner query runs ONCE. Its result is substituted into the outer query.
For non-correlated subqueries — the most common kind — SQL evaluates the inner query first, then passes its result to the outer query. For correlated subqueries, where the inner query references a column from the outer query, the inner query re-executes once for each row processed by the outer query. This distinction drives the most important performance tradeoff covered later in this guide.
-- 1. In WHERE — filter rows based on a subquery result SELECT name FROM products WHERE category_id IN (SELECT id FROM categories WHERE name = 'Electronics'); -- 2. In FROM — create a derived table (inline view) SELECT dept, avg_sal FROM ( SELECT department AS dept, AVG(salary) AS avg_sal FROM employees GROUP BY department ) dept_averages WHERE avg_sal > 70000; -- 3. In SELECT — compute a scalar value per row SELECT first_name, salary, (SELECT AVG(salary) FROM employees) AS company_avg FROM employees; -- 4. In HAVING — filter groups based on a subquery result SELECT department, COUNT(*) AS headcount FROM employees GROUP BY department HAVING COUNT(*) > (SELECT AVG(cnt) FROM ( SELECT COUNT(*) AS cnt FROM employees GROUP BY department ) dept_counts);
-- Single value comparison (scalar subquery in WHERE) SELECT first_name, salary, department FROM employees WHERE salary > (SELECT AVG(salary) FROM employees); -- Multi-row subquery with IN SELECT first_name, department FROM employees WHERE department IN ( SELECT department_name FROM departments WHERE location = 'New York' ); -- Existence check with EXISTS SELECT first_name, email FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.status = 'completed' );
-- Subquery in FROM acts as a temporary table
-- The alias (dept_stats) is REQUIRED in most databases
SELECT dept_stats.department, dept_stats.avg_salary
FROM (
SELECT
department,
COUNT(*) AS headcount,
ROUND(AVG(salary), 0) AS avg_salary
FROM employees
GROUP BY department
) dept_stats
WHERE dept_stats.headcount >= 5
ORDER BY dept_stats.avg_salary DESC;-- Returns one value per row — must return exactly 1 row × 1 column
SELECT
e.first_name,
e.salary,
(SELECT AVG(salary) FROM employees) AS company_avg,
(SELECT AVG(salary) FROM employees
WHERE department = e.department) AS dept_avg,
ROUND(e.salary - (SELECT AVG(salary) FROM employees
WHERE department = e.department), 0) AS vs_dept_avg
FROM employees e
ORDER BY vs_dept_avg DESC;-- Filter groups based on a subquery result SELECT department, ROUND(AVG(salary), 0) AS avg_sal FROM employees GROUP BY department HAVING AVG(salary) > ( SELECT AVG(salary) FROM employees -- overall company average ) ORDER BY avg_sal DESC; -- Returns only departments whose average salary -- exceeds the company-wide average.
A single row subquery returns exactly one row. Used with comparison operators: =, >, <, >=, <=, <>. If the subquery unexpectedly returns more than one row, the query throws a runtime error. Always use single row subqueries when the result is guaranteed to be unique — such as MAX(), MIN(), or a lookup by primary key.
-- Find the employee with the highest salary SELECT first_name, salary, department FROM employees WHERE salary = (SELECT MAX(salary) FROM employees); -- Find all employees in the same department as employee #42 SELECT first_name, department FROM employees WHERE department = (SELECT department FROM employees WHERE id = 42) AND id != 42;
A multiple row subquery returns more than one row. You must use a multi-row comparison operator: IN, NOT IN, ANY, or ALL. Using = with a multi-row subquery causes a "subquery returns more than one row" error. IN is the most common operator; ANY and ALL handle threshold comparisons against a set.
-- IN: employees in departments located in 'New York' or 'London'
SELECT first_name, department
FROM employees
WHERE department IN (
SELECT name FROM departments WHERE location IN ('New York', 'London')
);
-- ANY: employees who earn more than at least one contractor
SELECT first_name, salary FROM employees
WHERE salary > ANY (SELECT rate FROM contractors);
-- ALL: employees who earn more than EVERY contractor
SELECT first_name, salary FROM employees
WHERE salary > ALL (SELECT rate FROM contractors);-- Compare against multiple columns simultaneously -- Find orders where (user_id, product_id) matches a wishlisted pair SELECT o.id, o.total, o.order_date FROM orders o JOIN order_items oi ON oi.order_id = o.id WHERE (oi.product_id, o.user_id) IN ( SELECT product_id, user_id FROM wishlist_items ); -- Multi-column IN is supported in PostgreSQL, MySQL, SQLite. -- SQL Server requires EXISTS with a correlated WHERE instead.
A correlated subquery references one or more columns from the outer query. Because of this dependency it cannot run independently — it must re-execute once for every row the outer query processes. This makes it flexible (it can compute per-row context) but potentially slow on large tables. You can identify a correlated subquery by the outer table alias used inside the inner SELECT.
-- Find employees who earn above their own department average -- The inner query references e.department from the outer query SELECT e.first_name, e.salary, e.department FROM employees e WHERE e.salary > ( SELECT AVG(salary) FROM employees WHERE department = e.department -- ← references outer row ); -- For each employee row, the inner query runs once using that -- employee's department. With 500 employees, that is 500 inner executions. -- Modern optimizers often rewrite this as a single-pass join automatically.
A non-correlated subquery has no reference to the outer query. It executes once, returns its result, and the outer query uses that fixed result for all rows it processes. The database evaluates the inner query only once regardless of how many outer rows are examined — making it the most efficient subquery type.
-- The inner query runs ONCE and returns a single number (e.g. 72000) -- The outer query compares every employee row against that same number SELECT first_name, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees); -- Not correlated because AVG(salary) covers ALL employees, -- not a department-specific subset based on the outer row. -- There is no reference to the outer table alias inside the subquery.
A scalar subquery returns exactly one row and one column — a single scalar value. It can be used anywhere a single value is valid: as a column in SELECT, in a WHERE condition with =, or in any expression. If a scalar subquery ever returns more than one row or more than one column at runtime, the database throws an error.
An inline view is a subquery in the FROM clause. It creates a temporary named result set that the outer query treats exactly like a real table. The alias is required. Inline views are the foundation of derived table patterns and are especially useful for applying WHERE or ORDER BY conditions to aggregated results.
SELECT first_name, department, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees) ORDER BY salary DESC; -- Business use: identify above-average earners for compensation review. -- Type: non-correlated, single-row, WHERE clause.
-- Method 1: subquery approach — works in all SQL dialects SELECT MAX(salary) AS second_highest FROM employees WHERE salary < (SELECT MAX(salary) FROM employees); -- Method 2: OFFSET (PostgreSQL, MySQL, SQLite) SELECT DISTINCT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 1; -- Method 3: generalised to Nth highest with DENSE_RANK SELECT salary FROM ( SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM employees ) ranked WHERE rnk = 2 LIMIT 1;
-- NOT EXISTS is NULL-safe — preferred over NOT IN here SELECT p.name, p.category, p.price FROM products p WHERE NOT EXISTS ( SELECT 1 FROM order_items oi WHERE oi.product_id = p.id ); -- Business use: identify dead inventory for removal or promotion. -- Type: correlated NOT EXISTS.
-- EXISTS short-circuits at the first matching row SELECT u.first_name, u.email, u.country FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id ) ORDER BY u.first_name; -- Equivalent with IN (similar plan in modern optimizers): SELECT first_name, email FROM users WHERE id IN (SELECT DISTINCT user_id FROM orders);
SELECT u.first_name, u.email, u.created_at FROM users u WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id ) ORDER BY u.created_at DESC; -- Business use: identify unactivated users for onboarding campaigns. -- NOT EXISTS is safer than NOT IN when NULLs may be present.
SELECT id, user_id, total, order_date FROM orders WHERE total > ( SELECT AVG(total) FROM orders WHERE status = 'completed' ) AND status = 'completed' ORDER BY total DESC LIMIT 20; -- Business use: surface high-value orders for priority analysis.
SELECT
e.first_name,
e.department,
e.salary,
ROUND((SELECT AVG(salary) FROM employees
WHERE department = e.department), 0) AS dept_avg,
ROUND(e.salary - (SELECT AVG(salary) FROM employees
WHERE department = e.department), 0) AS diff
FROM employees e
ORDER BY e.department, diff DESC;
-- Business use: compensation equity analysis within departments.
-- Type: correlated scalar subquery in SELECT clause.SELECT product_name, total_revenue, units_sold
FROM (
SELECT
p.name AS product_name,
SUM(oi.quantity * oi.unit_price) AS total_revenue,
SUM(oi.quantity) AS units_sold
FROM order_items oi
JOIN products p ON p.id = oi.product_id
JOIN orders o ON o.id = oi.order_id
WHERE o.status = 'completed'
GROUP BY p.id, p.name
) product_stats
WHERE total_revenue > 5000
ORDER BY total_revenue DESC
LIMIT 20;
-- The inline view aggregates. The outer WHERE filters.
-- Without the inline view you cannot filter on the computed total_revenue.SELECT department, headcount
FROM (
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department
) dept_counts
WHERE headcount > (
SELECT AVG(headcount) FROM (
SELECT COUNT(*) AS headcount
FROM employees
GROUP BY department
) all_counts
)
ORDER BY headcount DESC;
-- Nested inline views: aggregate, then aggregate the aggregation.
-- Common in analytics when you need to compare groups to a group average.SELECT u.first_name, u.email, (SELECT MAX(order_date) FROM orders WHERE user_id = u.id) AS last_order_date, (SELECT COUNT(*) FROM orders WHERE user_id = u.id) AS total_orders FROM users u WHERE EXISTS (SELECT 1 FROM orders WHERE user_id = u.id) ORDER BY last_order_date DESC; -- Business use: user activity summary for CRM and re-engagement. -- Type: correlated scalar subqueries in SELECT clause.
SELECT c.name, COUNT(p.id) AS product_count FROM categories c JOIN products p ON p.category_id = c.id GROUP BY c.id, c.name HAVING COUNT(p.id) > ( SELECT AVG(cnt) FROM (SELECT COUNT(*) AS cnt FROM products GROUP BY category_id) cat_counts ) ORDER BY product_count DESC; -- Subquery in HAVING filters groups using a computed average.
-- ALL: salary must exceed every single value in the Sales salary list SELECT first_name, salary, department FROM employees WHERE salary > ALL ( SELECT salary FROM employees WHERE department = 'Sales' ) AND department != 'Sales' ORDER BY salary DESC; -- If Sales salaries are [50000, 60000, 70000], -- this returns non-Sales employees earning above 70000 (the max).
-- Pattern: GROUP BY + HAVING COUNT > 1 — memorise this for interviews SELECT email, COUNT(*) AS occurrences FROM users GROUP BY email HAVING COUNT(*) > 1 ORDER BY occurrences DESC; -- To retrieve the full duplicate rows (not just the count): SELECT * FROM users WHERE email IN ( SELECT email FROM users GROUP BY email HAVING COUNT(*) > 1 ) ORDER BY email;
SELECT u.first_name, u.email,
MAX(o.order_date) AS last_order_date
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.first_name, u.email
HAVING MAX(o.order_date) < DATE('now', '-90 days')
ORDER BY last_order_date ASC;
-- Alternative using a correlated subquery in WHERE:
SELECT first_name, email FROM users u
WHERE (SELECT MAX(order_date) FROM orders WHERE user_id = u.id)
< DATE('now', '-90 days');SELECT month, revenue,
ROUND((SELECT AVG(rev)
FROM (SELECT STRFTIME('%Y-%m', order_date) AS m,
SUM(total) AS rev
FROM orders WHERE status = 'completed'
GROUP BY m) all_months), 2) AS avg_monthly_revenue
FROM (
SELECT STRFTIME('%Y-%m', order_date) AS month,
ROUND(SUM(total), 2) AS revenue
FROM orders
WHERE status = 'completed'
GROUP BY month
) monthly_revenue
ORDER BY month;
-- Each month's revenue shown next to the overall monthly average.SELECT category, name, revenue,
RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS rank_in_category
FROM (
SELECT
p.category,
p.name,
ROUND(SUM(oi.quantity * oi.unit_price), 2) AS revenue
FROM order_items oi
JOIN products p ON p.id = oi.product_id
JOIN orders o ON o.id = oi.order_id
WHERE o.status = 'completed'
GROUP BY p.id, p.name, p.category
) product_revenue
ORDER BY category, rank_in_category;
-- Combining inline views with window functions is standard in BI reporting.
-- The inline view pre-aggregates; the window function ranks within groups.A scalar subquery returns exactly one column and one row — a single value. You can use it anywhere a single value is valid: as a derived column in SELECT, in a WHERE comparison with =, inside CASE WHEN expressions, or in arithmetic. The name comes from linear algebra where a scalar is a single number, as opposed to a vector or matrix.
-- Compare each employee's salary to two benchmarks SELECT e.first_name, e.department, e.salary, (SELECT ROUND(AVG(salary), 0) FROM employees) AS company_avg, (SELECT ROUND(AVG(salary), 0) FROM employees WHERE department = e.department) AS dept_avg FROM employees e ORDER BY e.salary DESC; -- First scalar subquery: non-correlated — runs once, value reused per row. -- Second scalar subquery: correlated — runs once per employee row.
-- Find the most recent order in the system SELECT id, user_id, total, order_date FROM orders WHERE order_date = (SELECT MAX(order_date) FROM orders); -- Find products priced above the most expensive Electronics item SELECT name, price, category FROM products WHERE price > ( SELECT MAX(price) FROM products WHERE category = 'Electronics' ) ORDER BY price DESC;
-- Each order as a percentage of total completed revenue
SELECT
id,
total,
ROUND(total * 100.0 / (SELECT SUM(total) FROM orders
WHERE status = 'completed'), 2) AS pct_of_total
FROM orders
WHERE status = 'completed'
ORDER BY pct_of_total DESC
LIMIT 20;
-- Salary delta from company average as a percentage
SELECT
first_name,
salary,
ROUND((salary - (SELECT AVG(salary) FROM employees)) * 100.0
/ (SELECT AVG(salary) FROM employees), 1) AS pct_above_avg
FROM employees
ORDER BY pct_above_avg DESC;-- ERROR: returns more than one row when multiple rows match -- Do not use = when the subquery could return multiple rows SELECT first_name FROM employees WHERE salary = (SELECT salary FROM employees WHERE department = 'Sales'); -- "subquery returns more than one row" -- SAFE: guarantee uniqueness with aggregate or primary key SELECT first_name FROM employees WHERE salary = (SELECT MAX(salary) FROM employees WHERE department = 'Sales'); -- ERROR: returns more than one column -- SELECT (SELECT first_name, salary FROM employees WHERE id = 1); -- Rule: scalar subquery must return exactly 1 row × 1 column.
An inline view is a subquery placed in the FROM clause. The outer query treats it exactly like a real table — you can SELECT any of its columns, JOIN it to other tables, filter it with WHERE, group it, and sort it. Unlike a CTE, it has no name visible to other parts of the query. The alias you give it is required in most databases (PostgreSQL, MySQL, SQL Server throw an error without it) and is scoped only to the immediately surrounding query.
-- Problem: you cannot use WHERE on COUNT(*) — but HAVING has limitations.
-- Inline views let you filter any computed column with regular WHERE.
SELECT dept, headcount, avg_sal
FROM (
SELECT
department AS dept,
COUNT(*) AS headcount,
ROUND(AVG(salary), 0) AS avg_sal
FROM employees
GROUP BY department
) dept_summary
WHERE headcount BETWEEN 3 AND 15 -- filter computed columns freely
ORDER BY avg_sal DESC;
-- Without the inline view you could not filter on headcount with WHERE.
-- HAVING can only filter on aggregate functions, not computed aliases.-- Pre-aggregate order stats per user, then join user details
SELECT
u.first_name,
u.country,
order_stats.total_orders,
order_stats.lifetime_value
FROM users u
JOIN (
SELECT
user_id,
COUNT(*) AS total_orders,
ROUND(SUM(total), 2) AS lifetime_value
FROM orders
WHERE status = 'completed'
GROUP BY user_id
) order_stats ON order_stats.user_id = u.id
WHERE order_stats.lifetime_value > 500
ORDER BY order_stats.lifetime_value DESC;
-- The inline view computes order stats once.
-- The JOIN enriches each result row with the user's profile fields.-- Average department headcount across the company SELECT ROUND(AVG(headcount), 1) AS avg_dept_headcount FROM ( SELECT COUNT(*) AS headcount FROM employees GROUP BY department ) dept_sizes; -- Each department vs that average SELECT dept, headcount FROM ( SELECT department AS dept, COUNT(*) AS headcount FROM employees GROUP BY department ) dept_sizes WHERE headcount > ( SELECT AVG(cnt) FROM (SELECT COUNT(*) AS cnt FROM employees GROUP BY department) x ) ORDER BY headcount DESC;
Inline views and CTEs (WITH clause) are semantically equivalent in most databases — both define a named intermediate result set the outer query can reference. Use a CTE when you need to reference the same derived data more than once in a single query, when the query has multiple logical steps that benefit from named intermediate stages, or when you need recursive queries. Use an inline view when the derived table is used only once and the query is short enough that adding a WITH clause would add noise rather than clarity.
When filtering rows based on whether a related row exists in another table, you have three main tools: EXISTS, IN, and JOIN. Each has different semantics, performance characteristics, and readability tradeoffs. Knowing which to use — and why — is a very common SQL interview question and a critical production skill.
Feature | EXISTS | IN | JOIN ---------------------|-------------------------------|-----------------------------|----------------------- Returns from subquery| TRUE / FALSE only | Set of values | Combined row set Short-circuits | Yes — stops at first match | No — evaluates full list | No NULL handling | Always safe | NOT IN breaks with NULLs | Depends on JOIN type Deduplication | Automatic (1 row per outer) | Handled by IN semantics | Often needs DISTINCT Optimizer rewrites | Often → semi-join | Often → semi-join | Already a join Best for | Existence checks, large sets | Small known value lists | Retrieving joined columns
-- EXISTS returns TRUE as soon as the subquery produces any row -- The SELECT list inside EXISTS is irrelevant — use SELECT 1 by convention -- Users who have at least one completed order SELECT u.first_name, u.email FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.status = 'completed' ); -- Products that appear in at least one order item SELECT p.name, p.price FROM products p WHERE EXISTS ( SELECT 1 FROM order_items oi WHERE oi.product_id = p.id );
-- Products that have NEVER been ordered -- NOT EXISTS is NULL-safe — use this instead of NOT IN whenever subquery -- columns might contain NULLs (which breaks NOT IN silently) SELECT p.name, p.category, p.price FROM products p WHERE NOT EXISTS ( SELECT 1 FROM order_items oi WHERE oi.product_id = p.id ); -- Why NOT IN can silently return zero rows: -- If ANY row in the subquery returns NULL, the condition NOT IN (NULL, ...) -- evaluates to UNKNOWN for every outer row — killing all results. -- NOT EXISTS does not have this problem.
-- IN with a small literal list — no subquery needed
SELECT name, department FROM employees
WHERE department IN ('Engineering', 'Product', 'Design');
-- IN with a subquery — readable for small result sets
SELECT first_name FROM users
WHERE id IN (
SELECT DISTINCT user_id FROM orders WHERE status = 'completed'
);
-- Performance: PostgreSQL 9+, MySQL 8+ rewrite IN (subquery) as a semi-join
-- producing the same plan as EXISTS. On older engines or very large subquery
-- result sets, EXISTS may be faster. Test with EXPLAIN.-- KEY RULE: use JOIN when you need to SELECT columns from the related table. -- EXISTS and IN only filter — they cannot expose the inner table's columns. -- Need order data alongside user info? Must use JOIN. SELECT u.first_name, o.id AS order_id, o.total, o.order_date FROM users u JOIN orders o ON o.user_id = u.id WHERE o.status = 'completed' ORDER BY o.total DESC; -- Only need to check if user ever ordered? Use EXISTS — simpler. SELECT u.first_name FROM users u WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
Scenario | Recommended
-------------------------------------------------|--------------------
Filter based on related row existence | EXISTS
Filter where related row does NOT exist | NOT EXISTS (not NOT IN)
Match against a small static list of values | IN ('a', 'b', 'c')
Subquery may return NULLs | EXISTS (not IN)
Need columns from the related / matched table | JOIN
Deduplicate results automatically | EXISTS or LEFT JOIN + IS NULL
Aggregate data from a related table | JOIN + GROUP BYSubqueries and JOINs can often express the same logic, but they are not always interchangeable — and choosing the wrong one affects readability, maintainability, and sometimes performance. The core rule: if you only need to filter based on a relationship, EXISTS or IN subqueries are usually cleaner. If you need columns from the related table, you need a JOIN. If you need aggregated data from a related table, an inline view (subquery in FROM) often leads to cleaner SQL than a GROUP BY in a JOIN.
Dimension | Subquery | JOIN ---------------------|------------------------------------|--------------------------------- Primary purpose | Filter, compute, or derive data | Combine rows from multiple tables Exposes related cols | No (unless inline view in FROM) | Yes — SELECT from any joined table Deduplication | Built in for WHERE subqueries | Often needs DISTINCT Readability | Cleaner for existence checks | Cleaner for multi-column retrieval Performance | Comparable after optimizer rewrite | Usually similar; JOIN wins on aggregates Scalability | Correlated subqueries can be slow | Better scaling with proper indexes
-- GOAL: find users who have placed at least one order -- Option A: IN subquery SELECT first_name, email FROM users WHERE id IN (SELECT DISTINCT user_id FROM orders); -- Option B: EXISTS (semantically clearest for "existence check") SELECT first_name, email FROM users u WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id); -- Option C: INNER JOIN (needs DISTINCT to avoid duplicating users) SELECT DISTINCT u.first_name, u.email FROM users u INNER JOIN orders o ON o.user_id = u.id; -- All three return the same result. Modern optimizers produce near-identical -- plans for A and B. Option C (JOIN) is the most familiar but needs DISTINCT.
-- 1. Existence checks without needing joined columns SELECT name FROM products p WHERE NOT EXISTS (SELECT 1 FROM order_items oi WHERE oi.product_id = p.id); -- 2. Filtering on an aggregated result (inline view) SELECT department, avg_salary FROM ( SELECT department, ROUND(AVG(salary), 0) AS avg_salary FROM employees GROUP BY department ) dept_stats WHERE avg_salary > 75000; -- 3. Per-row scalar comparisons in SELECT SELECT first_name, salary, salary - (SELECT AVG(salary) FROM employees) AS above_avg FROM employees;
-- When you need columns from the related table — a subquery in WHERE
-- cannot expose inner table columns to the outer SELECT.
SELECT u.first_name, u.country,
o.id AS order_id, o.total, o.order_date,
p.name AS product_name, oi.quantity
FROM users u
JOIN orders o ON o.user_id = u.id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE o.status = 'completed'
ORDER BY o.order_date DESC
LIMIT 50;
-- No subquery alone can produce this multi-table, multi-column result.-- Run EXPLAIN before assuming a subquery is slow EXPLAIN SELECT e.first_name, e.salary FROM employees e WHERE e.salary > ( SELECT AVG(salary) FROM employees WHERE department = e.department ); -- In PostgreSQL, use EXPLAIN ANALYZE for actual runtimes: EXPLAIN ANALYZE SELECT ... -- Look for: -- "Index Scan" vs "Seq Scan" — index scans are faster -- "Nested Loop" on large outer — indicates unoptimized correlated query -- "Hash Join" or "Merge Join" — optimizer rewrote to join (good) -- Row estimate vs actual — huge gaps mean stale statistics (run ANALYZE)
-- SLOW: correlated subquery on a large table SELECT e.first_name, e.salary FROM employees e WHERE e.salary > ( SELECT AVG(salary) FROM employees WHERE department = e.department ); -- FAST: compute dept averages once with GROUP BY, then JOIN WITH dept_avgs AS ( SELECT department, AVG(salary) AS dept_avg FROM employees GROUP BY department ) SELECT e.first_name, e.salary FROM employees e JOIN dept_avgs d ON d.department = e.department WHERE e.salary > d.dept_avg; -- The CTE version: one GROUP BY pass + one JOIN scan. -- The correlated version: N inner executions for N outer rows.
-- IN materializes the full subquery result before scanning SELECT first_name FROM users WHERE id IN (SELECT user_id FROM orders WHERE status = 'completed'); -- EXISTS short-circuits at the first matching row SELECT first_name FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.status = 'completed' ); -- On PostgreSQL 9+ and MySQL 8.0+, both often produce the same plan. -- EXISTS is always safer for NOT IN scenarios (NULL contamination risk). -- Always benchmark on your actual data volumes — rules of thumb can mislead.
-- Slow correlated subquery without index: SELECT first_name FROM users u WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id); -- Sequential scan of orders for every user row -- Add an index on the JOIN/correlation column: -- CREATE INDEX idx_orders_user_id ON orders(user_id); -- Now the inner subquery does an index lookup instead of a full scan. -- For composite conditions inside the subquery: -- CREATE INDEX idx_orders_user_status ON orders(user_id, status); -- Covers both the correlation column and the filter column.
-- LESS EFFICIENT: all rows aggregated, then outer WHERE filters SELECT dept, headcount FROM ( SELECT department AS dept, COUNT(*) AS headcount FROM employees GROUP BY department ) x WHERE dept = 'Engineering'; -- MORE EFFICIENT: filter before aggregation — fewer rows to GROUP BY SELECT dept, headcount FROM ( SELECT department AS dept, COUNT(*) AS headcount FROM employees WHERE department = 'Engineering' -- filter INSIDE the subquery GROUP BY department ) x; -- This matters on tables with millions of rows — moving the filter -- inside dramatically reduces the GROUP BY input size.
-- AVOID: fetches unnecessary columns even for existence checks WHERE id IN (SELECT * FROM orders WHERE status = 'completed') -- BETTER: fetch only the column you need WHERE id IN (SELECT user_id FROM orders WHERE status = 'completed') -- BEST for existence: SELECT 1 communicates intent clearly WHERE EXISTS (SELECT 1 FROM orders WHERE user_id = u.id) -- Some optimizers strip SELECT * in subqueries automatically, -- but being explicit is faster, more portable, and clearer to readers.
-- Hard to read, optimize, and maintain beyond two levels of nesting
SELECT * FROM (
SELECT dept, rev FROM (
SELECT p.category AS dept, SUM(oi.quantity * oi.unit_price) AS rev
FROM order_items oi JOIN products p ON p.id = oi.product_id
GROUP BY p.category
) cat_rev WHERE rev > 1000
) top_cats WHERE dept LIKE 'E%';
-- CTEs are cleaner and PostgreSQL 12+ can materialize them (optimization fence)
WITH cat_rev AS (
SELECT p.category AS dept, SUM(oi.quantity * oi.unit_price) AS rev
FROM order_items oi JOIN products p ON p.id = oi.product_id
GROUP BY p.category
),
top_cats AS (SELECT dept, rev FROM cat_rev WHERE rev > 1000)
SELECT * FROM top_cats WHERE dept LIKE 'E%';-- ERROR: if subquery returns more than one row, this fails at runtime SELECT first_name FROM employees WHERE department = (SELECT department FROM departments); -- "subquery returns more than one row" -- FIX: use IN when multiple rows are valid SELECT first_name FROM employees WHERE department IN (SELECT name FROM departments WHERE active = 1); -- FIX: use = only when result is guaranteed scalar (MAX, MIN, primary key) WHERE department = (SELECT name FROM departments WHERE id = 5);
-- SILENT BUG: if product_id contains even one NULL, NOT IN returns NOTHING SELECT name FROM products WHERE id NOT IN (SELECT product_id FROM order_items); -- Why: NULL != anything evaluates to UNKNOWN, not FALSE. -- NOT IN (1, 2, NULL) = NOT (id = 1 OR id = 2 OR id = NULL) -- = NOT (... OR UNKNOWN) = UNKNOWN for every row → zero results. -- FIX: use NOT EXISTS (always NULL-safe) SELECT name FROM products p WHERE NOT EXISTS (SELECT 1 FROM order_items oi WHERE oi.product_id = p.id); -- FIX: explicitly exclude NULLs from the subquery WHERE id NOT IN (SELECT product_id FROM order_items WHERE product_id IS NOT NULL);
-- ERROR in PostgreSQL, MySQL, SQL Server: -- "Every derived table must have its own alias" SELECT dept, avg_sal FROM ( SELECT department AS dept, AVG(salary) AS avg_sal FROM employees GROUP BY department ); -- ← missing alias here -- FIX: always add an alias after the closing parenthesis SELECT dept, avg_sal FROM ( SELECT department AS dept, AVG(salary) AS avg_sal FROM employees GROUP BY department ) dept_stats; -- ← required alias
-- ERROR if more than one user has the first name 'Alice' SELECT * FROM orders WHERE user_id = (SELECT id FROM users WHERE first_name = 'Alice'); -- FIX: filter on a unique column (email, primary key) WHERE user_id = (SELECT id FROM users WHERE email = 'alice@example.com'); -- FIX: use IN if multiple matches are intentional WHERE user_id IN (SELECT id FROM users WHERE first_name = 'Alice');
-- Three levels of nesting is the point of no return for readability
SELECT * FROM (
SELECT dept, rev FROM (
SELECT p.category AS dept, SUM(oi.quantity * oi.unit_price) AS rev
FROM order_items oi JOIN products p ON p.id = oi.product_id
GROUP BY p.category
) a WHERE rev > 1000
) b WHERE dept LIKE 'E%';
-- FIX: convert to CTEs the moment nesting exceeds one level
WITH cat_rev AS (
SELECT p.category AS dept, SUM(oi.quantity * oi.unit_price) AS rev
FROM order_items oi JOIN products p ON p.id = oi.product_id
GROUP BY p.category
)
SELECT dept, rev FROM cat_rev WHERE rev > 1000 AND dept LIKE 'E%';Modern query optimizers — PostgreSQL 9+, MySQL 8.0+, SQL Server 2016+ — routinely transform correlated subqueries into efficient semi-join or hash-join plans. Do not prematurely rewrite correct, readable code based on a general performance rule. Run EXPLAIN first. If the plan shows a nested loop on a large table without an index lookup, then rewrite. If it shows a hash semi-join, the optimizer already handled it.
-- All of these are identical in behavior — EXISTS only checks for row existence WHERE EXISTS (SELECT * FROM orders WHERE user_id = u.id) WHERE EXISTS (SELECT 1 FROM orders WHERE user_id = u.id) WHERE EXISTS (SELECT NULL FROM orders WHERE user_id = u.id) WHERE EXISTS (SELECT id FROM orders WHERE user_id = u.id) -- The columns in the EXISTS subquery SELECT list are ignored entirely. -- Convention: SELECT 1 — it communicates intent and avoids column resolution.
-- Customers who ordered 60–90 days ago and have not ordered since
-- Used for targeted re-engagement email campaigns
SELECT u.first_name, u.email,
MAX(o.order_date) AS last_order
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.first_name, u.email
HAVING MAX(o.order_date) BETWEEN DATE('now', '-90 days') AND DATE('now', '-60 days')
AND NOT EXISTS (
SELECT 1 FROM orders o2
WHERE o2.user_id = u.id
AND o2.order_date > DATE('now', '-60 days')
);-- Employees earning above the 75th percentile in their department
-- with at least 2 years tenure
SELECT e.first_name, e.department, e.salary, e.hire_date
FROM employees e
WHERE e.salary > (
SELECT s.salary FROM (
SELECT salary,
NTILE(4) OVER (PARTITION BY department ORDER BY salary) AS quartile
FROM employees
WHERE department = e.department
) s WHERE s.quartile >= 3
ORDER BY s.salary DESC LIMIT 1
)
AND e.hire_date <= DATE('now', '-2 years')
ORDER BY e.department, e.salary DESC;-- Orders more than 3× the user's own historical average — potential fraud
SELECT o.id, o.user_id, o.total, o.order_date
FROM orders o
WHERE o.total > 3 * (
SELECT AVG(total)
FROM orders
WHERE user_id = o.user_id
AND id != o.id -- exclude current order from the average
)
AND o.status = 'pending'
ORDER BY o.total DESC;
-- Business use: route flagged orders to manual review queue.-- Percentage of total completed revenue per product category
SELECT
category,
category_revenue,
ROUND(category_revenue * 100.0 / total_revenue, 2) AS revenue_share_pct
FROM (
SELECT p.category,
SUM(oi.quantity * oi.unit_price) AS category_revenue
FROM order_items oi
JOIN products p ON p.id = oi.product_id
JOIN orders o ON o.id = oi.order_id
WHERE o.status = 'completed'
GROUP BY p.category
) cat_rev
CROSS JOIN (
SELECT SUM(oi.quantity * oi.unit_price) AS total_revenue
FROM order_items oi
JOIN orders o ON o.id = oi.order_id
WHERE o.status = 'completed'
) total
ORDER BY revenue_share_pct DESC;-- Users who have engaged with more than 3 distinct features in 30 days
SELECT u.id, u.email,
(SELECT COUNT(DISTINCT feature_name)
FROM feature_events
WHERE user_id = u.id
AND event_date >= DATE('now', '-30 days')) AS features_used
FROM users u
WHERE (
SELECT COUNT(DISTINCT feature_name)
FROM feature_events
WHERE user_id = u.id
AND event_date >= DATE('now', '-30 days')
) > 3
ORDER BY features_used DESC;
-- Business use: segment power users for beta programs or case studies.A subquery is a SELECT statement nested inside another SQL statement. It can appear in WHERE (to filter rows), FROM (as an inline view or derived table), SELECT (as a scalar subquery for per-row values), and HAVING (to filter aggregated groups). Subqueries can also be used in INSERT, UPDATE, and DELETE statements to determine which rows to act on.
A non-correlated subquery runs once independently — it has no reference to the outer query.
SELECT first_name FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Inner query has no outer reference. Runs once. Result reused for all rows.
A correlated subquery references a column from the outer query.
It re-executes once per outer row.
SELECT first_name, department FROM employees e
WHERE salary > (SELECT AVG(salary) FROM employees
WHERE department = e.department);
-- e.department is the outer reference. Runs N times for N outer rows.-- Correlated subquery approach (readable, correct) SELECT first_name, department, salary FROM employees e WHERE salary > ( SELECT AVG(salary) FROM employees WHERE department = e.department ) ORDER BY department, salary DESC; -- JOIN + inline view approach (more performant on large tables) SELECT e.first_name, e.department, e.salary FROM employees e JOIN ( SELECT department, AVG(salary) AS dept_avg FROM employees GROUP BY department ) d ON d.department = e.department WHERE e.salary > d.dept_avg;
-- NOT EXISTS (safest — NULL-proof) SELECT name, category, price FROM products p WHERE NOT EXISTS ( SELECT 1 FROM order_items oi WHERE oi.product_id = p.id ); -- LEFT JOIN + IS NULL (classic equivalent) SELECT p.name, p.category, p.price FROM products p LEFT JOIN order_items oi ON oi.product_id = p.id WHERE oi.id IS NULL; -- NOT IN (only safe if NULLs are explicitly excluded) SELECT name FROM products WHERE id NOT IN ( SELECT product_id FROM order_items WHERE product_id IS NOT NULL );
EXISTS evaluates whether the subquery returns any rows — it short-circuits at the first match and returns TRUE or FALSE. It is always correlated. Use EXISTS for large subqueries and always use NOT EXISTS (never NOT IN) when the subquery might contain NULLs. IN compares a value against a complete result set. Use IN for small fixed value lists. In modern databases (PostgreSQL 9+, MySQL 8+), IN with a subquery is often rewritten to the same semi-join plan as EXISTS. The practical difference is that NOT IN is dangerous with NULLs while NOT EXISTS is always correct.
A scalar subquery returns exactly one row and one column. Use it anywhere a single value is expected: SELECT list, WHERE =, or expressions. Throws a runtime error when: • Used with = but returns multiple rows at runtime • Used in SELECT but returns multiple rows • Returns more than one column Safe pattern — guarantees scalar with MAX(): SELECT name, (SELECT MAX(price) FROM products) AS max_price FROM categories; Unsafe pattern — multiple employees may share 'John': WHERE id = (SELECT id FROM employees WHERE first_name = 'John')
-- Subquery approach — works in all SQL dialects SELECT MAX(salary) AS second_highest_salary FROM employees WHERE salary < (SELECT MAX(salary) FROM employees); -- DENSE_RANK approach — generalises to Nth highest SELECT salary FROM ( SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM employees ) ranked WHERE rnk = 2 LIMIT 1;
An inline view is a subquery in the FROM clause, scoped to the single query it appears in. A CTE (WITH clause) defines a named temporary result set scoped to the entire statement that can be referenced multiple times. CTEs are preferred when derived data is needed more than once or when readability benefits from named logical steps. Inline views are preferred for one-off derived tables in short queries. Both are semantically equivalent in most databases.
NOT IN returns zero rows if the subquery produces any NULL value. Reason: any comparison with NULL returns UNKNOWN, not FALSE. NOT IN (1, 2, NULL) for any outer value evaluates to UNKNOWN — never TRUE. UNKNOWN rows are excluded — so the entire result is empty. This is a silent bug — no error, just wrong (empty) results. ALWAYS use NOT EXISTS instead: SELECT name FROM products p WHERE NOT EXISTS (SELECT 1 FROM order_items oi WHERE oi.product_id = p.id); Or filter NULLs explicitly: WHERE id NOT IN (SELECT product_id FROM order_items WHERE product_id IS NOT NULL);
Original correlated form (re-runs once per outer row): SELECT e.first_name, e.salary FROM employees e WHERE e.salary > ( SELECT AVG(salary) FROM employees WHERE department = e.department ); Rewritten as JOIN + CTE (computes dept averages once): WITH dept_avgs AS ( SELECT department, AVG(salary) AS dept_avg FROM employees GROUP BY department ) SELECT e.first_name, e.salary FROM employees e JOIN dept_avgs d ON d.department = e.department WHERE e.salary > d.dept_avg; Always verify with EXPLAIN first — modern optimizers may already produce the same plan for both versions, making the rewrite unnecessary.
Yes. HAVING filters groups after GROUP BY, and a subquery can supply the comparison value — exactly like in WHERE. Find departments whose average salary exceeds the company-wide average: SELECT department, ROUND(AVG(salary), 0) AS avg_sal, COUNT(*) AS headcount FROM employees GROUP BY department HAVING AVG(salary) > (SELECT AVG(salary) FROM employees) ORDER BY avg_sal DESC;
SELECT user_id, COUNT(DISTINCT STRFTIME('%m', order_date)) AS months_active
FROM orders
WHERE order_date BETWEEN '2025-01-01' AND '2025-12-31'
GROUP BY user_id
HAVING COUNT(DISTINCT STRFTIME('%m', order_date)) = 12;
-- With user name (wrapping the above as a subquery):
SELECT u.first_name, u.email
FROM users u
WHERE u.id IN (
SELECT user_id FROM orders
WHERE order_date BETWEEN '2025-01-01' AND '2025-12-31'
GROUP BY user_id
HAVING COUNT(DISTINCT STRFTIME('%m', order_date)) = 12
);A subquery in WHERE is a filter condition — it determines which rows the outer query returns but cannot expose its own columns to the outer SELECT. A subquery in FROM creates an inline view — a temporary derived table whose columns are fully available to the outer SELECT, JOIN, WHERE, and ORDER BY. FROM subqueries expose their columns; WHERE subqueries do not.
-- Correlated subquery: count how many people in the same dept earn more
SELECT e1.first_name, e1.department, e1.salary
FROM employees e1
WHERE (
SELECT COUNT(*) FROM employees e2
WHERE e2.department = e1.department
AND e2.salary > e1.salary
) < 2 -- fewer than 2 people in same dept earn more than this employee
ORDER BY e1.department, e1.salary DESC;
-- Modern alternative: ROW_NUMBER() window function (cleaner)
SELECT first_name, department, salary
FROM (
SELECT first_name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
) ranked
WHERE rn <= 2;Non-correlated subqueries and inline views generally perform comparably to equivalent JOINs — modern optimizers rewrite them into the same physical plan. Correlated subqueries that execute once per outer row are the main concern: on a 500,000-row table this means 500,000 inner executions unless the optimizer converts the query to a join. Always run EXPLAIN before optimizing. Index the correlation columns — they convert sequential inner scans into fast index lookups. When benchmarking, test both the subquery and JOIN versions on representative data volumes: the optimizer's decision changes with data distribution and table sizes.
A subquery is a SELECT statement nested inside another SQL query. The outer query uses the result produced by the inner query to filter rows, compute a value, or define a temporary table. Subqueries can appear in WHERE, FROM, SELECT, and HAVING clauses.
A non-correlated subquery executes once and independently — it has no reference to the outer query. A correlated subquery references one or more columns from the outer query, forcing it to re-execute once per row the outer query processes. Non-correlated subqueries are generally more efficient; correlated subqueries are more flexible for per-row comparisons.
Yes — when used with IN, NOT IN, ANY, or ALL. A subquery used with = or in the SELECT clause must return exactly one value. Using = with a multi-row subquery throws a runtime error in all major SQL databases.
Non-correlated subqueries are rarely a performance problem. Correlated subqueries can be slow if they execute once per outer row on large tables, but modern optimizers (PostgreSQL 9+, MySQL 8+, SQL Server 2016+) often rewrite them into efficient join plans automatically. Always check the execution plan before assuming a subquery is causing a slowdown.
A nested query is another term for a subquery — a SELECT statement embedded inside another SQL statement. The inner SELECT is the nested query or inner query; the outer statement is the parent query. The terms are used interchangeably.
For existence checks and filtering, yes — EXISTS and IN subqueries are often cleaner than JOIN + DISTINCT. For retrieving columns from multiple tables, no — subqueries in WHERE cannot expose inner table columns to the outer SELECT. Use the pattern that matches your intent.
EXISTS checks whether the subquery returns any rows, short-circuits at the first match, and returns TRUE or FALSE. IN compares a value against a complete result set. The critical practical difference: NOT IN is unsafe when the subquery can return NULLs and silently returns zero rows. NOT EXISTS is always NULL-safe.
A scalar subquery returns exactly one column and one row — a single value. It can be used anywhere a single value is expected: in SELECT to compute a derived column, in WHERE with =, or in expressions. It throws a runtime error if it returns more than one row.
An inline view is a subquery in the FROM clause that creates a temporary named derived table. The outer query treats it exactly like a real table. An alias is required. Inline views are especially useful for applying WHERE conditions or ORDER BY to aggregated results.
Subquery syntax is standard across MySQL, PostgreSQL, Oracle, and SQL Server. The key differences are in optimizer sophistication: PostgreSQL and SQL Server are generally more aggressive at converting correlated subqueries into efficient semi-join plans. MySQL improved significantly in version 8.0. In all databases, use EXPLAIN to confirm the actual execution plan used — it removes the guesswork.
Run these in the SQLab Hub SQL playground against the pre-loaded tables
(users, orders, products, employees, departments).
1. Find all products priced above the average product price.
Hint: WHERE price > (SELECT AVG(price) FROM products)
2. List all employees from the department with the highest headcount.
Hint: WHERE department = (SELECT department FROM employees
GROUP BY department ORDER BY COUNT(*) DESC LIMIT 1)
3. Find all orders with a total above the average completed order total.
Hint: WHERE total > (SELECT AVG(total) FROM orders WHERE status = 'completed')
4. List all users who have placed at least one order.
Hint: WHERE EXISTS (SELECT 1 FROM orders WHERE user_id = u.id)
5. Find users who have NEVER placed an order.
Hint: WHERE NOT EXISTS (SELECT 1 FROM orders WHERE user_id = u.id)6. Find employees who earn exactly the maximum salary in their department.
Hint: correlated subquery WHERE salary = (SELECT MAX(salary) WHERE dept = e.dept)
7. Find the top 3 users by lifetime order value using only a subquery (no window functions).
Hint: inline view with SUM(total) GROUP BY user_id, outer LIMIT 3.
8. List products ordered in December 2025 but NOT in January 2026.
Hint: IN (December order_items product_ids) AND NOT IN (January product_ids)
9. Write a query returning each department alongside whether its average salary
is above or below the company average (use a CASE WHEN + scalar subquery).
10. Find all categories that contain more products than the average category size.
Hint: HAVING COUNT(p.id) > (SELECT AVG(cnt) FROM (...) x)11. Find the top 2 highest-paid employees per department using a correlated
subquery (no window functions).
Hint: WHERE (SELECT COUNT(*) FROM employees e2 WHERE dept = e1.dept
AND salary > e1.salary) < 2
12. Compute the running average order total per user using only subqueries
(no window functions, no CTEs).
Hint: correlated subquery AVG over all prior orders.
13. Find users who placed orders in at least 3 consecutive months in 2025.
Hint: join orders to itself offset by 1 and 2 months, or use
a subquery checking for the existence of month+1 and month+2 orders.
14. Build a complete customer summary: each user's total orders, lifetime value,
first order date, last order date, and percentile rank by lifetime value —
using correlated subqueries only (no window functions).
15. Write a query to detect products whose average review score is below
the average review score of their category, using both a correlated
subquery and an inline view approach. Compare the resulting SQL.Type | Best For ------------------------|-------------------------------------------------- Non-correlated WHERE | Filter rows based on a global aggregate or lookup Correlated WHERE | Per-row comparisons requiring outer query context EXISTS / NOT EXISTS | Existence checks, NULL-safe anti-join patterns IN / NOT IN | Matching small sets (always exclude NULLs for NOT IN) Scalar (SELECT clause) | Inline computed columns, per-row derived values Inline view (FROM) | Aggregating aggregates, pre-filtering GROUP BY output HAVING subquery | Filter groups based on an external computed value
Non-correlated subqueries and inline views are efficient — the optimizer evaluates them once or rewrites them into optimal join plans. Correlated subqueries can be slow on large tables but are often automatically rewritten by modern optimizers. Always run EXPLAIN before manually rewriting. Index the correlation columns to turn sequential inner scans into fast index lookups. Use NOT EXISTS instead of NOT IN whenever NULLs are possible. Prefer CTEs over deeply nested subqueries for anything beyond two levels of nesting.
Run every query from this guide in the SQLab Hub interactive SQL playground — no signup, no download, instant results.