Home/Blog/SQL Subqueries Explained: Correlated, Scalar, Inline Views & Real Examples
Advanced SQLSubqueriesInterview PrepPerformance

SQL Subqueries Explained: Correlated, Scalar, Inline Views & Real Examples

SQLab Hub Team·2026-05-14·22 min read

SQL subqueries — queries nested inside other queries — are one of the most powerful and frequently tested concepts in SQL. Every experienced data professional reaches for them daily: to filter results based on aggregates, to compute derived values inline, to check for existence without a JOIN, and to build complex analytics that would be impossible as a flat query. This guide covers everything from how a subquery executes and the difference between correlated and non-correlated types, through scalar subqueries, inline views, EXISTS vs IN performance tradeoffs, subquery vs JOIN comparisons, and over 20 real examples drawn from e-commerce, HR, finance, and SaaS scenarios. Whether you are writing your first nested query, preparing for a SQL interview at a tech company, or debugging a slow correlated subquery in production — this is the complete reference. Every example uses the SQLab Hub schema (users, orders, products, employees) so you can run them immediately in the free SQL playground.

Practice these queries in SQLab Hub →
Real SQLite engine · 8 tables · No signup needed
Open Playground

What Is a SQL Subquery?

The core concept

A subquery — also called a nested query or inner query — is a SELECT statement written inside another SQL statement. The outer statement is the parent query or outer query. The subquery produces a result set that the outer query uses to filter rows, compute values, or define a temporary table. Think of it as a question inside a question: "Show me employees (outer query) whose salary is above the average salary (inner query)."

Simple subquery: employees above average salary

-- Inner query computes the average first
-- Outer query uses that result to filter rows
SELECT first_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Execution order:
-- Step 1: (SELECT AVG(salary) FROM employees) runs → returns e.g. 72000
-- Step 2: Outer WHERE becomes: WHERE salary > 72000
-- The inner query runs ONCE. Its result is substituted into the outer query.

How SQL executes a subquery

For non-correlated subqueries — the most common kind — SQL evaluates the inner query first, then passes its result to the outer query. For correlated subqueries, where the inner query references a column from the outer query, the inner query re-executes once for each row processed by the outer query. This distinction drives the most important performance tradeoff covered later in this guide.

Where subqueries can appear in a SQL statement

-- 1. In WHERE — filter rows based on a subquery result
SELECT name FROM products
WHERE category_id IN (SELECT id FROM categories WHERE name = 'Electronics');

-- 2. In FROM — create a derived table (inline view)
SELECT dept, avg_sal FROM (
  SELECT department AS dept, AVG(salary) AS avg_sal
  FROM employees GROUP BY department
) dept_averages WHERE avg_sal > 70000;

-- 3. In SELECT — compute a scalar value per row
SELECT first_name, salary,
  (SELECT AVG(salary) FROM employees) AS company_avg
FROM employees;

-- 4. In HAVING — filter groups based on a subquery result
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department
HAVING COUNT(*) > (SELECT AVG(cnt) FROM (
  SELECT COUNT(*) AS cnt FROM employees GROUP BY department
) dept_counts);

SQL Subquery Syntax

WHERE clause subquery — the most common pattern

-- Single value comparison (scalar subquery in WHERE)
SELECT first_name, salary, department
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Multi-row subquery with IN
SELECT first_name, department
FROM employees
WHERE department IN (
  SELECT department_name FROM departments WHERE location = 'New York'
);

-- Existence check with EXISTS
SELECT first_name, email
FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.status = 'completed'
);

FROM clause subquery — inline views

-- Subquery in FROM acts as a temporary table
-- The alias (dept_stats) is REQUIRED in most databases
SELECT dept_stats.department, dept_stats.avg_salary
FROM (
  SELECT
    department,
    COUNT(*)               AS headcount,
    ROUND(AVG(salary), 0)  AS avg_salary
  FROM employees
  GROUP BY department
) dept_stats
WHERE dept_stats.headcount >= 5
ORDER BY dept_stats.avg_salary DESC;

SELECT clause subquery — scalar subqueries

-- Returns one value per row — must return exactly 1 row × 1 column
SELECT
  e.first_name,
  e.salary,
  (SELECT AVG(salary) FROM employees)               AS company_avg,
  (SELECT AVG(salary) FROM employees
   WHERE department = e.department)                 AS dept_avg,
  ROUND(e.salary - (SELECT AVG(salary) FROM employees
                    WHERE department = e.department), 0) AS vs_dept_avg
FROM employees e
ORDER BY vs_dept_avg DESC;

HAVING clause subquery

-- Filter groups based on a subquery result
SELECT department, ROUND(AVG(salary), 0) AS avg_sal
FROM employees
GROUP BY department
HAVING AVG(salary) > (
  SELECT AVG(salary) FROM employees   -- overall company average
)
ORDER BY avg_sal DESC;

-- Returns only departments whose average salary
-- exceeds the company-wide average.

Types of SQL Subqueries

Single Row Subquery

A single row subquery returns exactly one row. Used with comparison operators: =, >, <, >=, <=, <>. If the subquery unexpectedly returns more than one row, the query throws a runtime error. Always use single row subqueries when the result is guaranteed to be unique — such as MAX(), MIN(), or a lookup by primary key.

Single row subquery examples

-- Find the employee with the highest salary
SELECT first_name, salary, department
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);

-- Find all employees in the same department as employee #42
SELECT first_name, department
FROM employees
WHERE department = (SELECT department FROM employees WHERE id = 42)
  AND id != 42;

Multiple Row Subquery

A multiple row subquery returns more than one row. You must use a multi-row comparison operator: IN, NOT IN, ANY, or ALL. Using = with a multi-row subquery causes a "subquery returns more than one row" error. IN is the most common operator; ANY and ALL handle threshold comparisons against a set.

Multiple row subquery examples

-- IN: employees in departments located in 'New York' or 'London'
SELECT first_name, department
FROM employees
WHERE department IN (
  SELECT name FROM departments WHERE location IN ('New York', 'London')
);

-- ANY: employees who earn more than at least one contractor
SELECT first_name, salary FROM employees
WHERE salary > ANY (SELECT rate FROM contractors);

-- ALL: employees who earn more than EVERY contractor
SELECT first_name, salary FROM employees
WHERE salary > ALL (SELECT rate FROM contractors);

Multiple Column Subquery

-- Compare against multiple columns simultaneously
-- Find orders where (user_id, product_id) matches a wishlisted pair
SELECT o.id, o.total, o.order_date
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE (oi.product_id, o.user_id) IN (
  SELECT product_id, user_id FROM wishlist_items
);
-- Multi-column IN is supported in PostgreSQL, MySQL, SQLite.
-- SQL Server requires EXISTS with a correlated WHERE instead.

Correlated Subquery

A correlated subquery references one or more columns from the outer query. Because of this dependency it cannot run independently — it must re-execute once for every row the outer query processes. This makes it flexible (it can compute per-row context) but potentially slow on large tables. You can identify a correlated subquery by the outer table alias used inside the inner SELECT.

Correlated subquery example

-- Find employees who earn above their own department average
-- The inner query references e.department from the outer query
SELECT e.first_name, e.salary, e.department
FROM employees e
WHERE e.salary > (
  SELECT AVG(salary)
  FROM employees
  WHERE department = e.department   -- ← references outer row
);

-- For each employee row, the inner query runs once using that
-- employee's department. With 500 employees, that is 500 inner executions.
-- Modern optimizers often rewrite this as a single-pass join automatically.

Non-Correlated Subquery

A non-correlated subquery has no reference to the outer query. It executes once, returns its result, and the outer query uses that fixed result for all rows it processes. The database evaluates the inner query only once regardless of how many outer rows are examined — making it the most efficient subquery type.

Non-correlated subquery example

-- The inner query runs ONCE and returns a single number (e.g. 72000)
-- The outer query compares every employee row against that same number
SELECT first_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Not correlated because AVG(salary) covers ALL employees,
-- not a department-specific subset based on the outer row.
-- There is no reference to the outer table alias inside the subquery.

Scalar Subquery

A scalar subquery returns exactly one row and one column — a single scalar value. It can be used anywhere a single value is valid: as a column in SELECT, in a WHERE condition with =, or in any expression. If a scalar subquery ever returns more than one row or more than one column at runtime, the database throws an error.

Inline View

An inline view is a subquery in the FROM clause. It creates a temporary named result set that the outer query treats exactly like a real table. The alias is required. Inline views are the foundation of derived table patterns and are especially useful for applying WHERE or ORDER BY conditions to aggregated results.

SQL Subquery Examples

Example 1 — Employees above company average salary

SELECT first_name, department, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
ORDER BY salary DESC;

-- Business use: identify above-average earners for compensation review.
-- Type: non-correlated, single-row, WHERE clause.

Example 2 — Second highest salary (interview classic)

-- Method 1: subquery approach — works in all SQL dialects
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

-- Method 2: OFFSET (PostgreSQL, MySQL, SQLite)
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

-- Method 3: generalised to Nth highest with DENSE_RANK
SELECT salary FROM (
  SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
  FROM employees
) ranked
WHERE rnk = 2 LIMIT 1;

Example 3 — Products that have never been ordered

-- NOT EXISTS is NULL-safe — preferred over NOT IN here
SELECT p.name, p.category, p.price
FROM products p
WHERE NOT EXISTS (
  SELECT 1 FROM order_items oi WHERE oi.product_id = p.id
);

-- Business use: identify dead inventory for removal or promotion.
-- Type: correlated NOT EXISTS.

Example 4 — Customers who have placed at least one order

-- EXISTS short-circuits at the first matching row
SELECT u.first_name, u.email, u.country
FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
)
ORDER BY u.first_name;

-- Equivalent with IN (similar plan in modern optimizers):
SELECT first_name, email FROM users
WHERE id IN (SELECT DISTINCT user_id FROM orders);

Example 5 — Customers who have NEVER placed an order

SELECT u.first_name, u.email, u.created_at
FROM users u
WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
)
ORDER BY u.created_at DESC;

-- Business use: identify unactivated users for onboarding campaigns.
-- NOT EXISTS is safer than NOT IN when NULLs may be present.

Example 6 — Orders above average completed order value

SELECT id, user_id, total, order_date
FROM orders
WHERE total > (
  SELECT AVG(total) FROM orders WHERE status = 'completed'
)
AND status = 'completed'
ORDER BY total DESC
LIMIT 20;

-- Business use: surface high-value orders for priority analysis.

Example 7 — Each employee vs their department average (correlated)

SELECT
  e.first_name,
  e.department,
  e.salary,
  ROUND((SELECT AVG(salary) FROM employees
         WHERE department = e.department), 0) AS dept_avg,
  ROUND(e.salary - (SELECT AVG(salary) FROM employees
                    WHERE department = e.department), 0) AS diff
FROM employees e
ORDER BY e.department, diff DESC;

-- Business use: compensation equity analysis within departments.
-- Type: correlated scalar subquery in SELECT clause.

Example 8 — Top-selling products by revenue (inline view)

SELECT product_name, total_revenue, units_sold
FROM (
  SELECT
    p.name                             AS product_name,
    SUM(oi.quantity * oi.unit_price)   AS total_revenue,
    SUM(oi.quantity)                   AS units_sold
  FROM order_items oi
  JOIN products p ON p.id = oi.product_id
  JOIN orders o   ON o.id = oi.order_id
  WHERE o.status = 'completed'
  GROUP BY p.id, p.name
) product_stats
WHERE total_revenue > 5000
ORDER BY total_revenue DESC
LIMIT 20;

-- The inline view aggregates. The outer WHERE filters.
-- Without the inline view you cannot filter on the computed total_revenue.

Example 9 — Departments with headcount above company average

SELECT department, headcount
FROM (
  SELECT department, COUNT(*) AS headcount
  FROM employees
  GROUP BY department
) dept_counts
WHERE headcount > (
  SELECT AVG(headcount) FROM (
    SELECT COUNT(*) AS headcount
    FROM employees
    GROUP BY department
  ) all_counts
)
ORDER BY headcount DESC;

-- Nested inline views: aggregate, then aggregate the aggregation.
-- Common in analytics when you need to compare groups to a group average.

Example 10 — Most recent order and order count per active user

SELECT u.first_name, u.email,
  (SELECT MAX(order_date)
   FROM orders WHERE user_id = u.id) AS last_order_date,
  (SELECT COUNT(*)
   FROM orders WHERE user_id = u.id) AS total_orders
FROM users u
WHERE EXISTS (SELECT 1 FROM orders WHERE user_id = u.id)
ORDER BY last_order_date DESC;

-- Business use: user activity summary for CRM and re-engagement.
-- Type: correlated scalar subqueries in SELECT clause.

Example 11 — Categories with above-average product count

SELECT c.name, COUNT(p.id) AS product_count
FROM categories c
JOIN products p ON p.category_id = c.id
GROUP BY c.id, c.name
HAVING COUNT(p.id) > (
  SELECT AVG(cnt)
  FROM (SELECT COUNT(*) AS cnt FROM products GROUP BY category_id) cat_counts
)
ORDER BY product_count DESC;

-- Subquery in HAVING filters groups using a computed average.

Example 12 — Employees earning more than every employee in Sales

-- ALL: salary must exceed every single value in the Sales salary list
SELECT first_name, salary, department
FROM employees
WHERE salary > ALL (
  SELECT salary FROM employees WHERE department = 'Sales'
)
AND department != 'Sales'
ORDER BY salary DESC;

-- If Sales salaries are [50000, 60000, 70000],
-- this returns non-Sales employees earning above 70000 (the max).

Example 13 — Find duplicate email addresses

-- Pattern: GROUP BY + HAVING COUNT > 1 — memorise this for interviews
SELECT email, COUNT(*) AS occurrences
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;

-- To retrieve the full duplicate rows (not just the count):
SELECT * FROM users
WHERE email IN (
  SELECT email FROM users GROUP BY email HAVING COUNT(*) > 1
)
ORDER BY email;

Example 14 — Lapsed customers: last order over 90 days ago

SELECT u.first_name, u.email,
  MAX(o.order_date) AS last_order_date
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.first_name, u.email
HAVING MAX(o.order_date) < DATE('now', '-90 days')
ORDER BY last_order_date ASC;

-- Alternative using a correlated subquery in WHERE:
SELECT first_name, email FROM users u
WHERE (SELECT MAX(order_date) FROM orders WHERE user_id = u.id)
      < DATE('now', '-90 days');

Example 15 — Monthly revenue vs overall monthly average

SELECT month, revenue,
  ROUND((SELECT AVG(rev)
         FROM (SELECT STRFTIME('%Y-%m', order_date) AS m,
                      SUM(total) AS rev
               FROM orders WHERE status = 'completed'
               GROUP BY m) all_months), 2) AS avg_monthly_revenue
FROM (
  SELECT STRFTIME('%Y-%m', order_date) AS month,
         ROUND(SUM(total), 2)          AS revenue
  FROM orders
  WHERE status = 'completed'
  GROUP BY month
) monthly_revenue
ORDER BY month;

-- Each month's revenue shown next to the overall monthly average.

Example 16 — Rank products within each category (inline view + window function)

SELECT category, name, revenue,
  RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS rank_in_category
FROM (
  SELECT
    p.category,
    p.name,
    ROUND(SUM(oi.quantity * oi.unit_price), 2) AS revenue
  FROM order_items oi
  JOIN products p ON p.id = oi.product_id
  JOIN orders o   ON o.id = oi.order_id
  WHERE o.status = 'completed'
  GROUP BY p.id, p.name, p.category
) product_revenue
ORDER BY category, rank_in_category;

-- Combining inline views with window functions is standard in BI reporting.
-- The inline view pre-aggregates; the window function ranks within groups.

Correlated vs Non-Correlated Subqueries

The fundamental difference

The single most important distinction in subquery types is whether the inner query depends on the outer query. A non-correlated subquery is self-contained — it runs once and returns a fixed result the outer query reuses. A correlated subquery contains a reference to a column from the outer query, so it must be re-evaluated for each row the outer query processes. This has significant performance implications and drives most subquery optimization decisions.

Comparison table

Feature                 | Non-Correlated                 | Correlated
------------------------|--------------------------------|----------------------------------
Dependency on outer     | None — fully independent       | References outer query column(s)
Execution frequency     | Once only                      | Once per outer row
Performance             | Generally fast                 | Can be slow on large tables
Recognise it by         | No outer alias inside subquery | Outer table alias used inside
Optimizer rewrite       | Often cached as a constant     | May or may not be rewritten to join
Typical use cases       | Global filters, lookup values  | Per-row comparisons, existence checks
Example operators       | =, IN, >, EXISTS               | = referencing outer.col

Non-correlated: runs once, result reused

-- This subquery runs ONCE and returns a single value (e.g. 72000)
-- The outer query reuses that number for every row it evaluates
SELECT first_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Query cost: one full scan of employees to compute AVG,
-- then one scan of employees to apply the filter.
-- Total: 2 table scans regardless of row count.

Correlated: re-executes for every outer row

-- This subquery references e.department from the outer query
-- It MUST re-execute once for EVERY row in the outer employees scan
SELECT e.first_name, e.salary, e.department
FROM employees e
WHERE e.salary > (
  SELECT AVG(salary)
  FROM employees
  WHERE department = e.department   -- ← references outer query column
);

-- With 500 employees, the inner query runs 500 times.
-- Each execution scans a department subset to compute AVG.
-- Total cost scales with row count × inner query cost.

Rewriting a correlated subquery as a JOIN (often faster)

-- CORRELATED version — potentially slow at scale
SELECT e.first_name, e.salary, e.department
FROM employees e
WHERE e.salary > (
  SELECT AVG(salary) FROM employees WHERE department = e.department
);

-- REWRITTEN as JOIN + inline view — computes dept avgs once
SELECT e.first_name, e.salary, e.department
FROM employees e
JOIN (
  SELECT department, AVG(salary) AS dept_avg
  FROM employees
  GROUP BY department
) dept_avgs ON dept_avgs.department = e.department
WHERE e.salary > dept_avgs.dept_avg;

-- The JOIN version: one GROUP BY pass to build dept_avgs,
-- then one join scan. Far fewer total operations on large tables.
-- Always use EXPLAIN to confirm — modern optimizers may produce
-- an identical plan for both versions automatically.

When correlated subqueries are the right tool

Correlated subqueries are the cleanest solution for per-row existence checks (EXISTS / NOT EXISTS), for computing per-row derived values in the SELECT clause, and when the correlated logic would require an awkward self-join to express otherwise. Do not prematurely rewrite them without checking the execution plan first — PostgreSQL 9+, MySQL 8.0+, and SQL Server 2016+ frequently produce the same physical plan whether you write a correlated subquery or a JOIN.

Scalar Subquery in SQL

What makes a subquery scalar?

A scalar subquery returns exactly one column and one row — a single value. You can use it anywhere a single value is valid: as a derived column in SELECT, in a WHERE comparison with =, inside CASE WHEN expressions, or in arithmetic. The name comes from linear algebra where a scalar is a single number, as opposed to a vector or matrix.

Scalar subquery in SELECT — computed column per row

-- Compare each employee's salary to two benchmarks
SELECT
  e.first_name,
  e.department,
  e.salary,
  (SELECT ROUND(AVG(salary), 0) FROM employees)              AS company_avg,
  (SELECT ROUND(AVG(salary), 0) FROM employees
   WHERE department = e.department)                          AS dept_avg
FROM employees e
ORDER BY e.salary DESC;

-- First scalar subquery: non-correlated — runs once, value reused per row.
-- Second scalar subquery: correlated — runs once per employee row.

Scalar subquery in WHERE — exact value comparison

-- Find the most recent order in the system
SELECT id, user_id, total, order_date
FROM orders
WHERE order_date = (SELECT MAX(order_date) FROM orders);

-- Find products priced above the most expensive Electronics item
SELECT name, price, category
FROM products
WHERE price > (
  SELECT MAX(price) FROM products WHERE category = 'Electronics'
)
ORDER BY price DESC;

Scalar subquery in expressions — inline calculations

-- Each order as a percentage of total completed revenue
SELECT
  id,
  total,
  ROUND(total * 100.0 / (SELECT SUM(total) FROM orders
                         WHERE status = 'completed'), 2) AS pct_of_total
FROM orders
WHERE status = 'completed'
ORDER BY pct_of_total DESC
LIMIT 20;

-- Salary delta from company average as a percentage
SELECT
  first_name,
  salary,
  ROUND((salary - (SELECT AVG(salary) FROM employees)) * 100.0
        / (SELECT AVG(salary) FROM employees), 1) AS pct_above_avg
FROM employees
ORDER BY pct_above_avg DESC;

When scalar subqueries throw errors

-- ERROR: returns more than one row when multiple rows match
-- Do not use = when the subquery could return multiple rows
SELECT first_name FROM employees
WHERE salary = (SELECT salary FROM employees WHERE department = 'Sales');
-- "subquery returns more than one row"

-- SAFE: guarantee uniqueness with aggregate or primary key
SELECT first_name FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees WHERE department = 'Sales');

-- ERROR: returns more than one column
-- SELECT (SELECT first_name, salary FROM employees WHERE id = 1);

-- Rule: scalar subquery must return exactly 1 row × 1 column.

Inline Views in SQL

What is an inline view?

An inline view is a subquery placed in the FROM clause. The outer query treats it exactly like a real table — you can SELECT any of its columns, JOIN it to other tables, filter it with WHERE, group it, and sort it. Unlike a CTE, it has no name visible to other parts of the query. The alias you give it is required in most databases (PostgreSQL, MySQL, SQL Server throw an error without it) and is scoped only to the immediately surrounding query.

Basic inline view: applying WHERE to aggregated results

-- Problem: you cannot use WHERE on COUNT(*) — but HAVING has limitations.
-- Inline views let you filter any computed column with regular WHERE.

SELECT dept, headcount, avg_sal
FROM (
  SELECT
    department              AS dept,
    COUNT(*)                AS headcount,
    ROUND(AVG(salary), 0)   AS avg_sal
  FROM employees
  GROUP BY department
) dept_summary
WHERE headcount BETWEEN 3 AND 15   -- filter computed columns freely
ORDER BY avg_sal DESC;

-- Without the inline view you could not filter on headcount with WHERE.
-- HAVING can only filter on aggregate functions, not computed aliases.

Inline view with JOIN — enriching aggregated results

-- Pre-aggregate order stats per user, then join user details
SELECT
  u.first_name,
  u.country,
  order_stats.total_orders,
  order_stats.lifetime_value
FROM users u
JOIN (
  SELECT
    user_id,
    COUNT(*)               AS total_orders,
    ROUND(SUM(total), 2)   AS lifetime_value
  FROM orders
  WHERE status = 'completed'
  GROUP BY user_id
) order_stats ON order_stats.user_id = u.id
WHERE order_stats.lifetime_value > 500
ORDER BY order_stats.lifetime_value DESC;

-- The inline view computes order stats once.
-- The JOIN enriches each result row with the user's profile fields.

Nested inline views — aggregating an aggregation

-- Average department headcount across the company
SELECT ROUND(AVG(headcount), 1) AS avg_dept_headcount
FROM (
  SELECT COUNT(*) AS headcount
  FROM employees
  GROUP BY department
) dept_sizes;

-- Each department vs that average
SELECT dept, headcount
FROM (
  SELECT department AS dept, COUNT(*) AS headcount
  FROM employees GROUP BY department
) dept_sizes
WHERE headcount > (
  SELECT AVG(cnt)
  FROM (SELECT COUNT(*) AS cnt FROM employees GROUP BY department) x
)
ORDER BY headcount DESC;

Inline view vs CTE: when to use each

Inline views and CTEs (WITH clause) are semantically equivalent in most databases — both define a named intermediate result set the outer query can reference. Use a CTE when you need to reference the same derived data more than once in a single query, when the query has multiple logical steps that benefit from named intermediate stages, or when you need recursive queries. Use an inline view when the derived table is used only once and the query is short enough that adding a WITH clause would add noise rather than clarity.

EXISTS vs IN vs JOIN

Three ways to check for matching rows

When filtering rows based on whether a related row exists in another table, you have three main tools: EXISTS, IN, and JOIN. Each has different semantics, performance characteristics, and readability tradeoffs. Knowing which to use — and why — is a very common SQL interview question and a critical production skill.

Comparison table

Feature              | EXISTS                        | IN                          | JOIN
---------------------|-------------------------------|-----------------------------|-----------------------
Returns from subquery| TRUE / FALSE only             | Set of values               | Combined row set
Short-circuits       | Yes — stops at first match    | No — evaluates full list    | No
NULL handling        | Always safe                   | NOT IN breaks with NULLs    | Depends on JOIN type
Deduplication        | Automatic (1 row per outer)   | Handled by IN semantics     | Often needs DISTINCT
Optimizer rewrites   | Often → semi-join             | Often → semi-join           | Already a join
Best for             | Existence checks, large sets  | Small known value lists     | Retrieving joined columns

EXISTS — stop at first match

-- EXISTS returns TRUE as soon as the subquery produces any row
-- The SELECT list inside EXISTS is irrelevant — use SELECT 1 by convention

-- Users who have at least one completed order
SELECT u.first_name, u.email
FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.user_id = u.id AND o.status = 'completed'
);

-- Products that appear in at least one order item
SELECT p.name, p.price
FROM products p
WHERE EXISTS (
  SELECT 1 FROM order_items oi WHERE oi.product_id = p.id
);

NOT EXISTS — NULL-safe anti-join

-- Products that have NEVER been ordered
-- NOT EXISTS is NULL-safe — use this instead of NOT IN whenever subquery
-- columns might contain NULLs (which breaks NOT IN silently)
SELECT p.name, p.category, p.price
FROM products p
WHERE NOT EXISTS (
  SELECT 1 FROM order_items oi WHERE oi.product_id = p.id
);

-- Why NOT IN can silently return zero rows:
-- If ANY row in the subquery returns NULL, the condition NOT IN (NULL, ...)
-- evaluates to UNKNOWN for every outer row — killing all results.
-- NOT EXISTS does not have this problem.

IN — clean for known value sets

-- IN with a small literal list — no subquery needed
SELECT name, department FROM employees
WHERE department IN ('Engineering', 'Product', 'Design');

-- IN with a subquery — readable for small result sets
SELECT first_name FROM users
WHERE id IN (
  SELECT DISTINCT user_id FROM orders WHERE status = 'completed'
);

-- Performance: PostgreSQL 9+, MySQL 8+ rewrite IN (subquery) as a semi-join
-- producing the same plan as EXISTS. On older engines or very large subquery
-- result sets, EXISTS may be faster. Test with EXPLAIN.

JOIN — when you need columns from the related table

-- KEY RULE: use JOIN when you need to SELECT columns from the related table.
-- EXISTS and IN only filter — they cannot expose the inner table's columns.

-- Need order data alongside user info? Must use JOIN.
SELECT u.first_name, o.id AS order_id, o.total, o.order_date
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.status = 'completed'
ORDER BY o.total DESC;

-- Only need to check if user ever ordered? Use EXISTS — simpler.
SELECT u.first_name FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);

Practical decision guide

Scenario                                         | Recommended
-------------------------------------------------|--------------------
Filter based on related row existence             | EXISTS
Filter where related row does NOT exist           | NOT EXISTS (not NOT IN)
Match against a small static list of values       | IN ('a', 'b', 'c')
Subquery may return NULLs                         | EXISTS (not IN)
Need columns from the related / matched table     | JOIN
Deduplicate results automatically                 | EXISTS or LEFT JOIN + IS NULL
Aggregate data from a related table               | JOIN + GROUP BY

Subquery vs JOIN: When to Use Each

Are they interchangeable?

Subqueries and JOINs can often express the same logic, but they are not always interchangeable — and choosing the wrong one affects readability, maintainability, and sometimes performance. The core rule: if you only need to filter based on a relationship, EXISTS or IN subqueries are usually cleaner. If you need columns from the related table, you need a JOIN. If you need aggregated data from a related table, an inline view (subquery in FROM) often leads to cleaner SQL than a GROUP BY in a JOIN.

Comparison table

Dimension            | Subquery                           | JOIN
---------------------|------------------------------------|---------------------------------
Primary purpose      | Filter, compute, or derive data    | Combine rows from multiple tables
Exposes related cols | No (unless inline view in FROM)    | Yes — SELECT from any joined table
Deduplication        | Built in for WHERE subqueries      | Often needs DISTINCT
Readability          | Cleaner for existence checks       | Cleaner for multi-column retrieval
Performance          | Comparable after optimizer rewrite | Usually similar; JOIN wins on aggregates
Scalability          | Correlated subqueries can be slow  | Better scaling with proper indexes

Three equivalent queries — choose by intent

-- GOAL: find users who have placed at least one order

-- Option A: IN subquery
SELECT first_name, email FROM users
WHERE id IN (SELECT DISTINCT user_id FROM orders);

-- Option B: EXISTS (semantically clearest for "existence check")
SELECT first_name, email FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);

-- Option C: INNER JOIN (needs DISTINCT to avoid duplicating users)
SELECT DISTINCT u.first_name, u.email
FROM users u
INNER JOIN orders o ON o.user_id = u.id;

-- All three return the same result. Modern optimizers produce near-identical
-- plans for A and B. Option C (JOIN) is the most familiar but needs DISTINCT.

Where subqueries clearly win

-- 1. Existence checks without needing joined columns
SELECT name FROM products p
WHERE NOT EXISTS (SELECT 1 FROM order_items oi WHERE oi.product_id = p.id);

-- 2. Filtering on an aggregated result (inline view)
SELECT department, avg_salary
FROM (
  SELECT department, ROUND(AVG(salary), 0) AS avg_salary
  FROM employees GROUP BY department
) dept_stats
WHERE avg_salary > 75000;

-- 3. Per-row scalar comparisons in SELECT
SELECT first_name, salary,
  salary - (SELECT AVG(salary) FROM employees) AS above_avg
FROM employees;

Where JOINs clearly win

-- When you need columns from the related table — a subquery in WHERE
-- cannot expose inner table columns to the outer SELECT.

SELECT u.first_name, u.country,
       o.id AS order_id, o.total, o.order_date,
       p.name AS product_name, oi.quantity
FROM users u
JOIN orders o      ON o.user_id = u.id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p    ON p.id = oi.product_id
WHERE o.status = 'completed'
ORDER BY o.order_date DESC
LIMIT 50;

-- No subquery alone can produce this multi-table, multi-column result.

SQL Subquery Performance Tips

Tip 1 — Always check the execution plan first

-- Run EXPLAIN before assuming a subquery is slow
EXPLAIN
SELECT e.first_name, e.salary
FROM employees e
WHERE e.salary > (
  SELECT AVG(salary) FROM employees WHERE department = e.department
);

-- In PostgreSQL, use EXPLAIN ANALYZE for actual runtimes:
EXPLAIN ANALYZE SELECT ...

-- Look for:
--  "Index Scan" vs "Seq Scan" — index scans are faster
--  "Nested Loop" on large outer — indicates unoptimized correlated query
--  "Hash Join" or "Merge Join" — optimizer rewrote to join (good)
--  Row estimate vs actual — huge gaps mean stale statistics (run ANALYZE)

Tip 2 — Replace hot correlated subqueries with JOINs or CTEs

-- SLOW: correlated subquery on a large table
SELECT e.first_name, e.salary
FROM employees e
WHERE e.salary > (
  SELECT AVG(salary) FROM employees WHERE department = e.department
);

-- FAST: compute dept averages once with GROUP BY, then JOIN
WITH dept_avgs AS (
  SELECT department, AVG(salary) AS dept_avg
  FROM employees GROUP BY department
)
SELECT e.first_name, e.salary
FROM employees e
JOIN dept_avgs d ON d.department = e.department
WHERE e.salary > d.dept_avg;

-- The CTE version: one GROUP BY pass + one JOIN scan.
-- The correlated version: N inner executions for N outer rows.

Tip 3 — Use EXISTS over IN for large subquery results

-- IN materializes the full subquery result before scanning
SELECT first_name FROM users
WHERE id IN (SELECT user_id FROM orders WHERE status = 'completed');

-- EXISTS short-circuits at the first matching row
SELECT first_name FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.status = 'completed'
);

-- On PostgreSQL 9+ and MySQL 8.0+, both often produce the same plan.
-- EXISTS is always safer for NOT IN scenarios (NULL contamination risk).
-- Always benchmark on your actual data volumes — rules of thumb can mislead.

Tip 4 — Index the correlation and filter columns

-- Slow correlated subquery without index:
SELECT first_name FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
-- Sequential scan of orders for every user row

-- Add an index on the JOIN/correlation column:
-- CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Now the inner subquery does an index lookup instead of a full scan.

-- For composite conditions inside the subquery:
-- CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Covers both the correlation column and the filter column.

Tip 5 — Push filters into subqueries, not outside them

-- LESS EFFICIENT: all rows aggregated, then outer WHERE filters
SELECT dept, headcount FROM (
  SELECT department AS dept, COUNT(*) AS headcount
  FROM employees GROUP BY department
) x
WHERE dept = 'Engineering';

-- MORE EFFICIENT: filter before aggregation — fewer rows to GROUP BY
SELECT dept, headcount FROM (
  SELECT department AS dept, COUNT(*) AS headcount
  FROM employees
  WHERE department = 'Engineering'   -- filter INSIDE the subquery
  GROUP BY department
) x;

-- This matters on tables with millions of rows — moving the filter
-- inside dramatically reduces the GROUP BY input size.

Tip 6 — Avoid SELECT * inside subqueries

-- AVOID: fetches unnecessary columns even for existence checks
WHERE id IN (SELECT * FROM orders WHERE status = 'completed')

-- BETTER: fetch only the column you need
WHERE id IN (SELECT user_id FROM orders WHERE status = 'completed')

-- BEST for existence: SELECT 1 communicates intent clearly
WHERE EXISTS (SELECT 1 FROM orders WHERE user_id = u.id)

-- Some optimizers strip SELECT * in subqueries automatically,
-- but being explicit is faster, more portable, and clearer to readers.

Tip 7 — Convert deeply nested subqueries to CTEs

-- Hard to read, optimize, and maintain beyond two levels of nesting
SELECT * FROM (
  SELECT dept, rev FROM (
    SELECT p.category AS dept, SUM(oi.quantity * oi.unit_price) AS rev
    FROM order_items oi JOIN products p ON p.id = oi.product_id
    GROUP BY p.category
  ) cat_rev WHERE rev > 1000
) top_cats WHERE dept LIKE 'E%';

-- CTEs are cleaner and PostgreSQL 12+ can materialize them (optimization fence)
WITH cat_rev AS (
  SELECT p.category AS dept, SUM(oi.quantity * oi.unit_price) AS rev
  FROM order_items oi JOIN products p ON p.id = oi.product_id
  GROUP BY p.category
),
top_cats AS (SELECT dept, rev FROM cat_rev WHERE rev > 1000)
SELECT * FROM top_cats WHERE dept LIKE 'E%';

Common SQL Subquery Mistakes

Mistake 1: Using = with a multi-row subquery

-- ERROR: if subquery returns more than one row, this fails at runtime
SELECT first_name FROM employees
WHERE department = (SELECT department FROM departments);
-- "subquery returns more than one row"

-- FIX: use IN when multiple rows are valid
SELECT first_name FROM employees
WHERE department IN (SELECT name FROM departments WHERE active = 1);

-- FIX: use = only when result is guaranteed scalar (MAX, MIN, primary key)
WHERE department = (SELECT name FROM departments WHERE id = 5);

Mistake 2: NOT IN with NULL values in the subquery

-- SILENT BUG: if product_id contains even one NULL, NOT IN returns NOTHING
SELECT name FROM products
WHERE id NOT IN (SELECT product_id FROM order_items);

-- Why: NULL != anything evaluates to UNKNOWN, not FALSE.
-- NOT IN (1, 2, NULL) = NOT (id = 1 OR id = 2 OR id = NULL)
-- = NOT (... OR UNKNOWN) = UNKNOWN for every row → zero results.

-- FIX: use NOT EXISTS (always NULL-safe)
SELECT name FROM products p
WHERE NOT EXISTS (SELECT 1 FROM order_items oi WHERE oi.product_id = p.id);

-- FIX: explicitly exclude NULLs from the subquery
WHERE id NOT IN (SELECT product_id FROM order_items WHERE product_id IS NOT NULL);

Mistake 3: Missing alias on an inline view

-- ERROR in PostgreSQL, MySQL, SQL Server:
-- "Every derived table must have its own alias"
SELECT dept, avg_sal FROM (
  SELECT department AS dept, AVG(salary) AS avg_sal
  FROM employees GROUP BY department
);  -- ← missing alias here

-- FIX: always add an alias after the closing parenthesis
SELECT dept, avg_sal FROM (
  SELECT department AS dept, AVG(salary) AS avg_sal
  FROM employees GROUP BY department
) dept_stats;   -- ← required alias

Mistake 4: Scalar subquery returning multiple rows

-- ERROR if more than one user has the first name 'Alice'
SELECT * FROM orders
WHERE user_id = (SELECT id FROM users WHERE first_name = 'Alice');

-- FIX: filter on a unique column (email, primary key)
WHERE user_id = (SELECT id FROM users WHERE email = 'alice@example.com');

-- FIX: use IN if multiple matches are intentional
WHERE user_id IN (SELECT id FROM users WHERE first_name = 'Alice');

Mistake 5: Deeply nesting subqueries beyond two levels

-- Three levels of nesting is the point of no return for readability
SELECT * FROM (
  SELECT dept, rev FROM (
    SELECT p.category AS dept, SUM(oi.quantity * oi.unit_price) AS rev
    FROM order_items oi JOIN products p ON p.id = oi.product_id
    GROUP BY p.category
  ) a WHERE rev > 1000
) b WHERE dept LIKE 'E%';

-- FIX: convert to CTEs the moment nesting exceeds one level
WITH cat_rev AS (
  SELECT p.category AS dept, SUM(oi.quantity * oi.unit_price) AS rev
  FROM order_items oi JOIN products p ON p.id = oi.product_id
  GROUP BY p.category
)
SELECT dept, rev FROM cat_rev WHERE rev > 1000 AND dept LIKE 'E%';

Mistake 6: Assuming all correlated subqueries are slow

Modern query optimizers — PostgreSQL 9+, MySQL 8.0+, SQL Server 2016+ — routinely transform correlated subqueries into efficient semi-join or hash-join plans. Do not prematurely rewrite correct, readable code based on a general performance rule. Run EXPLAIN first. If the plan shows a nested loop on a large table without an index lookup, then rewrite. If it shows a hash semi-join, the optimizer already handled it.

Mistake 7: Forgetting that the EXISTS SELECT list is irrelevant

-- All of these are identical in behavior — EXISTS only checks for row existence
WHERE EXISTS (SELECT * FROM orders WHERE user_id = u.id)
WHERE EXISTS (SELECT 1 FROM orders WHERE user_id = u.id)
WHERE EXISTS (SELECT NULL FROM orders WHERE user_id = u.id)
WHERE EXISTS (SELECT id FROM orders WHERE user_id = u.id)

-- The columns in the EXISTS subquery SELECT list are ignored entirely.
-- Convention: SELECT 1 — it communicates intent and avoids column resolution.

Real-World Use Cases of SQL Subqueries

E-commerce: win-back campaign targeting

-- Customers who ordered 60–90 days ago and have not ordered since
-- Used for targeted re-engagement email campaigns
SELECT u.first_name, u.email,
  MAX(o.order_date) AS last_order
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.first_name, u.email
HAVING MAX(o.order_date) BETWEEN DATE('now', '-90 days') AND DATE('now', '-60 days')
  AND NOT EXISTS (
    SELECT 1 FROM orders o2
    WHERE o2.user_id = u.id
      AND o2.order_date > DATE('now', '-60 days')
  );

HR: identify promotion-eligible employees

-- Employees earning above the 75th percentile in their department
-- with at least 2 years tenure
SELECT e.first_name, e.department, e.salary, e.hire_date
FROM employees e
WHERE e.salary > (
  SELECT s.salary FROM (
    SELECT salary,
           NTILE(4) OVER (PARTITION BY department ORDER BY salary) AS quartile
    FROM employees
    WHERE department = e.department
  ) s WHERE s.quartile >= 3
  ORDER BY s.salary DESC LIMIT 1
)
AND e.hire_date <= DATE('now', '-2 years')
ORDER BY e.department, e.salary DESC;

Finance: flag anomalous orders (fraud detection)

-- Orders more than 3× the user's own historical average — potential fraud
SELECT o.id, o.user_id, o.total, o.order_date
FROM orders o
WHERE o.total > 3 * (
  SELECT AVG(total)
  FROM orders
  WHERE user_id = o.user_id
    AND id != o.id   -- exclude current order from the average
)
AND o.status = 'pending'
ORDER BY o.total DESC;

-- Business use: route flagged orders to manual review queue.

BI / Analytics: category revenue share

-- Percentage of total completed revenue per product category
SELECT
  category,
  category_revenue,
  ROUND(category_revenue * 100.0 / total_revenue, 2) AS revenue_share_pct
FROM (
  SELECT p.category,
         SUM(oi.quantity * oi.unit_price) AS category_revenue
  FROM order_items oi
  JOIN products p ON p.id = oi.product_id
  JOIN orders   o ON o.id = oi.order_id
  WHERE o.status = 'completed'
  GROUP BY p.category
) cat_rev
CROSS JOIN (
  SELECT SUM(oi.quantity * oi.unit_price) AS total_revenue
  FROM order_items oi
  JOIN orders o ON o.id = oi.order_id
  WHERE o.status = 'completed'
) total
ORDER BY revenue_share_pct DESC;

SaaS: identify power users by feature engagement

-- Users who have engaged with more than 3 distinct features in 30 days
SELECT u.id, u.email,
  (SELECT COUNT(DISTINCT feature_name)
   FROM feature_events
   WHERE user_id = u.id
     AND event_date >= DATE('now', '-30 days')) AS features_used
FROM users u
WHERE (
  SELECT COUNT(DISTINCT feature_name)
  FROM feature_events
  WHERE user_id = u.id
    AND event_date >= DATE('now', '-30 days')
) > 3
ORDER BY features_used DESC;

-- Business use: segment power users for beta programs or case studies.

SQL Subquery Interview Questions

Q1: What is a subquery and where can it appear in SQL?

A subquery is a SELECT statement nested inside another SQL statement. It can appear in WHERE (to filter rows), FROM (as an inline view or derived table), SELECT (as a scalar subquery for per-row values), and HAVING (to filter aggregated groups). Subqueries can also be used in INSERT, UPDATE, and DELETE statements to determine which rows to act on.

Q2: What is the difference between correlated and non-correlated subqueries?

A non-correlated subquery runs once independently — it has no reference to the outer query.

SELECT first_name FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Inner query has no outer reference. Runs once. Result reused for all rows.

A correlated subquery references a column from the outer query.
It re-executes once per outer row.

SELECT first_name, department FROM employees e
WHERE salary > (SELECT AVG(salary) FROM employees
                WHERE department = e.department);
-- e.department is the outer reference. Runs N times for N outer rows.

Q3: Write a query to find employees earning above their department average.

-- Correlated subquery approach (readable, correct)
SELECT first_name, department, salary
FROM employees e
WHERE salary > (
  SELECT AVG(salary) FROM employees WHERE department = e.department
)
ORDER BY department, salary DESC;

-- JOIN + inline view approach (more performant on large tables)
SELECT e.first_name, e.department, e.salary
FROM employees e
JOIN (
  SELECT department, AVG(salary) AS dept_avg
  FROM employees GROUP BY department
) d ON d.department = e.department
WHERE e.salary > d.dept_avg;

Q4: Find products that have never been ordered.

-- NOT EXISTS (safest — NULL-proof)
SELECT name, category, price
FROM products p
WHERE NOT EXISTS (
  SELECT 1 FROM order_items oi WHERE oi.product_id = p.id
);

-- LEFT JOIN + IS NULL (classic equivalent)
SELECT p.name, p.category, p.price
FROM products p
LEFT JOIN order_items oi ON oi.product_id = p.id
WHERE oi.id IS NULL;

-- NOT IN (only safe if NULLs are explicitly excluded)
SELECT name FROM products
WHERE id NOT IN (
  SELECT product_id FROM order_items WHERE product_id IS NOT NULL
);

Q5: What is the difference between EXISTS and IN? When do you use each?

EXISTS evaluates whether the subquery returns any rows — it short-circuits at the first match and returns TRUE or FALSE. It is always correlated. Use EXISTS for large subqueries and always use NOT EXISTS (never NOT IN) when the subquery might contain NULLs. IN compares a value against a complete result set. Use IN for small fixed value lists. In modern databases (PostgreSQL 9+, MySQL 8+), IN with a subquery is often rewritten to the same semi-join plan as EXISTS. The practical difference is that NOT IN is dangerous with NULLs while NOT EXISTS is always correct.

Q6: What is a scalar subquery? When does it throw an error?

A scalar subquery returns exactly one row and one column.
Use it anywhere a single value is expected: SELECT list, WHERE =, or expressions.

Throws a runtime error when:
• Used with = but returns multiple rows at runtime
• Used in SELECT but returns multiple rows
• Returns more than one column

Safe pattern — guarantees scalar with MAX():
SELECT name,
  (SELECT MAX(price) FROM products) AS max_price
FROM categories;

Unsafe pattern — multiple employees may share 'John':
WHERE id = (SELECT id FROM employees WHERE first_name = 'John')

Q7: Write a query to find the second highest salary.

-- Subquery approach — works in all SQL dialects
SELECT MAX(salary) AS second_highest_salary
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

-- DENSE_RANK approach — generalises to Nth highest
SELECT salary FROM (
  SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
  FROM employees
) ranked
WHERE rnk = 2 LIMIT 1;

Q8: What is an inline view? How does it differ from a CTE?

An inline view is a subquery in the FROM clause, scoped to the single query it appears in. A CTE (WITH clause) defines a named temporary result set scoped to the entire statement that can be referenced multiple times. CTEs are preferred when derived data is needed more than once or when readability benefits from named logical steps. Inline views are preferred for one-off derived tables in short queries. Both are semantically equivalent in most databases.

Q9: What is the NULL trap with NOT IN?

NOT IN returns zero rows if the subquery produces any NULL value.

Reason: any comparison with NULL returns UNKNOWN, not FALSE.
NOT IN (1, 2, NULL) for any outer value evaluates to UNKNOWN — never TRUE.
UNKNOWN rows are excluded — so the entire result is empty.

This is a silent bug — no error, just wrong (empty) results.

ALWAYS use NOT EXISTS instead:
SELECT name FROM products p
WHERE NOT EXISTS (SELECT 1 FROM order_items oi WHERE oi.product_id = p.id);

Or filter NULLs explicitly:
WHERE id NOT IN (SELECT product_id FROM order_items WHERE product_id IS NOT NULL);

Q10: How do you rewrite a slow correlated subquery?

Original correlated form (re-runs once per outer row):
SELECT e.first_name, e.salary
FROM employees e
WHERE e.salary > (
  SELECT AVG(salary) FROM employees WHERE department = e.department
);

Rewritten as JOIN + CTE (computes dept averages once):
WITH dept_avgs AS (
  SELECT department, AVG(salary) AS dept_avg
  FROM employees GROUP BY department
)
SELECT e.first_name, e.salary
FROM employees e
JOIN dept_avgs d ON d.department = e.department
WHERE e.salary > d.dept_avg;

Always verify with EXPLAIN first — modern optimizers may already produce
the same plan for both versions, making the rewrite unnecessary.

Q11: Can you use a subquery in a HAVING clause?

Yes. HAVING filters groups after GROUP BY, and a subquery can supply
the comparison value — exactly like in WHERE.

Find departments whose average salary exceeds the company-wide average:
SELECT department, ROUND(AVG(salary), 0) AS avg_sal, COUNT(*) AS headcount
FROM employees
GROUP BY department
HAVING AVG(salary) > (SELECT AVG(salary) FROM employees)
ORDER BY avg_sal DESC;

Q12: Write a query to find users active in every month of 2025.

SELECT user_id, COUNT(DISTINCT STRFTIME('%m', order_date)) AS months_active
FROM orders
WHERE order_date BETWEEN '2025-01-01' AND '2025-12-31'
GROUP BY user_id
HAVING COUNT(DISTINCT STRFTIME('%m', order_date)) = 12;

-- With user name (wrapping the above as a subquery):
SELECT u.first_name, u.email
FROM users u
WHERE u.id IN (
  SELECT user_id FROM orders
  WHERE order_date BETWEEN '2025-01-01' AND '2025-12-31'
  GROUP BY user_id
  HAVING COUNT(DISTINCT STRFTIME('%m', order_date)) = 12
);

Q13: What is the difference between a subquery in FROM vs WHERE?

A subquery in WHERE is a filter condition — it determines which rows the outer query returns but cannot expose its own columns to the outer SELECT. A subquery in FROM creates an inline view — a temporary derived table whose columns are fully available to the outer SELECT, JOIN, WHERE, and ORDER BY. FROM subqueries expose their columns; WHERE subqueries do not.

Q14: Find the top 2 earners per department using a subquery.

-- Correlated subquery: count how many people in the same dept earn more
SELECT e1.first_name, e1.department, e1.salary
FROM employees e1
WHERE (
  SELECT COUNT(*) FROM employees e2
  WHERE e2.department = e1.department
    AND e2.salary > e1.salary
) < 2   -- fewer than 2 people in same dept earn more than this employee
ORDER BY e1.department, e1.salary DESC;

-- Modern alternative: ROW_NUMBER() window function (cleaner)
SELECT first_name, department, salary
FROM (
  SELECT first_name, department, salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
  FROM employees
) ranked
WHERE rn <= 2;

Q15: What are the performance implications of subqueries in production?

Non-correlated subqueries and inline views generally perform comparably to equivalent JOINs — modern optimizers rewrite them into the same physical plan. Correlated subqueries that execute once per outer row are the main concern: on a 500,000-row table this means 500,000 inner executions unless the optimizer converts the query to a join. Always run EXPLAIN before optimizing. Index the correlation columns — they convert sequential inner scans into fast index lookups. When benchmarking, test both the subquery and JOIN versions on representative data volumes: the optimizer's decision changes with data distribution and table sizes.

Frequently Asked Questions

What is a subquery in SQL?

A subquery is a SELECT statement nested inside another SQL query. The outer query uses the result produced by the inner query to filter rows, compute a value, or define a temporary table. Subqueries can appear in WHERE, FROM, SELECT, and HAVING clauses.

What is the difference between a correlated and non-correlated subquery?

A non-correlated subquery executes once and independently — it has no reference to the outer query. A correlated subquery references one or more columns from the outer query, forcing it to re-execute once per row the outer query processes. Non-correlated subqueries are generally more efficient; correlated subqueries are more flexible for per-row comparisons.

Can a subquery return multiple rows?

Yes — when used with IN, NOT IN, ANY, or ALL. A subquery used with = or in the SELECT clause must return exactly one value. Using = with a multi-row subquery throws a runtime error in all major SQL databases.

Are subqueries bad for performance?

Non-correlated subqueries are rarely a performance problem. Correlated subqueries can be slow if they execute once per outer row on large tables, but modern optimizers (PostgreSQL 9+, MySQL 8+, SQL Server 2016+) often rewrite them into efficient join plans automatically. Always check the execution plan before assuming a subquery is causing a slowdown.

What is a nested query in SQL?

A nested query is another term for a subquery — a SELECT statement embedded inside another SQL statement. The inner SELECT is the nested query or inner query; the outer statement is the parent query. The terms are used interchangeably.

Can subqueries replace JOINs?

For existence checks and filtering, yes — EXISTS and IN subqueries are often cleaner than JOIN + DISTINCT. For retrieving columns from multiple tables, no — subqueries in WHERE cannot expose inner table columns to the outer SELECT. Use the pattern that matches your intent.

What is the difference between EXISTS and IN?

EXISTS checks whether the subquery returns any rows, short-circuits at the first match, and returns TRUE or FALSE. IN compares a value against a complete result set. The critical practical difference: NOT IN is unsafe when the subquery can return NULLs and silently returns zero rows. NOT EXISTS is always NULL-safe.

What is a scalar subquery?

A scalar subquery returns exactly one column and one row — a single value. It can be used anywhere a single value is expected: in SELECT to compute a derived column, in WHERE with =, or in expressions. It throws a runtime error if it returns more than one row.

What is an inline view in SQL?

An inline view is a subquery in the FROM clause that creates a temporary named derived table. The outer query treats it exactly like a real table. An alias is required. Inline views are especially useful for applying WHERE conditions or ORDER BY to aggregated results.

How do subqueries work in MySQL, PostgreSQL, and SQL Server?

Subquery syntax is standard across MySQL, PostgreSQL, Oracle, and SQL Server. The key differences are in optimizer sophistication: PostgreSQL and SQL Server are generally more aggressive at converting correlated subqueries into efficient semi-join plans. MySQL improved significantly in version 8.0. In all databases, use EXPLAIN to confirm the actual execution plan used — it removes the guesswork.

Practice Exercises

Beginner exercises

Run these in the SQLab Hub SQL playground against the pre-loaded tables
(users, orders, products, employees, departments).

1. Find all products priced above the average product price.
   Hint: WHERE price > (SELECT AVG(price) FROM products)

2. List all employees from the department with the highest headcount.
   Hint: WHERE department = (SELECT department FROM employees
         GROUP BY department ORDER BY COUNT(*) DESC LIMIT 1)

3. Find all orders with a total above the average completed order total.
   Hint: WHERE total > (SELECT AVG(total) FROM orders WHERE status = 'completed')

4. List all users who have placed at least one order.
   Hint: WHERE EXISTS (SELECT 1 FROM orders WHERE user_id = u.id)

5. Find users who have NEVER placed an order.
   Hint: WHERE NOT EXISTS (SELECT 1 FROM orders WHERE user_id = u.id)

Intermediate exercises

6. Find employees who earn exactly the maximum salary in their department.
   Hint: correlated subquery WHERE salary = (SELECT MAX(salary) WHERE dept = e.dept)

7. Find the top 3 users by lifetime order value using only a subquery (no window functions).
   Hint: inline view with SUM(total) GROUP BY user_id, outer LIMIT 3.

8. List products ordered in December 2025 but NOT in January 2026.
   Hint: IN (December order_items product_ids) AND NOT IN (January product_ids)

9. Write a query returning each department alongside whether its average salary
   is above or below the company average (use a CASE WHEN + scalar subquery).

10. Find all categories that contain more products than the average category size.
    Hint: HAVING COUNT(p.id) > (SELECT AVG(cnt) FROM (...) x)

Advanced exercises

11. Find the top 2 highest-paid employees per department using a correlated
    subquery (no window functions).
    Hint: WHERE (SELECT COUNT(*) FROM employees e2 WHERE dept = e1.dept
          AND salary > e1.salary) < 2

12. Compute the running average order total per user using only subqueries
    (no window functions, no CTEs).
    Hint: correlated subquery AVG over all prior orders.

13. Find users who placed orders in at least 3 consecutive months in 2025.
    Hint: join orders to itself offset by 1 and 2 months, or use
    a subquery checking for the existence of month+1 and month+2 orders.

14. Build a complete customer summary: each user's total orders, lifetime value,
    first order date, last order date, and percentile rank by lifetime value —
    using correlated subqueries only (no window functions).

15. Write a query to detect products whose average review score is below
    the average review score of their category, using both a correlated
    subquery and an inline view approach. Compare the resulting SQL.

Conclusion

When to use each subquery type

Type                    | Best For
------------------------|--------------------------------------------------
Non-correlated WHERE    | Filter rows based on a global aggregate or lookup
Correlated WHERE        | Per-row comparisons requiring outer query context
EXISTS / NOT EXISTS     | Existence checks, NULL-safe anti-join patterns
IN / NOT IN             | Matching small sets (always exclude NULLs for NOT IN)
Scalar (SELECT clause)  | Inline computed columns, per-row derived values
Inline view (FROM)      | Aggregating aggregates, pre-filtering GROUP BY output
HAVING subquery         | Filter groups based on an external computed value

Performance summary

Non-correlated subqueries and inline views are efficient — the optimizer evaluates them once or rewrites them into optimal join plans. Correlated subqueries can be slow on large tables but are often automatically rewritten by modern optimizers. Always run EXPLAIN before manually rewriting. Index the correlation columns to turn sequential inner scans into fast index lookups. Use NOT EXISTS instead of NOT IN whenever NULLs are possible. Prefer CTEs over deeply nested subqueries for anything beyond two levels of nesting.

Building subquery fluency

The best way to internalize subqueries is to write them against real data. The SQLab Hub SQL playground has 8 pre-loaded tables covering every pattern in this guide — run the examples, modify them, and work through the exercises above. For interview preparation, complement this guide with the Top 30 SQL Interview Questions. For the closest related concepts, read SQL JOINs Explained and the Window Functions guide. Together — subqueries, JOINs, and window functions — you can handle nearly any SQL challenge an interviewer or production codebase can throw at you.

Start practicing SQL for free

Run every query from this guide in the SQLab Hub interactive SQL playground — no signup, no download, instant results.

Open SQL Playground →View SQL Exercises →

Related Articles

Top 50 SQL Interview Questions & Answers (2026 Guide)25 min readSQL JOINs Explained: INNER, LEFT, RIGHT, FULL OUTER & Real Examples24 min readGROUP BY vs HAVING: What's the Difference?10 min read