GROUP BY and HAVING are among the most misunderstood clauses in SQL โ and mixing them up is one of the most common errors beginners make. If you've ever tried filtering a COUNT(*) with WHERE and got a cryptic error, this guide explains exactly why that happens and how to fix it. We cover the difference between GROUP BY and HAVING in depth, with a full comparison, the SQL execution order, real-world examples, and the most common mistakes. Whether you're learning SQL for the first time or preparing for an interview, this is the only guide you'll need. Last updated: May 2026.
1. GROUP BY โ how it groups and aggregates rows 2. HAVING โ how it filters aggregated results (and why WHERE cannot) 3. Key Differences โ GROUP BY vs HAVING side by side 4. SQL Execution Order โ the full query pipeline explained 5. Real-world Examples โ cities, revenue, duplicate detection 6. Common Mistakes โ the errors beginners make and how to fix them 7. FAQs โ quick answers to the most-searched questions
GROUP BY collapses multiple rows into a single summary row per unique group. Think of it as sorting your data into labelled buckets โ one bucket per unique value in the grouping column โ then computing a single number for each bucket. Every column in your SELECT must either appear in GROUP BY or be wrapped in an aggregate function (COUNT, SUM, AVG, MIN, MAX). Selecting a column that is neither grouped nor aggregated is an error in most SQL databases โ and silently wrong in MySQL.
SELECT department, COUNT(*) AS headcount, ROUND(AVG(salary), 0) AS avg_salary FROM employees GROUP BY department ORDER BY avg_salary DESC; -- Output: -- department | headcount | avg_salary -- Engineering | 24 | 95000 -- Marketing | 12 | 72000 -- Sales | 18 | 68000 -- 'Engineering' appears once, representing all 24 engineers. -- COUNT(*) tells you how many employees are in each department. -- AVG(salary) collapses every salary into a single average.
SELECT user_id, COUNT(*) AS total_orders, ROUND(SUM(total), 2) AS lifetime_value FROM orders GROUP BY user_id ORDER BY lifetime_value DESC LIMIT 10; -- Each user_id becomes one row โ summarising all their orders -- into a single count (how many?) and a dollar total (how much?). -- This is the foundation of customer lifetime value analysis. -- See also: SQL JOINs Explained to learn how to join users to orders.
HAVING filters groups AFTER GROUP BY has run. It is the WHERE clause for aggregated results. Here is why HAVING had to be invented: WHERE runs before GROUP BY, so at the point WHERE executes, there is no COUNT(*) or SUM() yet โ aggregation has not happened. You cannot use WHERE to say "only show departments with more than 10 employees" because WHERE does not know what COUNT(*) is at that stage. HAVING solves this. It runs after GROUP BY, so it has full access to every computed aggregate value.
-- WHERE filters individual rows BEFORE grouping: SELECT department, COUNT(*) AS headcount FROM employees WHERE salary > 50000 -- keeps only high-earning employees GROUP BY department; -- HAVING filters groups AFTER grouping: SELECT department, COUNT(*) AS headcount FROM employees GROUP BY department HAVING COUNT(*) > 10; -- keeps only large departments -- You can use BOTH in the same query: SELECT department, COUNT(*) AS headcount FROM employees WHERE salary > 50000 -- filter rows first GROUP BY department HAVING COUNT(*) > 5; -- then filter the resulting groups
SELECT category, COUNT(*) AS product_count, ROUND(AVG(price), 2) AS avg_price FROM products GROUP BY category HAVING AVG(price) > 100 ORDER BY avg_price DESC; -- Only categories where the average price exceeds $100 are returned. -- This is impossible with WHERE โ at the time WHERE runs, -- AVG(price) per category does not yet exist.
WHERE = filter rows (runs before GROUP BY) HAVING = filter groups (runs after GROUP BY) Memory trick: HAVING is WHERE for aggregates.
Feature | GROUP BY | HAVING --------------------|------------------------------------|----------------------------------------- Purpose | Groups rows into summary rows | Filters groups after aggregation Works on | Raw column values | Aggregate results (COUNT, SUM, AVGโฆ) Execution stage | After WHERE, before HAVING | After GROUP BY Common use | COUNT(*) per department | Only departments where COUNT(*) > 10 Can reference | Any column in the table | Aggregate functions only Error if misused | Non-grouped column in SELECT | Aggregate function inside WHERE clause Analogy | Sort rows into labelled buckets | Throw away buckets that are too small
Understanding the SQL execution order is the single fastest way to eliminate WHERE vs HAVING confusion forever. SQL always runs in this fixed order: 1. FROM โ load the table(s), apply JOINs 2. WHERE โ filter individual rows (no aggregates available yet) 3. GROUP BY โ group the remaining rows into buckets 4. HAVING โ filter the groups (aggregates are now computed) 5. SELECT โ compute the final output columns 6. ORDER BY โ sort the result set 7. LIMIT โ return the first N rows Key takeaways: โข WHERE runs BEFORE aggregation โ cannot reference COUNT(*), SUM(), etc. โข HAVING runs AFTER aggregation โ can only filter on aggregated values โข SELECT aliases (like "AS headcount") are not yet defined when WHERE and HAVING execute
SELECT department, COUNT(*) AS headcount FROM employees -- Step 1: load the table WHERE salary > 40000 -- Step 2: keep only employees earning > $40k GROUP BY department -- Step 3: group into one row per department HAVING COUNT(*) > 3 -- Step 4: only keep departments with 4+ employees ORDER BY headcount DESC -- Step 5: sort largest departments first LIMIT 5; -- Step 6: return the top 5 -- WHERE + GROUP BY + HAVING together is extremely common in -- real-world reporting and analytics queries.
You are a marketing analyst preparing a city-level campaign. Budget is limited, so you only want to target cities with at least 50 registered customers โ smaller markets are not worth the spend. SELECT city, COUNT(*) AS customer_count FROM users GROUP BY city HAVING COUNT(*) >= 50 ORDER BY customer_count DESC; -- city | customer_count -- New York | 312 -- Los Angeles | 218 -- Chicago | 143 -- Without HAVING you'd get every city including those with one customer. -- HAVING filters the result down to only the markets large enough to target.
Your sales director wants a weekly report: which product categories generated more than $10,000 from completed orders?
SELECT p.category,
COUNT(oi.id) AS units_sold,
ROUND(SUM(oi.quantity * oi.unit_price), 2) AS revenue
FROM order_items oi
JOIN products p ON p.id = oi.product_id
JOIN orders o ON o.id = oi.order_id
WHERE o.status = 'completed' -- filter to completed orders only
GROUP BY p.category
HAVING SUM(oi.quantity * oi.unit_price) > 10000
ORDER BY revenue DESC;
-- WHERE keeps only completed orders (reduces rows before aggregation).
-- GROUP BY sums up units and revenue per category.
-- HAVING drops any category that did not clear the $10,000 threshold.
-- See the SQL JOINs Explained guide for a deep dive on multi-table JOINs.SELECT email, COUNT(*) AS occurrences FROM users GROUP BY email HAVING COUNT(*) > 1 ORDER BY occurrences DESC; -- Any email that appears more than once is a duplicate. -- The pattern GROUP BY column + HAVING COUNT(*) > 1 is one -- of the most frequently asked SQL interview questions. -- Try it in the SQLab Hub playground on the users table.
-- WRONG โ โ throws an error SELECT department, COUNT(*) AS headcount FROM employees WHERE COUNT(*) > 10 -- Error: aggregate functions not allowed in WHERE GROUP BY department; -- CORRECT โ SELECT department, COUNT(*) AS headcount FROM employees GROUP BY department HAVING COUNT(*) > 10; -- Why it fails: WHERE executes before GROUP BY, so COUNT(*) has -- not been computed yet. There is nothing to compare against.
-- WRONG โ โ ambiguous: which employee's first_name do you want? SELECT department, first_name, COUNT(*) FROM employees GROUP BY department; -- Error in PostgreSQL, SQLite, SQL Server. -- MySQL may silently pick a random value โ never rely on this. -- CORRECT โ SELECT department, COUNT(*) FROM employees GROUP BY department; -- Why it fails: when grouped by department, each result row -- represents many employees. SQL has no rule for which -- employee's first_name to pick, so it throws an error.
-- MAY FAIL depending on your database โ SELECT department, COUNT(*) AS headcount FROM employees GROUP BY department HAVING headcount > 10; -- alias 'headcount' not always recognised -- SAFE โ works in all SQL dialects โ SELECT department, COUNT(*) AS headcount FROM employees GROUP BY department HAVING COUNT(*) > 10; -- PostgreSQL and MySQL 8+ accept aliases in HAVING. -- SQLite and SQL Server do not. Repeating the aggregate is -- portable across every SQL database.
Yes, technically โ but it is rarely useful in practice. Without GROUP BY, the entire table is treated as a single group. SELECT COUNT(*) FROM users HAVING COUNT(*) > 1000; This returns the row count only if the table has more than 1,000 rows, and returns nothing otherwise. In almost every real-world query, HAVING is paired with GROUP BY.
WHERE filters individual rows before grouping and runs before GROUP BY. HAVING filters groups after grouping and runs after GROUP BY. Use WHERE when: you want to exclude rows before aggregation โ for example, WHERE status = 'active' or WHERE date > '2024-01-01'. Use HAVING when: you need to filter based on an aggregate result โ for example, HAVING COUNT(*) > 5 or HAVING SUM(revenue) > 10000. You can use both in the same query: WHERE reduces the input rows first, GROUP BY groups them, then HAVING filters the resulting groups.
WHERE is almost always faster than HAVING for filtering non-aggregate conditions. Reason: WHERE runs before GROUP BY, so it reduces the number of rows the database must aggregate. Fewer input rows means faster grouping and faster overall query. Practical rule: always push filters to WHERE unless the condition requires an aggregate. For example, use WHERE city = 'New York' instead of HAVING city = 'New York' โ both return the same rows, but WHERE is significantly faster because it filters before aggregation.
Use HAVING whenever your filter condition involves an aggregate function. The most common patterns: 1. Filter by count: HAVING COUNT(*) > 10 2. Filter by sum: HAVING SUM(revenue) > 1000 3. Filter by average: HAVING AVG(rating) >= 4.0 4. Find duplicates: HAVING COUNT(*) > 1 5. Minimum threshold: HAVING MIN(price) > 50 If the condition does not involve COUNT, SUM, AVG, MIN, or MAX โ use WHERE. It is faster and semantically clearer.
Yes โ GROUP BY without aggregates returns one row per unique value, exactly like SELECT DISTINCT: SELECT department FROM employees GROUP BY department; -- Same result as: SELECT DISTINCT department FROM employees; In practice this is rare. GROUP BY is almost always paired with at least one aggregate function. If you only need unique values, DISTINCT is cleaner and often faster.
Run every query from this guide in the SQLab Hub interactive SQL playground โ no signup, no download, instant results.