SQL Interview Questions & Answers
The SQL questions actually asked in Data Scientist, Data Analyst, and ML Engineer interviews — joins, aggregation, window functions, CTEs, cohort, retention, and funnel queries. Every question has a worked answer and a short explanation. Free, no signup.
Basics & Filtering
-
Q1
Find the second highest salary from an Employee table.
Easy▼ Show answer ▲ Hide answer
sqlSELECT MAX(salary) AS second_highest FROM employee WHERE salary < (SELECT MAX(salary) FROM employee); -- More general (Nth highest) with a window function: SELECT DISTINCT salary FROM ( SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM employee ) t WHERE rnk = 2;Why: The subquery approach is the classic answer, but it returns NULL when there is no second salary. DENSE_RANK handles ties and generalizes to the Nth highest by changing the rank value.
-
Q2
Select all duplicate emails from a Person table.
Easy▼ Show answer ▲ Hide answer
sqlSELECT email FROM person GROUP BY email HAVING COUNT(*) > 1;Why: GROUP BY collapses rows per email; HAVING filters on the aggregate (you cannot use WHERE COUNT(*) because WHERE runs before aggregation).
-
Q3
Why use HAVING instead of WHERE, and in what order do SQL clauses execute?
Easy▼ Show answer ▲ Hide answer
sql-- Logical execution order: -- FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> DISTINCT -> ORDER BY -> LIMIT SELECT department, COUNT(*) AS headcount FROM employee WHERE active = 1 -- filters rows BEFORE grouping GROUP BY department HAVING COUNT(*) > 10 -- filters groups AFTER aggregation ORDER BY headcount DESC;Why: WHERE filters individual rows before grouping and cannot reference aggregates. HAVING filters groups after aggregation. Knowing the logical order explains why aliases from SELECT aren't usable in WHERE.
-
Q4
Find customers who have never placed an order.
Easy▼ Show answer ▲ Hide answer
sqlSELECT c.name FROM customers c LEFT JOIN orders o ON o.customer_id = c.id WHERE o.id IS NULL; -- Equivalent with NOT EXISTS (often faster on large tables): SELECT c.name FROM customers c WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.id );Why: A LEFT JOIN keeps all customers; rows with no matching order have NULL order columns. NOT EXISTS expresses the same anti-join and often optimizes better because it short-circuits.
Joins
-
Q5
Explain the difference between INNER, LEFT, RIGHT, and FULL OUTER JOIN.
Easy▼ Show answer ▲ Hide answer
sql-- INNER: only matching rows from both tables -- LEFT: all left rows + matched right rows (NULLs where no match) -- RIGHT: all right rows + matched left rows -- FULL: all rows from both, NULLs where no match on either side SELECT a.id, b.id FROM a LEFT JOIN b ON a.key = b.key;Why: Interviewers want to hear what happens to unmatched rows. LEFT JOIN + WHERE right.key IS NULL is the canonical anti-join pattern.
-
Q6
Self-join: find all employees who earn more than their manager.
Medium▼ Show answer ▲ Hide answer
sqlSELECT e.name AS employee FROM employee e JOIN employee m ON e.manager_id = m.id WHERE e.salary > m.salary;Why: A self-join aliases the same table twice — one instance as the employee, one as the manager — joining on the manager_id foreign key that points back into the same table.
-
Q7
What is a cross join and when would you intentionally use one?
Medium▼ Show answer ▲ Hide answer
sql-- Generate every (date, product) combination to fill gaps: SELECT d.day, p.product_id FROM calendar d CROSS JOIN products p LEFT JOIN sales s ON s.day = d.day AND s.product_id = p.product_id;Why: A cross join produces the Cartesian product (every row of A with every row of B). It's useful to build a complete grid (e.g., all date-product pairs) so you can LEFT JOIN actual data and surface missing combinations as zeros.
-
Q8
Find pairs of products frequently bought together in the same order.
Medium▼ Show answer ▲ Hide answer
sqlSELECT a.product_id AS p1, b.product_id AS p2, COUNT(*) AS times FROM order_items a JOIN order_items b ON a.order_id = b.order_id AND a.product_id < b.product_id -- avoid (x,x) and duplicate (x,y)/(y,x) GROUP BY a.product_id, b.product_id ORDER BY times DESC LIMIT 10;Why: The a.product_id < b.product_id predicate is the key trick: it removes self-pairs and counts each unordered pair once. This is the basis of simple market-basket / co-purchase analysis.
Aggregation & GROUP BY
-
Q9
Compute revenue per category, and the % each category contributes to total revenue.
Medium▼ Show answer ▲ Hide answer
sqlSELECT category, SUM(amount) AS revenue, ROUND(100.0 * SUM(amount) / SUM(SUM(amount)) OVER (), 2) AS pct_of_total FROM sales GROUP BY category ORDER BY revenue DESC;Why: SUM(SUM(amount)) OVER () is a window over the grouped result — the inner SUM aggregates per category, the outer windowed SUM totals across all groups. This avoids a separate subquery for the grand total.
-
Q10
Pivot: count orders per status into columns (pending, shipped, delivered).
Medium▼ Show answer ▲ Hide answer
sqlSELECT SUM(CASE WHEN status = 'pending' THEN 1 ELSE 0 END) AS pending, SUM(CASE WHEN status = 'shipped' THEN 1 ELSE 0 END) AS shipped, SUM(CASE WHEN status = 'delivered' THEN 1 ELSE 0 END) AS delivered FROM orders;Why: Conditional aggregation (SUM of CASE) is the portable way to pivot rows into columns when your dialect lacks a PIVOT operator. COUNT(CASE WHEN ... THEN 1 END) also works since COUNT ignores NULLs.
-
Q11
Find the average number of orders per customer.
Medium▼ Show answer ▲ Hide answer
sqlSELECT AVG(order_count) AS avg_orders_per_customer FROM ( SELECT customer_id, COUNT(*) AS order_count FROM orders GROUP BY customer_id ) t;Why: You must aggregate twice: first count orders per customer, then average those counts. A single AVG over the raw table would give meaningless results.
-
Q12
Find the median salary per department (without a built-in MEDIAN).
Hard▼ Show answer ▲ Hide answer
sqlSELECT department, AVG(salary) AS median_salary FROM ( SELECT department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary) AS rn_asc, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn_desc, COUNT(*) OVER (PARTITION BY department) AS cnt FROM employee ) t WHERE rn_asc IN (rn_desc, rn_desc - 1, rn_desc + 1) GROUP BY department;Why: The median sits where the ascending and descending row numbers meet (within 1). Averaging the 1-2 middle rows handles both odd and even group sizes. Many dialects now offer PERCENTILE_CONT(0.5).
Subqueries & CTEs
-
Q13
What is a CTE and how does a recursive CTE work? Walk an org hierarchy.
Medium▼ Show answer ▲ Hide answer
sqlWITH RECURSIVE org AS ( SELECT id, name, manager_id, 1 AS lvl FROM employee WHERE manager_id IS NULL -- anchor: the CEO UNION ALL SELECT e.id, e.name, e.manager_id, o.lvl + 1 FROM employee e JOIN org o ON e.manager_id = o.id -- recursive step ) SELECT * FROM org ORDER BY lvl;Why: A CTE (WITH clause) is a named temporary result set that improves readability over nested subqueries. A recursive CTE has an anchor member plus a recursive member that references the CTE itself — ideal for hierarchies, graphs, and number/date series.
-
Q14
Correlated vs non-correlated subquery — what's the difference?
Medium▼ Show answer ▲ Hide answer
sql-- Correlated: inner query references the outer row (runs per outer row) SELECT e.name FROM employee e WHERE e.salary > ( SELECT AVG(salary) FROM employee WHERE department = e.department ); -- Non-correlated: inner query is independent (runs once) SELECT name FROM employee WHERE salary > (SELECT AVG(salary) FROM employee);Why: A correlated subquery depends on a column from the outer query, so conceptually it executes once per outer row (optimizers often rewrite it). A non-correlated subquery is self-contained and evaluated a single time.
-
Q15
Find departments where every employee earns above 50k.
Hard▼ Show answer ▲ Hide answer
sqlSELECT department FROM employee GROUP BY department HAVING MIN(salary) > 50000; -- Or with NOT EXISTS ("no employee below 50k"): SELECT DISTINCT department FROM employee d WHERE NOT EXISTS ( SELECT 1 FROM employee e WHERE e.department = d.department AND e.salary <= 50000 );Why: "All rows satisfy X" is equivalent to "the minimum satisfies X" — so HAVING MIN(salary) > 50000 is the cleanest answer. The NOT EXISTS form expresses the same universal quantifier as 'there is no counterexample'.
Window Functions
-
Q16
Rank employees by salary within each department; give the top 3 per department.
Medium▼ Show answer ▲ Hide answer
sqlWITH ranked AS ( SELECT name, department, salary, DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk FROM employee ) SELECT name, department, salary FROM ranked WHERE rnk <= 3;Why: Top-N-per-group is THE most common window-function interview question. PARTITION BY resets the ranking per department. Use ROW_NUMBER for a strict cut, RANK/DENSE_RANK to keep ties.
-
Q17
Difference between ROW_NUMBER, RANK, and DENSE_RANK.
Medium▼ Show answer ▲ Hide answer
sqlSELECT score, ROW_NUMBER() OVER (ORDER BY score DESC) AS rn, -- 1,2,3,4 (no ties) RANK() OVER (ORDER BY score DESC) AS rnk, -- 1,2,2,4 (gaps) DENSE_RANK() OVER (ORDER BY score DESC) AS drnk -- 1,2,2,3 (no gaps) FROM scores;Why: ROW_NUMBER assigns unique sequential numbers even on ties. RANK gives ties the same rank but leaves gaps. DENSE_RANK gives ties the same rank with no gaps. Pick based on whether ties and gaps matter.
-
Q18
Compute a running (cumulative) total of daily revenue.
Medium▼ Show answer ▲ Hide answer
sqlSELECT day, revenue, SUM(revenue) OVER (ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total FROM daily_revenue ORDER BY day;Why: The frame clause ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW defines the cumulative window. Without an explicit frame, the default RANGE frame can behave unexpectedly with duplicate order keys.
-
Q19
For each user, find the time gap between consecutive logins.
Medium▼ Show answer ▲ Hide answer
sqlSELECT user_id, login_at, LAG(login_at) OVER (PARTITION BY user_id ORDER BY login_at) AS prev_login, login_at - LAG(login_at) OVER (PARTITION BY user_id ORDER BY login_at) AS gap FROM logins;Why: LAG fetches the previous row's value within the partition; LEAD fetches the next. They are the standard tools for period-over-period deltas, session gaps, and churn detection without a self-join.
-
Q20
Compute a 7-day moving average of daily active users.
Hard▼ Show answer ▲ Hide answer
sqlSELECT day, dau, AVG(dau) OVER (ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS dau_7d_avg FROM daily_active_users ORDER BY day;Why: A sliding frame of 6 preceding rows + the current row gives a trailing 7-day average. Moving averages smooth out weekday seasonality and are a staple of metrics/analytics rounds.
Dates, Funnels & Retention
-
Q21
Count new users per month from a signups table.
Medium▼ Show answer ▲ Hide answer
sqlSELECT DATE_TRUNC('month', created_at) AS month, COUNT(*) AS new_users FROM users GROUP BY DATE_TRUNC('month', created_at) ORDER BY month; -- MySQL equivalent: -- SELECT DATE_FORMAT(created_at, '%Y-%m-01') AS month, COUNT(*) ...Why: DATE_TRUNC (Postgres) or DATE_FORMAT (MySQL) buckets timestamps into months. Truncating to the first of the month keeps results sortable as real dates rather than strings.
-
Q22
Compute Day-1 retention: of users who signed up, what % returned the next day?
Hard▼ Show answer ▲ Hide answer
sqlWITH signups AS ( SELECT user_id, MIN(DATE(event_time)) AS signup_day FROM events GROUP BY user_id ) SELECT s.signup_day, COUNT(DISTINCT s.user_id) AS cohort_size, COUNT(DISTINCT e.user_id) AS retained_d1, ROUND(100.0 * COUNT(DISTINCT e.user_id) / COUNT(DISTINCT s.user_id), 1) AS d1_pct FROM signups s LEFT JOIN events e ON e.user_id = s.user_id AND DATE(e.event_time) = s.signup_day + INTERVAL '1 day' GROUP BY s.signup_day ORDER BY s.signup_day;Why: Define each user's cohort (signup day), then LEFT JOIN events exactly one day later. COUNT(DISTINCT) on the retained users over the cohort size gives the retention rate. This cohort pattern generalizes to D7/D30.
-
Q23
Funnel: of users who viewed a product, how many added to cart, then purchased?
Hard▼ Show answer ▲ Hide answer
sqlWITH steps AS ( SELECT user_id, MAX(CASE WHEN event = 'view' THEN 1 ELSE 0 END) AS viewed, MAX(CASE WHEN event = 'add_cart' THEN 1 ELSE 0 END) AS added, MAX(CASE WHEN event = 'purchase' THEN 1 ELSE 0 END) AS bought FROM events GROUP BY user_id ) SELECT SUM(viewed) AS viewed, SUM(CASE WHEN viewed = 1 AND added = 1 THEN 1 ELSE 0 END) AS added, SUM(CASE WHEN viewed = 1 AND added = 1 AND bought = 1 THEN 1 ELSE 0 END) AS bought FROM steps;Why: Collapse each user's events into step flags, then count users who completed each successive stage. Conditioning each stage on the previous one (viewed AND added) gives true sequential funnel conversion.
-
Q24
Find the longest streak of consecutive days each user was active (gaps-and-islands).
Hard▼ Show answer ▲ Hide answer
sqlWITH d AS ( SELECT DISTINCT user_id, DATE(event_time) AS day FROM events ), g AS ( SELECT user_id, day, day - (ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY day))::int AS grp FROM d ) SELECT user_id, MAX(streak) AS longest_streak FROM ( SELECT user_id, grp, COUNT(*) AS streak FROM g GROUP BY user_id, grp ) t GROUP BY user_id;Why: The classic gaps-and-islands trick: subtract a per-user ROW_NUMBER from the date. Consecutive days share the same (date - rownum) value, forming an 'island' you can group and count. This pattern appears constantly in analytics interviews.