Find the second highest salary from an Employee table.

The subquery approach is the classic answer, but it returns NULL when there is no second salary. DENSE_RANK handles ties and generalizes to the Nth highest by changing the rank value.

Select all duplicate emails from a Person table.

GROUP BY collapses rows per email; HAVING filters on the aggregate (you cannot use WHERE COUNT(*) because WHERE runs before aggregation).

Self-join: find all employees who earn more than their manager.

A self-join aliases the same table twice — one instance as the employee, one as the manager — joining on the manager_id foreign key that points back into the same table.

Compute revenue per category, and the % each category contributes to total revenue.

SUM(SUM(amount)) OVER () is a window over the grouped result — the inner SUM aggregates per category, the outer windowed SUM totals across all groups. This avoids a separate subquery for the grand total.

Pivot: count orders per status into columns (pending, shipped, delivered).

Conditional aggregation (SUM of CASE) is the portable way to pivot rows into columns when your dialect lacks a PIVOT operator. COUNT(CASE WHEN ... THEN 1 END) also works since COUNT ignores NULLs.

What is a CTE and how does a recursive CTE work? Walk an org hierarchy.

A CTE (WITH clause) is a named temporary result set that improves readability over nested subqueries. A recursive CTE has an anchor member plus a recursive member that references the CTE itself — ideal for hierarchies, graphs, and number/date series.

Correlated vs non-correlated subquery — what's the difference?

A correlated subquery depends on a column from the outer query, so conceptually it executes once per outer row (optimizers often rewrite it). A non-correlated subquery is self-contained and evaluated a single time.

Rank employees by salary within each department; give the top 3 per department.

Top-N-per-group is THE most common window-function interview question. PARTITION BY resets the ranking per department. Use ROW_NUMBER for a strict cut, RANK/DENSE_RANK to keep ties.

Difference between ROW_NUMBER, RANK, and DENSE_RANK.

ROW_NUMBER assigns unique sequential numbers even on ties. RANK gives ties the same rank but leaves gaps. DENSE_RANK gives ties the same rank with no gaps. Pick based on whether ties and gaps matter.

Count new users per month from a signups table.

DATE_TRUNC (Postgres) or DATE_FORMAT (MySQL) buckets timestamps into months. Truncating to the first of the month keeps results sortable as real dates rather than strings.

24 questions · 6 topics · with solutions

SQL Interview Questions & Answers

The SQL questions actually asked in Data Scientist, Data Analyst, and ML Engineer interviews — joins, aggregation, window functions, CTEs, cohort, retention, and funnel queries. Every question has a worked answer and a short explanation. Free, no signup.

24Questions

6Topics

WinWindow functions

CTERecursive + analytics

Basics & Filtering

Q1
Find the second highest salary from an Employee table.
Easy

AmazonMeta
▼ Show answer ▲ Hide answer
sql
```
SELECT MAX(salary) AS second_highest
FROM employee
WHERE salary < (SELECT MAX(salary) FROM employee);

-- More general (Nth highest) with a window function:
SELECT DISTINCT salary
FROM (
  SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
  FROM employee
) t
WHERE rnk = 2;
```
Why: The subquery approach is the classic answer, but it returns NULL when there is no second salary. DENSE_RANK handles ties and generalizes to the Nth highest by changing the rank value.
Q2
Select all duplicate emails from a Person table.
Easy

Meta
▼ Show answer ▲ Hide answer
sql
```
SELECT email
FROM person
GROUP BY email
HAVING COUNT(*) > 1;
```
Why: GROUP BY collapses rows per email; HAVING filters on the aggregate (you cannot use WHERE COUNT(*) because WHERE runs before aggregation).

Why use HAVING instead of WHERE, and in what order do SQL clauses execute?

Easy

Google

▼ Show answer ▲ Hide answer

sql

-- Logical execution order:
-- FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> DISTINCT -> ORDER BY -> LIMIT

SELECT department, COUNT(*) AS headcount
FROM employee
WHERE active = 1          -- filters rows BEFORE grouping
GROUP BY department
HAVING COUNT(*) > 10      -- filters groups AFTER aggregation
ORDER BY headcount DESC;

Why: WHERE filters individual rows before grouping and cannot reference aggregates. HAVING filters groups after aggregation. Knowing the logical order explains why aliases from SELECT aren't usable in WHERE.

Q4
Find customers who have never placed an order.
Easy

Amazon
▼ Show answer ▲ Hide answer
sql
```
SELECT c.name
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.id IS NULL;

-- Equivalent with NOT EXISTS (often faster on large tables):
SELECT c.name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
```
Why: A LEFT JOIN keeps all customers; rows with no matching order have NULL order columns. NOT EXISTS expresses the same anti-join and often optimizes better because it short-circuits.

Joins

Explain the difference between INNER, LEFT, RIGHT, and FULL OUTER JOIN.

Easy

GoogleMicrosoft

▼ Show answer ▲ Hide answer

sql

-- INNER: only matching rows from both tables
-- LEFT:  all left rows + matched right rows (NULLs where no match)
-- RIGHT: all right rows + matched left rows
-- FULL:  all rows from both, NULLs where no match on either side

SELECT a.id, b.id
FROM a LEFT JOIN b ON a.key = b.key;

Why: Interviewers want to hear what happens to unmatched rows. LEFT JOIN + WHERE right.key IS NULL is the canonical anti-join pattern.

Q6
Self-join: find all employees who earn more than their manager.
Medium

AmazonApple
▼ Show answer ▲ Hide answer
sql
```
SELECT e.name AS employee
FROM employee e
JOIN employee m ON e.manager_id = m.id
WHERE e.salary > m.salary;
```
Why: A self-join aliases the same table twice — one instance as the employee, one as the manager — joining on the manager_id foreign key that points back into the same table.
Q7
What is a cross join and when would you intentionally use one?
Medium

Google
▼ Show answer ▲ Hide answer
sql
```
-- Generate every (date, product) combination to fill gaps:
SELECT d.day, p.product_id
FROM calendar d
CROSS JOIN products p
LEFT JOIN sales s
  ON s.day = d.day AND s.product_id = p.product_id;
```
Why: A cross join produces the Cartesian product (every row of A with every row of B). It's useful to build a complete grid (e.g., all date-product pairs) so you can LEFT JOIN actual data and surface missing combinations as zeros.
Q8
Find pairs of products frequently bought together in the same order.
Medium

Amazon
▼ Show answer ▲ Hide answer
sql
```
SELECT a.product_id AS p1, b.product_id AS p2, COUNT(*) AS times
FROM order_items a
JOIN order_items b
  ON a.order_id = b.order_id
 AND a.product_id < b.product_id   -- avoid (x,x) and duplicate (x,y)/(y,x)
GROUP BY a.product_id, b.product_id
ORDER BY times DESC
LIMIT 10;
```
Why: The a.product_id < b.product_id predicate is the key trick: it removes self-pairs and counts each unordered pair once. This is the basis of simple market-basket / co-purchase analysis.

Aggregation & GROUP BY

Q9
Compute revenue per category, and the % each category contributes to total revenue.
Medium

MicrosoftUber
▼ Show answer ▲ Hide answer
sql
```
SELECT category,
       SUM(amount) AS revenue,
       ROUND(100.0 * SUM(amount) / SUM(SUM(amount)) OVER (), 2) AS pct_of_total
FROM sales
GROUP BY category
ORDER BY revenue DESC;
```
Why: SUM(SUM(amount)) OVER () is a window over the grouped result — the inner SUM aggregates per category, the outer windowed SUM totals across all groups. This avoids a separate subquery for the grand total.
Q10
Pivot: count orders per status into columns (pending, shipped, delivered).
Medium

Amazon
▼ Show answer ▲ Hide answer
sql
```
SELECT
  SUM(CASE WHEN status = 'pending'   THEN 1 ELSE 0 END) AS pending,
  SUM(CASE WHEN status = 'shipped'   THEN 1 ELSE 0 END) AS shipped,
  SUM(CASE WHEN status = 'delivered' THEN 1 ELSE 0 END) AS delivered
FROM orders;
```
Why: Conditional aggregation (SUM of CASE) is the portable way to pivot rows into columns when your dialect lacks a PIVOT operator. COUNT(CASE WHEN ... THEN 1 END) also works since COUNT ignores NULLs.
Q11
Find the average number of orders per customer.
Medium

Meta
▼ Show answer ▲ Hide answer
sql
```
SELECT AVG(order_count) AS avg_orders_per_customer
FROM (
  SELECT customer_id, COUNT(*) AS order_count
  FROM orders
  GROUP BY customer_id
) t;
```
Why: You must aggregate twice: first count orders per customer, then average those counts. A single AVG over the raw table would give meaningless results.

Q12

Find the median salary per department (without a built-in MEDIAN).

Hard

Google

▼ Show answer ▲ Hide answer

sql

SELECT department, AVG(salary) AS median_salary
FROM (
  SELECT department, salary,
         ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary)     AS rn_asc,
         ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn_desc,
         COUNT(*)    OVER (PARTITION BY department)                       AS cnt
  FROM employee
) t
WHERE rn_asc IN (rn_desc, rn_desc - 1, rn_desc + 1)
GROUP BY department;

Why: The median sits where the ascending and descending row numbers meet (within 1). Averaging the 1-2 middle rows handles both odd and even group sizes. Many dialects now offer PERCENTILE_CONT(0.5).

Subqueries & CTEs

Q13
What is a CTE and how does a recursive CTE work? Walk an org hierarchy.
Medium

MicrosoftGoogle
▼ Show answer ▲ Hide answer
sql
```
WITH RECURSIVE org AS (
  SELECT id, name, manager_id, 1 AS lvl
  FROM employee
  WHERE manager_id IS NULL          -- anchor: the CEO
  UNION ALL
  SELECT e.id, e.name, e.manager_id, o.lvl + 1
  FROM employee e
  JOIN org o ON e.manager_id = o.id  -- recursive step
)
SELECT * FROM org ORDER BY lvl;
```
Why: A CTE (WITH clause) is a named temporary result set that improves readability over nested subqueries. A recursive CTE has an anchor member plus a recursive member that references the CTE itself — ideal for hierarchies, graphs, and number/date series.
Q14
Correlated vs non-correlated subquery — what's the difference?
Medium

Amazon
▼ Show answer ▲ Hide answer
sql
```
-- Correlated: inner query references the outer row (runs per outer row)
SELECT e.name
FROM employee e
WHERE e.salary > (
  SELECT AVG(salary) FROM employee WHERE department = e.department
);

-- Non-correlated: inner query is independent (runs once)
SELECT name FROM employee
WHERE salary > (SELECT AVG(salary) FROM employee);
```
Why: A correlated subquery depends on a column from the outer query, so conceptually it executes once per outer row (optimizers often rewrite it). A non-correlated subquery is self-contained and evaluated a single time.
Q15
Find departments where every employee earns above 50k.
Hard

Google
▼ Show answer ▲ Hide answer
sql
```
SELECT department
FROM employee
GROUP BY department
HAVING MIN(salary) > 50000;

-- Or with NOT EXISTS ("no employee below 50k"):
SELECT DISTINCT department FROM employee d
WHERE NOT EXISTS (
  SELECT 1 FROM employee e
  WHERE e.department = d.department AND e.salary <= 50000
);
```
Why: "All rows satisfy X" is equivalent to "the minimum satisfies X" — so HAVING MIN(salary) > 50000 is the cleanest answer. The NOT EXISTS form expresses the same universal quantifier as 'there is no counterexample'.

Window Functions

Q16
Rank employees by salary within each department; give the top 3 per department.
Medium

AmazonMetaUber
▼ Show answer ▲ Hide answer
sql
```
WITH ranked AS (
  SELECT name, department, salary,
         DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
  FROM employee
)
SELECT name, department, salary
FROM ranked
WHERE rnk <= 3;
```
Why: Top-N-per-group is THE most common window-function interview question. PARTITION BY resets the ranking per department. Use ROW_NUMBER for a strict cut, RANK/DENSE_RANK to keep ties.
Q17
Difference between ROW_NUMBER, RANK, and DENSE_RANK.
Medium

GoogleMicrosoft
▼ Show answer ▲ Hide answer
sql
```
SELECT score,
       ROW_NUMBER() OVER (ORDER BY score DESC) AS rn,   -- 1,2,3,4 (no ties)
       RANK()       OVER (ORDER BY score DESC) AS rnk,  -- 1,2,2,4 (gaps)
       DENSE_RANK() OVER (ORDER BY score DESC) AS drnk  -- 1,2,2,3 (no gaps)
FROM scores;
```
Why: ROW_NUMBER assigns unique sequential numbers even on ties. RANK gives ties the same rank but leaves gaps. DENSE_RANK gives ties the same rank with no gaps. Pick based on whether ties and gaps matter.
Q18
Compute a running (cumulative) total of daily revenue.
Medium

UberNetflix
▼ Show answer ▲ Hide answer
sql
```
SELECT day,
       revenue,
       SUM(revenue) OVER (ORDER BY day
         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM daily_revenue
ORDER BY day;
```
Why: The frame clause ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW defines the cumulative window. Without an explicit frame, the default RANGE frame can behave unexpectedly with duplicate order keys.
Q19
For each user, find the time gap between consecutive logins.
Medium

MetaUber
▼ Show answer ▲ Hide answer
sql
```
SELECT user_id,
       login_at,
       LAG(login_at) OVER (PARTITION BY user_id ORDER BY login_at) AS prev_login,
       login_at - LAG(login_at) OVER (PARTITION BY user_id ORDER BY login_at) AS gap
FROM logins;
```
Why: LAG fetches the previous row's value within the partition; LEAD fetches the next. They are the standard tools for period-over-period deltas, session gaps, and churn detection without a self-join.
Q20
Compute a 7-day moving average of daily active users.
Hard

NetflixUber
▼ Show answer ▲ Hide answer
sql
```
SELECT day,
       dau,
       AVG(dau) OVER (ORDER BY day
         ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS dau_7d_avg
FROM daily_active_users
ORDER BY day;
```
Why: A sliding frame of 6 preceding rows + the current row gives a trailing 7-day average. Moving averages smooth out weekday seasonality and are a staple of metrics/analytics rounds.

Dates, Funnels & Retention

Q21
Count new users per month from a signups table.
Medium

MetaAmazon
▼ Show answer ▲ Hide answer
sql
```
SELECT DATE_TRUNC('month', created_at) AS month,
       COUNT(*) AS new_users
FROM users
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;

-- MySQL equivalent:
-- SELECT DATE_FORMAT(created_at, '%Y-%m-01') AS month, COUNT(*) ...
```
Why: DATE_TRUNC (Postgres) or DATE_FORMAT (MySQL) buckets timestamps into months. Truncating to the first of the month keeps results sortable as real dates rather than strings.

Q22

Compute Day-1 retention: of users who signed up, what % returned the next day?

Hard

MetaUberNetflix

▼ Show answer ▲ Hide answer

sql

WITH signups AS (
  SELECT user_id, MIN(DATE(event_time)) AS signup_day
  FROM events GROUP BY user_id
)
SELECT s.signup_day,
       COUNT(DISTINCT s.user_id) AS cohort_size,
       COUNT(DISTINCT e.user_id) AS retained_d1,
       ROUND(100.0 * COUNT(DISTINCT e.user_id) / COUNT(DISTINCT s.user_id), 1) AS d1_pct
FROM signups s
LEFT JOIN events e
  ON e.user_id = s.user_id
 AND DATE(e.event_time) = s.signup_day + INTERVAL '1 day'
GROUP BY s.signup_day
ORDER BY s.signup_day;

Why: Define each user's cohort (signup day), then LEFT JOIN events exactly one day later. COUNT(DISTINCT) on the retained users over the cohort size gives the retention rate. This cohort pattern generalizes to D7/D30.

Q23

Funnel: of users who viewed a product, how many added to cart, then purchased?

Hard

AmazonUber

▼ Show answer ▲ Hide answer

sql

WITH steps AS (
  SELECT user_id,
         MAX(CASE WHEN event = 'view'     THEN 1 ELSE 0 END) AS viewed,
         MAX(CASE WHEN event = 'add_cart' THEN 1 ELSE 0 END) AS added,
         MAX(CASE WHEN event = 'purchase' THEN 1 ELSE 0 END) AS bought
  FROM events
  GROUP BY user_id
)
SELECT SUM(viewed) AS viewed,
       SUM(CASE WHEN viewed = 1 AND added = 1 THEN 1 ELSE 0 END) AS added,
       SUM(CASE WHEN viewed = 1 AND added = 1 AND bought = 1 THEN 1 ELSE 0 END) AS bought
FROM steps;

Why: Collapse each user's events into step flags, then count users who completed each successive stage. Conditioning each stage on the previous one (viewed AND added) gives true sequential funnel conversion.

Q24
Find the longest streak of consecutive days each user was active (gaps-and-islands).
Hard

GoogleMeta
▼ Show answer ▲ Hide answer
sql
```
WITH d AS (
  SELECT DISTINCT user_id, DATE(event_time) AS day FROM events
),
g AS (
  SELECT user_id, day,
         day - (ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY day))::int AS grp
  FROM d
)
SELECT user_id, MAX(streak) AS longest_streak
FROM (
  SELECT user_id, grp, COUNT(*) AS streak
  FROM g GROUP BY user_id, grp
) t
GROUP BY user_id;
```
Why: The classic gaps-and-islands trick: subtract a per-user ROW_NUMBER from the date. Consecutive days share the same (date - rownum) value, forming an 'island' you can group and count. This pattern appears constantly in analytics interviews.

SQL Interview Questions & Answers

Basics & Filtering

Find the second highest salary from an Employee table.

Select all duplicate emails from a Person table.

Why use HAVING instead of WHERE, and in what order do SQL clauses execute?

Find customers who have never placed an order.

Joins

Explain the difference between INNER, LEFT, RIGHT, and FULL OUTER JOIN.

Self-join: find all employees who earn more than their manager.

What is a cross join and when would you intentionally use one?

Find pairs of products frequently bought together in the same order.

Aggregation & GROUP BY

Compute revenue per category, and the % each category contributes to total revenue.

Pivot: count orders per status into columns (pending, shipped, delivered).

Find the average number of orders per customer.

Find the median salary per department (without a built-in MEDIAN).

Subqueries & CTEs

What is a CTE and how does a recursive CTE work? Walk an org hierarchy.

Correlated vs non-correlated subquery — what's the difference?

Find departments where every employee earns above 50k.

Window Functions

Rank employees by salary within each department; give the top 3 per department.

Difference between ROW_NUMBER, RANK, and DENSE_RANK.

Compute a running (cumulative) total of daily revenue.

For each user, find the time gap between consecutive logins.

Compute a 7-day moving average of daily active users.

Dates, Funnels & Retention

Count new users per month from a signups table.

Compute Day-1 retention: of users who signed up, what % returned the next day?

Funnel: of users who viewed a product, how many added to cart, then purchased?

Find the longest streak of consecutive days each user was active (gaps-and-islands).

Keep prepping