24 questions · 6 topics · with solutions

SQL Interview Questions & Answers

The SQL questions actually asked in Data Scientist, Data Analyst, and ML Engineer interviews — joins, aggregation, window functions, CTEs, cohort, retention, and funnel queries. Every question has a worked answer and a short explanation. Free, no signup.

24Questions
6Topics
WinWindow functions
CTERecursive + analytics

Basics & Filtering

  1. Q1

    Find the second highest salary from an Employee table.

    Easy
    AmazonMeta
    ▼ Show answer ▲ Hide answer
    sql
    SELECT MAX(salary) AS second_highest
    FROM employee
    WHERE salary < (SELECT MAX(salary) FROM employee);
    
    -- More general (Nth highest) with a window function:
    SELECT DISTINCT salary
    FROM (
      SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
      FROM employee
    ) t
    WHERE rnk = 2;

    Why: The subquery approach is the classic answer, but it returns NULL when there is no second salary. DENSE_RANK handles ties and generalizes to the Nth highest by changing the rank value.

  2. Q2

    Select all duplicate emails from a Person table.

    Easy
    Meta
    ▼ Show answer ▲ Hide answer
    sql
    SELECT email
    FROM person
    GROUP BY email
    HAVING COUNT(*) > 1;

    Why: GROUP BY collapses rows per email; HAVING filters on the aggregate (you cannot use WHERE COUNT(*) because WHERE runs before aggregation).

  3. Q3

    Why use HAVING instead of WHERE, and in what order do SQL clauses execute?

    Easy
    Google
    ▼ Show answer ▲ Hide answer
    sql
    -- Logical execution order:
    -- FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> DISTINCT -> ORDER BY -> LIMIT
    
    SELECT department, COUNT(*) AS headcount
    FROM employee
    WHERE active = 1          -- filters rows BEFORE grouping
    GROUP BY department
    HAVING COUNT(*) > 10      -- filters groups AFTER aggregation
    ORDER BY headcount DESC;

    Why: WHERE filters individual rows before grouping and cannot reference aggregates. HAVING filters groups after aggregation. Knowing the logical order explains why aliases from SELECT aren't usable in WHERE.

  4. Q4

    Find customers who have never placed an order.

    Easy
    Amazon
    ▼ Show answer ▲ Hide answer
    sql
    SELECT c.name
    FROM customers c
    LEFT JOIN orders o ON o.customer_id = c.id
    WHERE o.id IS NULL;
    
    -- Equivalent with NOT EXISTS (often faster on large tables):
    SELECT c.name
    FROM customers c
    WHERE NOT EXISTS (
      SELECT 1 FROM orders o WHERE o.customer_id = c.id
    );

    Why: A LEFT JOIN keeps all customers; rows with no matching order have NULL order columns. NOT EXISTS expresses the same anti-join and often optimizes better because it short-circuits.

Joins

  1. Q5

    Explain the difference between INNER, LEFT, RIGHT, and FULL OUTER JOIN.

    Easy
    GoogleMicrosoft
    ▼ Show answer ▲ Hide answer
    sql
    -- INNER: only matching rows from both tables
    -- LEFT:  all left rows + matched right rows (NULLs where no match)
    -- RIGHT: all right rows + matched left rows
    -- FULL:  all rows from both, NULLs where no match on either side
    
    SELECT a.id, b.id
    FROM a LEFT JOIN b ON a.key = b.key;

    Why: Interviewers want to hear what happens to unmatched rows. LEFT JOIN + WHERE right.key IS NULL is the canonical anti-join pattern.

  2. Q6

    Self-join: find all employees who earn more than their manager.

    Medium
    AmazonApple
    ▼ Show answer ▲ Hide answer
    sql
    SELECT e.name AS employee
    FROM employee e
    JOIN employee m ON e.manager_id = m.id
    WHERE e.salary > m.salary;

    Why: A self-join aliases the same table twice — one instance as the employee, one as the manager — joining on the manager_id foreign key that points back into the same table.

  3. Q7

    What is a cross join and when would you intentionally use one?

    Medium
    Google
    ▼ Show answer ▲ Hide answer
    sql
    -- Generate every (date, product) combination to fill gaps:
    SELECT d.day, p.product_id
    FROM calendar d
    CROSS JOIN products p
    LEFT JOIN sales s
      ON s.day = d.day AND s.product_id = p.product_id;

    Why: A cross join produces the Cartesian product (every row of A with every row of B). It's useful to build a complete grid (e.g., all date-product pairs) so you can LEFT JOIN actual data and surface missing combinations as zeros.

  4. Q8

    Find pairs of products frequently bought together in the same order.

    Medium
    Amazon
    ▼ Show answer ▲ Hide answer
    sql
    SELECT a.product_id AS p1, b.product_id AS p2, COUNT(*) AS times
    FROM order_items a
    JOIN order_items b
      ON a.order_id = b.order_id
     AND a.product_id < b.product_id   -- avoid (x,x) and duplicate (x,y)/(y,x)
    GROUP BY a.product_id, b.product_id
    ORDER BY times DESC
    LIMIT 10;

    Why: The a.product_id < b.product_id predicate is the key trick: it removes self-pairs and counts each unordered pair once. This is the basis of simple market-basket / co-purchase analysis.

Aggregation & GROUP BY

  1. Q9

    Compute revenue per category, and the % each category contributes to total revenue.

    Medium
    MicrosoftUber
    ▼ Show answer ▲ Hide answer
    sql
    SELECT category,
           SUM(amount) AS revenue,
           ROUND(100.0 * SUM(amount) / SUM(SUM(amount)) OVER (), 2) AS pct_of_total
    FROM sales
    GROUP BY category
    ORDER BY revenue DESC;

    Why: SUM(SUM(amount)) OVER () is a window over the grouped result — the inner SUM aggregates per category, the outer windowed SUM totals across all groups. This avoids a separate subquery for the grand total.

  2. Q10

    Pivot: count orders per status into columns (pending, shipped, delivered).

    Medium
    Amazon
    ▼ Show answer ▲ Hide answer
    sql
    SELECT
      SUM(CASE WHEN status = 'pending'   THEN 1 ELSE 0 END) AS pending,
      SUM(CASE WHEN status = 'shipped'   THEN 1 ELSE 0 END) AS shipped,
      SUM(CASE WHEN status = 'delivered' THEN 1 ELSE 0 END) AS delivered
    FROM orders;

    Why: Conditional aggregation (SUM of CASE) is the portable way to pivot rows into columns when your dialect lacks a PIVOT operator. COUNT(CASE WHEN ... THEN 1 END) also works since COUNT ignores NULLs.

  3. Q11

    Find the average number of orders per customer.

    Medium
    Meta
    ▼ Show answer ▲ Hide answer
    sql
    SELECT AVG(order_count) AS avg_orders_per_customer
    FROM (
      SELECT customer_id, COUNT(*) AS order_count
      FROM orders
      GROUP BY customer_id
    ) t;

    Why: You must aggregate twice: first count orders per customer, then average those counts. A single AVG over the raw table would give meaningless results.

  4. Q12

    Find the median salary per department (without a built-in MEDIAN).

    Hard
    Google
    ▼ Show answer ▲ Hide answer
    sql
    SELECT department, AVG(salary) AS median_salary
    FROM (
      SELECT department, salary,
             ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary)     AS rn_asc,
             ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn_desc,
             COUNT(*)    OVER (PARTITION BY department)                       AS cnt
      FROM employee
    ) t
    WHERE rn_asc IN (rn_desc, rn_desc - 1, rn_desc + 1)
    GROUP BY department;

    Why: The median sits where the ascending and descending row numbers meet (within 1). Averaging the 1-2 middle rows handles both odd and even group sizes. Many dialects now offer PERCENTILE_CONT(0.5).

Subqueries & CTEs

  1. Q13

    What is a CTE and how does a recursive CTE work? Walk an org hierarchy.

    Medium
    MicrosoftGoogle
    ▼ Show answer ▲ Hide answer
    sql
    WITH RECURSIVE org AS (
      SELECT id, name, manager_id, 1 AS lvl
      FROM employee
      WHERE manager_id IS NULL          -- anchor: the CEO
      UNION ALL
      SELECT e.id, e.name, e.manager_id, o.lvl + 1
      FROM employee e
      JOIN org o ON e.manager_id = o.id  -- recursive step
    )
    SELECT * FROM org ORDER BY lvl;

    Why: A CTE (WITH clause) is a named temporary result set that improves readability over nested subqueries. A recursive CTE has an anchor member plus a recursive member that references the CTE itself — ideal for hierarchies, graphs, and number/date series.

  2. Q14

    Correlated vs non-correlated subquery — what's the difference?

    Medium
    Amazon
    ▼ Show answer ▲ Hide answer
    sql
    -- Correlated: inner query references the outer row (runs per outer row)
    SELECT e.name
    FROM employee e
    WHERE e.salary > (
      SELECT AVG(salary) FROM employee WHERE department = e.department
    );
    
    -- Non-correlated: inner query is independent (runs once)
    SELECT name FROM employee
    WHERE salary > (SELECT AVG(salary) FROM employee);

    Why: A correlated subquery depends on a column from the outer query, so conceptually it executes once per outer row (optimizers often rewrite it). A non-correlated subquery is self-contained and evaluated a single time.

  3. Q15

    Find departments where every employee earns above 50k.

    Hard
    Google
    ▼ Show answer ▲ Hide answer
    sql
    SELECT department
    FROM employee
    GROUP BY department
    HAVING MIN(salary) > 50000;
    
    -- Or with NOT EXISTS ("no employee below 50k"):
    SELECT DISTINCT department FROM employee d
    WHERE NOT EXISTS (
      SELECT 1 FROM employee e
      WHERE e.department = d.department AND e.salary <= 50000
    );

    Why: "All rows satisfy X" is equivalent to "the minimum satisfies X" — so HAVING MIN(salary) > 50000 is the cleanest answer. The NOT EXISTS form expresses the same universal quantifier as 'there is no counterexample'.

Window Functions

  1. Q16

    Rank employees by salary within each department; give the top 3 per department.

    Medium
    AmazonMetaUber
    ▼ Show answer ▲ Hide answer
    sql
    WITH ranked AS (
      SELECT name, department, salary,
             DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
      FROM employee
    )
    SELECT name, department, salary
    FROM ranked
    WHERE rnk <= 3;

    Why: Top-N-per-group is THE most common window-function interview question. PARTITION BY resets the ranking per department. Use ROW_NUMBER for a strict cut, RANK/DENSE_RANK to keep ties.

  2. Q17

    Difference between ROW_NUMBER, RANK, and DENSE_RANK.

    Medium
    GoogleMicrosoft
    ▼ Show answer ▲ Hide answer
    sql
    SELECT score,
           ROW_NUMBER() OVER (ORDER BY score DESC) AS rn,   -- 1,2,3,4 (no ties)
           RANK()       OVER (ORDER BY score DESC) AS rnk,  -- 1,2,2,4 (gaps)
           DENSE_RANK() OVER (ORDER BY score DESC) AS drnk  -- 1,2,2,3 (no gaps)
    FROM scores;

    Why: ROW_NUMBER assigns unique sequential numbers even on ties. RANK gives ties the same rank but leaves gaps. DENSE_RANK gives ties the same rank with no gaps. Pick based on whether ties and gaps matter.

  3. Q18

    Compute a running (cumulative) total of daily revenue.

    Medium
    UberNetflix
    ▼ Show answer ▲ Hide answer
    sql
    SELECT day,
           revenue,
           SUM(revenue) OVER (ORDER BY day
             ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
    FROM daily_revenue
    ORDER BY day;

    Why: The frame clause ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW defines the cumulative window. Without an explicit frame, the default RANGE frame can behave unexpectedly with duplicate order keys.

  4. Q19

    For each user, find the time gap between consecutive logins.

    Medium
    MetaUber
    ▼ Show answer ▲ Hide answer
    sql
    SELECT user_id,
           login_at,
           LAG(login_at) OVER (PARTITION BY user_id ORDER BY login_at) AS prev_login,
           login_at - LAG(login_at) OVER (PARTITION BY user_id ORDER BY login_at) AS gap
    FROM logins;

    Why: LAG fetches the previous row's value within the partition; LEAD fetches the next. They are the standard tools for period-over-period deltas, session gaps, and churn detection without a self-join.

  5. Q20

    Compute a 7-day moving average of daily active users.

    Hard
    NetflixUber
    ▼ Show answer ▲ Hide answer
    sql
    SELECT day,
           dau,
           AVG(dau) OVER (ORDER BY day
             ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS dau_7d_avg
    FROM daily_active_users
    ORDER BY day;

    Why: A sliding frame of 6 preceding rows + the current row gives a trailing 7-day average. Moving averages smooth out weekday seasonality and are a staple of metrics/analytics rounds.

Dates, Funnels & Retention

  1. Q21

    Count new users per month from a signups table.

    Medium
    MetaAmazon
    ▼ Show answer ▲ Hide answer
    sql
    SELECT DATE_TRUNC('month', created_at) AS month,
           COUNT(*) AS new_users
    FROM users
    GROUP BY DATE_TRUNC('month', created_at)
    ORDER BY month;
    
    -- MySQL equivalent:
    -- SELECT DATE_FORMAT(created_at, '%Y-%m-01') AS month, COUNT(*) ...

    Why: DATE_TRUNC (Postgres) or DATE_FORMAT (MySQL) buckets timestamps into months. Truncating to the first of the month keeps results sortable as real dates rather than strings.

  2. Q22

    Compute Day-1 retention: of users who signed up, what % returned the next day?

    Hard
    MetaUberNetflix
    ▼ Show answer ▲ Hide answer
    sql
    WITH signups AS (
      SELECT user_id, MIN(DATE(event_time)) AS signup_day
      FROM events GROUP BY user_id
    )
    SELECT s.signup_day,
           COUNT(DISTINCT s.user_id) AS cohort_size,
           COUNT(DISTINCT e.user_id) AS retained_d1,
           ROUND(100.0 * COUNT(DISTINCT e.user_id) / COUNT(DISTINCT s.user_id), 1) AS d1_pct
    FROM signups s
    LEFT JOIN events e
      ON e.user_id = s.user_id
     AND DATE(e.event_time) = s.signup_day + INTERVAL '1 day'
    GROUP BY s.signup_day
    ORDER BY s.signup_day;

    Why: Define each user's cohort (signup day), then LEFT JOIN events exactly one day later. COUNT(DISTINCT) on the retained users over the cohort size gives the retention rate. This cohort pattern generalizes to D7/D30.

  3. Q23

    Funnel: of users who viewed a product, how many added to cart, then purchased?

    Hard
    AmazonUber
    ▼ Show answer ▲ Hide answer
    sql
    WITH steps AS (
      SELECT user_id,
             MAX(CASE WHEN event = 'view'     THEN 1 ELSE 0 END) AS viewed,
             MAX(CASE WHEN event = 'add_cart' THEN 1 ELSE 0 END) AS added,
             MAX(CASE WHEN event = 'purchase' THEN 1 ELSE 0 END) AS bought
      FROM events
      GROUP BY user_id
    )
    SELECT SUM(viewed) AS viewed,
           SUM(CASE WHEN viewed = 1 AND added = 1 THEN 1 ELSE 0 END) AS added,
           SUM(CASE WHEN viewed = 1 AND added = 1 AND bought = 1 THEN 1 ELSE 0 END) AS bought
    FROM steps;

    Why: Collapse each user's events into step flags, then count users who completed each successive stage. Conditioning each stage on the previous one (viewed AND added) gives true sequential funnel conversion.

  4. Q24

    Find the longest streak of consecutive days each user was active (gaps-and-islands).

    Hard
    GoogleMeta
    ▼ Show answer ▲ Hide answer
    sql
    WITH d AS (
      SELECT DISTINCT user_id, DATE(event_time) AS day FROM events
    ),
    g AS (
      SELECT user_id, day,
             day - (ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY day))::int AS grp
      FROM d
    )
    SELECT user_id, MAX(streak) AS longest_streak
    FROM (
      SELECT user_id, grp, COUNT(*) AS streak
      FROM g GROUP BY user_id, grp
    ) t
    GROUP BY user_id;

    Why: The classic gaps-and-islands trick: subtract a per-user ROW_NUMBER from the date. Consecutive days share the same (date - rownum) value, forming an 'island' you can group and count. This pattern appears constantly in analytics interviews.