Curated · 49+ tips · AI/ML/DS specific

Interview Tips & Tricks

Battle-tested guidance across ML, deep learning, LLMs, statistics, SQL, and behavioral rounds.

General Interview Strategy

Research the company deeply

Read their engineering blog, recent papers, and product launches. Mention specific projects in your answers — it signals real interest.

Use the STAR method

Situation, Task, Action, Result. Structure every behavioral answer this way. Practice 8-10 stories that cover leadership, conflict, failure, impact.

Always end with metrics

Don't just say 'I built a model.' Say 'I built a churn model that reduced false positives by 32%, saving the team 12 hours/week of manual review.'

Have 3 questions ready for them

Ask about the team's hardest current problem, how success is measured in the first 90 days, and what the interviewer enjoys most. Never ask anything Google would tell you.

Practice the 60-second intro

Past → Present → Future. 'I'm an X who did Y, currently doing Z, looking for opportunities to do W.' Rehearse it until it feels natural, not robotic.

Mirror the interviewer's vocabulary

If they say 'experiment' use 'experiment', not 'A/B test'. If they say 'model' don't switch to 'algorithm'. Subconsciously signals you're already part of the team.

Reset after a bad answer

Don't dwell. Say 'Let me reframe that' or 'Actually, the cleaner answer is...' and move on. Interviewers weigh trajectory over individual misses — recover well and you still pass.

Machine Learning Fundamentals

Bias-variance tradeoff

High bias = underfitting (model too simple). High variance = overfitting (model memorizes noise). Be ready to explain how regularization, more data, or simpler models address each.

Know your evaluation metrics cold

Precision, recall, F1, AUC-ROC, AUC-PR. When does each matter? Hint: imbalanced classes → PR-AUC over ROC-AUC. Fraud detection → recall matters more than accuracy.

Cross-validation pitfalls

Time series → use TimeSeriesSplit, never random splits. Grouped data (same user multiple rows) → GroupKFold. Leakage from preprocessing on full dataset → always fit on train only.

Feature engineering > model choice

A linear model with great features beats XGBoost with bad features. Talk about feature creation, encoding categoricals (target, one-hot, embeddings), handling missing values, and scaling.

Why does a tree split that way?

Gini impurity, entropy, information gain. For regression: variance reduction (MSE). Random Forest = bagging trees on bootstrap samples + random feature subsets per split.

Handling class imbalance

Don't reflexively reach for SMOTE. Try class weights first, then resampling, then anomaly-detection framing. Always evaluate on the original distribution — resampling the test set lies to you.

Calibration matters more than accuracy

A model with 0.9 confidence should be right 90% of the time. XGBoost and neural nets are often miscalibrated. Use Platt scaling or isotonic regression. Critical for ranking, thresholding, and expected-value decisions.

Deep Learning

Vanishing/exploding gradients

Use ReLU instead of sigmoid/tanh, batch normalization, residual connections (ResNet), and gradient clipping. Be ready to draw a backprop diagram.

Transformer architecture from memory

Self-attention: Q, K, V matrices. Attention(Q,K,V) = softmax(QK^T / sqrt(d_k)) V. Multi-head attention runs N attention layers in parallel. Positional encoding adds order info.

Regularization techniques

Dropout (random zeroing), L2 weight decay, early stopping, data augmentation, label smoothing. Know when each helps.

Optimizer differences

SGD = stable, slow. Momentum = accelerates in consistent directions. Adam = adaptive learning rate per param, works out of the box. AdamW = Adam with proper weight decay.

How would you debug a model that won't learn?

Check learning rate (too high → exploding loss, too low → no progress), verify data loading, overfit a single batch first, check loss function correctness, inspect gradients.

Batch size effects

Larger batch = more stable gradients, more parallelism, but worse generalization (sharp minima). Smaller batch = noisier, better generalization. Scale learning rate linearly with batch size as a starting heuristic.

Why use mixed precision (FP16/BF16)?

2x speedup and ~50% memory reduction on modern GPUs. FP16 needs loss scaling to avoid underflow; BF16 has the same range as FP32 so it just works. Default to BF16 on A100/H100.

LLMs & Generative AI

RAG vs fine-tuning

RAG = inject context at inference time. Best for changing knowledge. Fine-tuning = teach style/format/domain. Combine both for production. Always mention chunking strategy and embedding model choice.

Why does my LLM hallucinate?

Out-of-distribution queries, weak retrieval (irrelevant chunks), context exceeding model attention, low-temperature decoding masking the issue. Solutions: better retrieval, citations, confidence scoring, function calling.

Prompt engineering basics

Few-shot examples, chain-of-thought ('think step by step'), output format constraints (JSON schema), role assignment, delimiters. Know the difference between prompt and system prompt.

Vector DB selection

Pinecone (managed, easy), Chroma (open-source, local), Weaviate (hybrid search), pgvector (use Postgres you already have). Compare on cost, latency, hybrid search support.

Evaluating LLM applications

BLEU/ROUGE are weak for open generation. Use LLM-as-judge, human eval rubrics, faithfulness/groundedness for RAG, retrieval recall@k. Trace + log every call (LangSmith, Helicone).

Token economics

Know the cost per 1M tokens for the model you're using. Cache repeated prompts (Anthropic prompt caching, OpenAI cached input), use smaller models for routing/classification, reserve the big model for hard reasoning. Streaming hides latency, not cost.

Agents — when not to use them

Agents add latency, cost, and failure modes. If the workflow is deterministic, just chain calls. Reach for agents when the path branches based on intermediate output. Always bound recursion and tool-call count.

Statistics & Probability

p-values demystified

p-value = probability of observing data this extreme IF the null hypothesis is true. It's NOT the probability the null is true. A p < 0.05 doesn't mean the effect is meaningful — talk about effect size.

A/B testing in practice

Power analysis BEFORE the test (sample size, MDE). Watch for peeking (early stopping inflates false positives). Use CUPED for variance reduction. Two-tailed by default. Bonferroni if testing multiple metrics.

Central Limit Theorem

Sampling distribution of the mean approaches normal as n grows, regardless of underlying distribution. Why we can use t-tests on non-normal data with large samples.

Bayesian vs frequentist

Frequentist: 'p-value < 0.05'. Bayesian: 'posterior probability the effect is positive is 96%'. Bayesian is more interpretable for stakeholders and handles peeking better.

Confidence intervals

A 95% CI means 95% of intervals constructed this way contain the true parameter. It does NOT mean 'the parameter is in this interval with 95% probability' — that's the Bayesian credible interval.

Simpson's paradox

An effect that appears in groups can reverse when groups are combined (or vice versa). Always check whether your aggregate metric hides subgroup heterogeneity. Classic example: UC Berkeley admissions case.

Correlation pitfalls

Pearson catches linear, Spearman catches monotonic, mutual information catches anything. A correlation of 0 doesn't mean independence (e.g., y = x²). Always plot your data — Anscombe's quartet is the canonical warning.

SQL & Coding

Window functions are everywhere

ROW_NUMBER, RANK, DENSE_RANK for ranking. LAG/LEAD for previous/next row. SUM() OVER (PARTITION BY ... ORDER BY ...) for running totals. Practice 'top N per group' until automatic.

Self joins for finding duplicates / relationships

Find users who referred others, find rows where col_a = col_b but ids differ. SELECT a.id, b.id FROM t a JOIN t b ON a.email = b.email AND a.id < b.id.

EXPLAIN your query

When asked 'is this efficient?', talk about indexes, full scans vs index seeks, hash joins vs nested loops. Adding LIMIT doesn't help if there's a sort first.

Python: don't reinvent the wheel

Counter, defaultdict, itertools.groupby, bisect, heapq. Interviewers love when you reach for the right tool. Use list comprehensions for clarity, generators for memory.

Talk while coding

Narrate your thinking. 'I'll start with brute force, then optimize.' Walk through an example before coding. Ask about constraints (input size, edge cases). Don't go silent.

CTEs over nested subqueries

WITH clauses are readable, debuggable, and most modern optimizers handle them as well as subqueries. Chain them to express multi-step logic. Bonus: easy to test each CTE in isolation.

Big-O is necessary, not sufficient

O(n) with cache misses can lose to O(n log n) that fits in L2. Mention constants and memory access patterns when relevant. For ML interviews: vectorized NumPy beats a pure-Python loop even at the same asymptotic complexity.

Behavioral & Soft Skills

Tell me about a time you failed

Pick a real failure with real consequences. Show what you learned and how you applied it. Avoid humble brags ('I worked too hard'). Show humility, growth, and ownership.

Conflict with a colleague

Frame it as collaborative problem-solving, not 'I was right'. Show you listened, sought to understand, found a path forward, and maintained the relationship.

Why are you leaving your current job?

Pull toward something new, not push away from something bad. 'I'm looking for more X' beats 'My current manager is Y'. Never trash-talk previous employers.

Where do you see yourself in 5 years?

Show ambition tied to the company's growth, not a different industry. Tech lead, IC depth, founding a team — pick one and back it with what you'd need to learn.

Salary negotiation

Never give a number first. Say 'I'd like to focus on the role first; I'm sure we can align on compensation when we get there.' When pressed, give a range with the low end above your real floor.

Disagree with your manager — give a real example

Interviewers want to see you can push back constructively, not that you always agree. Walk through how you stated your case, what data you brought, how you escalated (or yielded), and what the outcome taught you.

Project you're most proud of

Pick one with quantifiable impact AND personal ownership (not 'my team did X'). Be ready for follow-ups: what tradeoffs you made, what you'd redo, what was the hardest part. Generic answers signal you weren't really driving it.