Skip to content
mentorship

Questions

The senior ML interview canon

50 questions across 8 categories. Each with what L4 / L5 / L6 answers actually sound like, the tells that get a strong-hire vote, and the tells that get you down-leveled. Press / to search.

ML Fundamentals 8

  • Bayesian vs frequentist: a practitioner's framing

    The textbook distinction is philosophical. The practitioner distinction is whether you can sample from a posterior cheaply, and whether you need uncertainty for downstream decisions.

  • Explain backprop in your own words

    The textbook answer is the chain rule. The senior answer is what backprop is doing as a system: a reverse-mode auto-diff pass that reuses intermediate computations to get all gradients in one extra forward-cost pass.

  • How do you choose a learning rate?

    The right answer is a procedure, not a number. The wrong answers are 'use the default' and 'try a few values.'

  • How do you choose a loss function?

    The loss is the objective. Picking the wrong one means optimizing for the wrong thing, no matter how well you train. The senior answer derives the loss from the problem, not from a list.

  • L1 vs L2 regularization, beyond the formula

    The math is identical to most candidates: penalty terms in the loss. The senior signal is the Bayesian interpretation, the optimization geometry, and when each is the right choice.

  • Walk me through the bias-variance tradeoff

    The classic warm-up question. The L4 answer is the formula; the L6 answer is what it tells you about model selection in production.

  • When would you not use cross-validation?

    Cross-validation is a tool, not a default. The senior answer names the cases where it's wrong, expensive, or misleading.

  • Why does dropout work?

    The trick is that there are three valid explanations and they all matter. Which ones you reach for tells the interviewer your level.

Deep Learning Production 6

LLM Systems 13

Recsys & Search 8

  • Design Amazon's people also bought

    A simple-sounding feature with deep recsys ground underneath. The senior answer chooses between item-item collaborative filtering, embedding similarity, and learned co-purchase models, with explicit handling of feedback loops.

  • Design Spotify's homepage

    A multi-shelf, multi-objective recommendation surface. The senior answer scopes the shelves first, then designs each as its own ranker with a meta-layer above.

  • Design YouTube's recommender

    The canonical recsys design question. The real test is whether you'll dive into model architecture or scope the problem first.

  • How would you do cold-start for a new user?

    Cold-start is solved by combining minimal explicit signal, demographic and contextual fallbacks, and aggressive exploration in the first few sessions.

  • How would you evaluate a search ranker?

    Search ranking eval is offline metrics for development, A/B for shipping, and human raters for absolute calibration. The senior answer uses all three and respects what each measures.

  • Negative sampling strategies: what actually matters

    Choice of negatives often matters more than choice of model. The senior answer ranks the strategies (in-batch, hard, BM25-mined, model-mined) and explains the trade-offs.

  • Recsys in the LLM era: what changes?

    Most of recsys hasn't changed; LLMs add new capabilities at specific stages. The senior answer names which stages benefit and which don't.

  • Two-tower vs cross-encoder: when to use which?

    The recsys / search architecture decision that comes up in every retrieval interview. The right answer is 'both, in sequence.'

ML System Design 5

  • Design a content moderation system

    Moderation is a multi-policy classification problem at scale, with appeals, human review, and adversarial users. The senior answer separates policy from model and treats human review as part of the system.

  • Design a feature store from scratch

    A feature store solves training-serving skew, feature reuse, and lineage. The senior answer explains why each property matters and what minimum viable looks like.

  • Design fraud detection for a payment company

    Fraud has the worst data of any ML problem: heavily imbalanced, biased labels, adversarial actors, and direct money on the line. The senior answer respects all four.

  • Design ML monitoring

    Most ML systems fail silently. Monitoring is what tells you. The senior answer monitors data, model, and outcome layers separately.

  • Design real-time personalization

    Real-time personalization fails most often at the data infrastructure, not the model. The senior answer designs the feature freshness and serving stack first.

Behavioral 5

Math 3

Coding 2

  • Debug this training loop

    A live coding question with a paste of buggy training code. The senior signal is the order in which you find bugs and what your debugging procedure looks like.

  • Implement KNN efficiently

    The naive solution is one line. The interview is about scaling: when does naive fail, and what do you do?