mlmentorship

mlmentorshipSenior ML & AI interview prep. Essays, interview questions with leveled answers (L4/L5/L6), reference notes, system design case studies. Free.https://mlmentorship.com/enActor-critic methodshttps://mlmentorship.com/blog/2026-05-07-actor-critic-methods/https://mlmentorship.com/blog/2026-05-07-actor-critic-methods/Policy gradient with a learned value baseline. The actor picks actions; the critic estimates how good they were. The architecture under PPO, A3C, SAC, and most modern RL.Thu, 07 May 2026 00:00:00 GMTconceptsAdvantage estimation and GAEhttps://mlmentorship.com/blog/2026-05-07-advantage-estimation-and-gae/https://mlmentorship.com/blog/2026-05-07-advantage-estimation-and-gae/Policy gradients need a low-variance estimate of how much better an action was than average. GAE is the standard answer: an exponentially weighted blend of n-step returns.Thu, 07 May 2026 00:00:00 GMTconceptsAnchor boxes and non-maximum suppressionhttps://mlmentorship.com/blog/2026-05-07-anchor-boxes-and-nms/https://mlmentorship.com/blog/2026-05-07-anchor-boxes-and-nms/Object detectors predict thousands of overlapping boxes. Anchors give each prediction a prior shape; NMS prunes near-duplicates. The pre-DETR pipeline that defined the field for a decade.Thu, 07 May 2026 00:00:00 GMTconceptsAlternating least squares for collaborative filteringhttps://mlmentorship.com/blog/2026-05-07-alternating-least-squares/https://mlmentorship.com/blog/2026-05-07-alternating-least-squares/Factorize the user-item matrix into two low-rank factors. Each is a linear regression given the other, so alternate. The classical recsys workhorse before deep learning.Thu, 07 May 2026 00:00:00 GMTconceptsApproximate nearest neighbors: HNSW, IVF, and product quantizationhttps://mlmentorship.com/blog/2026-05-07-approximate-nearest-neighbors/https://mlmentorship.com/blog/2026-05-07-approximate-nearest-neighbors/Exact k-NN over a billion vectors is infeasible. ANN trades a small recall hit for a 100x to 10,000x speedup. The reason vector search at scale exists.Thu, 07 May 2026 00:00:00 GMTconceptsBERT and masked language modelinghttps://mlmentorship.com/blog/2026-05-07-bert-and-masked-language-modeling/https://mlmentorship.com/blog/2026-05-07-bert-and-masked-language-modeling/Train a transformer to fill in randomly masked tokens. The result is a bidirectional encoder that broke a dozen NLP benchmarks at once and defined the pretrain-then-finetune era.Thu, 07 May 2026 00:00:00 GMTconceptsConvolution as matrix multiplication (im2col)https://mlmentorship.com/blog/2026-05-07-convolution-as-matmul/https://mlmentorship.com/blog/2026-05-07-convolution-as-matmul/A 2D convolution is a matmul in disguise. Unfold the input into columns, multiply by a flattened filter matrix. The reason CNNs run fast on the same hardware as transformers.Thu, 07 May 2026 00:00:00 GMTconceptsDecoding strategies: greedy, beam, top-k, top-p, temperaturehttps://mlmentorship.com/blog/2026-05-07-decoding-strategies/https://mlmentorship.com/blog/2026-05-07-decoding-strategies/Same model, different samplers, very different outputs. The choice of decoder is often more impactful than the last percent of training. Know the tradeoffs.Thu, 07 May 2026 00:00:00 GMTconceptsDesigning a RAG system that actually workshttps://mlmentorship.com/blog/2026-05-07-designing-rag-that-works/https://mlmentorship.com/blog/2026-05-07-designing-rag-that-works/RAG fails most often at retrieval, not generation. A practitioner's guide to the architecture, the failure modes, and what production teams actually do in 2026.Thu, 07 May 2026 00:00:00 GMTguidesEpistemic vs aleatoric uncertaintyhttps://mlmentorship.com/blog/2026-05-07-epistemic-vs-aleatoric-uncertainty/https://mlmentorship.com/blog/2026-05-07-epistemic-vs-aleatoric-uncertainty/Epistemic uncertainty shrinks with more data; aleatoric does not. Conflating them produces miscalibrated systems and wasted data collection. The distinction every senior ML engineer should be able to articulate.Thu, 07 May 2026 00:00:00 GMTconceptsFactorization machineshttps://mlmentorship.com/blog/2026-05-07-factorization-machines/https://mlmentorship.com/blog/2026-05-07-factorization-machines/Linear models can't capture feature interactions. Polynomial models have too many parameters. Factorization machines find a middle path: factorize the interaction matrix and learn an embedding per feature.Thu, 07 May 2026 00:00:00 GMTconceptsExploration vs exploitation: epsilon-greedy, UCB, Thompson samplinghttps://mlmentorship.com/blog/2026-05-07-exploration-vs-exploitation/https://mlmentorship.com/blog/2026-05-07-exploration-vs-exploitation/An RL or bandit agent has to keep trying new actions to learn while taking the best-known action to score. Three classical strategies, each with a different way of resolving the tension.Thu, 07 May 2026 00:00:00 GMTconceptsGaussian processeshttps://mlmentorship.com/blog/2026-05-07-gaussian-processes/https://mlmentorship.com/blog/2026-05-07-gaussian-processes/A distribution over functions defined entirely by a covariance kernel. Predicts both a mean and a calibrated uncertainty. Beautiful theory, brutal scaling.Thu, 07 May 2026 00:00:00 GMTconceptsForward-backward and Viterbi: dynamic programming on chainshttps://mlmentorship.com/blog/2026-05-07-forward-backward-and-viterbi/https://mlmentorship.com/blog/2026-05-07-forward-backward-and-viterbi/Sum and max over exponentially many paths in linear time. Forward-backward computes posteriors over hidden states; Viterbi finds the most likely state sequence. The same idea, two semirings.Thu, 07 May 2026 00:00:00 GMTconceptsGraph neural networks: message passing as A·X·Whttps://mlmentorship.com/blog/2026-05-07-graph-neural-networks/https://mlmentorship.com/blog/2026-05-07-graph-neural-networks/Neighbors carry signal. A graph neural network averages each node's neighborhood and projects with a learned matrix. The same matmul as a CNN, on irregular structure.Thu, 07 May 2026 00:00:00 GMTconceptsKernel methods and the kernel trickhttps://mlmentorship.com/blog/2026-05-07-kernel-methods-and-the-kernel-trick/https://mlmentorship.com/blog/2026-05-07-kernel-methods-and-the-kernel-trick/Compute inner products in a high-dimensional feature space without ever materializing the features. The mathematical move that lets a linear classifier draw nonlinear boundaries.Thu, 07 May 2026 00:00:00 GMTconceptsKnowledge distillationhttps://mlmentorship.com/blog/2026-05-07-knowledge-distillation/https://mlmentorship.com/blog/2026-05-07-knowledge-distillation/Train a small student to match a large teacher's outputs. The student gets richer signal than from hard labels because the teacher's soft probabilities encode similarity structure.Thu, 07 May 2026 00:00:00 GMTconceptsMulti-head attention: why one head is not enoughhttps://mlmentorship.com/blog/2026-05-07-multi-head-attention/https://mlmentorship.com/blog/2026-05-07-multi-head-attention/Run h independent attention computations in parallel, then concatenate. Each head specializes in a different relation. The mechanism most senior candidates can write but few can motivate.Thu, 07 May 2026 00:00:00 GMTconceptsPruning: structured vs unstructured sparsityhttps://mlmentorship.com/blog/2026-05-07-pruning/https://mlmentorship.com/blog/2026-05-07-pruning/Set unimportant weights to zero, recover most of the accuracy. Unstructured pruning shrinks model size; structured pruning shrinks inference time. They solve different problems.Thu, 07 May 2026 00:00:00 GMTconceptsLSTM and GRU: gating as Hadamard productshttps://mlmentorship.com/blog/2026-05-07-lstm-and-gru/https://mlmentorship.com/blog/2026-05-07-lstm-and-gru/Recurrent networks fail because gradients vanish through repeated matmul. Gates fix this by using elementwise multiplication to control information flow. Then transformers replaced them anyway.Thu, 07 May 2026 00:00:00 GMTconceptsSelf-attention vs cross-attentionhttps://mlmentorship.com/blog/2026-05-07-self-attention-vs-cross-attention/https://mlmentorship.com/blog/2026-05-07-self-attention-vs-cross-attention/Same operation, different inputs. Self-attention reads from one sequence; cross-attention reads from another. The distinction every encoder-decoder architecture rests on.Thu, 07 May 2026 00:00:00 GMTconceptst-SNE and UMAP: nonlinear dimensionality reductionhttps://mlmentorship.com/blog/2026-05-07-tsne-and-umap/https://mlmentorship.com/blog/2026-05-07-tsne-and-umap/Both project high-dimensional data to 2D for visualization by preserving local neighborhoods. Both are easy to misread. Know what they show and what they hide.Thu, 07 May 2026 00:00:00 GMTconceptsWord embeddings: Word2Vec, GloVe, and the geometry of meaninghttps://mlmentorship.com/blog/2026-05-07-word-embeddings/https://mlmentorship.com/blog/2026-05-07-word-embeddings/Map words to dense vectors so that similar words land near each other. The breakthrough that proved meaning lives in geometry, not symbols.Thu, 07 May 2026 00:00:00 GMTconceptsLLM Evals: The hardest part of shipping LLMs, and why most teams get it wronghttps://mlmentorship.com/blog/2026-05-05-llm-evals-the-hardest-part/https://mlmentorship.com/blog/2026-05-05-llm-evals-the-hardest-part/Your model is only as good as your eval. Your eval is a product. Treat it like one. The patterns that separate teams that ship from teams that thrash.Tue, 05 May 2026 00:00:00 GMTguidesWeight initialization (Kaiming, Xavier)https://mlmentorship.com/blog/2026-05-05-weight-initialization/https://mlmentorship.com/blog/2026-05-05-weight-initialization/Set the initial variance of each layer's weights so that activations and gradients neither explode nor vanish through depth. The single most impactful one-line decision in deep nets.Tue, 05 May 2026 00:00:00 GMTconceptsWalk me through speculative decodinghttps://mlmentorship.com/blog/2026-05-04-walk-through-speculative-decoding/https://mlmentorship.com/blog/2026-05-04-walk-through-speculative-decoding/The interview signal is whether you understand why decoding is memory-bound and why the verify pass is essentially free.Mon, 04 May 2026 00:00:00 GMTquestionsHow do you A/B test a chatbot?https://mlmentorship.com/blog/2026-05-01-ab-test-chatbot/https://mlmentorship.com/blog/2026-05-01-ab-test-chatbot/Chatbot A/B testing has all the hard parts of regular A/B testing plus delayed feedback, conversational state, and metrics that are hard to define.Fri, 01 May 2026 00:00:00 GMTquestionsActivation checkpointinghttps://mlmentorship.com/blog/2026-05-01-activation-checkpointing/https://mlmentorship.com/blog/2026-05-01-activation-checkpointing/Trade compute for memory: drop activations during the forward pass and recompute them during the backward pass. The cheapest way to fit a larger model on the same GPU.Fri, 01 May 2026 00:00:00 GMTconceptsProximal Policy Optimization (PPO)https://mlmentorship.com/blog/2026-05-01-ppo/https://mlmentorship.com/blog/2026-05-01-ppo/Constrain policy updates with a clipped surrogate objective. The default actor-critic algorithm in 2026. For robotics, games, and RLHF.Fri, 01 May 2026 00:00:00 GMTconceptsWhy does Adam sometimes generalize worse than SGD?https://mlmentorship.com/blog/2026-04-29-adam-vs-sgd-generalization/https://mlmentorship.com/blog/2026-04-29-adam-vs-sgd-generalization/Adam usually trains faster but in some settings finds sharper minima with worse generalization. The senior answer names the regimes where this happens and the modern fixes.Wed, 29 Apr 2026 00:00:00 GMTquestionsDesign real-time personalizationhttps://mlmentorship.com/blog/2026-04-28-real-time-personalization/https://mlmentorship.com/blog/2026-04-28-real-time-personalization/Real-time personalization fails most often at the data infrastructure, not the model. The senior answer designs the feature freshness and serving stack first.Tue, 28 Apr 2026 00:00:00 GMTquestionsGPU memory hierarchy: HBM, SRAM, and why I/O matters more than FLOPshttps://mlmentorship.com/blog/2026-04-27-gpu-memory-hierarchy/https://mlmentorship.com/blog/2026-04-27-gpu-memory-hierarchy/Modern GPUs are memory-bound for almost everything except big matmuls. Understanding HBM vs. SRAM bandwidth is the prerequisite for FlashAttention, KV-cache reasoning, and inference cost models.Mon, 27 Apr 2026 00:00:00 GMTconceptsWhy is softmax + cross-entropy the right pairing?https://mlmentorship.com/blog/2026-04-26-softmax-cross-entropy-pairing/https://mlmentorship.com/blog/2026-04-26-softmax-cross-entropy-pairing/The gradient simplifies to (p - y), and that's not a coincidence. The senior answer derives this and connects to GLMs and numerical stability.Sun, 26 Apr 2026 00:00:00 GMTquestionsWhat L5 vs L6 actually means at FAANG MLhttps://mlmentorship.com/blog/2026-04-23-l5-vs-l6-faang-ml/https://mlmentorship.com/blog/2026-04-23-l5-vs-l6-faang-ml/Level lines are mostly invisible from the outside but sharp on the inside. A practical calibration of L4 through L7 in ML / Applied Scientist tracks.Thu, 23 Apr 2026 00:00:00 GMTguidesHow do you scope an ambiguous problem?https://mlmentorship.com/blog/2026-04-23-scope-ambiguous-problem/https://mlmentorship.com/blog/2026-04-23-scope-ambiguous-problem/Scoping is the single most important senior skill. The interview tests whether you have a process, not just a definition.Thu, 23 Apr 2026 00:00:00 GMTquestionsContinuous batching for LLM servinghttps://mlmentorship.com/blog/2026-04-22-continuous-batching/https://mlmentorship.com/blog/2026-04-22-continuous-batching/Let new requests join an in-flight batch at every decode step instead of waiting for the slowest one. The other half of why vLLM is fast.Wed, 22 Apr 2026 00:00:00 GMTconceptsValue-based vs. policy-based RLhttps://mlmentorship.com/blog/2026-04-22-value-vs-policy-rl/https://mlmentorship.com/blog/2026-04-22-value-vs-policy-rl/Two paradigms in reinforcement learning. Value-based learns Q(s, a) and acts greedily; policy-based directly parametrizes the policy. When to use which.Wed, 22 Apr 2026 00:00:00 GMTconceptsGaussian mixture modelshttps://mlmentorship.com/blog/2026-04-20-gaussian-mixture-models/https://mlmentorship.com/blog/2026-04-20-gaussian-mixture-models/Model data as a weighted sum of K Gaussians. Soft clustering, density estimation, and the canonical EM example.Mon, 20 Apr 2026 00:00:00 GMTconceptsDecision treeshttps://mlmentorship.com/blog/2026-04-19-decision-trees/https://mlmentorship.com/blog/2026-04-19-decision-trees/Recursively split the feature space along axis-aligned thresholds chosen to maximize a purity criterion. The base learner of GBDT and random forests.Sun, 19 Apr 2026 00:00:00 GMTconceptsHow to think about LLM inference costhttps://mlmentorship.com/blog/2026-04-18-llm-inference-cost/https://mlmentorship.com/blog/2026-04-18-llm-inference-cost/Most teams calculate inference cost by multiplying token price by token count. The actual cost structure has five layers and most of the optimization wins are in the bottom four.Sat, 18 Apr 2026 00:00:00 GMTguidesSGD with momentumhttps://mlmentorship.com/blog/2026-04-18-sgd-with-momentum/https://mlmentorship.com/blog/2026-04-18-sgd-with-momentum/Add a moving average of past gradients to the update. Smoother trajectories, faster convergence in narrow valleys, and the foundation of Adam's first moment.Sat, 18 Apr 2026 00:00:00 GMTconceptsLogistic regressionhttps://mlmentorship.com/blog/2026-04-17-logistic-regression/https://mlmentorship.com/blog/2026-04-17-logistic-regression/Linear regression for binary classification: pass a linear combination through a sigmoid, train by maximum likelihood. Still the strongest non-trivial baseline for tabular classification.Fri, 17 Apr 2026 00:00:00 GMTconceptsNegative sampling strategies: what actually mattershttps://mlmentorship.com/blog/2026-04-17-negative-sampling-strategies/https://mlmentorship.com/blog/2026-04-17-negative-sampling-strategies/Choice of negatives often matters more than choice of model. The senior answer ranks the strategies (in-batch, hard, BM25-mined, model-mined) and explains the trade-offs.Fri, 17 Apr 2026 00:00:00 GMTquestionsHidden Markov modelshttps://mlmentorship.com/blog/2026-04-16-hidden-markov-models/https://mlmentorship.com/blog/2026-04-16-hidden-markov-models/A latent Markov chain emits observations through a per-state distribution. Forward-backward, Viterbi, Baum-Welch. The classical sequence model toolkit.Thu, 16 Apr 2026 00:00:00 GMTconceptsSystem design case study: building personalized search rankinghttps://mlmentorship.com/blog/2026-04-14-personalized-search-ranking/https://mlmentorship.com/blog/2026-04-14-personalized-search-ranking/An end-to-end design of a personalized search ranking system at scale, from problem framing through deployment and monitoring. The same template works for most ML system design interviews.Tue, 14 Apr 2026 00:00:00 GMTguidessystem-designExplain backprop in your own wordshttps://mlmentorship.com/blog/2026-04-13-explain-backprop/https://mlmentorship.com/blog/2026-04-13-explain-backprop/The textbook answer is the chain rule. The senior answer is what backprop is doing as a system: a reverse-mode auto-diff pass that reuses intermediate computations to get all gradients in one extra forward-cost pass.Mon, 13 Apr 2026 00:00:00 GMTquestionsZ-losshttps://mlmentorship.com/blog/2026-04-12-z-loss/https://mlmentorship.com/blog/2026-04-12-z-loss/An auxiliary loss term that penalizes the squared log-partition function of the softmax. Started as a stability hack for logit blowup. Now used as the default regularizer on logit scale during long or deep cooldowns.Sun, 12 Apr 2026 00:00:00 GMTconceptsFlashAttentionhttps://mlmentorship.com/blog/2026-04-10-flashattention/https://mlmentorship.com/blog/2026-04-10-flashattention/I/O-aware exact attention. Replaces the O(n²) HBM traffic with a tiled streaming softmax in SRAM. The single most important kernel-level optimization in modern transformers.Fri, 10 Apr 2026 00:00:00 GMTconceptsPipeline parallelismhttps://mlmentorship.com/blog/2026-04-06-pipeline-parallelism/https://mlmentorship.com/blog/2026-04-06-pipeline-parallelism/Split the model across GPUs by layer; pipeline mini-batches through the stages. The way to scale across slow interconnects when TP isn't viable.Mon, 06 Apr 2026 00:00:00 GMTconceptsKV cache: how LLM inference avoids quadratic decode costhttps://mlmentorship.com/blog/2026-04-04-kv-cache/https://mlmentorship.com/blog/2026-04-04-kv-cache/The single most important optimization in autoregressive decoding. Without it, generating 1000 tokens would cost O(1000²) attention operations.Sat, 04 Apr 2026 00:00:00 GMTconceptsNaive Bayeshttps://mlmentorship.com/blog/2026-04-04-naive-bayes/https://mlmentorship.com/blog/2026-04-04-naive-bayes/A trivially simple generative classifier that assumes features are conditionally independent given the class. Fast, parameter-light, surprisingly hard to beat on text.Sat, 04 Apr 2026 00:00:00 GMTconceptsAutoregressive vs. diffusion generationhttps://mlmentorship.com/blog/2026-04-02-autoregressive-vs-diffusion/https://mlmentorship.com/blog/2026-04-02-autoregressive-vs-diffusion/Two paradigms for generative modeling: predict the next element step-by-step (autoregressive) or iteratively denoise from pure noise (diffusion). Different costs, different strengths.Thu, 02 Apr 2026 00:00:00 GMTconceptsObject detection: Faster R-CNN, YOLO, DETRhttps://mlmentorship.com/blog/2026-03-29-object-detection-overview/https://mlmentorship.com/blog/2026-03-29-object-detection-overview/Localize and classify objects in an image. The three main architectural families: two-stage proposal-based, one-stage grid-based, and transformer-based.Sun, 29 Mar 2026 00:00:00 GMTconceptsMixture of Experts (MoE)https://mlmentorship.com/blog/2026-03-27-mixture-of-experts/https://mlmentorship.com/blog/2026-03-27-mixture-of-experts/Replace one large feed-forward block with N smaller experts and a router that activates only k of them per token. Trades parameter count for compute.Fri, 27 Mar 2026 00:00:00 GMTconceptsDBSCANhttps://mlmentorship.com/blog/2026-03-25-dbscan/https://mlmentorship.com/blog/2026-03-25-dbscan/Density-based clustering: form clusters from regions of high point density, label sparse points as noise. Handles arbitrary cluster shapes; no k to specify.Wed, 25 Mar 2026 00:00:00 GMTconceptsROC, PR curves, and AUChttps://mlmentorship.com/blog/2026-03-25-roc-pr-auc/https://mlmentorship.com/blog/2026-03-25-roc-pr-auc/What ROC-AUC and PR-AUC measure, when to use which, and why ROC-AUC is misleading on heavy class imbalance.Wed, 25 Mar 2026 00:00:00 GMTconceptsExploding and vanishing gradientshttps://mlmentorship.com/blog/2026-03-24-exploding-vanishing-gradients/https://mlmentorship.com/blog/2026-03-24-exploding-vanishing-gradients/Why deep networks were untrainable before residuals, normalization, and ReLU. The math of gradient magnitudes through depth and the standard fixes.Tue, 24 Mar 2026 00:00:00 GMTconceptsHow do you choose a learning rate?https://mlmentorship.com/blog/2026-03-22-how-to-choose-learning-rate/https://mlmentorship.com/blog/2026-03-22-how-to-choose-learning-rate/The right answer is a procedure, not a number. The wrong answers are 'use the default' and 'try a few values.'Sun, 22 Mar 2026 00:00:00 GMTquestionsThe 5 things every applied scientist interview is actually testing forhttps://mlmentorship.com/blog/2026-03-20-five-things-as-interview-tests/https://mlmentorship.com/blog/2026-03-20-five-things-as-interview-tests/Strip away the questions and the role-specific jargon. Every senior AS loop is checking the same five things. If you know what they are, the prep gets sharper.Fri, 20 Mar 2026 00:00:00 GMTguidesRoPE, ALiBi, and the modern positional encoding landscapehttps://mlmentorship.com/blog/2026-03-15-positional-encoding/https://mlmentorship.com/blog/2026-03-15-positional-encoding/Sinusoidal positional encoding is in the original transformer paper and not in any modern LLM. Here's what replaced it and why.Sun, 15 Mar 2026 00:00:00 GMTconceptsHow do you choose a loss function?https://mlmentorship.com/blog/2026-03-09-how-to-choose-loss-function/https://mlmentorship.com/blog/2026-03-09-how-to-choose-loss-function/The loss is the objective. Picking the wrong one means optimizing for the wrong thing, no matter how well you train. The senior answer derives the loss from the problem, not from a list.Mon, 09 Mar 2026 00:00:00 GMTquestionsDesign a system for safe LLM deployment in healthcarehttps://mlmentorship.com/blog/2026-03-07-llm-deployment-healthcare/https://mlmentorship.com/blog/2026-03-07-llm-deployment-healthcare/Healthcare adds three constraints on top of normal LLM deployment: regulatory compliance, low tolerance for harm, and a workflow that already has clinicians as the final decision-maker.Sat, 07 Mar 2026 00:00:00 GMTquestionsProbabilistic graphical modelshttps://mlmentorship.com/blog/2026-03-03-graphical-models/https://mlmentorship.com/blog/2026-03-03-graphical-models/Express joint distributions as graphs whose structure encodes conditional independence. Bayesian networks (directed) and Markov random fields (undirected).Tue, 03 Mar 2026 00:00:00 GMTconceptsCalibration: when your model says 80% it should be right 80% of the timehttps://mlmentorship.com/blog/2026-03-01-calibration/https://mlmentorship.com/blog/2026-03-01-calibration/Accuracy isn't enough; you also want predictions to mean what they say. Calibration is the difference.Sun, 01 Mar 2026 00:00:00 GMTconceptsApplied Scientist vs MLE vs Research Engineer: what these roles actually dohttps://mlmentorship.com/blog/2026-02-28-as-vs-mle-vs-re/https://mlmentorship.com/blog/2026-02-28-as-vs-mle-vs-re/The role taxonomy is confusing because companies use the same titles to mean different things. Here's the actual decomposition, and which one you should target.Sat, 28 Feb 2026 00:00:00 GMTguidesWeight decay vs. L2 regularizationhttps://mlmentorship.com/blog/2026-02-27-weight-decay-vs-l2/https://mlmentorship.com/blog/2026-02-27-weight-decay-vs-l2/L2 adds ½λ‖θ‖² to the loss; weight decay shrinks θ multiplicatively at each step. They are equivalent under SGD but not under Adam. Which is why AdamW exists.Fri, 27 Feb 2026 00:00:00 GMTconceptsLabel smoothinghttps://mlmentorship.com/blog/2026-02-24-label-smoothing/https://mlmentorship.com/blog/2026-02-24-label-smoothing/Replace one-hot targets with a softened distribution that puts ε mass on the wrong classes. Improves calibration, sometimes hurts retrieval.Tue, 24 Feb 2026 00:00:00 GMTconceptsMatrix calculus for MLhttps://mlmentorship.com/blog/2026-02-23-matrix-calculus/https://mlmentorship.com/blog/2026-02-23-matrix-calculus/Gradients, Jacobians, and Hessians for vector- and matrix-valued functions. The minimum needed to derive backprop and second-order methods.Mon, 23 Feb 2026 00:00:00 GMTconceptsEmbedding spaces and similarity metricshttps://mlmentorship.com/blog/2026-02-22-embedding-spaces-and-similarity/https://mlmentorship.com/blog/2026-02-22-embedding-spaces-and-similarity/How learned vector representations encode meaning, and why cosine similarity is the default metric for retrieval and recsys.Sun, 22 Feb 2026 00:00:00 GMTconceptsQ-learninghttps://mlmentorship.com/blog/2026-02-22-q-learning/https://mlmentorship.com/blog/2026-02-22-q-learning/Learn the action-value function Q(s, a) by Bellman backups. The foundation of value-based RL. DQN, Rainbow, and the original Atari breakthroughs.Sun, 22 Feb 2026 00:00:00 GMTconceptsGradient boosting (xgboost, lightgbm, catboost)https://mlmentorship.com/blog/2026-02-21-gradient-boosting/https://mlmentorship.com/blog/2026-02-21-gradient-boosting/Train trees sequentially, each one fitting the gradient of the loss with respect to the current ensemble's prediction. The dominant tabular learner in 2026.Sat, 21 Feb 2026 00:00:00 GMTconceptsExplain backprop through timehttps://mlmentorship.com/blog/2026-02-16-bptt-backprop-through-time/https://mlmentorship.com/blog/2026-02-16-bptt-backprop-through-time/BPTT is just backprop on the unrolled computation graph of a recurrent network. The interview signal is whether you understand truncation and what it costs.Mon, 16 Feb 2026 00:00:00 GMTquestionsMatrix factorization for recsyshttps://mlmentorship.com/blog/2026-02-16-matrix-factorization-recsys/https://mlmentorship.com/blog/2026-02-16-matrix-factorization-recsys/Decompose the user-item interaction matrix into user and item embeddings whose dot product approximates the rating. The original collaborative filtering.Mon, 16 Feb 2026 00:00:00 GMTconceptsDesign Amazon's people also boughthttps://mlmentorship.com/blog/2026-02-13-people-also-bought/https://mlmentorship.com/blog/2026-02-13-people-also-bought/A simple-sounding feature with deep recsys ground underneath. The senior answer chooses between item-item collaborative filtering, embedding similarity, and learned co-purchase models, with explicit handling of feedback loops.Fri, 13 Feb 2026 00:00:00 GMTquestionsSpeculative decodinghttps://mlmentorship.com/blog/2026-02-13-speculative-decoding/https://mlmentorship.com/blog/2026-02-13-speculative-decoding/Break the autoregressive serial bottleneck without changing the output distribution. 2-3× inference speedup, free.Fri, 13 Feb 2026 00:00:00 GMTconceptsGradient accumulationhttps://mlmentorship.com/blog/2026-02-09-gradient-accumulation/https://mlmentorship.com/blog/2026-02-09-gradient-accumulation/Run several forward-backward passes before each optimizer step to simulate a larger effective batch size without the memory cost.Mon, 09 Feb 2026 00:00:00 GMTconceptsSparse attention (BigBird, Longformer)https://mlmentorship.com/blog/2026-02-08-sparse-attention/https://mlmentorship.com/blog/2026-02-08-sparse-attention/Replace the dense n×n attention mask with a sparse pattern that has O(n) non-zeros while preserving information flow across the full sequence.Sun, 08 Feb 2026 00:00:00 GMTconceptsHow would you reduce LLM inference cost by 10x?https://mlmentorship.com/blog/2026-02-05-reduce-llm-inference-cost-10x/https://mlmentorship.com/blog/2026-02-05-reduce-llm-inference-cost-10x/The cost-engineering question. The L6 answer doesn't pick a technique, it diagnoses where the cost is, then picks five.Thu, 05 Feb 2026 00:00:00 GMTquestionsBayesian vs frequentist: a practitioner's framinghttps://mlmentorship.com/blog/2026-02-02-bayesian-vs-frequentist/https://mlmentorship.com/blog/2026-02-02-bayesian-vs-frequentist/The textbook distinction is philosophical. The practitioner distinction is whether you can sample from a posterior cheaply, and whether you need uncertainty for downstream decisions.Mon, 02 Feb 2026 00:00:00 GMTquestionsFloating-point formats: FP32, FP16, BF16, FP8, TF32https://mlmentorship.com/blog/2026-02-01-floating-point-formats/https://mlmentorship.com/blog/2026-02-01-floating-point-formats/How modern accelerators trade precision for speed. The bit layouts of every numeric format that appears in deep learning.Sun, 01 Feb 2026 00:00:00 GMTconceptsRegularization: L1, L2, dropout, early stopping, and the modern viewhttps://mlmentorship.com/blog/2026-01-30-regularization/https://mlmentorship.com/blog/2026-01-30-regularization/The classical regularizers + the modern reality that SGD's noise is itself a regularizer. The hierarchy of choices when your model is overfitting.Fri, 30 Jan 2026 00:00:00 GMTconceptsRandom forestshttps://mlmentorship.com/blog/2026-01-27-random-forests/https://mlmentorship.com/blog/2026-01-27-random-forests/Bag deep decision trees plus random feature subsets per split. Variance averaging beats any single tree; the dominant out-of-the-box ensemble before GBDT.Tue, 27 Jan 2026 00:00:00 GMTconceptsSequence packing with block-diagonal maskshttps://mlmentorship.com/blog/2026-01-25-sequence-packing/https://mlmentorship.com/blog/2026-01-25-sequence-packing/Concatenate multiple short examples into one fixed-length sequence to eliminate padding waste. The single largest throughput win for training on skewed-length corpora.Sun, 25 Jan 2026 00:00:00 GMTconceptsDesign a feature store from scratchhttps://mlmentorship.com/blog/2026-01-23-design-feature-store/https://mlmentorship.com/blog/2026-01-23-design-feature-store/A feature store solves training-serving skew, feature reuse, and lineage. The senior answer explains why each property matters and what minimum viable looks like.Fri, 23 Jan 2026 00:00:00 GMTquestionsPositive (semi-)definite matriceshttps://mlmentorship.com/blog/2026-01-23-positive-definite-matrices/https://mlmentorship.com/blog/2026-01-23-positive-definite-matrices/Matrices that define inner products and proper covariances. The geometry of PSD: ellipsoids, not arbitrary shapes.Fri, 23 Jan 2026 00:00:00 GMTconceptsLessons from Marin 8B: what an open pretraining log actually teaches youhttps://mlmentorship.com/blog/2026-01-21-lessons-from-marin-8b/https://mlmentorship.com/blog/2026-01-21-lessons-from-marin-8b/Marin trained the first open-source 8B model to beat Llama 3.1 8B and published every mistake. The transferable lessons aren't about TPUs. They're about how to run pretraining like a science.Wed, 21 Jan 2026 00:00:00 GMTguidesTokenization: BPE, WordPiece, and the LLM erahttps://mlmentorship.com/blog/2026-01-20-tokenization/https://mlmentorship.com/blog/2026-01-20-tokenization/The critical input layer between text and model. Tokenization mismatch is a frequent source of production LLM bugs.Tue, 20 Jan 2026 00:00:00 GMTconceptsActivation functionshttps://mlmentorship.com/blog/2026-01-19-activation-functions/https://mlmentorship.com/blog/2026-01-19-activation-functions/ReLU, GELU, swish, sigmoid, tanh. What each does, why GELU/swish replaced ReLU in transformers, and when to use which.Mon, 19 Jan 2026 00:00:00 GMTconceptsBias and variance of estimatorshttps://mlmentorship.com/blog/2026-01-13-bias-variance-of-estimators/https://mlmentorship.com/blog/2026-01-13-bias-variance-of-estimators/An estimator has bias (systematic error) and variance (sample-to-sample wobble). Mean-squared error decomposes into the two.Tue, 13 Jan 2026 00:00:00 GMTconceptsHow would you evaluate an LLM application you've built?https://mlmentorship.com/blog/2026-01-10-how-would-you-evaluate-an-llm-application/https://mlmentorship.com/blog/2026-01-10-how-would-you-evaluate-an-llm-application/A level-defining question. The same words elicit a junior, senior, or staff answer. The rubric below shows the differences.Sat, 10 Jan 2026 00:00:00 GMTquestionsLearning rate schedules: warmup and cosine decayhttps://mlmentorship.com/blog/2026-01-05-learning-rate-schedules/https://mlmentorship.com/blog/2026-01-05-learning-rate-schedules/Why almost every modern training run linearly warms up the LR over a few hundred steps and then decays it on a cosine to near zero.Mon, 05 Jan 2026 00:00:00 GMTconceptsGenerative adversarial networks (GANs)https://mlmentorship.com/blog/2026-01-01-gans-overview/https://mlmentorship.com/blog/2026-01-01-gans-overview/Two networks compete: a generator produces samples, a discriminator distinguishes them from real data. Sharp samples, training instability, mostly displaced by diffusion in 2026.Thu, 01 Jan 2026 00:00:00 GMTconceptsUniversal approximation theoremhttps://mlmentorship.com/blog/2025-12-29-universal-approximation-theorem/https://mlmentorship.com/blog/2025-12-29-universal-approximation-theorem/A neural network with one hidden layer and enough units can approximate any continuous function on a bounded domain. What it does and doesn't say about deep learning.Mon, 29 Dec 2025 00:00:00 GMTconceptsBatchNorm vs LayerNorm (and the transformer wrinkle)https://mlmentorship.com/blog/2025-12-28-batchnorm-vs-layernorm/https://mlmentorship.com/blog/2025-12-28-batchnorm-vs-layernorm/These look similar and aren't. Mixing them up in interviews is one of the cheapest ways to lose level points. Here's the right mental model.Sun, 28 Dec 2025 00:00:00 GMTconceptsRLHF, DPO, and the alignment training stackhttps://mlmentorship.com/blog/2025-12-28-rlhf-and-dpo/https://mlmentorship.com/blog/2025-12-28-rlhf-and-dpo/How LLMs get from 'next-token predictor' to 'helpful assistant.' The post-training pipeline in 2026.Sun, 28 Dec 2025 00:00:00 GMTconceptsDesign fraud detection for a payment companyhttps://mlmentorship.com/blog/2025-12-25-design-fraud-detection/https://mlmentorship.com/blog/2025-12-25-design-fraud-detection/Fraud has the worst data of any ML problem: heavily imbalanced, biased labels, adversarial actors, and direct money on the line. The senior answer respects all four.Thu, 25 Dec 2025 00:00:00 GMTquestionsDerive logistic regression from MLEhttps://mlmentorship.com/blog/2025-12-23-derive-logistic-regression/https://mlmentorship.com/blog/2025-12-23-derive-logistic-regression/Standard math-screen question. The senior signal is whether you can derive it cleanly and connect MLE to cross-entropy.Tue, 23 Dec 2025 00:00:00 GMTquestionsGrouped-query and multi-query attention (GQA, MQA)https://mlmentorship.com/blog/2025-12-23-grouped-query-attention/https://mlmentorship.com/blog/2025-12-23-grouped-query-attention/Share K and V heads across query heads to shrink the KV cache 4-8x with negligible quality loss. Standard in modern decoder LLMs.Tue, 23 Dec 2025 00:00:00 GMTconceptsTensor parallelismhttps://mlmentorship.com/blog/2025-12-23-tensor-parallelism/https://mlmentorship.com/blog/2025-12-23-tensor-parallelism/Split a single matrix multiplication across multiple GPUs. The way to fit one transformer layer that doesn't fit on a single device.Tue, 23 Dec 2025 00:00:00 GMTconceptsA/B testing for ML systemshttps://mlmentorship.com/blog/2025-12-22-ab-testing-for-ml/https://mlmentorship.com/blog/2025-12-22-ab-testing-for-ml/The framework for proving a model change actually helps. Statistical power, novelty effects, network effects, all the things people get wrong.Mon, 22 Dec 2025 00:00:00 GMTconceptsCentral limit theoremhttps://mlmentorship.com/blog/2025-12-21-central-limit-theorem/https://mlmentorship.com/blog/2025-12-21-central-limit-theorem/Sums of many independent random variables become Gaussian. Why nearly every error bar in ML and statistics is computed from a normal distribution.Sun, 21 Dec 2025 00:00:00 GMTconceptsPrefill vs. decode: the two phases of LLM inferencehttps://mlmentorship.com/blog/2025-12-14-prefill-vs-decode/https://mlmentorship.com/blog/2025-12-14-prefill-vs-decode/LLM inference has two cost regimes with very different bottlenecks. Mixing them up leads to wrong cost models and bad serving decisions.Sun, 14 Dec 2025 00:00:00 GMTconceptsDesign Spotify's homepagehttps://mlmentorship.com/blog/2025-12-12-design-spotify-homepage/https://mlmentorship.com/blog/2025-12-12-design-spotify-homepage/A multi-shelf, multi-objective recommendation surface. The senior answer scopes the shelves first, then designs each as its own ranker with a meta-layer above.Fri, 12 Dec 2025 00:00:00 GMTquestionsRecsys in the LLM era: what changes?https://mlmentorship.com/blog/2025-12-12-recsys-llm-era/https://mlmentorship.com/blog/2025-12-12-recsys-llm-era/Most of recsys hasn't changed; LLMs add new capabilities at specific stages. The senior answer names which stages benefit and which don't.Fri, 12 Dec 2025 00:00:00 GMTquestionsVision transformers (ViT)https://mlmentorship.com/blog/2025-12-10-vision-transformers/https://mlmentorship.com/blog/2025-12-10-vision-transformers/Apply a standard transformer to a sequence of image patches. Beats CNNs at scale; the dominant backbone for foundation vision models in 2026.Wed, 10 Dec 2025 00:00:00 GMTconceptsMixed precision: what's actually happening?https://mlmentorship.com/blog/2025-12-08-mixed-precision-deep/https://mlmentorship.com/blog/2025-12-08-mixed-precision-deep/Beyond 'use BF16'. The senior answer explains what stays in FP32, why loss scaling exists for FP16, and the memory split.Mon, 08 Dec 2025 00:00:00 GMTquestionsWhat's the most over-rated technique in ML right now?https://mlmentorship.com/blog/2025-12-07-most-overrated-technique/https://mlmentorship.com/blog/2025-12-07-most-overrated-technique/A trap question that rewards taste. Strong opinions, defended with reasoning, are the senior signal. Weak opinions or 'I don't know' both lose.Sun, 07 Dec 2025 00:00:00 GMTquestionsLinear attention (Linformer, Performer, kernel methods)https://mlmentorship.com/blog/2025-12-06-linear-attention/https://mlmentorship.com/blog/2025-12-06-linear-attention/Approximate the softmax attention matrix with a low-rank or kernel factorization so cost is linear in sequence length.Sat, 06 Dec 2025 00:00:00 GMTconceptsL1 vs L2 regularization, beyond the formulahttps://mlmentorship.com/blog/2025-12-04-l1-vs-l2-beyond-formula/https://mlmentorship.com/blog/2025-12-04-l1-vs-l2-beyond-formula/The math is identical to most candidates: penalty terms in the loss. The senior signal is the Bayesian interpretation, the optimization geometry, and when each is the right choice.Thu, 04 Dec 2025 00:00:00 GMTquestionsExponential familyhttps://mlmentorship.com/blog/2025-12-03-exponential-family/https://mlmentorship.com/blog/2025-12-03-exponential-family/A unified family of distributions (Gaussian, Bernoulli, Poisson, Beta, Gamma, etc.) with shared properties: sufficient statistics, conjugate priors, simple MLE.Wed, 03 Dec 2025 00:00:00 GMTconceptsDeterminant and volumehttps://mlmentorship.com/blog/2025-12-02-determinant-and-volume/https://mlmentorship.com/blog/2025-12-02-determinant-and-volume/The determinant of a matrix is the signed volume scaling factor of the linear map. Zero determinant means the map collapses dimensions.Tue, 02 Dec 2025 00:00:00 GMTconceptsCross-validation strategieshttps://mlmentorship.com/blog/2025-12-01-cross-validation-strategies/https://mlmentorship.com/blog/2025-12-01-cross-validation-strategies/Hold-out, k-fold, stratified, grouped, and time-series CV. And when each one is and isn't appropriate.Mon, 01 Dec 2025 00:00:00 GMTconceptsDebug this training loophttps://mlmentorship.com/blog/2025-12-01-debug-training-loop/https://mlmentorship.com/blog/2025-12-01-debug-training-loop/A live coding question with a paste of buggy training code. The senior signal is the order in which you find bugs and what your debugging procedure looks like.Mon, 01 Dec 2025 00:00:00 GMTquestionsHow would you do cold-start for a new user?https://mlmentorship.com/blog/2025-11-30-cold-start-new-user/https://mlmentorship.com/blog/2025-11-30-cold-start-new-user/Cold-start is solved by combining minimal explicit signal, demographic and contextual fallbacks, and aggressive exploration in the first few sessions.Sun, 30 Nov 2025 00:00:00 GMTquestionsWalk me through how you'd train a 100B parameter modelhttps://mlmentorship.com/blog/2025-11-28-train-100b-model/https://mlmentorship.com/blog/2025-11-28-train-100b-model/The question is about parallelism and memory, not about modeling. The L6 answer combines data, tensor, pipeline, and FSDP/ZeRO sharding into a coherent strategy.Fri, 28 Nov 2025 00:00:00 GMTquestionsImplement attention from scratchhttps://mlmentorship.com/blog/2025-11-27-implement-attention-from-scratch/https://mlmentorship.com/blog/2025-11-27-implement-attention-from-scratch/The coding question that doubles as a depth check. The code is short; the conversation around it tells the level.Thu, 27 Nov 2025 00:00:00 GMTquestionsMarkov chainshttps://mlmentorship.com/blog/2025-11-27-markov-chains/https://mlmentorship.com/blog/2025-11-27-markov-chains/Stochastic processes where the future depends only on the present, not the past. Foundation of HMMs, MCMC, and many sequence models.Thu, 27 Nov 2025 00:00:00 GMTconceptsResidual connectionshttps://mlmentorship.com/blog/2025-11-27-residual-connections/https://mlmentorship.com/blog/2025-11-27-residual-connections/Add the input of a block to its output. Lets gradients flow unimpeded through depth and made networks deeper than 30 layers practical for the first time.Thu, 27 Nov 2025 00:00:00 GMTconceptsDiffusion modelshttps://mlmentorship.com/blog/2025-11-22-diffusion-models/https://mlmentorship.com/blog/2025-11-22-diffusion-models/Learn to invert a fixed noising process. The dominant generative paradigm for images, audio, video, and molecules in 2026.Sat, 22 Nov 2025 00:00:00 GMTconceptsVariational autoencoders (VAE)https://mlmentorship.com/blog/2025-11-22-variational-autoencoders/https://mlmentorship.com/blog/2025-11-22-variational-autoencoders/Encode inputs to a latent distribution, decode samples back, optimize evidence lower bound. The cleanest gateway to deep generative models.Sat, 22 Nov 2025 00:00:00 GMTconceptsPolicy gradient methodshttps://mlmentorship.com/blog/2025-11-20-policy-gradient/https://mlmentorship.com/blog/2025-11-20-policy-gradient/Directly optimize the policy by following the gradient of expected return. REINFORCE, actor-critic, and the foundation of modern RL.Thu, 20 Nov 2025 00:00:00 GMTconceptsSVM and the kernel trickhttps://mlmentorship.com/blog/2025-11-16-svm-and-kernels/https://mlmentorship.com/blog/2025-11-16-svm-and-kernels/Maximum-margin classifier with a kernel that lets it operate in implicit high-dimensional feature spaces. Beautiful theory; less common in 2026 production.Sun, 16 Nov 2025 00:00:00 GMTconceptsBayes' rule and the posteriorhttps://mlmentorship.com/blog/2025-11-15-bayes-rule-and-posterior/https://mlmentorship.com/blog/2025-11-15-bayes-rule-and-posterior/How to update beliefs given evidence: posterior ∝ likelihood × prior. The foundation of Bayesian inference, naive Bayes, and probabilistic graphical models.Sat, 15 Nov 2025 00:00:00 GMTconceptsLong-context LLMs: training and serving techniqueshttps://mlmentorship.com/blog/2025-11-15-long-context-llms/https://mlmentorship.com/blog/2025-11-15-long-context-llms/What makes a 1M-token context model work. Position-encoding extension, attention kernels, KV-cache management, and the tradeoffs.Sat, 15 Nov 2025 00:00:00 GMTconceptsFSDP and ZeRO: sharding optimizer state, gradients, and parametershttps://mlmentorship.com/blog/2025-11-14-fsdp-and-zero/https://mlmentorship.com/blog/2025-11-14-fsdp-and-zero/How modern training scales beyond a single GPU's memory by partitioning the optimizer state, gradients, and parameters across the data-parallel group.Fri, 14 Nov 2025 00:00:00 GMTconceptsHow do you evaluate an agent?https://mlmentorship.com/blog/2025-11-11-evaluate-an-agent/https://mlmentorship.com/blog/2025-11-11-evaluate-an-agent/Agent eval is harder than chat eval because there are intermediate steps, tool calls, and long-horizon outcomes. The senior answer evaluates trajectories, not just final outputs.Tue, 11 Nov 2025 00:00:00 GMTquestionsMicroannealing and midtraininghttps://mlmentorship.com/blog/2025-11-11-microannealing/https://mlmentorship.com/blog/2025-11-11-microannealing/A short cooldown applied to a mostly-trained checkpoint with a small fraction of candidate data mixed in. The standard mid-training probe for whether a new dataset is worth including.Tue, 11 Nov 2025 00:00:00 GMTconceptsExpectation-Maximization (EM)https://mlmentorship.com/blog/2025-11-09-expectation-maximization/https://mlmentorship.com/blog/2025-11-09-expectation-maximization/Iterate between estimating latent variables given parameters (E-step) and updating parameters given latents (M-step). The standard tool for latent-variable MLE when the latents are unobserved.Sun, 09 Nov 2025 00:00:00 GMTconceptsBackpropagationhttps://mlmentorship.com/blog/2025-11-06-backpropagation/https://mlmentorship.com/blog/2025-11-06-backpropagation/Reverse-mode automatic differentiation applied to a computation graph. The algorithm that computes gradients for every parameter in one backward pass.Thu, 06 Nov 2025 00:00:00 GMTconceptsResNethttps://mlmentorship.com/blog/2025-11-05-resnet/https://mlmentorship.com/blog/2025-11-05-resnet/Residual connections enabled networks deeper than 30 layers to train. Still the dominant backbone for transfer learning in 2026.Wed, 05 Nov 2025 00:00:00 GMTconceptsTransformer architecture: a senior-level mental modelhttps://mlmentorship.com/blog/2025-11-05-transformer-architecture/https://mlmentorship.com/blog/2025-11-05-transformer-architecture/Strip away the diagram clutter. A transformer is a stack of (residual + LayerNorm + (attention or FFN)) blocks. Understanding why each piece is there is more important than memorizing the diagram.Wed, 05 Nov 2025 00:00:00 GMTconceptsConfusion matrix and classification metricshttps://mlmentorship.com/blog/2025-11-04-confusion-matrix-and-classification-metrics/https://mlmentorship.com/blog/2025-11-04-confusion-matrix-and-classification-metrics/The 2x2 (or KxK) table of predictions vs. truth that every classification metric is computed from. The Rosetta stone of binary classification.Tue, 04 Nov 2025 00:00:00 GMTconceptsExplain the reparameterization trickhttps://mlmentorship.com/blog/2025-10-29-reparameterization-trick/https://mlmentorship.com/blog/2025-10-29-reparameterization-trick/How VAEs propagate gradients through a sampling step. The senior answer explains the why (you can't differentiate through a sample) and the how (move the randomness outside the parameters).Wed, 29 Oct 2025 00:00:00 GMTquestionsk-means clusteringhttps://mlmentorship.com/blog/2025-10-26-k-means-clustering/https://mlmentorship.com/blog/2025-10-26-k-means-clustering/Partition n points into k clusters by minimizing within-cluster variance. Lloyd's algorithm: alternate assigning points to nearest center and recomputing centers.Sun, 26 Oct 2025 00:00:00 GMTconceptsNormalizing flowshttps://mlmentorship.com/blog/2025-10-26-normalizing-flows/https://mlmentorship.com/blog/2025-10-26-normalizing-flows/Generative models built from invertible transformations. Compute exact likelihoods and sample efficiently. At the cost of architectural restrictions.Sun, 26 Oct 2025 00:00:00 GMTconceptsKL divergencehttps://mlmentorship.com/blog/2025-10-23-kl-divergence/https://mlmentorship.com/blog/2025-10-23-kl-divergence/Asymmetric distance between probability distributions. Cross-entropy minus entropy. The mathematical glue holding most of probabilistic ML together.Thu, 23 Oct 2025 00:00:00 GMTconceptsRanking metrics: NDCG, MAP, MRRhttps://mlmentorship.com/blog/2025-10-23-ranking-metrics-ndcg-map-mrr/https://mlmentorship.com/blog/2025-10-23-ranking-metrics-ndcg-map-mrr/Beyond binary precision-recall: how to measure ranking quality when order matters and labels are graded.Thu, 23 Oct 2025 00:00:00 GMTconceptsDesign YouTube's recommenderhttps://mlmentorship.com/blog/2025-10-20-design-youtube-recommender/https://mlmentorship.com/blog/2025-10-20-design-youtube-recommender/The canonical recsys design question. The real test is whether you'll dive into model architecture or scope the problem first.Mon, 20 Oct 2025 00:00:00 GMTquestionsMixup and CutMixhttps://mlmentorship.com/blog/2025-10-17-mixup-and-cutmix/https://mlmentorship.com/blog/2025-10-17-mixup-and-cutmix/Two data-augmentation schemes that train on convex combinations of pairs of inputs and their labels. Strong regularization for image classification; sometimes used in audio and tabular.Fri, 17 Oct 2025 00:00:00 GMTconceptsHow do you deal with class imbalance in 2026?https://mlmentorship.com/blog/2025-10-13-class-imbalance/https://mlmentorship.com/blog/2025-10-13-class-imbalance/Class weighting and SMOTE are the textbook answers and often the wrong ones. The senior answer matches the technique to the imbalance ratio, the cost asymmetry, and the metric you actually care about.Mon, 13 Oct 2025 00:00:00 GMTquestionsThe attention mechanismhttps://mlmentorship.com/blog/2025-10-12-attention-mechanism/https://mlmentorship.com/blog/2025-10-12-attention-mechanism/Compute a weighted sum of values, weights derived from query-key similarity. The single operation that powers transformers, retrieval, and most of modern ML.Sun, 12 Oct 2025 00:00:00 GMTconceptsHow do you decide what to work on?https://mlmentorship.com/blog/2025-10-11-decide-what-to-work-on/https://mlmentorship.com/blog/2025-10-11-decide-what-to-work-on/The senior signal here is that you have an explicit prioritization framework, not just a list of interests. The L6 answer connects user value, technical leverage, and team strategy.Sat, 11 Oct 2025 00:00:00 GMTquestionsPerplexity and bits per tokenhttps://mlmentorship.com/blog/2025-10-02-perplexity-and-bits-per-token/https://mlmentorship.com/blog/2025-10-02-perplexity-and-bits-per-token/The standard intrinsic metric for language models. What it measures, what units to use, and why it's a poor end-product evaluation.Thu, 02 Oct 2025 00:00:00 GMTconceptsMonte Carlo and importance samplinghttps://mlmentorship.com/blog/2025-10-01-monte-carlo-and-importance-sampling/https://mlmentorship.com/blog/2025-10-01-monte-carlo-and-importance-sampling/Estimate expectations by averaging over random samples. The simplest way to compute integrals you can't compute analytically.Wed, 01 Oct 2025 00:00:00 GMTconceptsDropouthttps://mlmentorship.com/blog/2025-09-30-dropout/https://mlmentorship.com/blog/2025-09-30-dropout/Randomly zero out a fraction of activations during training. The simplest stochastic regularizer; still standard in vision and many NLP architectures.Tue, 30 Sep 2025 00:00:00 GMTconceptsRotary position embeddings (RoPE)https://mlmentorship.com/blog/2025-09-29-rotary-position-embeddings/https://mlmentorship.com/blog/2025-09-29-rotary-position-embeddings/The dominant position encoding for modern LLMs. Encodes relative position by rotating Q and K in 2D subspaces, enabling clean context extrapolation.Mon, 29 Sep 2025 00:00:00 GMTconceptsDesign a RAG system for legal documentshttps://mlmentorship.com/blog/2025-09-26-rag-for-legal-docs/https://mlmentorship.com/blog/2025-09-26-rag-for-legal-docs/Legal RAG amplifies every standard RAG concern: precise citations, no hallucinations, regulated domain, dense documents with structure. The senior answer addresses each.Fri, 26 Sep 2025 00:00:00 GMTquestionsMaximum likelihood estimationhttps://mlmentorship.com/blog/2025-09-22-maximum-likelihood-estimation/https://mlmentorship.com/blog/2025-09-22-maximum-likelihood-estimation/The dominant statistical principle: pick parameters that make the observed data most probable. Reduces to minimizing cross-entropy for classification and MSE for Gaussian regression.Mon, 22 Sep 2025 00:00:00 GMTconceptsAll-reduce and other collectiveshttps://mlmentorship.com/blog/2025-09-22-all-reduce-and-collectives/https://mlmentorship.com/blog/2025-09-22-all-reduce-and-collectives/The communication primitives behind every distributed training job. All-reduce, all-gather, reduce-scatter, broadcast. What they do, costs, and when each is used.Mon, 22 Sep 2025 00:00:00 GMTconceptsMixed precision training: FP16, BF16, and FP8https://mlmentorship.com/blog/2025-09-21-mixed-precision-training/https://mlmentorship.com/blog/2025-09-21-mixed-precision-training/How modern transformers train at 2-4× the throughput of FP32 without quality loss. The bit layouts matter; the loss-scaling recipe matters more.Sun, 21 Sep 2025 00:00:00 GMTconceptsPrecision, recall, and F1https://mlmentorship.com/blog/2025-09-20-precision-recall-f1/https://mlmentorship.com/blog/2025-09-20-precision-recall-f1/The three metrics every classifier interview asks about. Their definitions, when to optimize which, and the F-beta generalization.Sat, 20 Sep 2025 00:00:00 GMTconceptsLinear regressionhttps://mlmentorship.com/blog/2025-09-18-linear-regression/https://mlmentorship.com/blog/2025-09-18-linear-regression/Predict a continuous target as a linear combination of features by minimizing squared error. Closed-form solution, MLE under Gaussian noise, and the foundation everything else builds on.Thu, 18 Sep 2025 00:00:00 GMTconceptsAdam, AdamW, and the modern optimizer landscapehttps://mlmentorship.com/blog/2025-09-15-adam-and-adamw/https://mlmentorship.com/blog/2025-09-15-adam-and-adamw/Why Adam works, why AdamW is the version you actually want, and what's changed in the optimizer landscape since 2018.Mon, 15 Sep 2025 00:00:00 GMTconceptsCNN architecturehttps://mlmentorship.com/blog/2025-09-15-cnn-architecture/https://mlmentorship.com/blog/2025-09-15-cnn-architecture/Convolutions encode translation equivariance and locality. The structural inductive bias that powered the deep learning revolution in vision.Mon, 15 Sep 2025 00:00:00 GMTconceptsMatrices as linear mapshttps://mlmentorship.com/blog/2025-09-15-matrices-as-linear-maps/https://mlmentorship.com/blog/2025-09-15-matrices-as-linear-maps/A matrix is a linear function from one vector space to another. Every operation in ML. Projection, rotation, basis change, gradient flow. Is matrix multiplication.Mon, 15 Sep 2025 00:00:00 GMTconceptsSVD and PCAhttps://mlmentorship.com/blog/2025-09-09-svd-and-pca/https://mlmentorship.com/blog/2025-09-09-svd-and-pca/The singular value decomposition factorizes any matrix into rotation × stretching × rotation. PCA is SVD applied to mean-centered data.Tue, 09 Sep 2025 00:00:00 GMTconceptsEncoder-decoder architectureshttps://mlmentorship.com/blog/2025-09-06-encoder-decoder-architectures/https://mlmentorship.com/blog/2025-09-06-encoder-decoder-architectures/An encoder summarizes the input into a representation; a decoder generates the output conditioned on it. The structure behind translation, T5, summarization, and many multimodal models.Sat, 06 Sep 2025 00:00:00 GMTconceptsCross-entropy and softmaxhttps://mlmentorship.com/blog/2025-09-05-cross-entropy-softmax/https://mlmentorship.com/blog/2025-09-05-cross-entropy-softmax/The pairing isn't arbitrary. Cross-entropy is the negative log-likelihood under a categorical distribution, and the softmax+CE gradient simplifies to (p − y), which is why it's stable.Fri, 05 Sep 2025 00:00:00 GMTconceptsRAG: retrieval-augmented generationhttps://mlmentorship.com/blog/2025-09-02-rag-overview/https://mlmentorship.com/blog/2025-09-02-rag-overview/The standard pattern for grounding LLMs in your own data. Reference page; the full essay is linked at the bottom.Tue, 02 Sep 2025 00:00:00 GMTconceptsWSD and WSD-S learning rate scheduleshttps://mlmentorship.com/blog/2025-08-31-wsd-and-wsd-s/https://mlmentorship.com/blog/2025-08-31-wsd-and-wsd-s/Warmup-Stable-Decay holds the LR flat for most of training and decays at the end. WSD-S adds cyclic decay-and-rewarm probes. Both are designed for pretraining where you don't know the total token budget upfront.Sun, 31 Aug 2025 00:00:00 GMTconceptsTwo-tower retrievalhttps://mlmentorship.com/blog/2025-08-23-two-tower-retrieval/https://mlmentorship.com/blog/2025-08-23-two-tower-retrieval/Encode queries and items with separate networks into a shared embedding space; retrieve by approximate nearest neighbors. The default architecture for industrial recommenders and search.Sat, 23 Aug 2025 00:00:00 GMTconceptsEigenvalues and the spectral theoremhttps://mlmentorship.com/blog/2025-08-21-eigenvalues-and-spectral-theorem/https://mlmentorship.com/blog/2025-08-21-eigenvalues-and-spectral-theorem/Eigenvectors are directions a matrix only stretches. The spectral theorem says symmetric matrices have a full orthogonal eigenbasis with real eigenvalues.Thu, 21 Aug 2025 00:00:00 GMTconceptsQuantization: INT8, INT4, FP8, and the inference cost picturehttps://mlmentorship.com/blog/2025-08-21-quantization/https://mlmentorship.com/blog/2025-08-21-quantization/Reduce model precision to shrink memory and speed up inference. The trade-offs are real but increasingly small with modern techniques.Thu, 21 Aug 2025 00:00:00 GMTconceptsExpected Calibration Error (ECE)https://mlmentorship.com/blog/2025-08-18-expected-calibration-error/https://mlmentorship.com/blog/2025-08-18-expected-calibration-error/How well do predicted probabilities match empirical frequencies? Bin predictions by confidence, compare bin-mean confidence to bin-accuracy.Mon, 18 Aug 2025 00:00:00 GMTconceptsGradient clippinghttps://mlmentorship.com/blog/2025-08-17-gradient-clipping/https://mlmentorship.com/blog/2025-08-17-gradient-clipping/Cap the norm of the gradient before each optimizer step. The simplest and most reliable defense against training instability.Sun, 17 Aug 2025 00:00:00 GMTconceptsImplement KNN efficientlyhttps://mlmentorship.com/blog/2025-08-17-implement-knn/https://mlmentorship.com/blog/2025-08-17-implement-knn/The naive solution is one line. The interview is about scaling: when does naive fail, and what do you do?Sun, 17 Aug 2025 00:00:00 GMTquestionsPagedAttention and the vLLM serving modelhttps://mlmentorship.com/blog/2025-08-13-paged-attention/https://mlmentorship.com/blog/2025-08-13-paged-attention/Treat the KV cache like virtual memory: allocate in fixed-size pages, share pages across sequences, eliminate fragmentation. The reason vLLM is the default LLM server.Wed, 13 Aug 2025 00:00:00 GMTconceptsWalk me through the bias-variance tradeoffhttps://mlmentorship.com/blog/2025-07-27-bias-variance-tradeoff/https://mlmentorship.com/blog/2025-07-27-bias-variance-tradeoff/The classic warm-up question. The L4 answer is the formula; the L6 answer is what it tells you about model selection in production.Sun, 27 Jul 2025 00:00:00 GMTquestionsTell me about a time you disagreed with someone seniorhttps://mlmentorship.com/blog/2025-07-15-disagreed-with-senior/https://mlmentorship.com/blog/2025-07-15-disagreed-with-senior/The standard behavioral question. The interviewer is checking whether you can hold technical positions, push back productively, and update on new information.Tue, 15 Jul 2025 00:00:00 GMTquestionsFine-tuning vs prompting: the deep versionhttps://mlmentorship.com/blog/2025-06-27-fine-tuning-deep/https://mlmentorship.com/blog/2025-06-27-fine-tuning-deep/Past the basic decision tree. The senior answer covers SFT, LoRA, DPO, continued pretraining, and the operational trade-offs each introduces.Fri, 27 Jun 2025 00:00:00 GMTquestionsTwo-tower vs cross-encoder: when to use which?https://mlmentorship.com/blog/2025-06-24-two-tower-vs-cross-encoder/https://mlmentorship.com/blog/2025-06-24-two-tower-vs-cross-encoder/The recsys / search architecture decision that comes up in every retrieval interview. The right answer is 'both, in sequence.'Tue, 24 Jun 2025 00:00:00 GMTquestionsHow would you build evals for a coding assistant?https://mlmentorship.com/blog/2025-06-07-evals-for-coding-assistant/https://mlmentorship.com/blog/2025-06-07-evals-for-coding-assistant/Code is one of the few LLM domains where ground truth is verifiable. Use that. The senior answer combines verifiable metrics with human review for what verification can't catch.Sat, 07 Jun 2025 00:00:00 GMTquestionsDesign a content moderation systemhttps://mlmentorship.com/blog/2025-05-29-content-moderation/https://mlmentorship.com/blog/2025-05-29-content-moderation/Moderation is a multi-policy classification problem at scale, with appeals, human review, and adversarial users. The senior answer separates policy from model and treats human review as part of the system.Thu, 29 May 2025 00:00:00 GMTquestionsWhen would you not use cross-validation?https://mlmentorship.com/blog/2025-05-25-when-not-cross-validation/https://mlmentorship.com/blog/2025-05-25-when-not-cross-validation/Cross-validation is a tool, not a default. The senior answer names the cases where it's wrong, expensive, or misleading.Sun, 25 May 2025 00:00:00 GMTquestionsWhen would you fine-tune vs prompt vs RAG?https://mlmentorship.com/blog/2025-05-21-fine-tune-vs-prompt-vs-rag/https://mlmentorship.com/blog/2025-05-21-fine-tune-vs-prompt-vs-rag/The most-asked LLM design question of 2026. The answer is a decision tree, not a preference.Wed, 21 May 2025 00:00:00 GMTquestionsHow would you debug a model that's not learning?https://mlmentorship.com/blog/2025-05-14-debug-model-not-learning/https://mlmentorship.com/blog/2025-05-14-debug-model-not-learning/The 'tell me how you'd debug' question is a behavioral round in disguise. The interviewer is probing your debugging instinct, not testing facts.Wed, 14 May 2025 00:00:00 GMTquestionsTell me about your most ambitious projecthttps://mlmentorship.com/blog/2025-05-11-most-ambitious-project/https://mlmentorship.com/blog/2025-05-11-most-ambitious-project/The interview is checking the size of problem you can hold in your head and the structure of how you describe it. Specificity wins.Sun, 11 May 2025 00:00:00 GMTquestionsHow do you handle hallucinations in production?https://mlmentorship.com/blog/2025-04-25-handle-hallucinations-in-production/https://mlmentorship.com/blog/2025-04-25-handle-hallucinations-in-production/There is no single solution. The senior answer is a layered system that catches different hallucination types at different stages.Fri, 25 Apr 2025 00:00:00 GMTquestionsBuild an LLM coding assistant from scratchhttps://mlmentorship.com/blog/2025-04-12-build-llm-coding-assistant/https://mlmentorship.com/blog/2025-04-12-build-llm-coding-assistant/The architecture decision space is large: model choice, context retrieval, IDE integration, evals. The senior answer scopes the use case before any of it.Sat, 12 Apr 2025 00:00:00 GMTquestionsHow would you evaluate a search ranker?https://mlmentorship.com/blog/2025-04-09-evaluate-search-ranker/https://mlmentorship.com/blog/2025-04-09-evaluate-search-ranker/Search ranking eval is offline metrics for development, A/B for shipping, and human raters for absolute calibration. The senior answer uses all three and respects what each measures.Wed, 09 Apr 2025 00:00:00 GMTquestionsDesign ML monitoringhttps://mlmentorship.com/blog/2025-03-17-design-ml-monitoring/https://mlmentorship.com/blog/2025-03-17-design-ml-monitoring/Most ML systems fail silently. Monitoring is what tells you. The senior answer monitors data, model, and outcome layers separately.Mon, 17 Mar 2025 00:00:00 GMTquestionsWhy does dropout work?https://mlmentorship.com/blog/2025-03-16-why-does-dropout-work/https://mlmentorship.com/blog/2025-03-16-why-does-dropout-work/The trick is that there are three valid explanations and they all matter. Which ones you reach for tells the interviewer your level.Sun, 16 Mar 2025 00:00:00 GMTquestions