<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>mlmentorship</title><description>Senior ML &amp; AI interview prep. Essays, interview questions with leveled answers (L4/L5/L6), reference notes, system design case studies. Free.</description><link>https://mlmentorship.com/</link><language>en</language><item><title>Actor-critic methods</title><link>https://mlmentorship.com/blog/2026-05-07-actor-critic-methods/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-actor-critic-methods/</guid><description>Policy gradient with a learned value baseline. The actor picks actions; the critic estimates how good they were. The architecture under PPO, A3C, SAC, and most modern RL.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Advantage estimation and GAE</title><link>https://mlmentorship.com/blog/2026-05-07-advantage-estimation-and-gae/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-advantage-estimation-and-gae/</guid><description>Policy gradients need a low-variance estimate of how much better an action was than average. GAE is the standard answer: an exponentially weighted blend of n-step returns.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Anchor boxes and non-maximum suppression</title><link>https://mlmentorship.com/blog/2026-05-07-anchor-boxes-and-nms/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-anchor-boxes-and-nms/</guid><description>Object detectors predict thousands of overlapping boxes. Anchors give each prediction a prior shape; NMS prunes near-duplicates. The pre-DETR pipeline that defined the field for a decade.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Alternating least squares for collaborative filtering</title><link>https://mlmentorship.com/blog/2026-05-07-alternating-least-squares/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-alternating-least-squares/</guid><description>Factorize the user-item matrix into two low-rank factors. Each is a linear regression given the other, so alternate. The classical recsys workhorse before deep learning.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Approximate nearest neighbors: HNSW, IVF, and product quantization</title><link>https://mlmentorship.com/blog/2026-05-07-approximate-nearest-neighbors/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-approximate-nearest-neighbors/</guid><description>Exact k-NN over a billion vectors is infeasible. ANN trades a small recall hit for a 100x to 10,000x speedup. The reason vector search at scale exists.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>BERT and masked language modeling</title><link>https://mlmentorship.com/blog/2026-05-07-bert-and-masked-language-modeling/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-bert-and-masked-language-modeling/</guid><description>Train a transformer to fill in randomly masked tokens. The result is a bidirectional encoder that broke a dozen NLP benchmarks at once and defined the pretrain-then-finetune era.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Convolution as matrix multiplication (im2col)</title><link>https://mlmentorship.com/blog/2026-05-07-convolution-as-matmul/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-convolution-as-matmul/</guid><description>A 2D convolution is a matmul in disguise. Unfold the input into columns, multiply by a flattened filter matrix. The reason CNNs run fast on the same hardware as transformers.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Decoding strategies: greedy, beam, top-k, top-p, temperature</title><link>https://mlmentorship.com/blog/2026-05-07-decoding-strategies/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-decoding-strategies/</guid><description>Same model, different samplers, very different outputs. The choice of decoder is often more impactful than the last percent of training. Know the tradeoffs.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Designing a RAG system that actually works</title><link>https://mlmentorship.com/blog/2026-05-07-designing-rag-that-works/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-designing-rag-that-works/</guid><description>RAG fails most often at retrieval, not generation. A practitioner&apos;s guide to the architecture, the failure modes, and what production teams actually do in 2026.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>guides</category></item><item><title>Epistemic vs aleatoric uncertainty</title><link>https://mlmentorship.com/blog/2026-05-07-epistemic-vs-aleatoric-uncertainty/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-epistemic-vs-aleatoric-uncertainty/</guid><description>Epistemic uncertainty shrinks with more data; aleatoric does not. Conflating them produces miscalibrated systems and wasted data collection. The distinction every senior ML engineer should be able to articulate.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Factorization machines</title><link>https://mlmentorship.com/blog/2026-05-07-factorization-machines/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-factorization-machines/</guid><description>Linear models can&apos;t capture feature interactions. Polynomial models have too many parameters. Factorization machines find a middle path: factorize the interaction matrix and learn an embedding per feature.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Exploration vs exploitation: epsilon-greedy, UCB, Thompson sampling</title><link>https://mlmentorship.com/blog/2026-05-07-exploration-vs-exploitation/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-exploration-vs-exploitation/</guid><description>An RL or bandit agent has to keep trying new actions to learn while taking the best-known action to score. Three classical strategies, each with a different way of resolving the tension.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Gaussian processes</title><link>https://mlmentorship.com/blog/2026-05-07-gaussian-processes/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-gaussian-processes/</guid><description>A distribution over functions defined entirely by a covariance kernel. Predicts both a mean and a calibrated uncertainty. Beautiful theory, brutal scaling.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Forward-backward and Viterbi: dynamic programming on chains</title><link>https://mlmentorship.com/blog/2026-05-07-forward-backward-and-viterbi/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-forward-backward-and-viterbi/</guid><description>Sum and max over exponentially many paths in linear time. Forward-backward computes posteriors over hidden states; Viterbi finds the most likely state sequence. The same idea, two semirings.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Graph neural networks: message passing as A·X·W</title><link>https://mlmentorship.com/blog/2026-05-07-graph-neural-networks/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-graph-neural-networks/</guid><description>Neighbors carry signal. A graph neural network averages each node&apos;s neighborhood and projects with a learned matrix. The same matmul as a CNN, on irregular structure.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Kernel methods and the kernel trick</title><link>https://mlmentorship.com/blog/2026-05-07-kernel-methods-and-the-kernel-trick/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-kernel-methods-and-the-kernel-trick/</guid><description>Compute inner products in a high-dimensional feature space without ever materializing the features. The mathematical move that lets a linear classifier draw nonlinear boundaries.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Knowledge distillation</title><link>https://mlmentorship.com/blog/2026-05-07-knowledge-distillation/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-knowledge-distillation/</guid><description>Train a small student to match a large teacher&apos;s outputs. The student gets richer signal than from hard labels because the teacher&apos;s soft probabilities encode similarity structure.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Multi-head attention: why one head is not enough</title><link>https://mlmentorship.com/blog/2026-05-07-multi-head-attention/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-multi-head-attention/</guid><description>Run h independent attention computations in parallel, then concatenate. Each head specializes in a different relation. The mechanism most senior candidates can write but few can motivate.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Pruning: structured vs unstructured sparsity</title><link>https://mlmentorship.com/blog/2026-05-07-pruning/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-pruning/</guid><description>Set unimportant weights to zero, recover most of the accuracy. Unstructured pruning shrinks model size; structured pruning shrinks inference time. They solve different problems.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>LSTM and GRU: gating as Hadamard products</title><link>https://mlmentorship.com/blog/2026-05-07-lstm-and-gru/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-lstm-and-gru/</guid><description>Recurrent networks fail because gradients vanish through repeated matmul. Gates fix this by using elementwise multiplication to control information flow. Then transformers replaced them anyway.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Self-attention vs cross-attention</title><link>https://mlmentorship.com/blog/2026-05-07-self-attention-vs-cross-attention/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-self-attention-vs-cross-attention/</guid><description>Same operation, different inputs. Self-attention reads from one sequence; cross-attention reads from another. The distinction every encoder-decoder architecture rests on.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>t-SNE and UMAP: nonlinear dimensionality reduction</title><link>https://mlmentorship.com/blog/2026-05-07-tsne-and-umap/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-tsne-and-umap/</guid><description>Both project high-dimensional data to 2D for visualization by preserving local neighborhoods. Both are easy to misread. Know what they show and what they hide.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Word embeddings: Word2Vec, GloVe, and the geometry of meaning</title><link>https://mlmentorship.com/blog/2026-05-07-word-embeddings/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-07-word-embeddings/</guid><description>Map words to dense vectors so that similar words land near each other. The breakthrough that proved meaning lives in geometry, not symbols.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>LLM Evals: The hardest part of shipping LLMs, and why most teams get it wrong</title><link>https://mlmentorship.com/blog/2026-05-05-llm-evals-the-hardest-part/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-05-llm-evals-the-hardest-part/</guid><description>Your model is only as good as your eval. Your eval is a product. Treat it like one. The patterns that separate teams that ship from teams that thrash.</description><pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate><category>guides</category></item><item><title>Weight initialization (Kaiming, Xavier)</title><link>https://mlmentorship.com/blog/2026-05-05-weight-initialization/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-05-weight-initialization/</guid><description>Set the initial variance of each layer&apos;s weights so that activations and gradients neither explode nor vanish through depth. The single most impactful one-line decision in deep nets.</description><pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Walk me through speculative decoding</title><link>https://mlmentorship.com/blog/2026-05-04-walk-through-speculative-decoding/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-04-walk-through-speculative-decoding/</guid><description>The interview signal is whether you understand why decoding is memory-bound and why the verify pass is essentially free.</description><pubDate>Mon, 04 May 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>How do you A/B test a chatbot?</title><link>https://mlmentorship.com/blog/2026-05-01-ab-test-chatbot/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-01-ab-test-chatbot/</guid><description>Chatbot A/B testing has all the hard parts of regular A/B testing plus delayed feedback, conversational state, and metrics that are hard to define.</description><pubDate>Fri, 01 May 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Activation checkpointing</title><link>https://mlmentorship.com/blog/2026-05-01-activation-checkpointing/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-01-activation-checkpointing/</guid><description>Trade compute for memory: drop activations during the forward pass and recompute them during the backward pass. The cheapest way to fit a larger model on the same GPU.</description><pubDate>Fri, 01 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Proximal Policy Optimization (PPO)</title><link>https://mlmentorship.com/blog/2026-05-01-ppo/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-05-01-ppo/</guid><description>Constrain policy updates with a clipped surrogate objective. The default actor-critic algorithm in 2026. For robotics, games, and RLHF.</description><pubDate>Fri, 01 May 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Why does Adam sometimes generalize worse than SGD?</title><link>https://mlmentorship.com/blog/2026-04-29-adam-vs-sgd-generalization/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-29-adam-vs-sgd-generalization/</guid><description>Adam usually trains faster but in some settings finds sharper minima with worse generalization. The senior answer names the regimes where this happens and the modern fixes.</description><pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Design real-time personalization</title><link>https://mlmentorship.com/blog/2026-04-28-real-time-personalization/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-28-real-time-personalization/</guid><description>Real-time personalization fails most often at the data infrastructure, not the model. The senior answer designs the feature freshness and serving stack first.</description><pubDate>Tue, 28 Apr 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>GPU memory hierarchy: HBM, SRAM, and why I/O matters more than FLOPs</title><link>https://mlmentorship.com/blog/2026-04-27-gpu-memory-hierarchy/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-27-gpu-memory-hierarchy/</guid><description>Modern GPUs are memory-bound for almost everything except big matmuls. Understanding HBM vs. SRAM bandwidth is the prerequisite for FlashAttention, KV-cache reasoning, and inference cost models.</description><pubDate>Mon, 27 Apr 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Why is softmax + cross-entropy the right pairing?</title><link>https://mlmentorship.com/blog/2026-04-26-softmax-cross-entropy-pairing/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-26-softmax-cross-entropy-pairing/</guid><description>The gradient simplifies to (p - y), and that&apos;s not a coincidence. The senior answer derives this and connects to GLMs and numerical stability.</description><pubDate>Sun, 26 Apr 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>What L5 vs L6 actually means at FAANG ML</title><link>https://mlmentorship.com/blog/2026-04-23-l5-vs-l6-faang-ml/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-23-l5-vs-l6-faang-ml/</guid><description>Level lines are mostly invisible from the outside but sharp on the inside. A practical calibration of L4 through L7 in ML / Applied Scientist tracks.</description><pubDate>Thu, 23 Apr 2026 00:00:00 GMT</pubDate><category>guides</category></item><item><title>How do you scope an ambiguous problem?</title><link>https://mlmentorship.com/blog/2026-04-23-scope-ambiguous-problem/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-23-scope-ambiguous-problem/</guid><description>Scoping is the single most important senior skill. The interview tests whether you have a process, not just a definition.</description><pubDate>Thu, 23 Apr 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Continuous batching for LLM serving</title><link>https://mlmentorship.com/blog/2026-04-22-continuous-batching/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-22-continuous-batching/</guid><description>Let new requests join an in-flight batch at every decode step instead of waiting for the slowest one. The other half of why vLLM is fast.</description><pubDate>Wed, 22 Apr 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Value-based vs. policy-based RL</title><link>https://mlmentorship.com/blog/2026-04-22-value-vs-policy-rl/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-22-value-vs-policy-rl/</guid><description>Two paradigms in reinforcement learning. Value-based learns Q(s, a) and acts greedily; policy-based directly parametrizes the policy. When to use which.</description><pubDate>Wed, 22 Apr 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Gaussian mixture models</title><link>https://mlmentorship.com/blog/2026-04-20-gaussian-mixture-models/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-20-gaussian-mixture-models/</guid><description>Model data as a weighted sum of K Gaussians. Soft clustering, density estimation, and the canonical EM example.</description><pubDate>Mon, 20 Apr 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Decision trees</title><link>https://mlmentorship.com/blog/2026-04-19-decision-trees/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-19-decision-trees/</guid><description>Recursively split the feature space along axis-aligned thresholds chosen to maximize a purity criterion. The base learner of GBDT and random forests.</description><pubDate>Sun, 19 Apr 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>How to think about LLM inference cost</title><link>https://mlmentorship.com/blog/2026-04-18-llm-inference-cost/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-18-llm-inference-cost/</guid><description>Most teams calculate inference cost by multiplying token price by token count. The actual cost structure has five layers and most of the optimization wins are in the bottom four.</description><pubDate>Sat, 18 Apr 2026 00:00:00 GMT</pubDate><category>guides</category></item><item><title>SGD with momentum</title><link>https://mlmentorship.com/blog/2026-04-18-sgd-with-momentum/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-18-sgd-with-momentum/</guid><description>Add a moving average of past gradients to the update. Smoother trajectories, faster convergence in narrow valleys, and the foundation of Adam&apos;s first moment.</description><pubDate>Sat, 18 Apr 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Logistic regression</title><link>https://mlmentorship.com/blog/2026-04-17-logistic-regression/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-17-logistic-regression/</guid><description>Linear regression for binary classification: pass a linear combination through a sigmoid, train by maximum likelihood. Still the strongest non-trivial baseline for tabular classification.</description><pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Negative sampling strategies: what actually matters</title><link>https://mlmentorship.com/blog/2026-04-17-negative-sampling-strategies/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-17-negative-sampling-strategies/</guid><description>Choice of negatives often matters more than choice of model. The senior answer ranks the strategies (in-batch, hard, BM25-mined, model-mined) and explains the trade-offs.</description><pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Hidden Markov models</title><link>https://mlmentorship.com/blog/2026-04-16-hidden-markov-models/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-16-hidden-markov-models/</guid><description>A latent Markov chain emits observations through a per-state distribution. Forward-backward, Viterbi, Baum-Welch. The classical sequence model toolkit.</description><pubDate>Thu, 16 Apr 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>System design case study: building personalized search ranking</title><link>https://mlmentorship.com/blog/2026-04-14-personalized-search-ranking/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-14-personalized-search-ranking/</guid><description>An end-to-end design of a personalized search ranking system at scale, from problem framing through deployment and monitoring. The same template works for most ML system design interviews.</description><pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate><category>guides</category><category>system-design</category></item><item><title>Explain backprop in your own words</title><link>https://mlmentorship.com/blog/2026-04-13-explain-backprop/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-13-explain-backprop/</guid><description>The textbook answer is the chain rule. The senior answer is what backprop is doing as a system: a reverse-mode auto-diff pass that reuses intermediate computations to get all gradients in one extra forward-cost pass.</description><pubDate>Mon, 13 Apr 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Z-loss</title><link>https://mlmentorship.com/blog/2026-04-12-z-loss/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-12-z-loss/</guid><description>An auxiliary loss term that penalizes the squared log-partition function of the softmax. Started as a stability hack for logit blowup. Now used as the default regularizer on logit scale during long or deep cooldowns.</description><pubDate>Sun, 12 Apr 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>FlashAttention</title><link>https://mlmentorship.com/blog/2026-04-10-flashattention/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-10-flashattention/</guid><description>I/O-aware exact attention. Replaces the O(n²) HBM traffic with a tiled streaming softmax in SRAM. The single most important kernel-level optimization in modern transformers.</description><pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Pipeline parallelism</title><link>https://mlmentorship.com/blog/2026-04-06-pipeline-parallelism/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-06-pipeline-parallelism/</guid><description>Split the model across GPUs by layer; pipeline mini-batches through the stages. The way to scale across slow interconnects when TP isn&apos;t viable.</description><pubDate>Mon, 06 Apr 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>KV cache: how LLM inference avoids quadratic decode cost</title><link>https://mlmentorship.com/blog/2026-04-04-kv-cache/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-04-kv-cache/</guid><description>The single most important optimization in autoregressive decoding. Without it, generating 1000 tokens would cost O(1000²) attention operations.</description><pubDate>Sat, 04 Apr 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Naive Bayes</title><link>https://mlmentorship.com/blog/2026-04-04-naive-bayes/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-04-naive-bayes/</guid><description>A trivially simple generative classifier that assumes features are conditionally independent given the class. Fast, parameter-light, surprisingly hard to beat on text.</description><pubDate>Sat, 04 Apr 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Autoregressive vs. diffusion generation</title><link>https://mlmentorship.com/blog/2026-04-02-autoregressive-vs-diffusion/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-04-02-autoregressive-vs-diffusion/</guid><description>Two paradigms for generative modeling: predict the next element step-by-step (autoregressive) or iteratively denoise from pure noise (diffusion). Different costs, different strengths.</description><pubDate>Thu, 02 Apr 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Object detection: Faster R-CNN, YOLO, DETR</title><link>https://mlmentorship.com/blog/2026-03-29-object-detection-overview/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-03-29-object-detection-overview/</guid><description>Localize and classify objects in an image. The three main architectural families: two-stage proposal-based, one-stage grid-based, and transformer-based.</description><pubDate>Sun, 29 Mar 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Mixture of Experts (MoE)</title><link>https://mlmentorship.com/blog/2026-03-27-mixture-of-experts/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-03-27-mixture-of-experts/</guid><description>Replace one large feed-forward block with N smaller experts and a router that activates only k of them per token. Trades parameter count for compute.</description><pubDate>Fri, 27 Mar 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>DBSCAN</title><link>https://mlmentorship.com/blog/2026-03-25-dbscan/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-03-25-dbscan/</guid><description>Density-based clustering: form clusters from regions of high point density, label sparse points as noise. Handles arbitrary cluster shapes; no k to specify.</description><pubDate>Wed, 25 Mar 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>ROC, PR curves, and AUC</title><link>https://mlmentorship.com/blog/2026-03-25-roc-pr-auc/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-03-25-roc-pr-auc/</guid><description>What ROC-AUC and PR-AUC measure, when to use which, and why ROC-AUC is misleading on heavy class imbalance.</description><pubDate>Wed, 25 Mar 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Exploding and vanishing gradients</title><link>https://mlmentorship.com/blog/2026-03-24-exploding-vanishing-gradients/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-03-24-exploding-vanishing-gradients/</guid><description>Why deep networks were untrainable before residuals, normalization, and ReLU. The math of gradient magnitudes through depth and the standard fixes.</description><pubDate>Tue, 24 Mar 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>How do you choose a learning rate?</title><link>https://mlmentorship.com/blog/2026-03-22-how-to-choose-learning-rate/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-03-22-how-to-choose-learning-rate/</guid><description>The right answer is a procedure, not a number. The wrong answers are &apos;use the default&apos; and &apos;try a few values.&apos;</description><pubDate>Sun, 22 Mar 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>The 5 things every applied scientist interview is actually testing for</title><link>https://mlmentorship.com/blog/2026-03-20-five-things-as-interview-tests/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-03-20-five-things-as-interview-tests/</guid><description>Strip away the questions and the role-specific jargon. Every senior AS loop is checking the same five things. If you know what they are, the prep gets sharper.</description><pubDate>Fri, 20 Mar 2026 00:00:00 GMT</pubDate><category>guides</category></item><item><title>RoPE, ALiBi, and the modern positional encoding landscape</title><link>https://mlmentorship.com/blog/2026-03-15-positional-encoding/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-03-15-positional-encoding/</guid><description>Sinusoidal positional encoding is in the original transformer paper and not in any modern LLM. Here&apos;s what replaced it and why.</description><pubDate>Sun, 15 Mar 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>How do you choose a loss function?</title><link>https://mlmentorship.com/blog/2026-03-09-how-to-choose-loss-function/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-03-09-how-to-choose-loss-function/</guid><description>The loss is the objective. Picking the wrong one means optimizing for the wrong thing, no matter how well you train. The senior answer derives the loss from the problem, not from a list.</description><pubDate>Mon, 09 Mar 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Design a system for safe LLM deployment in healthcare</title><link>https://mlmentorship.com/blog/2026-03-07-llm-deployment-healthcare/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-03-07-llm-deployment-healthcare/</guid><description>Healthcare adds three constraints on top of normal LLM deployment: regulatory compliance, low tolerance for harm, and a workflow that already has clinicians as the final decision-maker.</description><pubDate>Sat, 07 Mar 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Probabilistic graphical models</title><link>https://mlmentorship.com/blog/2026-03-03-graphical-models/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-03-03-graphical-models/</guid><description>Express joint distributions as graphs whose structure encodes conditional independence. Bayesian networks (directed) and Markov random fields (undirected).</description><pubDate>Tue, 03 Mar 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Calibration: when your model says 80% it should be right 80% of the time</title><link>https://mlmentorship.com/blog/2026-03-01-calibration/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-03-01-calibration/</guid><description>Accuracy isn&apos;t enough; you also want predictions to mean what they say. Calibration is the difference.</description><pubDate>Sun, 01 Mar 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Applied Scientist vs MLE vs Research Engineer: what these roles actually do</title><link>https://mlmentorship.com/blog/2026-02-28-as-vs-mle-vs-re/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-28-as-vs-mle-vs-re/</guid><description>The role taxonomy is confusing because companies use the same titles to mean different things. Here&apos;s the actual decomposition, and which one you should target.</description><pubDate>Sat, 28 Feb 2026 00:00:00 GMT</pubDate><category>guides</category></item><item><title>Weight decay vs. L2 regularization</title><link>https://mlmentorship.com/blog/2026-02-27-weight-decay-vs-l2/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-27-weight-decay-vs-l2/</guid><description>L2 adds ½λ‖θ‖² to the loss; weight decay shrinks θ multiplicatively at each step. They are equivalent under SGD but not under Adam. Which is why AdamW exists.</description><pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Label smoothing</title><link>https://mlmentorship.com/blog/2026-02-24-label-smoothing/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-24-label-smoothing/</guid><description>Replace one-hot targets with a softened distribution that puts ε mass on the wrong classes. Improves calibration, sometimes hurts retrieval.</description><pubDate>Tue, 24 Feb 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Matrix calculus for ML</title><link>https://mlmentorship.com/blog/2026-02-23-matrix-calculus/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-23-matrix-calculus/</guid><description>Gradients, Jacobians, and Hessians for vector- and matrix-valued functions. The minimum needed to derive backprop and second-order methods.</description><pubDate>Mon, 23 Feb 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Embedding spaces and similarity metrics</title><link>https://mlmentorship.com/blog/2026-02-22-embedding-spaces-and-similarity/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-22-embedding-spaces-and-similarity/</guid><description>How learned vector representations encode meaning, and why cosine similarity is the default metric for retrieval and recsys.</description><pubDate>Sun, 22 Feb 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Q-learning</title><link>https://mlmentorship.com/blog/2026-02-22-q-learning/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-22-q-learning/</guid><description>Learn the action-value function Q(s, a) by Bellman backups. The foundation of value-based RL. DQN, Rainbow, and the original Atari breakthroughs.</description><pubDate>Sun, 22 Feb 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Gradient boosting (xgboost, lightgbm, catboost)</title><link>https://mlmentorship.com/blog/2026-02-21-gradient-boosting/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-21-gradient-boosting/</guid><description>Train trees sequentially, each one fitting the gradient of the loss with respect to the current ensemble&apos;s prediction. The dominant tabular learner in 2026.</description><pubDate>Sat, 21 Feb 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Explain backprop through time</title><link>https://mlmentorship.com/blog/2026-02-16-bptt-backprop-through-time/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-16-bptt-backprop-through-time/</guid><description>BPTT is just backprop on the unrolled computation graph of a recurrent network. The interview signal is whether you understand truncation and what it costs.</description><pubDate>Mon, 16 Feb 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Matrix factorization for recsys</title><link>https://mlmentorship.com/blog/2026-02-16-matrix-factorization-recsys/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-16-matrix-factorization-recsys/</guid><description>Decompose the user-item interaction matrix into user and item embeddings whose dot product approximates the rating. The original collaborative filtering.</description><pubDate>Mon, 16 Feb 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Design Amazon&apos;s people also bought</title><link>https://mlmentorship.com/blog/2026-02-13-people-also-bought/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-13-people-also-bought/</guid><description>A simple-sounding feature with deep recsys ground underneath. The senior answer chooses between item-item collaborative filtering, embedding similarity, and learned co-purchase models, with explicit handling of feedback loops.</description><pubDate>Fri, 13 Feb 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Speculative decoding</title><link>https://mlmentorship.com/blog/2026-02-13-speculative-decoding/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-13-speculative-decoding/</guid><description>Break the autoregressive serial bottleneck without changing the output distribution. 2-3× inference speedup, free.</description><pubDate>Fri, 13 Feb 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Gradient accumulation</title><link>https://mlmentorship.com/blog/2026-02-09-gradient-accumulation/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-09-gradient-accumulation/</guid><description>Run several forward-backward passes before each optimizer step to simulate a larger effective batch size without the memory cost.</description><pubDate>Mon, 09 Feb 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Sparse attention (BigBird, Longformer)</title><link>https://mlmentorship.com/blog/2026-02-08-sparse-attention/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-08-sparse-attention/</guid><description>Replace the dense n×n attention mask with a sparse pattern that has O(n) non-zeros while preserving information flow across the full sequence.</description><pubDate>Sun, 08 Feb 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>How would you reduce LLM inference cost by 10x?</title><link>https://mlmentorship.com/blog/2026-02-05-reduce-llm-inference-cost-10x/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-05-reduce-llm-inference-cost-10x/</guid><description>The cost-engineering question. The L6 answer doesn&apos;t pick a technique, it diagnoses where the cost is, then picks five.</description><pubDate>Thu, 05 Feb 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Bayesian vs frequentist: a practitioner&apos;s framing</title><link>https://mlmentorship.com/blog/2026-02-02-bayesian-vs-frequentist/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-02-bayesian-vs-frequentist/</guid><description>The textbook distinction is philosophical. The practitioner distinction is whether you can sample from a posterior cheaply, and whether you need uncertainty for downstream decisions.</description><pubDate>Mon, 02 Feb 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Floating-point formats: FP32, FP16, BF16, FP8, TF32</title><link>https://mlmentorship.com/blog/2026-02-01-floating-point-formats/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-02-01-floating-point-formats/</guid><description>How modern accelerators trade precision for speed. The bit layouts of every numeric format that appears in deep learning.</description><pubDate>Sun, 01 Feb 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Regularization: L1, L2, dropout, early stopping, and the modern view</title><link>https://mlmentorship.com/blog/2026-01-30-regularization/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-01-30-regularization/</guid><description>The classical regularizers + the modern reality that SGD&apos;s noise is itself a regularizer. The hierarchy of choices when your model is overfitting.</description><pubDate>Fri, 30 Jan 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Random forests</title><link>https://mlmentorship.com/blog/2026-01-27-random-forests/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-01-27-random-forests/</guid><description>Bag deep decision trees plus random feature subsets per split. Variance averaging beats any single tree; the dominant out-of-the-box ensemble before GBDT.</description><pubDate>Tue, 27 Jan 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Sequence packing with block-diagonal masks</title><link>https://mlmentorship.com/blog/2026-01-25-sequence-packing/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-01-25-sequence-packing/</guid><description>Concatenate multiple short examples into one fixed-length sequence to eliminate padding waste. The single largest throughput win for training on skewed-length corpora.</description><pubDate>Sun, 25 Jan 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Design a feature store from scratch</title><link>https://mlmentorship.com/blog/2026-01-23-design-feature-store/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-01-23-design-feature-store/</guid><description>A feature store solves training-serving skew, feature reuse, and lineage. The senior answer explains why each property matters and what minimum viable looks like.</description><pubDate>Fri, 23 Jan 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Positive (semi-)definite matrices</title><link>https://mlmentorship.com/blog/2026-01-23-positive-definite-matrices/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-01-23-positive-definite-matrices/</guid><description>Matrices that define inner products and proper covariances. The geometry of PSD: ellipsoids, not arbitrary shapes.</description><pubDate>Fri, 23 Jan 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Lessons from Marin 8B: what an open pretraining log actually teaches you</title><link>https://mlmentorship.com/blog/2026-01-21-lessons-from-marin-8b/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-01-21-lessons-from-marin-8b/</guid><description>Marin trained the first open-source 8B model to beat Llama 3.1 8B and published every mistake. The transferable lessons aren&apos;t about TPUs. They&apos;re about how to run pretraining like a science.</description><pubDate>Wed, 21 Jan 2026 00:00:00 GMT</pubDate><category>guides</category></item><item><title>Tokenization: BPE, WordPiece, and the LLM era</title><link>https://mlmentorship.com/blog/2026-01-20-tokenization/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-01-20-tokenization/</guid><description>The critical input layer between text and model. Tokenization mismatch is a frequent source of production LLM bugs.</description><pubDate>Tue, 20 Jan 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Activation functions</title><link>https://mlmentorship.com/blog/2026-01-19-activation-functions/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-01-19-activation-functions/</guid><description>ReLU, GELU, swish, sigmoid, tanh. What each does, why GELU/swish replaced ReLU in transformers, and when to use which.</description><pubDate>Mon, 19 Jan 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Bias and variance of estimators</title><link>https://mlmentorship.com/blog/2026-01-13-bias-variance-of-estimators/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-01-13-bias-variance-of-estimators/</guid><description>An estimator has bias (systematic error) and variance (sample-to-sample wobble). Mean-squared error decomposes into the two.</description><pubDate>Tue, 13 Jan 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>How would you evaluate an LLM application you&apos;ve built?</title><link>https://mlmentorship.com/blog/2026-01-10-how-would-you-evaluate-an-llm-application/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-01-10-how-would-you-evaluate-an-llm-application/</guid><description>A level-defining question. The same words elicit a junior, senior, or staff answer. The rubric below shows the differences.</description><pubDate>Sat, 10 Jan 2026 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Learning rate schedules: warmup and cosine decay</title><link>https://mlmentorship.com/blog/2026-01-05-learning-rate-schedules/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-01-05-learning-rate-schedules/</guid><description>Why almost every modern training run linearly warms up the LR over a few hundred steps and then decays it on a cosine to near zero.</description><pubDate>Mon, 05 Jan 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Generative adversarial networks (GANs)</title><link>https://mlmentorship.com/blog/2026-01-01-gans-overview/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2026-01-01-gans-overview/</guid><description>Two networks compete: a generator produces samples, a discriminator distinguishes them from real data. Sharp samples, training instability, mostly displaced by diffusion in 2026.</description><pubDate>Thu, 01 Jan 2026 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Universal approximation theorem</title><link>https://mlmentorship.com/blog/2025-12-29-universal-approximation-theorem/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-29-universal-approximation-theorem/</guid><description>A neural network with one hidden layer and enough units can approximate any continuous function on a bounded domain. What it does and doesn&apos;t say about deep learning.</description><pubDate>Mon, 29 Dec 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>BatchNorm vs LayerNorm (and the transformer wrinkle)</title><link>https://mlmentorship.com/blog/2025-12-28-batchnorm-vs-layernorm/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-28-batchnorm-vs-layernorm/</guid><description>These look similar and aren&apos;t. Mixing them up in interviews is one of the cheapest ways to lose level points. Here&apos;s the right mental model.</description><pubDate>Sun, 28 Dec 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>RLHF, DPO, and the alignment training stack</title><link>https://mlmentorship.com/blog/2025-12-28-rlhf-and-dpo/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-28-rlhf-and-dpo/</guid><description>How LLMs get from &apos;next-token predictor&apos; to &apos;helpful assistant.&apos; The post-training pipeline in 2026.</description><pubDate>Sun, 28 Dec 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Design fraud detection for a payment company</title><link>https://mlmentorship.com/blog/2025-12-25-design-fraud-detection/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-25-design-fraud-detection/</guid><description>Fraud has the worst data of any ML problem: heavily imbalanced, biased labels, adversarial actors, and direct money on the line. The senior answer respects all four.</description><pubDate>Thu, 25 Dec 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Derive logistic regression from MLE</title><link>https://mlmentorship.com/blog/2025-12-23-derive-logistic-regression/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-23-derive-logistic-regression/</guid><description>Standard math-screen question. The senior signal is whether you can derive it cleanly and connect MLE to cross-entropy.</description><pubDate>Tue, 23 Dec 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Grouped-query and multi-query attention (GQA, MQA)</title><link>https://mlmentorship.com/blog/2025-12-23-grouped-query-attention/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-23-grouped-query-attention/</guid><description>Share K and V heads across query heads to shrink the KV cache 4-8x with negligible quality loss. Standard in modern decoder LLMs.</description><pubDate>Tue, 23 Dec 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Tensor parallelism</title><link>https://mlmentorship.com/blog/2025-12-23-tensor-parallelism/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-23-tensor-parallelism/</guid><description>Split a single matrix multiplication across multiple GPUs. The way to fit one transformer layer that doesn&apos;t fit on a single device.</description><pubDate>Tue, 23 Dec 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>A/B testing for ML systems</title><link>https://mlmentorship.com/blog/2025-12-22-ab-testing-for-ml/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-22-ab-testing-for-ml/</guid><description>The framework for proving a model change actually helps. Statistical power, novelty effects, network effects, all the things people get wrong.</description><pubDate>Mon, 22 Dec 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Central limit theorem</title><link>https://mlmentorship.com/blog/2025-12-21-central-limit-theorem/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-21-central-limit-theorem/</guid><description>Sums of many independent random variables become Gaussian. Why nearly every error bar in ML and statistics is computed from a normal distribution.</description><pubDate>Sun, 21 Dec 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Prefill vs. decode: the two phases of LLM inference</title><link>https://mlmentorship.com/blog/2025-12-14-prefill-vs-decode/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-14-prefill-vs-decode/</guid><description>LLM inference has two cost regimes with very different bottlenecks. Mixing them up leads to wrong cost models and bad serving decisions.</description><pubDate>Sun, 14 Dec 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Design Spotify&apos;s homepage</title><link>https://mlmentorship.com/blog/2025-12-12-design-spotify-homepage/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-12-design-spotify-homepage/</guid><description>A multi-shelf, multi-objective recommendation surface. The senior answer scopes the shelves first, then designs each as its own ranker with a meta-layer above.</description><pubDate>Fri, 12 Dec 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Recsys in the LLM era: what changes?</title><link>https://mlmentorship.com/blog/2025-12-12-recsys-llm-era/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-12-recsys-llm-era/</guid><description>Most of recsys hasn&apos;t changed; LLMs add new capabilities at specific stages. The senior answer names which stages benefit and which don&apos;t.</description><pubDate>Fri, 12 Dec 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Vision transformers (ViT)</title><link>https://mlmentorship.com/blog/2025-12-10-vision-transformers/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-10-vision-transformers/</guid><description>Apply a standard transformer to a sequence of image patches. Beats CNNs at scale; the dominant backbone for foundation vision models in 2026.</description><pubDate>Wed, 10 Dec 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Mixed precision: what&apos;s actually happening?</title><link>https://mlmentorship.com/blog/2025-12-08-mixed-precision-deep/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-08-mixed-precision-deep/</guid><description>Beyond &apos;use BF16&apos;. The senior answer explains what stays in FP32, why loss scaling exists for FP16, and the memory split.</description><pubDate>Mon, 08 Dec 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>What&apos;s the most over-rated technique in ML right now?</title><link>https://mlmentorship.com/blog/2025-12-07-most-overrated-technique/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-07-most-overrated-technique/</guid><description>A trap question that rewards taste. Strong opinions, defended with reasoning, are the senior signal. Weak opinions or &apos;I don&apos;t know&apos; both lose.</description><pubDate>Sun, 07 Dec 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Linear attention (Linformer, Performer, kernel methods)</title><link>https://mlmentorship.com/blog/2025-12-06-linear-attention/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-06-linear-attention/</guid><description>Approximate the softmax attention matrix with a low-rank or kernel factorization so cost is linear in sequence length.</description><pubDate>Sat, 06 Dec 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>L1 vs L2 regularization, beyond the formula</title><link>https://mlmentorship.com/blog/2025-12-04-l1-vs-l2-beyond-formula/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-04-l1-vs-l2-beyond-formula/</guid><description>The math is identical to most candidates: penalty terms in the loss. The senior signal is the Bayesian interpretation, the optimization geometry, and when each is the right choice.</description><pubDate>Thu, 04 Dec 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Exponential family</title><link>https://mlmentorship.com/blog/2025-12-03-exponential-family/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-03-exponential-family/</guid><description>A unified family of distributions (Gaussian, Bernoulli, Poisson, Beta, Gamma, etc.) with shared properties: sufficient statistics, conjugate priors, simple MLE.</description><pubDate>Wed, 03 Dec 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Determinant and volume</title><link>https://mlmentorship.com/blog/2025-12-02-determinant-and-volume/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-02-determinant-and-volume/</guid><description>The determinant of a matrix is the signed volume scaling factor of the linear map. Zero determinant means the map collapses dimensions.</description><pubDate>Tue, 02 Dec 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Cross-validation strategies</title><link>https://mlmentorship.com/blog/2025-12-01-cross-validation-strategies/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-01-cross-validation-strategies/</guid><description>Hold-out, k-fold, stratified, grouped, and time-series CV. And when each one is and isn&apos;t appropriate.</description><pubDate>Mon, 01 Dec 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Debug this training loop</title><link>https://mlmentorship.com/blog/2025-12-01-debug-training-loop/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-12-01-debug-training-loop/</guid><description>A live coding question with a paste of buggy training code. The senior signal is the order in which you find bugs and what your debugging procedure looks like.</description><pubDate>Mon, 01 Dec 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>How would you do cold-start for a new user?</title><link>https://mlmentorship.com/blog/2025-11-30-cold-start-new-user/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-30-cold-start-new-user/</guid><description>Cold-start is solved by combining minimal explicit signal, demographic and contextual fallbacks, and aggressive exploration in the first few sessions.</description><pubDate>Sun, 30 Nov 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Walk me through how you&apos;d train a 100B parameter model</title><link>https://mlmentorship.com/blog/2025-11-28-train-100b-model/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-28-train-100b-model/</guid><description>The question is about parallelism and memory, not about modeling. The L6 answer combines data, tensor, pipeline, and FSDP/ZeRO sharding into a coherent strategy.</description><pubDate>Fri, 28 Nov 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Implement attention from scratch</title><link>https://mlmentorship.com/blog/2025-11-27-implement-attention-from-scratch/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-27-implement-attention-from-scratch/</guid><description>The coding question that doubles as a depth check. The code is short; the conversation around it tells the level.</description><pubDate>Thu, 27 Nov 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Markov chains</title><link>https://mlmentorship.com/blog/2025-11-27-markov-chains/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-27-markov-chains/</guid><description>Stochastic processes where the future depends only on the present, not the past. Foundation of HMMs, MCMC, and many sequence models.</description><pubDate>Thu, 27 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Residual connections</title><link>https://mlmentorship.com/blog/2025-11-27-residual-connections/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-27-residual-connections/</guid><description>Add the input of a block to its output. Lets gradients flow unimpeded through depth and made networks deeper than 30 layers practical for the first time.</description><pubDate>Thu, 27 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Diffusion models</title><link>https://mlmentorship.com/blog/2025-11-22-diffusion-models/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-22-diffusion-models/</guid><description>Learn to invert a fixed noising process. The dominant generative paradigm for images, audio, video, and molecules in 2026.</description><pubDate>Sat, 22 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Variational autoencoders (VAE)</title><link>https://mlmentorship.com/blog/2025-11-22-variational-autoencoders/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-22-variational-autoencoders/</guid><description>Encode inputs to a latent distribution, decode samples back, optimize evidence lower bound. The cleanest gateway to deep generative models.</description><pubDate>Sat, 22 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Policy gradient methods</title><link>https://mlmentorship.com/blog/2025-11-20-policy-gradient/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-20-policy-gradient/</guid><description>Directly optimize the policy by following the gradient of expected return. REINFORCE, actor-critic, and the foundation of modern RL.</description><pubDate>Thu, 20 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>SVM and the kernel trick</title><link>https://mlmentorship.com/blog/2025-11-16-svm-and-kernels/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-16-svm-and-kernels/</guid><description>Maximum-margin classifier with a kernel that lets it operate in implicit high-dimensional feature spaces. Beautiful theory; less common in 2026 production.</description><pubDate>Sun, 16 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Bayes&apos; rule and the posterior</title><link>https://mlmentorship.com/blog/2025-11-15-bayes-rule-and-posterior/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-15-bayes-rule-and-posterior/</guid><description>How to update beliefs given evidence: posterior ∝ likelihood × prior. The foundation of Bayesian inference, naive Bayes, and probabilistic graphical models.</description><pubDate>Sat, 15 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Long-context LLMs: training and serving techniques</title><link>https://mlmentorship.com/blog/2025-11-15-long-context-llms/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-15-long-context-llms/</guid><description>What makes a 1M-token context model work. Position-encoding extension, attention kernels, KV-cache management, and the tradeoffs.</description><pubDate>Sat, 15 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>FSDP and ZeRO: sharding optimizer state, gradients, and parameters</title><link>https://mlmentorship.com/blog/2025-11-14-fsdp-and-zero/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-14-fsdp-and-zero/</guid><description>How modern training scales beyond a single GPU&apos;s memory by partitioning the optimizer state, gradients, and parameters across the data-parallel group.</description><pubDate>Fri, 14 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>How do you evaluate an agent?</title><link>https://mlmentorship.com/blog/2025-11-11-evaluate-an-agent/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-11-evaluate-an-agent/</guid><description>Agent eval is harder than chat eval because there are intermediate steps, tool calls, and long-horizon outcomes. The senior answer evaluates trajectories, not just final outputs.</description><pubDate>Tue, 11 Nov 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Microannealing and midtraining</title><link>https://mlmentorship.com/blog/2025-11-11-microannealing/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-11-microannealing/</guid><description>A short cooldown applied to a mostly-trained checkpoint with a small fraction of candidate data mixed in. The standard mid-training probe for whether a new dataset is worth including.</description><pubDate>Tue, 11 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Expectation-Maximization (EM)</title><link>https://mlmentorship.com/blog/2025-11-09-expectation-maximization/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-09-expectation-maximization/</guid><description>Iterate between estimating latent variables given parameters (E-step) and updating parameters given latents (M-step). The standard tool for latent-variable MLE when the latents are unobserved.</description><pubDate>Sun, 09 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Backpropagation</title><link>https://mlmentorship.com/blog/2025-11-06-backpropagation/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-06-backpropagation/</guid><description>Reverse-mode automatic differentiation applied to a computation graph. The algorithm that computes gradients for every parameter in one backward pass.</description><pubDate>Thu, 06 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>ResNet</title><link>https://mlmentorship.com/blog/2025-11-05-resnet/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-05-resnet/</guid><description>Residual connections enabled networks deeper than 30 layers to train. Still the dominant backbone for transfer learning in 2026.</description><pubDate>Wed, 05 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Transformer architecture: a senior-level mental model</title><link>https://mlmentorship.com/blog/2025-11-05-transformer-architecture/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-05-transformer-architecture/</guid><description>Strip away the diagram clutter. A transformer is a stack of (residual + LayerNorm + (attention or FFN)) blocks. Understanding why each piece is there is more important than memorizing the diagram.</description><pubDate>Wed, 05 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Confusion matrix and classification metrics</title><link>https://mlmentorship.com/blog/2025-11-04-confusion-matrix-and-classification-metrics/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-11-04-confusion-matrix-and-classification-metrics/</guid><description>The 2x2 (or KxK) table of predictions vs. truth that every classification metric is computed from. The Rosetta stone of binary classification.</description><pubDate>Tue, 04 Nov 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Explain the reparameterization trick</title><link>https://mlmentorship.com/blog/2025-10-29-reparameterization-trick/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-10-29-reparameterization-trick/</guid><description>How VAEs propagate gradients through a sampling step. The senior answer explains the why (you can&apos;t differentiate through a sample) and the how (move the randomness outside the parameters).</description><pubDate>Wed, 29 Oct 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>k-means clustering</title><link>https://mlmentorship.com/blog/2025-10-26-k-means-clustering/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-10-26-k-means-clustering/</guid><description>Partition n points into k clusters by minimizing within-cluster variance. Lloyd&apos;s algorithm: alternate assigning points to nearest center and recomputing centers.</description><pubDate>Sun, 26 Oct 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Normalizing flows</title><link>https://mlmentorship.com/blog/2025-10-26-normalizing-flows/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-10-26-normalizing-flows/</guid><description>Generative models built from invertible transformations. Compute exact likelihoods and sample efficiently. At the cost of architectural restrictions.</description><pubDate>Sun, 26 Oct 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>KL divergence</title><link>https://mlmentorship.com/blog/2025-10-23-kl-divergence/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-10-23-kl-divergence/</guid><description>Asymmetric distance between probability distributions. Cross-entropy minus entropy. The mathematical glue holding most of probabilistic ML together.</description><pubDate>Thu, 23 Oct 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Ranking metrics: NDCG, MAP, MRR</title><link>https://mlmentorship.com/blog/2025-10-23-ranking-metrics-ndcg-map-mrr/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-10-23-ranking-metrics-ndcg-map-mrr/</guid><description>Beyond binary precision-recall: how to measure ranking quality when order matters and labels are graded.</description><pubDate>Thu, 23 Oct 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Design YouTube&apos;s recommender</title><link>https://mlmentorship.com/blog/2025-10-20-design-youtube-recommender/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-10-20-design-youtube-recommender/</guid><description>The canonical recsys design question. The real test is whether you&apos;ll dive into model architecture or scope the problem first.</description><pubDate>Mon, 20 Oct 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Mixup and CutMix</title><link>https://mlmentorship.com/blog/2025-10-17-mixup-and-cutmix/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-10-17-mixup-and-cutmix/</guid><description>Two data-augmentation schemes that train on convex combinations of pairs of inputs and their labels. Strong regularization for image classification; sometimes used in audio and tabular.</description><pubDate>Fri, 17 Oct 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>How do you deal with class imbalance in 2026?</title><link>https://mlmentorship.com/blog/2025-10-13-class-imbalance/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-10-13-class-imbalance/</guid><description>Class weighting and SMOTE are the textbook answers and often the wrong ones. The senior answer matches the technique to the imbalance ratio, the cost asymmetry, and the metric you actually care about.</description><pubDate>Mon, 13 Oct 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>The attention mechanism</title><link>https://mlmentorship.com/blog/2025-10-12-attention-mechanism/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-10-12-attention-mechanism/</guid><description>Compute a weighted sum of values, weights derived from query-key similarity. The single operation that powers transformers, retrieval, and most of modern ML.</description><pubDate>Sun, 12 Oct 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>How do you decide what to work on?</title><link>https://mlmentorship.com/blog/2025-10-11-decide-what-to-work-on/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-10-11-decide-what-to-work-on/</guid><description>The senior signal here is that you have an explicit prioritization framework, not just a list of interests. The L6 answer connects user value, technical leverage, and team strategy.</description><pubDate>Sat, 11 Oct 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Perplexity and bits per token</title><link>https://mlmentorship.com/blog/2025-10-02-perplexity-and-bits-per-token/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-10-02-perplexity-and-bits-per-token/</guid><description>The standard intrinsic metric for language models. What it measures, what units to use, and why it&apos;s a poor end-product evaluation.</description><pubDate>Thu, 02 Oct 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Monte Carlo and importance sampling</title><link>https://mlmentorship.com/blog/2025-10-01-monte-carlo-and-importance-sampling/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-10-01-monte-carlo-and-importance-sampling/</guid><description>Estimate expectations by averaging over random samples. The simplest way to compute integrals you can&apos;t compute analytically.</description><pubDate>Wed, 01 Oct 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Dropout</title><link>https://mlmentorship.com/blog/2025-09-30-dropout/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-30-dropout/</guid><description>Randomly zero out a fraction of activations during training. The simplest stochastic regularizer; still standard in vision and many NLP architectures.</description><pubDate>Tue, 30 Sep 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Rotary position embeddings (RoPE)</title><link>https://mlmentorship.com/blog/2025-09-29-rotary-position-embeddings/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-29-rotary-position-embeddings/</guid><description>The dominant position encoding for modern LLMs. Encodes relative position by rotating Q and K in 2D subspaces, enabling clean context extrapolation.</description><pubDate>Mon, 29 Sep 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Design a RAG system for legal documents</title><link>https://mlmentorship.com/blog/2025-09-26-rag-for-legal-docs/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-26-rag-for-legal-docs/</guid><description>Legal RAG amplifies every standard RAG concern: precise citations, no hallucinations, regulated domain, dense documents with structure. The senior answer addresses each.</description><pubDate>Fri, 26 Sep 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Maximum likelihood estimation</title><link>https://mlmentorship.com/blog/2025-09-22-maximum-likelihood-estimation/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-22-maximum-likelihood-estimation/</guid><description>The dominant statistical principle: pick parameters that make the observed data most probable. Reduces to minimizing cross-entropy for classification and MSE for Gaussian regression.</description><pubDate>Mon, 22 Sep 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>All-reduce and other collectives</title><link>https://mlmentorship.com/blog/2025-09-22-all-reduce-and-collectives/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-22-all-reduce-and-collectives/</guid><description>The communication primitives behind every distributed training job. All-reduce, all-gather, reduce-scatter, broadcast. What they do, costs, and when each is used.</description><pubDate>Mon, 22 Sep 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Mixed precision training: FP16, BF16, and FP8</title><link>https://mlmentorship.com/blog/2025-09-21-mixed-precision-training/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-21-mixed-precision-training/</guid><description>How modern transformers train at 2-4× the throughput of FP32 without quality loss. The bit layouts matter; the loss-scaling recipe matters more.</description><pubDate>Sun, 21 Sep 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Precision, recall, and F1</title><link>https://mlmentorship.com/blog/2025-09-20-precision-recall-f1/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-20-precision-recall-f1/</guid><description>The three metrics every classifier interview asks about. Their definitions, when to optimize which, and the F-beta generalization.</description><pubDate>Sat, 20 Sep 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Linear regression</title><link>https://mlmentorship.com/blog/2025-09-18-linear-regression/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-18-linear-regression/</guid><description>Predict a continuous target as a linear combination of features by minimizing squared error. Closed-form solution, MLE under Gaussian noise, and the foundation everything else builds on.</description><pubDate>Thu, 18 Sep 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Adam, AdamW, and the modern optimizer landscape</title><link>https://mlmentorship.com/blog/2025-09-15-adam-and-adamw/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-15-adam-and-adamw/</guid><description>Why Adam works, why AdamW is the version you actually want, and what&apos;s changed in the optimizer landscape since 2018.</description><pubDate>Mon, 15 Sep 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>CNN architecture</title><link>https://mlmentorship.com/blog/2025-09-15-cnn-architecture/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-15-cnn-architecture/</guid><description>Convolutions encode translation equivariance and locality. The structural inductive bias that powered the deep learning revolution in vision.</description><pubDate>Mon, 15 Sep 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Matrices as linear maps</title><link>https://mlmentorship.com/blog/2025-09-15-matrices-as-linear-maps/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-15-matrices-as-linear-maps/</guid><description>A matrix is a linear function from one vector space to another. Every operation in ML. Projection, rotation, basis change, gradient flow. Is matrix multiplication.</description><pubDate>Mon, 15 Sep 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>SVD and PCA</title><link>https://mlmentorship.com/blog/2025-09-09-svd-and-pca/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-09-svd-and-pca/</guid><description>The singular value decomposition factorizes any matrix into rotation × stretching × rotation. PCA is SVD applied to mean-centered data.</description><pubDate>Tue, 09 Sep 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Encoder-decoder architectures</title><link>https://mlmentorship.com/blog/2025-09-06-encoder-decoder-architectures/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-06-encoder-decoder-architectures/</guid><description>An encoder summarizes the input into a representation; a decoder generates the output conditioned on it. The structure behind translation, T5, summarization, and many multimodal models.</description><pubDate>Sat, 06 Sep 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Cross-entropy and softmax</title><link>https://mlmentorship.com/blog/2025-09-05-cross-entropy-softmax/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-05-cross-entropy-softmax/</guid><description>The pairing isn&apos;t arbitrary. Cross-entropy is the negative log-likelihood under a categorical distribution, and the softmax+CE gradient simplifies to (p − y), which is why it&apos;s stable.</description><pubDate>Fri, 05 Sep 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>RAG: retrieval-augmented generation</title><link>https://mlmentorship.com/blog/2025-09-02-rag-overview/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-09-02-rag-overview/</guid><description>The standard pattern for grounding LLMs in your own data. Reference page; the full essay is linked at the bottom.</description><pubDate>Tue, 02 Sep 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>WSD and WSD-S learning rate schedules</title><link>https://mlmentorship.com/blog/2025-08-31-wsd-and-wsd-s/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-08-31-wsd-and-wsd-s/</guid><description>Warmup-Stable-Decay holds the LR flat for most of training and decays at the end. WSD-S adds cyclic decay-and-rewarm probes. Both are designed for pretraining where you don&apos;t know the total token budget upfront.</description><pubDate>Sun, 31 Aug 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Two-tower retrieval</title><link>https://mlmentorship.com/blog/2025-08-23-two-tower-retrieval/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-08-23-two-tower-retrieval/</guid><description>Encode queries and items with separate networks into a shared embedding space; retrieve by approximate nearest neighbors. The default architecture for industrial recommenders and search.</description><pubDate>Sat, 23 Aug 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Eigenvalues and the spectral theorem</title><link>https://mlmentorship.com/blog/2025-08-21-eigenvalues-and-spectral-theorem/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-08-21-eigenvalues-and-spectral-theorem/</guid><description>Eigenvectors are directions a matrix only stretches. The spectral theorem says symmetric matrices have a full orthogonal eigenbasis with real eigenvalues.</description><pubDate>Thu, 21 Aug 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Quantization: INT8, INT4, FP8, and the inference cost picture</title><link>https://mlmentorship.com/blog/2025-08-21-quantization/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-08-21-quantization/</guid><description>Reduce model precision to shrink memory and speed up inference. The trade-offs are real but increasingly small with modern techniques.</description><pubDate>Thu, 21 Aug 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Expected Calibration Error (ECE)</title><link>https://mlmentorship.com/blog/2025-08-18-expected-calibration-error/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-08-18-expected-calibration-error/</guid><description>How well do predicted probabilities match empirical frequencies? Bin predictions by confidence, compare bin-mean confidence to bin-accuracy.</description><pubDate>Mon, 18 Aug 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Gradient clipping</title><link>https://mlmentorship.com/blog/2025-08-17-gradient-clipping/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-08-17-gradient-clipping/</guid><description>Cap the norm of the gradient before each optimizer step. The simplest and most reliable defense against training instability.</description><pubDate>Sun, 17 Aug 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Implement KNN efficiently</title><link>https://mlmentorship.com/blog/2025-08-17-implement-knn/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-08-17-implement-knn/</guid><description>The naive solution is one line. The interview is about scaling: when does naive fail, and what do you do?</description><pubDate>Sun, 17 Aug 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>PagedAttention and the vLLM serving model</title><link>https://mlmentorship.com/blog/2025-08-13-paged-attention/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-08-13-paged-attention/</guid><description>Treat the KV cache like virtual memory: allocate in fixed-size pages, share pages across sequences, eliminate fragmentation. The reason vLLM is the default LLM server.</description><pubDate>Wed, 13 Aug 2025 00:00:00 GMT</pubDate><category>concepts</category></item><item><title>Walk me through the bias-variance tradeoff</title><link>https://mlmentorship.com/blog/2025-07-27-bias-variance-tradeoff/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-07-27-bias-variance-tradeoff/</guid><description>The classic warm-up question. The L4 answer is the formula; the L6 answer is what it tells you about model selection in production.</description><pubDate>Sun, 27 Jul 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Tell me about a time you disagreed with someone senior</title><link>https://mlmentorship.com/blog/2025-07-15-disagreed-with-senior/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-07-15-disagreed-with-senior/</guid><description>The standard behavioral question. The interviewer is checking whether you can hold technical positions, push back productively, and update on new information.</description><pubDate>Tue, 15 Jul 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Fine-tuning vs prompting: the deep version</title><link>https://mlmentorship.com/blog/2025-06-27-fine-tuning-deep/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-06-27-fine-tuning-deep/</guid><description>Past the basic decision tree. The senior answer covers SFT, LoRA, DPO, continued pretraining, and the operational trade-offs each introduces.</description><pubDate>Fri, 27 Jun 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Two-tower vs cross-encoder: when to use which?</title><link>https://mlmentorship.com/blog/2025-06-24-two-tower-vs-cross-encoder/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-06-24-two-tower-vs-cross-encoder/</guid><description>The recsys / search architecture decision that comes up in every retrieval interview. The right answer is &apos;both, in sequence.&apos;</description><pubDate>Tue, 24 Jun 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>How would you build evals for a coding assistant?</title><link>https://mlmentorship.com/blog/2025-06-07-evals-for-coding-assistant/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-06-07-evals-for-coding-assistant/</guid><description>Code is one of the few LLM domains where ground truth is verifiable. Use that. The senior answer combines verifiable metrics with human review for what verification can&apos;t catch.</description><pubDate>Sat, 07 Jun 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Design a content moderation system</title><link>https://mlmentorship.com/blog/2025-05-29-content-moderation/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-05-29-content-moderation/</guid><description>Moderation is a multi-policy classification problem at scale, with appeals, human review, and adversarial users. The senior answer separates policy from model and treats human review as part of the system.</description><pubDate>Thu, 29 May 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>When would you not use cross-validation?</title><link>https://mlmentorship.com/blog/2025-05-25-when-not-cross-validation/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-05-25-when-not-cross-validation/</guid><description>Cross-validation is a tool, not a default. The senior answer names the cases where it&apos;s wrong, expensive, or misleading.</description><pubDate>Sun, 25 May 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>When would you fine-tune vs prompt vs RAG?</title><link>https://mlmentorship.com/blog/2025-05-21-fine-tune-vs-prompt-vs-rag/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-05-21-fine-tune-vs-prompt-vs-rag/</guid><description>The most-asked LLM design question of 2026. The answer is a decision tree, not a preference.</description><pubDate>Wed, 21 May 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>How would you debug a model that&apos;s not learning?</title><link>https://mlmentorship.com/blog/2025-05-14-debug-model-not-learning/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-05-14-debug-model-not-learning/</guid><description>The &apos;tell me how you&apos;d debug&apos; question is a behavioral round in disguise. The interviewer is probing your debugging instinct, not testing facts.</description><pubDate>Wed, 14 May 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Tell me about your most ambitious project</title><link>https://mlmentorship.com/blog/2025-05-11-most-ambitious-project/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-05-11-most-ambitious-project/</guid><description>The interview is checking the size of problem you can hold in your head and the structure of how you describe it. Specificity wins.</description><pubDate>Sun, 11 May 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>How do you handle hallucinations in production?</title><link>https://mlmentorship.com/blog/2025-04-25-handle-hallucinations-in-production/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-04-25-handle-hallucinations-in-production/</guid><description>There is no single solution. The senior answer is a layered system that catches different hallucination types at different stages.</description><pubDate>Fri, 25 Apr 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Build an LLM coding assistant from scratch</title><link>https://mlmentorship.com/blog/2025-04-12-build-llm-coding-assistant/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-04-12-build-llm-coding-assistant/</guid><description>The architecture decision space is large: model choice, context retrieval, IDE integration, evals. The senior answer scopes the use case before any of it.</description><pubDate>Sat, 12 Apr 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>How would you evaluate a search ranker?</title><link>https://mlmentorship.com/blog/2025-04-09-evaluate-search-ranker/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-04-09-evaluate-search-ranker/</guid><description>Search ranking eval is offline metrics for development, A/B for shipping, and human raters for absolute calibration. The senior answer uses all three and respects what each measures.</description><pubDate>Wed, 09 Apr 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Design ML monitoring</title><link>https://mlmentorship.com/blog/2025-03-17-design-ml-monitoring/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-03-17-design-ml-monitoring/</guid><description>Most ML systems fail silently. Monitoring is what tells you. The senior answer monitors data, model, and outcome layers separately.</description><pubDate>Mon, 17 Mar 2025 00:00:00 GMT</pubDate><category>questions</category></item><item><title>Why does dropout work?</title><link>https://mlmentorship.com/blog/2025-03-16-why-does-dropout-work/</link><guid isPermaLink="true">https://mlmentorship.com/blog/2025-03-16-why-does-dropout-work/</guid><description>The trick is that there are three valid explanations and they all matter. Which ones you reach for tells the interviewer your level.</description><pubDate>Sun, 16 Mar 2025 00:00:00 GMT</pubDate><category>questions</category></item></channel></rss>