Skip to content
mentorship

concepts

ROC, PR curves, and AUC

What ROC-AUC and PR-AUC measure, when to use which, and why ROC-AUC is misleading on heavy class imbalance.

Reviewed · 3 min read

One-line definition

For a binary classifier producing scores, the ROC curve plots true positive rate (TPR) vs. false positive rate (FPR) as the decision threshold sweeps from to . The PR curve plots precision vs. recall over the same sweep. The area under each curve (AUC) is a single-number summary independent of any threshold choice.

Why it matters

A confusion matrix at one threshold tells you about that threshold; the curves and their AUCs summarize the classifier’s separability across all thresholds. Picking the wrong curve for your imbalance regime gives you a misleading optimism (ROC-AUC near 1 on heavy imbalance is common but uninformative).

The four base counts

Predicted positivePredicted negative
Actually positiveTPFN
Actually negativeFPTN

From these:

ROC curve and ROC-AUC

  • X-axis: FPR (false alarm rate among true negatives).
  • Y-axis: TPR (recall on positives).
  • Random classifier: diagonal from (0,0) to (1,1), AUC = 0.5.
  • Perfect classifier: top-left corner, AUC = 1.0.

Probabilistic interpretation: ROC-AUC is the probability that a uniformly random positive example scores higher than a uniformly random negative example.

ROC-AUC is invariant to class balance because TPR and FPR are normalized within each class. Sounds great. But this is also its failure mode. On heavy imbalance (1% positives) a model that’s “good at not flagging the 99% obvious negatives” gets a high ROC-AUC even when its top-1000 predictions are mostly wrong.

PR curve and PR-AUC

  • X-axis: Recall.
  • Y-axis: Precision.
  • Random classifier: horizontal line at (the positive class prior). AUC = .
  • Perfect classifier: top-right corner.

Probabilistic interpretation: PR-AUC is the average precision over recall thresholds.

PR-AUC depends explicitly on class balance. That’s why it is more honest than ROC-AUC under imbalance. Halving the positive prior halves the random-baseline PR-AUC.

Which to report

SettingDefault metric
Balanced or near-balanced classificationROC-AUC
Imbalanced (positives < 5%)PR-AUC, plus precision@k
Search / retrieval / recsysPR-AUC, recall@k, NDCG
Fraud / anomaly detectionPR-AUC, precision@high-recall point
Medical screeningPR-AUC + sensitivity at fixed specificity

For interview answers: if you give ROC-AUC alone for a fraud problem (1% positives), expect a follow-up.

Numerical example

A model on a 1%-positive dataset:

  • Random baseline: ROC-AUC = 0.50, PR-AUC = 0.01.
  • “Good” model: ROC-AUC = 0.98, PR-AUC = 0.30.

The ROC-AUC of 0.98 is impressive-looking but ordinary; PR-AUC of 0.30 is the honest score. Always report both when imbalance is significant.

Relationship between the two

  • Concavity / shape: ROC is monotonically non-decreasing; PR can be jagged and non-monotonic.
  • Pareto-equivalent for ranking: a model dominates another in ROC iff it dominates in PR (Davis & Goadrich, 2006).
  • AUCs are not directly comparable across the two curves. Same model has very different ROC-AUC and PR-AUC values.

Common pitfalls

  • Reporting ROC-AUC on a 99/1 dataset and calling 0.95 “great.” It’s likely not.
  • Comparing PR-AUC across datasets with different positive priors. They live on different baselines.
  • Treating high AUC as a deployment-ready signal. AUC measures separability across all thresholds; production needs one threshold, picked from a confusion matrix at the operating point.
  • Confusing ROC-AUC with “accuracy.” They are different quantities; accuracy is threshold-dependent, AUC is not.