ROC, PR curves, and AUC

What ROC-AUC and PR-AUC measure, when to use which, and why ROC-AUC is misleading on heavy class imbalance.

Reviewed March 25, 2026 · 3 min read

One-line definition

For a binary classifier producing scores, the ROC curve plots true positive rate (TPR) vs. false positive rate (FPR) as the decision threshold sweeps from $- \infty$ to $+ \infty$ . The PR curve plots precision vs. recall over the same sweep. The area under each curve (AUC) is a single-number summary independent of any threshold choice.

Why it matters

A confusion matrix at one threshold tells you about that threshold; the curves and their AUCs summarize the classifier’s separability across all thresholds. Picking the wrong curve for your imbalance regime gives you a misleading optimism (ROC-AUC near 1 on heavy imbalance is common but uninformative).

The four base counts

	Predicted positive	Predicted negative
Actually positive	TP	FN
Actually negative	FP	TN

From these:

TPR = Recall = \frac{T P}{T P + F N}, FPR = \frac{F P}{F P + T N}, Precision = \frac{T P}{T P + F P} .

ROC curve and ROC-AUC

X-axis: FPR (false alarm rate among true negatives).
Y-axis: TPR (recall on positives).
Random classifier: diagonal from (0,0) to (1,1), AUC = 0.5.
Perfect classifier: top-left corner, AUC = 1.0.

Probabilistic interpretation: ROC-AUC is the probability that a uniformly random positive example scores higher than a uniformly random negative example.

ROC-AUC is invariant to class balance because TPR and FPR are normalized within each class. Sounds great. But this is also its failure mode. On heavy imbalance (1% positives) a model that’s “good at not flagging the 99% obvious negatives” gets a high ROC-AUC even when its top-1000 predictions are mostly wrong.

PR curve and PR-AUC

X-axis: Recall.
Y-axis: Precision.
Random classifier: horizontal line at $y = π$ (the positive class prior). AUC = $π$ .
Perfect classifier: top-right corner.

Probabilistic interpretation: PR-AUC is the average precision over recall thresholds.

PR-AUC depends explicitly on class balance. That’s why it is more honest than ROC-AUC under imbalance. Halving the positive prior halves the random-baseline PR-AUC.

Which to report

Setting	Default metric
Balanced or near-balanced classification	ROC-AUC
Imbalanced (positives < 5%)	PR-AUC, plus precision@k
Search / retrieval / recsys	PR-AUC, recall@k, NDCG
Fraud / anomaly detection	PR-AUC, precision@high-recall point
Medical screening	PR-AUC + sensitivity at fixed specificity

For interview answers: if you give ROC-AUC alone for a fraud problem (1% positives), expect a follow-up.

Numerical example

A model on a 1%-positive dataset:

Random baseline: ROC-AUC = 0.50, PR-AUC = 0.01.
“Good” model: ROC-AUC = 0.98, PR-AUC = 0.30.

The ROC-AUC of 0.98 is impressive-looking but ordinary; PR-AUC of 0.30 is the honest score. Always report both when imbalance is significant.

Relationship between the two

Concavity / shape: ROC is monotonically non-decreasing; PR can be jagged and non-monotonic.
Pareto-equivalent for ranking: a model dominates another in ROC iff it dominates in PR (Davis & Goadrich, 2006).
AUCs are not directly comparable across the two curves. Same model has very different ROC-AUC and PR-AUC values.

Common pitfalls

Reporting ROC-AUC on a 99/1 dataset and calling 0.95 “great.” It’s likely not.
Comparing PR-AUC across datasets with different positive priors. They live on different baselines.
Treating high AUC as a deployment-ready signal. AUC measures separability across all thresholds; production needs one threshold, picked from a confusion matrix at the operating point.
Confusing ROC-AUC with “accuracy.” They are different quantities; accuracy is threshold-dependent, AUC is not.

Calibration. Predicted probabilities should match empirical frequencies.
Class imbalance. Handling imbalanced training data.