Specificity measures: of all the actual negatives, what fraction did the model correctly identify as negative?

The denominator is all the negatives — those the model correctly classified (TN) plus those it incorrectly flagged (FP).

Also called the true negative rate (TNR). It’s the mirror image of recall (sensitivity, TPR):

  • Recall (TPR) — fraction of actual positives caught.
  • Specificity (TNR) — fraction of actual negatives correctly identified.

High specificity means the model rarely raises false alarms. The applications that care most about specificity are those where a false alarm is costly:

  • Spam filtering, where a legitimate email going to the spam folder (FP) is more harmful than a few spam messages getting through.
  • Drug testing, where wrongly flagging an innocent person is worse than missing some users.
  • Quality control, where wrongly rejecting good product is more expensive than missing the occasional defect.

The complement of specificity is the False positive rate (FPR):

FPR is what the ROC curve plots on the x-axis (against TPR on the y-axis), so specificity and FPR are essentially the same information in different forms.

The recall-specificity tradeoff

Tuning a classifier’s threshold trades recall against specificity:

  • Lower threshold (more eager to predict positive) → higher recall, lower specificity.
  • Higher threshold (more conservative) → lower recall, higher specificity.

The ROC curve traces out this tradeoff across all thresholds. The right operating point depends on the costs in the application.

In scikit-learn

scikit-learn doesn’t have a standalone specificity_score, but it’s easy to compute from the Confusion matrix:

from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
specificity = tn / (tn + fp)

Or equivalently as the recall of the negative class — pass pos_label=0 to recall_score. The wine-quality classifier from the Introduction to Data Science textbook ends up with recall 0.84 and specificity 0.55 — a real asymmetry, the kind of thing accuracy alone wouldn’t reveal.