Understanding Recall, Precision, and Accuracy: Key Metrics in Machine Learning

Rajat Patyal
Mar 2, 2025
2 min read

When evaluating a machine learning model, particularly in classification tasks, understanding metrics such as Recall, Precision, and Accuracy is essential. These metrics help determine how well a model is performing and guide decisions on optimizing it for specific use cases.

In this blog, we will explore the differences between these metrics, their real-world use cases, and the impact of True Positives (TP), False Positives (FP), False Negatives (FN), and True Negatives (TN) on model performance.

1. Defining the Key Metrics

Accuracy

Accuracy is the simplest and most commonly used metric, calculated as:

Accuracy=TP+TNTP+FP+FN+TNAccuracy = \frac{TP + TN}{TP + FP + FN + TN}

It measures the percentage of correctly classified instances out of the total instances. However, accuracy can be misleading in imbalanced datasets. For instance, if a model predicts 99 negatives and 1 positive in a dataset with 99 negative cases and 1 positive case, an accuracy of 99% seems high, but the model is useless if the positive instance is critical.

Precision

Precision focuses on the correctness of positive predictions and is calculated as:

Precision=TPTP+FPPrecision = \frac{TP}{TP + FP}

Precision is crucial in scenarios where false positives (FP) are costly. A high precision means fewer incorrect positive predictions.

Recall (Sensitivity)

Recall measures the ability to correctly identify actual positives and is calculated as:

Recall=TPTP+FNRecall = \frac{TP}{TP + FN}

It is critical in cases where false negatives (FN) are costly, ensuring that most actual positives are detected.

F1-Score

Since precision and recall often have a trade-off, the F1-score provides a balance:

F1−score=2×Precision×RecallPrecision+RecallF1-score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

A high F1-score indicates a good balance between precision and recall.

2. Impact of TP, FP, FN, and TN

True Positives (TP): Correctly predicted positive cases. High TP values indicate a good model.
False Positives (FP): Incorrectly predicted positive cases. High FP rates reduce precision.
False Negatives (FN): Missed actual positives. High FN rates reduce recall.
True Negatives (TN): Correctly predicted negative cases. High TN improves overall accuracy but does not necessarily improve recall or precision.

3. Use Cases and Trade-offs

Use Case 1: Medical Diagnosis (High Recall Priority)

Example: Detecting cancer from medical images.
Why Recall Matters? Missing a cancer case (FN) is more dangerous than incorrectly identifying a healthy person as having cancer (FP).
The model should prioritize recall to minimize FN, even at the cost of some FP.

Use Case 2: Spam Detection (High Precision Priority)

Example: Email spam filtering.
Why Precision Matters? Marking an important email as spam (FP) can cause users to miss crucial emails.
The model should prioritize precision, reducing FP, even if some spam emails (FN) pass through.

Use Case 3: Fraud Detection (Balanced F1-Score Priority)

Example: Credit card fraud detection.
Why Balance Matters? Both FP (blocking legitimate transactions) and FN (missing fraud) have severe consequences.
The model should optimize the F1-score to balance precision and recall.

4. Conclusion

Choosing the right metric depends on the problem you are solving. Accuracy is not always the best measure, especially in imbalanced datasets. Recall and precision often have a trade-off, and selecting the right balance ensures optimal model performance. Understanding the impact of TP, FP, FN, and TN helps tailor models to specific needs, whether it be medical diagnosis, spam detection, or fraud prevention.