P4-metric - Wikiwand

The P₄ metric ^[1]^[2] (also known as FS or Symmetric F ^[3]) enables performance evaluation of a binary classifier. The P₄ metric is calculated from precision, recall, specificity, and NPV (negative predictive value). The definition of the P₄ metric is similar to that of the F₁ metric, however the P₄ metric definition addresses criticisms leveled against the definition of the F₁ metric. The definition of the P₄ metric may, therefore, be understood as an extension of the F₁ metric.

Like the other known metrics, the P₄ metric is a function of: TP (true positives), TN (true negatives), FP (false positives), FN (false negatives).

Remove ads

Justification

Summarize

Perspective

The key concept of the P₄ metric is to leverage the four key conditional probabilities:

$P(+\mid C{+})$ — the probability that the sample is positive, provided the classifier result was positive.
$P(C{+}\mid +)$ — the probability that the classifier result will be positive, provided the sample is positive.
$P(C{-}\mid -)$ — the probability that the classifier result will be negative, provided the sample is negative.
$P(-\mid C{-})$ — the probability the sample is negative, provided the classifier result was negative.

The main assumption behind this metric is that all the probabilities mentioned above are close to 1 for a properly designed binary classifier. Indeed, $\mathrm {P} _{4}=1$ if, and only if, all of the probabilities above are equal to 1. Another important feature is that $\mathrm {P} _{4}$ tends to zero any of the above probabilities tend to zero.

Remove ads

Definition

Summarize

Perspective

P₄ is defined as a harmonic mean of four key conditional probabilities:

\mathrm {P} _{4}={\frac {4}{{\frac {1}{P(+\mid C{+})}}+{\frac {1}{P(C{+}\mid +)}}+{\frac {1}{P(C{-}\mid -)}}+{\frac {1}{P(-\mid C{-})}}}}={\frac {4}{{\frac {1}{\mathit {precision}}}+{\frac {1}{\mathit {recall}}}+{\frac {1}{\mathit {specificity}}}+{\frac {1}{\mathit {NPV}}}}}.

In terms of TP,TN,FP,FN it can be calculated as follows:

\mathrm {P} _{4}={\frac {4\cdot \mathrm {TP} \cdot \mathrm {TN} }{4\cdot \mathrm {TP} \cdot \mathrm {TN} +(\mathrm {TP} +\mathrm {TN} )\cdot (\mathrm {FP} +\mathrm {FN} )}}.

Remove ads

Evaluation of the binary classifier performance

Summarize

Perspective

Evaluating the performance of binary classifiers is a multidisciplinary concept. It spans from the evaluation of medical tests, psychiatric tests to machine learning classifiers from a variety of fields. Thus, many of the metrics in use exist under several names, some defined independently.

		Predicted condition		^Sources:^[4]^[5]^[6]^[7]^[8]^[9]^[10]^[11] ^{view talk edit}
	Total population $= P + N$	Predicted positive	Predicted negative	Informedness, bookmaker informedness (BM) $= TPR + TNR - 1$	Prevalence threshold (PT) $= .mw-parser-output .sfrac{white-space:nowrap}.mw-parser-output .sfrac.tion,.mw-parser-output .sfrac .tion{display:inline-block;vertical-align:-0.5em;font-size:85%;text-align:center}.mw-parser-output .sfrac .num{display:block;line-height:1em;margin:0.0em 0.1em;border-bottom:1px solid}.mw-parser-output .sfrac .den{display:block;line-height:1em;margin:0.1em 0.1em}.mw-parser-output .sr-only{border:0;clip:rect(0,0,0,0);clip-path:polygon(0px 0px,0px 0px,0px 0px);height:1px;margin:-1px;overflow:hidden;padding:0;position:absolute;width:1px}⁠√TPR × FPR − FPR/TPR − FPR⁠$
Actual condition	Real Positive (P) ^[a]	True positive (TP), hit^[b]	False negative (FN), miss, underestimation	True positive rate (TPR), recall, sensitivity (SEN), probability of detection, hit rate, power $= ⁠ TP / P ⁠$ $= 1 - FNR$	False negative rate (FNR), miss rate type II error ^[c] $= ⁠ FN / P ⁠$ $= 1 - TPR$
Actual condition	Real Negative (N)^[d]	False positive (FP), false alarm, overestimation	True negative (TN), correct rejection^[e]	False positive rate (FPR), probability of false alarm, fall-out type I error ^[f] $= ⁠ FP / N ⁠$ $= 1 - TNR$	True negative rate (TNR), specificity (SPC), selectivity $= ⁠ TN / N ⁠$ $= 1 - FPR$
	Prevalence $= ⁠ P / P + N ⁠$	Positive predictive value (PPV), precision $= ⁠ TP / TP + FP ⁠$ $= 1 - FDR$	False omission rate (FOR) $= ⁠ FN / TN + FN ⁠$ $= 1 - NPV$	Positive likelihood ratio (LR+) $= ⁠ TPR / FPR ⁠$	Negative likelihood ratio (LR−) $= ⁠ FNR / TNR ⁠$
	Accuracy (ACC) $= ⁠ TP + TN / P + N ⁠$	False discovery rate (FDR) $= ⁠ FP / TP + FP ⁠$ $= 1 - PPV$	Negative predictive value (NPV) $= ⁠ TN / TN + FN ⁠$ $= 1 - FOR$	Markedness (MK), deltaP (Δp) $= PPV + NPV - 1$	Diagnostic odds ratio (DOR) $= ⁠ LR+ / LR- ⁠$
	Balanced accuracy (BA) $= ⁠ TPR + TNR / 2 ⁠$	F₁ score $= ⁠ 2 PPV \times TPR / PPV + TPR ⁠$ $= ⁠ 2 TP / 2 TP + FP + FN ⁠$	Fowlkes–Mallows index (FM) $= \sqrt PPV \times TPR$	phi or Matthews correlation coefficient (MCC) $= \sqrt TPR \times TNR \times PPV \times NPV$ $- \sqrt FNR \times FPR \times FOR \times FDR$	Threat score (TS), critical success index (CSI), Jaccard index $= ⁠ TP / TP + FN + FP ⁠$

[a]
the number of real positive cases in the data
[b]
A test result that correctly indicates the presence of a condition or characteristic
[c]
Type II error: A test result which wrongly indicates that a particular condition or attribute is absent
[d]
the number of real negative cases in the data
[e]
A test result that correctly indicates the absence of a condition or characteristic
[f]
Type I error: A test result which wrongly indicates that a particular condition or attribute is present

Remove ads

Properties of P4 metric

Symmetry — contrasting to the F₁ metric, P₄ is symmetrical. It means - it does not change its value when dataset labeling is changed - positives named negatives and negatives named positives.
Range: $\mathrm {P} _{4}\in [0,1]$ .
Achieving $\mathrm {P} _{4}\approx 1$ requires all the key four conditional probabilities being close to 1.
For $\mathrm {P} _{4}\approx 0$ it is sufficient that one of the key four conditional probabilities is close to 0.

Remove ads

Examples, comparing with the other metrics

Summarize

Perspective

Dependency table for selected metrics ("true" means depends, "false" - does not depend):

More information

...

	$P(+\mid C{+})$	$P(C{+}\mid +)$	$P(C{-}\mid -)$	$P(-\mid C{-})$
P₄	true	true	true	true
F₁	true	true	false	false
Informedness	false	true	true	false
Markedness	true	false	false	true

Metrics that do not depend on a given probability are prone to misrepresentation when the probability approaches 0.

Example 1: Rare disease detection test

Let us consider a medical test used to detect a rare disease. Suppose a population size of 100000 and 0.05% of the population is infected. Further suppose the following test performance: 95% of all positive individuals are classified correctly (TPR=0.95) and 95% of all negative individuals are classified correctly (TNR=0.95). In such a case, due to high population imbalance and in spite of having high test accuracy (0.95), the probability that an individual who has been classified as positive is in fact positive is very low:

P(+\mid C{+})=0.0095.

We can observe how this low probability is reflected in some of the metrics:

$\mathrm {P} _{4}=0.0370$ ,
$\mathrm {F} _{1}=0.0188$ ,
$\mathrm {J} =\mathbf {0.9100}$ (Informedness / Youden index),
$\mathrm {MK} =0.0095$ (Markedness).

Example 2: Image recognition — cats vs dogs

Consider the problem of training a neural network based image classifier with only two types of images: those containing dogs (labeled as 0) and those containing cats (labeled as 1). Thus, the goal is to distinguish between the cats and dogs. Suppose that the classifier overpredicts in favour of cats ("positive" samples): 99.99% of cats are classified correctly and only 1% of dogs are classified correctly. Further, suppose that the image dataset consists of 100000 images, 90% of which are pictures of cats and 10% are pictures of dogs. In this situation, the probability that the picture containing dog will be classified correctly is pretty low: