Confusion Matrix Use case in the evaluation of Cyberattacks

Riyadaga
4 min readJun 5, 2021

--

Overview

Cyberattacks are becoming increasingly sophisticated, necessitating the efficient intrusion detection mechanisms to monitor computer resources and generate reports on anomalous or suspicious activities.

Zero-day intrusion detection is a serious challenge as hundreds of thousands of new intrusions are detected every day and the damage caused by these intrusions is becoming increasingly harmful and could result in compromising business continuity.

Intrusion Detection Systems (IDSs) could be software or hardware systems capable of identifying any such malicious activities in computer systems. The goal of intrusion detection systems is to monitor the computer system to detect abnormal behavior, which could not be detected by a conventional packet filter.

Many IDSs use a single classifier for identifying intrusions. The basic approach is to use machine learning to create a model of trustworthy activity, and then compare new behavior against this model.

The evaluation of the classifier is done based on the metric called confusion matrix. You must be wondering, What is Confusion matrix? let me tell you it’s not confusing as it’s name suggests.

Confusion Matrix

A confusion matrix is a matrix that represents the result of classification. It represents true and false classification results and visualizes the accuracy of a classifier by comparing the actual and predicted classes.

True positive (TP): Intrusions that are successfully detected by the IDS.

False-positive ( FP): Normal/non-intrusive behavior that is wrongly classified as intrusive by the IDS.

True Negative (TN): Normal/non-intrusive behavior that is successfully labeled as normal/non-intrusive by the IDS.

False Negative ( FN): Intrusions that are missed by the IDS, and classified as normal/non-intrusive.

Types of Error:

False Positive is a Type I error because False Positive = False True and that only has one F. False Negative is a Type II error because False Negative = False False so thus there are two F’s making it a Type II. (Kudos to Riley Dallas for this method!)

Other Important Terms using a Confusion matrix:

We can compute the accuracy-test from the confusion matrix:

Precision is accuracy on cases predicted to be positive.

Recall also called Sensitivity, Probability of Detection, True Positive Rate.The ratio of correct positive predictions to the total positives examples.

F-score: We often want a model to have high precision and high recall. A good combination of both of these metrics is the F-score, which is the harmonic mean, and is therefore high when both precision and recall are high.

Roc Curve: Roc curve shows the true positive rates against the false positive rate at various cut points. It also demonstrates a trade-off between sensitivity (recall and specificity or the true negative rate).

A typical ROC curve

Why you need Confusion matrix?

Here are pros/benefits of using a confusion matrix.

  • It shows how any classification model is confused when it makes predictions.
  • Confusion matrix not only gives you insight into the errors being made by your classifier but also types of errors that are being made.
  • This breakdown helps you to overcomes the limitation of using classification accuracy alone.
  • Every column of the confusion matrix represents the instances of that predicted class.
  • Each row of the confusion matrix represents the instances of the actual class.

Thanks for reading :)

--

--