In the last article, we started to look at the implemented algorithm and looked at the model score along with the confusion matrix. In this article, I am going to cover off a high level overview of some of the methods we could use to evaluate our model performance.
Classification Accuracy
Classification Accuracy is the first approach we took. It’s a simple method to determine the accuracy of a model. It’s simply: count of correct predictions / total predictions. This particular approach is great if we have a reasonable split in our outcomes. However, if we had a churn dataset, where 95 percent of users in the dataset churned and 5 percent did not; we may have a problem. Imagine if our model was correct for all of the ‘YES’ class but 0% accurate for the no class. Your model would show a very high accuracy rating, but it would be very inaccurate. In medical or financial cases where misclassification is critical or even life threatening, we need to use other methods to validate our model.
Confusion Matrix
A confusion matrix is of course one way to look a little deeper into the problem. As we discussed in the previous article, it can give us a better view of when the algorithm is correct and when it is incorrect. In our diabetes example, we can see that the majority of misclassifications come from incorrectly predicting that people do not have diabetes, when in fact they do.
Has Diabetes | Does Not Have Diabetes | |
Has Diabetes | 92 | 5 |
Does Not Have Diabetes | 31 | 26 |
ROC Curves
The ROC Curve & the Area Under Curve (AUC) is used for binary classification problems. This chart looks at the True Positive Rate vs the False Positive Rate.
The True Positive Rate (also known as sensitivity) is the proportion of positive points which were correctly determined as positive. So, that’s the True Positive / True Positive + False Negative. False Negative points SHOULD have been predicted as Postivie, so the total number of positive points in the dataset is the sum of true positive and false negative.
The False Positive rate (also known as specificity) is exactly the opposite. It’s the proportion of negative points which were correctly deterimined as negative (True Negative / True Negative + False Positive).
Okay so what is a ROC curve? Well, each time we change the threshold of a binary classifier, we plot a point on a graph, which produces the ROC curve. This curve summarizes the confusion matrices for all of the thresholds we evaluate.
We can pllot multiple ROC curves on a graph for different models (e.g. Random Forest vs Logistic Regression); whichever has highest area under the curve wins

Let’s look at the below for some more detail. Here, with the current threshold we’ve chosen, we have the confusion matrix of [[4,2][0,2]]. This therefore means, the sensitivity is 1 and the specificity is 0.5. We can plot that mark on the graph, denoted by the black X.
Ideally, you want to reduce the number of false positives as much as possible. So, the closest point to the very top left of the graph is the best model parameter to choose.

Mean Absolute Error (MAE)
The Mean Absolute Error (MAE) is super simple. It is the average of the error rate (i.e. the difference between the correct values and the values predicted by the model).
In the next article, we’ll start looking at implementing a ROC curve on our dataset and we will start tuning our model.
MAE is an example of a loss/cost function. The loss function works across a single datapoint while a cost function summarizes the accuracy of the entire input dataset.