ML

MAE | RMSE | MAPE : Measures of model accuracy for data scientists

Mean Absolute Error (MAE) This simply takes the difference between the predicted value and the actual value for every prediction and takes an average of the result. However, to avoid values cancelling one another out, it takes the absolute value (which means, it makes all the values positive). Let’s consider an example. In the below, […]

Read more
ML

Using ROC Curves & AUC

This is a snippet from my upcoming book ‘Data Badass’ (pictured below): The ROC Curve & the Area Under Curve (AUC) is used for binary classification problems. The ROC curve chart looks at the True Positive Rate vs the False Positive Rate. Ideally, you want to reduce the number of false positives as much as […]

Read more
ML

Can we successfully implement Agile in data science?

Agile is about iterative development and delivering tangible products/features quickly, which provides the business with value and ROI faster than a traditional waterfall project. Consider the example of a piece of accounting software. Overall, it’s going to have 50 features to support the accounts team. To deliver all of the features in a waterfall fashion, […]

Read more
ML

The Data Scientist Statistics Learning Plan For 2021

As data scientists, we need to be comfortable with mathematics. If you Google what you need to know, you’ll find answers stating you need to fully understand linear algebra; calculus and how to calculate all of the algorthms we use by hand. I’m not going to downplay the importance of understanding how the algorithm works, […]

Read more
ML

My Three Favourite Supervised Regression Machine Learning Model Options

In this article, we’re going to cover what models you could use to predict continuous numeric values. You have a number of model choices – let’s discuss three: Linear regression Random forest regressor  Gradient Boosting Tree These are big enough topics to have their own articles and indeed, they shall. But for now, let’s give […]

Read more
ML

How Do We Go About Data Exploration As Data Engineers?

In the previous article, I highlighted the below phases as being part of the ML workflow. Data cleansing, formatting Data exploration Feature engineering and feature selection Initial machine learning model implementation Comparison of different models Hyper parameter tuning on the best model Evaluation of the model accuracy on testing data set Understanding the model In […]

Read more
ML

How Do Data Scientists Carry Out Data Cleaning?

When we’re working through a data science problem, there really are a few main steps which we need to take. These are outlined below: Data cleansing, formatting Data exploration Feature engineering and feature selection Initial machine learning model implementation Comparison of different models Hyper parameter tuning on the best model Evaluation of the model accuracy […]

Read more