ML

## MAE | RMSE | MAPE : Measures of model accuracy for data scientists

Mean Absolute Error (MAE) This simply takes the difference between the predicted value and the actual value for every prediction and takes an average of the result. However, to avoid values cancelling one another out, it takes the absolute value (which means, it makes all the values positive). Let’s consider an example. In the below, […]

ML

## Using ROC Curves & AUC

This is a snippet from my upcoming book ‘Data Badass’ (pictured below): The ROC Curve & the Area Under Curve (AUC) is used for binary classification problems. The ROC curve chart looks at the True Positive Rate vs the False Positive Rate. Ideally, you want to reduce the number of false positives as much as […]

ML

## Can we successfully implement Agile in data science?

Agile is about iterative development and delivering tangible products/features quickly, which provides the business with value and ROI faster than a traditional waterfall project. Consider the example of a piece of accounting software. Overall, it’s going to have 50 features to support the accounts team. To deliver all of the features in a waterfall fashion, […]

ML

## The Data Scientist Statistics Learning Plan For 2021

As data scientists, we need to be comfortable with mathematics. If you Google what you need to know, you’ll find answers stating you need to fully understand linear algebra; calculus and how to calculate all of the algorthms we use by hand. I’m not going to downplay the importance of understanding how the algorithm works, […]

ML

## Boost Your Random Forest Machine Learning Model Accuracy With Gradient Boosted Machines

In a traditional random forest, there is parallel learning. In the below, we can see that each model samples data from the overall dataset and produces a model from it. It does this in parallel and independently – no model has any influence on any other model. Gradient boosting seeks to improve on a weak […]

ML

## My Three Favourite Supervised Regression Machine Learning Model Options

In this article, we’re going to cover what models you could use to predict continuous numeric values. You have a number of model choices – let’s discuss three: Linear regression Random forest regressor  Gradient Boosting Tree These are big enough topics to have their own articles and indeed, they shall. But for now, let’s give […]

ML

## How Do We Go About Data Exploration As Data Engineers?

In the previous article, I highlighted the below phases as being part of the ML workflow. Data cleansing, formatting Data exploration Feature engineering and feature selection Initial machine learning model implementation Comparison of different models Hyper parameter tuning on the best model Evaluation of the model accuracy on testing data set Understanding the model In […]