ML

Using Association Rule Mining To Determine How Likely Is Y, Given X.

Association rule mining is an unsupervised learning algorithm which finds patterns in our data, whereby we know how likely it is that Y occurs in the event of X. In other words, it finds features (or dimensions) which appear together frequently. Note: just because someone buys burgers with ketchup, does not mean someone that buys […]

Read more
ML

Exploring Data Using Seaborn And Pandas As A Data Scientist

Here we have a dataset from Kaggle; all data within this dataset are females aged 21 and above. The task is to take that data and build a classification model which will determine, given some information about a patient, whether they are diabetic. In this post, we’re going to ingest and explore the dataset and […]

Read more
ML

How Does The Naive Bayes Machine Learning Algoritm Work?

A Naive Bayes algorithm is a classifier. It takes into account the probability of each feature occuring and determines the overall probability of the target class (outcome). From that, it takes the highest probabiltiy and returns that as its prediction. The reason it’s naive is, it acts as though features don’t depend on one another […]

Read more
ML

What On Earth Is The Support Vector Machines (SVM) ML Model?

Support vector machines are supervised classification models, within which each observation (or data point) is plotted on an Ndimensional array, where N is the number of features. If we had height and weight, we would have a 2 dimensional graph. And if we were trying to classify those people as average or overweight, we would […]

Read more
ML

My Three Favourite Supervised Regression Machine Learning Model Options

In this article, we’re going to cover what models you could use to predict continuous numeric values. You have a number of model choices – let’s discuss three: Linear regression Random forest regressor  Gradient Boosting Tree These are big enough topics to have their own articles and indeed, they shall. But for now, let’s give […]

Read more
ML

Three Methods For Feature Engineering and Selection For Data Engineers

As with the previous sections in this series, there is a little overlap – but not a huge amount. The techniques we’re going to discuss are related to feature engineering and feature selection. Binning data Binning is a really useful technique. It’s a way to convert continuous variables into discrete variables by bucketing them in […]

Read more