ML

Exploring Data Using Seaborn And Pandas As A Data Scientist

Here we have a dataset from Kaggle; all data within this dataset are females aged 21 and above. The task is to take that data and build a classification model which will determine, given some information about a patient, whether they are diabetic. In this post, we’re going to ingest and explore the dataset and […]

Read more
ML

How Does The Naive Bayes Machine Learning Algoritm Work?

A Naive Bayes algorithm is a classifier. It takes into account the probability of each feature occuring and determines the overall probability of the target class (outcome). From that, it takes the highest probabiltiy and returns that as its prediction. The reason it’s naive is, it acts as though features don’t depend on one another […]

Read more
ML

What On Earth Is The Support Vector Machines (SVM) ML Model?

Support vector machines are supervised classification models, within which each observation (or data point) is plotted on an Ndimensional array, where N is the number of features. If we had height and weight, we would have a 2 dimensional graph. And if we were trying to classify those people as average or overweight, we would […]

Read more
ML

My Three Favourite Supervised Regression Machine Learning Model Options

In this article, we’re going to cover what models you could use to predict continuous numeric values. You have a number of model choices – let’s discuss three: Linear regression Random forest regressor  Gradient Boosting Tree These are big enough topics to have their own articles and indeed, they shall. But for now, let’s give […]

Read more
ML

Three Methods For Feature Engineering and Selection For Data Engineers

As with the previous sections in this series, there is a little overlap – but not a huge amount. The techniques we’re going to discuss are related to feature engineering and feature selection. Binning data Binning is a really useful technique. It’s a way to convert continuous variables into discrete variables by bucketing them in […]

Read more
ML

How Do We Go About Data Exploration As Data Engineers?

In the previous article, I highlighted the below phases as being part of the ML workflow. Data cleansing, formatting Data exploration Feature engineering and feature selection Initial machine learning model implementation Comparison of different models Hyper parameter tuning on the best model Evaluation of the model accuracy on testing data set Understanding the model In […]

Read more
ML

How Do Data Scientists Carry Out Data Cleaning?

When we’re working through a data science problem, there really are a few main steps which we need to take. These are outlined below: Data cleansing, formatting Data exploration Feature engineering and feature selection Initial machine learning model implementation Comparison of different models Hyper parameter tuning on the best model Evaluation of the model accuracy […]

Read more
ML

Outlier and Anomaly Dection Using Isolation Forest For Data Scientists

Detecting outliers in highly dimensional data is hard. There are so many observations across a large number of dimensions; so plotting it is often not possible and if you can plot it, interpreting it with the naked eye is extremely challenging. Before we get into how we may identify the outliers using an isolation forest, […]

Read more
Python

Calculating Distance Between Two Geo Points In Python

As you may be aware, I am a Python tutor online and quite often I get asked pretty specific questions. This week, I was asked to show a simple way to find the distance in kilometres between two geographic locations. So, here it is then – the easiest way to approach this problem is to […]

Read more
Python

A UK Postcode Validation Script In Python

The below script takes the input of a UK postcode and ensures that it matches a valid format. I have handled the below formats: X11XX XX11XX XX111XX To do this, I use some regular expressions. Let’s look at one: “^[a-zA-Z]{1}[0-9]{2}[a-zA-Z]{2}”. Here: The beginning of the string (denoted with a ^) needs to be a letter. […]

Read more