# DATA. IT’S THE FUTURE.

Upskilling in data engineering and data science could be the best thing you can do for your career. The articles on Kodey aim to support you on your journey

## LATEST POSTS

### ML Series: How Likely Is Y, Given X. Association Rule Mining

Association rule mining is an unsupervised learning algorithm which finds patterns in our data, whereby we know how likely it is that Y occurs in the event of X. In other words, it finds features […]

### ML Series: A Random Forest Implementation With An Imbalanced Dataset

For this article, I wanted to demonstrate an end to end model implementation, including tuning for a dataset. When I chose the dataset here, I did not realise that the initial, untuned implementation would achieve […]

### ML Series: Methods For Increasing Model Accuracy

Tuning our machine learning model is of course critical to its accuracy. There are two methods for tuning models: tuning the data and tuning the parameters. We ‘tune’ the data to clean it up, select […]

### ML Series: Accuracy Metrics To Prove Or Improve Our Diabetes Algorithm

In the last article, we started to look at the implemented algorithm and looked at the model score along with the confusion matrix. In this article, I am going to cover off a high level […]

### ML Series: Predicting Diabetes; Initial Model Implementation & Confusion Matrix

Following on from my previous article; we are continuing our journey to predicting whether someone has diabetes. The way we do this is pretty straight forward. First, we get our data ready to be ingested […]

### ML Series: Predicting Diabetes; Exploring Data

Here we have a dataset from Kaggle; all data within this dataset are females aged 21 and above. The task is to take that data and build a classification model which will determine, given some […]

### ML Series: K-Nearest Neighbours Described

K Nearest neighbours is one of the simplest machine learning models to explain Let’s say, we have lots of historical data with a class assigned. In the below, you can see that we have a […]

### ML Series: An Introduction To Naive Bayes

A Naive Bayes algorithm is a classifier. It takes into account the probability of each feature occuring and determines the overall probability of the target class (outcome). From that, it takes the highest probabiltiy and […]

### ML Series: What On Earth Are Support Vector Machines?

Support vector machines are supervised classification models, within which each observation (or data point) is plotted on an Ndimensional array, where N is the number of features. If we had height and weight, we would […]

### ML Series: A Look At Gradient Boosted Trees

In a traditional random forest, there is parallel learning. In the below, we can see that each model samples data from the overall dataset and produces a model from it. It does this in parallel […]

### ML Series: Supervised Regression Model Options

In this article, we’re going to cover what models you could use to predict continuous numeric values. You have a number of model choices – let’s discuss three: Linear regression Random forest regressor Gradient Boosting […]

### ML Series: Feature Engineering and Selection

As with the previous sections in this series, there is a little overlap – but not a huge amount. The techniques we’re going to discuss are related to feature engineering and feature selection. Binning data […]

### ML Series: Exploring Our Data

In the previous article, I highlighted the below phases as being part of the ML workflow. Data cleansing, formatting Data exploration Feature engineering and feature selection Initial machine learning model implementation Comparison of different models […]

### ML Series: Deep Dive On Data Cleansing

When we’re working through a data science problem, there really are a few main steps which we need to take. These are outlined below: Data cleansing, formatting Data exploration Feature engineering and feature selection Initial […]

### Calculating Distance Between Two Geo Points In Python

As you may be aware, I am a Python tutor online and quite often I get asked pretty specific questions. This week, I was asked to show a simple way to find the distance in […]

### A UK Postcode Validation Script In Python

The below script takes the input of a UK postcode and ensures that it matches a valid format. I have handled the below formats: X11XX XX11XX XX111XX To do this, I use some regular expressions. […]

### Making a simple hangman game in Python

Today I thought it might be cool to make a super simple little text based game in Python in my spare 15 minutes. So I made this hangman game. As you can see, the word […]

### Python Brainteaser: Formatting Numbers with Commas

Often, you will want to format your numbers in Python because it’s quite hard for a user to read 10000000000 and immediately understand it as: 10,000,000,000. So, this little article will show you one possible […]

### Introducing Zip Lists in Python

The Python zip function is incredibly useful. It takes in an iterable (a list, a tuple, a dictionary etc..) and returns an iterator. Well, that’s as clear as mud, right? Let’s look at an example […]

### Quick Tip: Lambda Functions

Those of you that know me will know I am not a fan of Lambda functions. I just don’t buy into the notion of making code as concise as possible at the detriment of readability. […]

### Quick Tip: Filter function in Python

This is a really quick post about filtering in Python. In the below, you will see that we have a list of animals. I want to filter that list based on a boolean condition – […]

### Maintaining state while threading in Python

If you have ever worked with threading in Python before, you may have encountered issues where everything gets a little bit out of step (if you’re incrementing a counter, 2 threads may simultaneously try to […]

### A crash course in threading and multiprocessing in python

When you first start looking into asynchronous processing in Python, you’ll come across a couple of terms: threading and multiprocessing. The first part of this article then, is about understanding what those two terms mean […]

### UPDATED: All about classes in python

Classes are great when you’re working with concepts in the real world. They are by no means a necessary construct in Python, they’re mostly used for code simplification, readaibility and reusability. We’ll walk through some […]

### Percentiles, Quartiles And Boxplots

Percentiles are 100 equal groups into which a population can be divided according to the distribution of values. A percentile can be between 1 and 99 – whatever number you pick, X% should fall below […]

Loading…

Something went wrong. Please refresh the page and/or try again.

### Follow My Blog

Get new content delivered directly to your inbox.