ML

Top Five Ways To Avoid Data Leakage As A Data Scientist

Data leakage is where we accidentally share data between the test and training sets; so,that our predictions are more accurate than they will ever be in the real world. If you’ve run your model against your test dataset and have a 99% accuracy, you probably have data leakage. Maybe you don’t, but you probably do. […]

Read more
ML

Using Association Rule Mining To Determine How Likely Is Y, Given X.

Association rule mining is an unsupervised learning algorithm which finds patterns in our data, whereby we know how likely it is that Y occurs in the event of X. In other words, it finds features (or dimensions) which appear together frequently. Note: just because someone buys burgers with ketchup, does not mean someone that buys […]

Read more
ML

Exploring Data Using Seaborn And Pandas As A Data Scientist

Here we have a dataset from Kaggle; all data within this dataset are females aged 21 and above. The task is to take that data and build a classification model which will determine, given some information about a patient, whether they are diabetic. In this post, we’re going to ingest and explore the dataset and […]

Read more