Month: October 2020
12 posts
Bamboolib: The Most Flexible Pandas GUI?
As data engineers and data scientists, we’re spend a lot of time exploring data. When you’re working with…
Another Pandas GUI: Pandas Profiling
Another GUI for Pandas! YES – they’re coming out of the woodwork now! But this one is a…
A GUI For Pandas! Is This A Game Changer?
Pandas is the defaqto library for data analysis in Python for good reason. It’s infinitely flexible, relatively performant…
Top Five Ways To Avoid Data Leakage As A Data Scientist
Data leakage is where we accidentally share data between the test and training sets; so,that our predictions are…
Using Association Rule Mining To Determine How Likely Is Y, Given X.
Association rule mining is an unsupervised learning algorithm which finds patterns in our data, whereby we know how…
How To Implement A Random Forest Machine Learning Model On An Imbalanced Dataset
For this article, I wanted to demonstrate an end to end model implementation, including tuning for an imbalanced…
Methods For Increasing Model Accuracy (Dimensionality Reduction; Class Imbalance; Hyperparameters)
Tuning our machine learning model is of course critical to its accuracy. There are two methods for tuning…
The Four Key Accuracy Metrics To Prove Or Improve Our Machine Learning Model Performance
In the last article, we started to look at the implemented algorithm and looked at the model score…
Implementing A Logistic Regression Machine Learning Model On Our Diabetes Dataset
Following on from my previous article; we are continuing our journey to predicting whether someone has diabetes. The…
Exploring Data Using Seaborn And Pandas As A Data Scientist
Here we have a dataset from Kaggle; all data within this dataset are females aged 21 and above.…