LATEST ARTICLES
The MySQL Cheatsheet: Data Types
Understanding the different data types we have at our disposal is key when designing database tables. We need to ensure…
COURSE REVIEWS
A few reviews from my popular courses: No Nonsense Python (~12,000 students); Data Analytics in Python using Pandas (~20,000 Students); Crash Course in PySpark (~7,000 students)
MACHINE LEARNING ARTICLES
Important clustering concepts to keep in mind
First things first, what is k-means clustering? It’s an algorithm…
Data Badass Free Book for Data Leaders
My new book ‘Data Badass’ is available to view using this…
Ensemble modelling to improve your model performance
In my last article, I spoke about auto-sklearn. I said that, the library…
Getting started with Sci-Kit Learn AutoML
An automl workflow should be able to preprocess data; select the right…
DEEP LEARNING
Deep learning for beginners (Part 8): Improving our tuning script & using the Keras tuner
This is the eighth and final part in the…
Deep learning for beginners (Part 7): neural network design (layers & neurons)
This is the seventh part in the Deep…
Deep learning for beginners (Part 6): more terminology to optimise our Keras model
This is the sixth part in the Deep…
Deep learning for beginners (Part 5): our first foray into Keras
This is the fifth part in the Deep…
SQL / HIVE ARTICLES
The MySQL Cheatsheet: Data Types
Understanding the different data types we have at our disposal is key…
Composite indexes in MySQL explained
Just like an index in the back of a text book; a table index allows us…
How to create tonnes of dummy data in MySQL
When we are testing various database concepts, we need mock data.…
MySQL Querying A Partitioned Table
Querying partitions in MySQL is a nice straightforward process. However,…
APACHE SPARK
Managing small file issues when writing to Hive with Spark SQL
Hive sits on top of HDFS (the Hadoop…
Improving performance when calculating percentiles in Spark
Performance is a major concern when…
Overcoming Futures Timeout & Read Timeout errors in PySpark
This article covers a less than orthodox…
Handling data skew / imbalanced partitions in Pyspark
Data Skew is a real problem in Spark. It…