Month: April 2021
5 posts
Expose your ML model via a simple API in Python
As data scientists, it is important that we have a method of sharing the insight from our models.…
Parameters to make your Hive queries perform better
Hive, in my experience, is a platform which can have extremely variable performance, which can make it difficult…
Keeping your Hive queries clean with CTEs
This is a super short & quick article about keeping your queries as readable and performant as possible…
Working with dates in Apache Hive
Working with dates is one of those tedious things we frequently come across as data engineers. The frustration…
Improving performance when calculating percentiles in Spark
Performance is a major concern when you’re working in a distributed environment with a massive amount of data.…