Browsing Category
Data
16 posts
NLP Crash Course (Part 2): Approaches to text classification
In the last article, we spoke about how you would go about preparing your data for your NLP…
NLP Crash Course (Part 1): Preparing data for NLP
Natural Language Processing (NLP) is one of the major topics in data science. We use it to understand…
Data Strategy: Implementation Framework
DMBOK is the Data Management Body of Knowledge framework that we use when we’re putting our data management…
Keeping your Hive queries clean with CTEs
This is a super short & quick article about keeping your queries as readable and performant as possible…
Working with dates in Apache Hive
Working with dates is one of those tedious things we frequently come across as data engineers. The frustration…
An introduction to structured data modelling
This is an introductory chapter of my upcoming book ‘Data Badass’ (pictured below): Data modelling is all about…
An introduction to data structures for aspiring data engineers
This is an introductory chapter of my upcoming book ‘Data Badass’ (pictured below): Data Types Data types are…
A guide to windowing functions in Hive for data analysis
Windowing functions in Hive are super useful. They make analysis that would otherwise be challenging, much easier. Let’s…
The Hive SQL Crash Course For Data Analysts
SQL is one of the most in-demand data skills. The language has been adopted by many database platforms,…
The data scientist learning plan for 2021
When you look online at what it takes to become a data scientist, it’s enough to make your…