Timeseries Decomposition is a mathematical procedure which allows us to transform our single timeseries into multiple series. These help us to extract seasonality information and trend easily. Doing this in Python is quite a simple task, I have outlined it below. However, before we get into that, we need to understand the difference between additive […]

Read more## An introduction to timeseries models (AR, MA, ARMA and ARIMA)

Timeseries forecasting is quite a big topic to cover. I’ve spoken about key terminology and exponential smoothing in this article and I’ve spoken about how we might remove timeseries outliers here. In this post, I am going to discuss the different components of the ARIMA model (AR and MA), in addition to the ARIMA model […]

Read more## Key terminology & process for timeseries analysis including exponential smoothing

Timeseries analysis is incredibly powerful but can get quite confusing. There is a lot of terminology which we need to understand before we can really progress with making a forecast. Ultimately, timeseries analysis is all about analysing and forecasting data that is indexed in equally spaced increments of time; i.e. minutes, seconds, days, weeks, months, […]

Read more## MAE | RMSE | MAPE : Measures of model accuracy for data scientists

Mean Absolute Error (MAE) This simply takes the difference between the predicted value and the actual value for every prediction and takes an average of the result. However, to avoid values cancelling one another out, it takes the absolute value (which means, it makes all the values positive). Let’s consider an example. In the below, […]

Read more## Data Badass Early Preview

An early view of my new book ‘Data Badass’ is available to view using this link. It’s not yet been through thorough editing and will be added to over time but I am keen to gather some feedback. It’s a book that covers the data basics; data platforms (including Hadoop, Kafka, Flume, Hive, Spark) and […]

Read more## Using ROC Curves & AUC

This is a snippet from my upcoming book ‘Data Badass’ (pictured below): The ROC Curve & the Area Under Curve (AUC) is used for binary classification problems. The ROC curve chart looks at the True Positive Rate vs the False Positive Rate. Ideally, you want to reduce the number of false positives as much as […]

Read more## The data scientist learning plan for 2021

When you look online at what it takes to become a data scientist, it’s enough to make your brain melt. You have people telling you that you need to be an expert statistician / mathematician; you need to be a top-level coder in 15 different languages; be proficient with every type of SQL/NoSQL database on […]

Read more## Can we successfully implement Agile in data science?

Agile is about iterative development and delivering tangible products/features quickly, which provides the business with value and ROI faster than a traditional waterfall project. Consider the example of a piece of accounting software. Overall, it’s going to have 50 features to support the accounts team. To deliver all of the features in a waterfall fashion, […]

Read more## The Data Scientist Statistics Learning Plan For 2021

As data scientists, we need to be comfortable with mathematics. If you Google what you need to know, you’ll find answers stating you need to fully understand linear algebra; calculus and how to calculate all of the algorthms we use by hand. I’m not going to downplay the importance of understanding how the algorithm works, […]

Read more## A Guide To Basic Linear Algebra Notation For Machine Learning

Often, you’ll be looking around on the web for an answer to a question you have about an algorithm & you are presented with a formulae-heavy answer on a forum. If you don’t know the notation, this is going to give you a headache. So this article aims to cover off much of the common […]

Read more## The Ultimate Guide To Linear Regression For Aspiring Data Scientists

Regression is about finding the relationship between some input variable(s) and an outcome. Let’s think about a simple example of height and weight. We need to understand the relationship between the two – intuition can tell us that as height increases, so does weight. The idea of regression is to create a mathematical formula which […]

Read more## How Do Data Scientists Deal With Outliers In Timeseries Analysis To Reveal Trends and Patterns?

Welcome back to the series on timeseries analysis. In this article, we’re going to discuss: plotting timeseries data and smoothing the data to handle outliers & make finding a trend a little bit easier. In the below, I have ingsted the timeseries data into a dataframe called df. From that dataframe, I have then set […]

Read more