Timeseries Decomposition is a mathematical procedure which allows us to transform our single timeseries into multiple series. These help us to extract seasonality information and trend easily.
Doing this in Python is quite a simple task, I have outlined it below. However, before we get into that, we need to understand the difference between additive and multiplicative seasonality. Essentially, additive means, the seasonal variation is constant. It doesn’t really change as the timeseries value increases, whereas, multiplicative timeseries has an growth in the seasonal fluctuations too.
If you look at the below terribly drawn charts (sorry), you can see a demonstration of this. To the left, we see a seasonal dataset with an upward trend. But note that the dotted red lines maintain a consistent distance apart throughout. This is because the magnitude of the seasonal spikes have not changed as the trend goes upward. This is therefore an additive seasonal timeseries.
To the right, we see the opposite. Yes, there is a similar upward trend to the dataset, but the seasonal variation is much stronger & hence the dotted lines get further and further apart as the dataset goes on. This is an example of a multiplicative seasonal timeseries.
The output of the Python code at the bottom of this post, is the below chart. Here we have the original timeseries at the top along with three other charts:
- The trend chart shows you the general trend of the dataset. You can see that the overall trend is upwards
- The seasonal chart extracts the seasonality from the dataset
- The resid chart is the left over (or residual) values, after the trend and seasonality has been removed. Essentially, this is the stationary dataset.
In the code below:
- We bring our dataset in (which is the dataset found here).
- We then set the Date as the index for the timeseries
- We then plot the timeseries data
import pandas as pd import matplotlib.pylab as plt %matplotlib inline df = pd.read_csv('/home/Datasets/seasons.csv') ts = df.set_index('Date') ts = ts['Temp'] plt.plot(ts)
From here, we then run the seasonal decomposition. This is done with a period of 365 as seasonality seems to occur yearly, with a daily observation interval (i.e. 365 intervals in a year). We can also see that the dataset is additive, as the magnitude of the seasonal spikes is quite consistent.
from pylab import rcParams import statsmodels.api as sm ts = ts.fillna(0) rcParams['figure.figsize'] = 18, 8 decomposition = sm.tsa.seasonal_decompose(ts, model='additive', period = 365) fig = decomposition.plot()
Interestingly, we have also uncovered a trend, which was hard to see from the original data, wehre overall, there was an upward trend from about 60% of the way through the timeseries.