An introduction to timeseries models (AR, MA, ARMA and ARIMA)

Timeseries forecasting is quite a big topic to cover. I’ve spoken about key terminology and exponential smoothing in this article and I’ve spoken about how we might remove timeseries outliers here. In this post, I am going to discuss the different components of the ARIMA model (AR and MA), in addition to the ARIMA model itself.

AR Model

Fist, the AR piece. This stands for auto regressive. As we know, regression is where we predict a value Y based on X. In the auto-regressive model, the X is the past value of Y. So, we are trying to predict Y based on the historic values of Y (e.g. predict the value of Y today, based on yesterday’s value of Y).

The idea is, if we provide the historic data, we can identify patterns; including seasonality and trends, which will make our forecasting stronger.

Let’s say we’re trying to predict the sales of coconuts in the coming months, to ensure we have the correct amount of inventory to fulfil our orders. Here, we are trying to predict Ct (where c stands for coconut and t = time) based on Ct-1, Ct-2, Ct-3 etc.. where the number after t relates to the number of lags we’re looking at. I.e. t-1 would be yesterday in a daily timeseries plot.

We should be careful here, using all historic data can lead to overfitting of our model, we only want to use the lags which are statistically significant – i.e. the lags which do influence the current prediction. Here, we will introduce the PACF (Partial Auto Correlation Function) plot.

Below, I have mocked up this plot. The way the PACF works is, the number of lags are along the X axis & the correlation is on the Y. Anything outside of the shaded area is considered statistically significant. Hence, only lags 1, 4, 5, 7, 11, 12 and 13 influence todays value.

The PACF looks at the influence of each lag on t, ignoring the influence of other points – it’s the direct effect of t-3 for example with t. With a ACF plot, which we will discuss a bit later, this is not the case, an ACF plot takes into account the influence of t-3, t-2 and t-1 on t this would show you the indirect effect of t-3 on t.


The hyperparameter which we tune with the AR model is called P. The P variable is called the order. It refers to how many time periods apart we should consider. If P = 1, we should consider lags that are one period apart. If P = 2 or P = 3, it would relate to datapoints two or three periods (lags) apart.

MA Model

An MA model is a moving average model. We adjust our predictions for the current period, based on the error of previous periods. In the below example we predicted 10 items would sell in time period one. The actual number was 8, so we were wrong by negative two. We will then adjust our prediction for next month by 0.5*Error (0.5 is the constant we have chosen). So, we will adjust our prediction by -1. Hence, in time period 2, we take the previous average of 10 and minus 50% of our error (1), leaving us with a prediction of 9. This continues for each time period.


So it’s very similar to the AR model, except, instead of looking at previous values of Y to make our prediction, we will look at the previous errors in our predictions to make a prediction.

An MA(1) model says that the target value Y is based on the previous error – this parameter is called Q. How wrong you were yesterday should predict how much you predict today. The first observation will be the average in the series of data.

The PACF we saw earlier, relates to an AR model. The ACF relates to the MA model. The PACF looks at the direct effect of a given lag on t. The ACF looks at the effect of t-2 on t-1 and the effect of t-1 on t. It’s not the direct correlation of t-2 on t, as a PACF would show.

ARMA Models

The ARMA model uses P and Q as the input parameters – we’ve discused these above. ARMA(1,1) relates to ARMA(P,Q). We look at both the number we needed last month to make the prediction as we would with the auto regressive model, but we also look at the error rates, as we do in the moving average model.

ARIMA Models

ARIMA stands for Autoregressive Integrated Moving Average. We’ve seen all of these terms above, except for integrated. This simply means, the number of times you need to difference your time series data in order to achieve stationarity. I have discussed differencing and stationarity here. We therefore have ARIMA(P,D,Q), where P is derived from the autoregressive model; the Q is derived from the MA model and the D refers to the number of times we need to difference our data.

An ARIMA model is the same as an ARIMA model which does not require differencing. Hence, ARIMA is useful where we have a trend which needs to be removed. In other situations, we can use an ARMA model.

Where we have seasonality, we can use the SARIMA model. This adds three new parameters, so it becomes SARIMA (p,d,q)(P,D,Q) – where the second set of parameters describes the autoregressive, intregration and moving average orders for the seasonal component of the timeseries.

Where we have multiple input features, we can use the ARIMAX or SARIMAX models. The X here stands for eXogenous – which essentially means, it’s not a univariate prediction, multiple features can be taken into account to predict t.