Moving average model (MA)

A moving average model of order \(q\), MA(\(q\)), expresses the current value as a linear combination of the current and \(q\) past white noise errors. Unlike AR models, MA models are always stationary regardless of the parameter values. The invertibility condition determines whether the model has a useful AR representation.

Definition

\[y_t = \mu + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \theta_2 \varepsilon_{t-2} + \cdots + \theta_q \varepsilon_{t-q}\]

where \(\varepsilon_t \sim \text{WN}(0, \sigma^2)\). Using the lag operator:

\[y_t - \mu = \Theta(L)\,\varepsilon_t, \qquad \Theta(L) = 1 + \theta_1 L + \theta_2 L^2 + \cdots + \theta_q L^q\]

The \(\theta_i\) measure how past shocks propagate into current values. A positive \(\theta_1\) means a positive shock today raises tomorrow’s value; negative \(\theta_1\) creates an overshooting correction.

MA processes are always stationary

For any finite \(q\) and any values of \(\theta_1, \ldots, \theta_q\):

\[E[y_t] = \mu, \quad \text{Var}(y_t) = \sigma^2(1 + \theta_1^2 + \cdots + \theta_q^2)\]

Both are constants, independent of \(t\). The autocovariance:

\[\gamma_k = \begin{cases} \sigma^2 \sum_{j=0}^{q-k} \theta_j \theta_{j+k} & k = 1, 2, \ldots, q \\ 0 & k > q \end{cases}\]

depends only on lag \(k\), not on \(t\). Therefore every MA(\(q\)) is weakly stationary. This contrasts with AR models, where stationarity requires the root condition \(|\phi_1| < 1\) for AR(1), etc.

ACF and PACF of MA processes

The theoretical ACF of MA(\(q\)) cuts off exactly at lag \(q\):

\[\rho_k = \begin{cases} \dfrac{\sum_{j=0}^{q-k} \theta_j \theta_{j+k}}{1 + \theta_1^2 + \cdots + \theta_q^2} & k = 1, \ldots, q \\ 0 & k > q \end{cases}\]

The PACF tails off geometrically (or with damped oscillations). This is the mirror image of the AR pattern, and the key identifier for MA models.

ACF and PACF plots for MA(1) and MA(2) processes showing cutoff in ACF and tailing off in PACF

Invertibility

An MA(\(q\)) process is invertible if all roots of \(\Theta(z) = 1 + \theta_1 z + \cdots + \theta_q z^q = 0\) lie outside the unit circle \(|z| > 1\).

Invertibility means the MA process can be written as an AR(\(\infty\)):

\[y_t = \sum_{j=1}^\infty \pi_j y_{t-j} + \varepsilon_t\]

where the \(\pi_j\) coefficients decay to zero. This AR(\(\infty\)) representation is what makes the model useful: it allows estimation by expressing current errors in terms of observable past values.

For MA(1): \(y_t = \varepsilon_t + \theta_1 \varepsilon_{t-1}\) is invertible iff \(|\theta_1| < 1\). The AR(\(\infty\)) representation is:

\[\varepsilon_t = y_t - \theta_1 y_{t-1} + \theta_1^2 y_{t-2} - \theta_1^3 y_{t-3} + \cdots = \sum_{j=0}^\infty (-\theta_1)^j y_{t-j}\]

which converges only when \(|\theta_1| < 1\).

⚠️ Non-invertible MA models are not identifiable

For any non-invertible MA(\(q\)) with parameter \(\theta_1\), there exists an invertible MA(\(q\)) with parameter \(1/\theta_1\) that has the same ACF. The two models are observationally equivalent from the ACF alone.

Example: MA(1) with \(\theta_1 = 2\) and MA(1) with \(\theta_1 = 0.5\) have the same autocorrelation \(\rho_1 = \theta_1/(1+\theta_1^2)\). By convention, we always choose the invertible solution (\(|\theta_1| < 1\)). Software enforces this automatically.

Two MA(1) processes with theta=0.8 (invertible) and theta=1.25 (non-invertible) showing identical ACF but different PACF behavior

Both series have nearly identical ACF at lag 1, confirming the observational equivalence. By convention the invertible solution (\(|\theta_1| < 1\)) is always chosen.

Example: MA(1) for stock returns

Daily excess returns of a stock often show a small negative MA(1) component: a day of above-average returns is slightly followed by a day of below-average returns (bid-ask bounce effect).

MA(1) fit to simulated stock returns showing the model fit and impulse response function

The impulse response function (IRF) shows the effect of a unit shock on current and future values. For MA(\(q\)), the response is exactly zero after \(q\) periods: shocks have finite memory. This contrasts with AR models, where shocks decay geometrically but never fully disappear.

MA vs AR: key differences

Property AR(\(p\)) MA(\(q\))
Always stationary No Yes
Invertibility required Always Only for AR(\(\infty\)) representation
ACF Tails off Cuts off at \(q\)
PACF Cuts off at \(p\) Tails off
Memory of shocks Infinite (geometric decay) Finite (\(q\) periods)
Estimation OLS, Yule-Walker, MLE MLE (non-linear)

💡 Fitting MA models in R

# Fit MA(1)
arima(y, order = c(0, 0, 1))

# Fit MA(2)
arima(y, order = c(0, 0, 2))

# Check invertibility: roots should be outside unit circle
fit <- arima(y, order = c(0, 0, 1))
polyroot(c(1, coef(fit)["ma1"]))  # modulus should be > 1

Unlike AR models, MA parameters cannot be estimated by OLS because the past errors \(\varepsilon_{t-j}\) are unobservable. MLE is the standard method.