# Kalman Filter

Suppose we have a system as follows: \begin{align} \alpha_t &= T \alpha_{t-1} + \eta_t \\ y_t &= Z \alpha_{t} + \epsilon_t \\ \end{align}

where

• $\alpha: m\times 1$
• $\eta : m\times 1$
• $T: m\times m$
• $y: n\times 1$
• $\epsilon: n\times 1$
• $Z: n\times m$

and

• $E[\eta_t] = 0$
• $E[\eta_t\eta_t'] = Q_{m\times m}$
• $E[\epsilon_t] = 0$
• $E[\epsilon_t\epsilon'_t] = H_{n\times n}$
• $E[\epsilon_t\eta_s] = 0$ for all $t$ and $s$.

$\alpha_t$ is unobserved, $y_t$ is observed.

Our goal is to write the system above as: \begin{align*} \hat{\alpha}_{t+1|t} & = T\hat{\alpha}_{t|t-1} + K_tv_t\\ y_t &= Z\hat{\alpha}_{t|t-1} + v_t \end{align*}

Let's get started.

Suppose we are given the observations until time $t-1$, i.e. we know $y_0,\ldots,y_{t-1}$.

First, define the time $t-1$ estimate for the state vector $\alpha$:

$$\hat{\alpha}_{t-1} = E[\alpha_{t-1}|y_0,\ldots,y_{t-1}]$$

The variance of the estimation error is:

$$P_{t-1} = E[(\alpha_{t-1}-\hat{\alpha}_{t-1})(\alpha_{t-1}-\hat{\alpha}_{t-1})']$$

Then, we can estimate $\alpha$ for period $t$ using $\hat{\alpha}_{t-1}$:

$$\hat{\alpha}_{t|t-1} = T \hat{\alpha}_{t-1}$$

The variance of the prediction error is:

\begin{align*} P_{t|t-1} & = E[(\alpha_{t}-\hat{\alpha}_{t|t-1})(\alpha_{t}-\hat{\alpha}_{t|t-1})'] \\ & = E[(T\alpha_{t-1}+\eta_t - T\hat{\alpha}_{t-1}) (T\alpha_{t-1}+\eta_t - T\hat{\alpha}_{t-1})'] \\ & = E[(T(\alpha_{t-1}-\hat{\alpha}_{t-1}) + \eta_t)(T(\alpha_{t-1}-\hat{\alpha}_{t-1}) + \eta_t)'] \\ & = T E[(\alpha_{t-1}-\hat{\alpha}_{t-1})(\alpha_{t-1}-\hat{\alpha}_{t-1})']T'+TE[(\alpha_{t-1}-\hat{\alpha}_{t-1})\eta_t'] + E[\eta_t(\alpha_{t-1}-\hat{\alpha}_{t-1})']T' + E[\eta_t\eta_t']\\ & = T E[(\alpha_{t-1}-\hat{\alpha}_{t-1})(\alpha_{t-1}-\hat{\alpha}_{t-1})']T' + E[\eta_t\eta_t']\\ & = T P_{t-1} T' + Q \end{align*}

Now, suppose we observe $y_t$. After observing the new data, we should update our estimations and predictions for $\alpha$.

Before observing $y_t$, our prediction for $y_t$ was

$$\hat{y}_{t|t-1} = Z\hat{\alpha}_{t|t-1}$$

Now that we observe $y_t$, we can say that our prediction error for $y_t$, or the innovation, is

\begin{align*} v_t &= y_t - \hat{y}_{t|t-1} \\ & = Z\alpha_t + \epsilon_t - Z\hat{\alpha}_{t|t-1} \\ & = Z(\alpha_t - \hat{\alpha}_{t|t-1}) + \epsilon_t \end{align*}

The variance of innovation is:

\begin{align*} F_t &= E[(Z(\alpha_t - \hat{\alpha}_{t|t-1}) + \epsilon_t )(Z(\alpha_t - \hat{\alpha}_{t|t-1}) + \epsilon_t )'] \\ &= Z E[(\alpha_{t}-\hat{\alpha}_{t|t-1})(\alpha_{t}-\hat{\alpha}_{t|t-1})']Z'+ZE[(\alpha_{t}-\hat{\alpha}_{t|t-1})\epsilon_t'] + E[\epsilon_t(\alpha_{t}-\hat{\alpha}_{t|t-1})']Z' + E[\epsilon_t\epsilon_t']\\ &= Z E[(\alpha_{t}-\hat{\alpha}_{t|t-1})(\alpha_{t}-\hat{\alpha}_{t|t-1})']Z'+ E[\epsilon_t\epsilon_t']\\ & = Z P_{t|t-1} Z' + H \end{align*}

The covariance of the innovation with the prediction error $(\alpha_t - \hat{\alpha}_{t|t-1})$ is

\begin{align*} G_t &= E[(y_t-\hat{y}_{t|t-1})(\alpha_t-\hat{\alpha}_{t|t-1})']\\ &= E[(Z(\alpha_t - \hat{\alpha}_{t|t-1}) + \epsilon_t )(\alpha_t-\hat{\alpha}_{t|t-1})'] \\ &= Z E[(\alpha_t - \hat{\alpha}_{t|t-1})(\alpha_t-\hat{\alpha}_{t|t-1})'] + E[\epsilon_t(\alpha_t-\hat{\alpha}_{t|t-1})'] \\ & = Z E[(\alpha_t - \hat{\alpha}_{t|t-1})(\alpha_t-\hat{\alpha}_{t|t-1})']\\ & = Z P_{t|t-1} \end{align*}

Now, let's take one step back. Assume $A$ and $B$ are two jointly normal random variables with mean $\bar{A}$ and $\bar{B}$ respectively, and with variance-covariance matrix

$$\Sigma = \left[\begin{array}{cc}\Sigma_{AA}& \Sigma_{AB} \\ \Sigma_{BA} & \Sigma_{BB}\end{array}\right]$$

Then,

\begin{align*} E[A|B] &= \bar{A} + \Sigma_{AB}\Sigma_{BB}^{-1}(B-\bar{B}) \\ Var[A|B] &= \Sigma_{AA} - \Sigma_{AB}\Sigma_{BB}^{-1}\Sigma_{BA} \end{align*}

Now, let's think of $A$ as our $\alpha$, and $B$ as our $y$. Then,

\begin{align*} \hat{\alpha}_t &= \hat{\alpha}_{t|t-1} + (ZP_{t|t-1})'F_t^{-1}(y_t - Z\hat{\alpha}_{t|t-1}) \\ & = \hat{\alpha}_{t|t-1} + P_{t|t-1}Z'F_t^{-1}v_t \\ P_t &= P_{t|t-1} - (ZP_{t|t-1})'F_t^{-1}ZP_{t|t-1}\\ &= P_{t|t-1} - P_{t|t-1}Z'F_t^{-1}ZP_{t|t-1}\\ \end{align*}

So, after observing $y_t$, we updated our prediction about $\alpha$ from $\hat{\alpha}_{t|t-1}$ to $\hat{\alpha}_{t}$.

If we multiply both sides of the equation for $\hat{\alpha}_t$ by $T$, we get \begin{align*} T\hat{\alpha}_t & = T\hat{\alpha}_{t|t-1} + TP_{t|t-1}Z'F_t^{-1}v_t \\ \Rightarrow \hat{\alpha}_{t+1|t} &= T\hat{\alpha}_{t|t-1} + TP_{t|t-1}Z'F_t^{-1}v_t \\ &= T\hat{\alpha}_{t|t-1} + K_tv_t \end{align*}

where $K_t \equiv TP_{t|t-1}Z'F_t^{-1}$ is called the Kalman gain.

## AR(1) Process¶

Assume the process for $x_t$ is governed by

\begin{equation} x_t = \rho x_{t-1} + \epsilon_t \end{equation}

Then, we can write this process as follows:

\begin{equation} \alpha_t = \rho \alpha_{t-1} + \epsilon_{t} \\ x_t = \alpha_t \end{equation}

So, in this model, the matrices we defined above are: $$T = [\rho] \\ Z =  \\ Q = [\sigma^2] \\ H = \\$$

Suppose we set $\rho = 0.6$ and $\sigma = 0.2$, and estimate these parameters by maximum likelihood after we apply Kalman filter to the original $x_t$ process that we generate by using the parameter values we set. Here is how we do it.

In :
import numpy as np
from numpy import *
import matplotlib.pyplot as plt
import scipy
from scipy import optimize

# AR(p) process, p = 1;

p = 1;
numT = 1000;

# Initialization
y = np.zeros((numT,))
y = scipy.randn();

## True parameters:
rho = 0.6;
sigma = 0.2;

# We generate the observed values here.
for i in range(numT-1):
y[i+1] = rho*y[i] + sigma*scipy.randn();

Z = np.zeros((1, 1));
Z[0,0] = 1;
H = np.zeros((1,1));

# Loglikelihood function
def loglik(params):
T = np.zeros((1,1));
T[0,0] = params;
Q = np.zeros((1,1));
Q[0,0] = params**2;

alpha_old = np.zeros((1,1));
P_old = np.zeros((1,1));
obj = 0;
t=0;

while(t < len(y)):
alpha_int = T @ alpha_old;
P_int = T @ P_old @ T.T + Q;
v = y[t] - Z @ alpha_int;
F = Z @ P_int @ Z.T + H;

alpha_new = alpha_int + P_int @ Z.T @ np.linalg.inv(F) @ v
P_new = P_int - P_int @ Z.T @ np.linalg.inv(F) @ Z @ P_int;

P_old = P_new;
alpha_old = alpha_new;
obj -= (-(1/2) * log(2*pi) - (1/2)*log(np.linalg.det(F)) - (1/2)*(v.T @ np.linalg.inv(F) @ v))
t = t+1;
return obj;

params_init = np.array([0.1, 0.1])
bds = ((None,None),(0, None))
u = scipy.optimize.minimize(loglik, params_init, method = "TNC", tol=1e-8)
print(u.x)

[0.60955509 0.19680586]


## AR(2) Process¶

Now, let's do the same exercise, but this time assuming that the process for $x_t$ is governed by

\begin{equation} x_t = \rho_1 x_{t-1} + \rho_2 x_{t-2} + \epsilon_t \end{equation} Then, we can write this process as follows:

$$\left[\begin{array}{c}x_{t} \\ x_{t-1}\end{array}\right]= \left[\begin{array}{cc}\rho_1 & \rho_2 \\\ 1 & 0\end{array}\right] \left[\begin{array}{c}x_{t-1}\\ x_{t-2}\end{array}\right] + \left[\begin{array}{c}\epsilon_{t}\\ 0\end{array}\right] \\ x_t = \left[\begin{array}{cc}1 & 0 \end{array}\right]\left[\begin{array}{c}x_{t} \\ x_{t-1}\end{array}\right]$$

So, in this model, the matrices we defined above are: $$T = \left[\begin{array}{cc}\rho_1& \rho_2 \\ 1& 0 \end{array} \right] \\ Z = \left[\begin{array}{cc}1 & 0\end{array}\right] \\ Q = \left[\begin{array}{cc}\sigma^2& 0 \\ 0 & 0 \end{array}\right] \\ H = 0\\$$

Suppose we set $\rho_1 = 0.6$, $\rho_2 = -0.2$, and $\sigma = 0.2$, and estimate these parameters by maximum likelihood after we apply Kalman filter to the original $x_t$ process that we generate by using the parameter values we set. Here is how we do it.

In :
# import numpy as np
from numpy import *
import matplotlib.pyplot as plt
import scipy
from scipy import optimize

# AR(2) process

numT = 1000;

# Initialization
y = np.zeros((numT,))
y = scipy.randn();
y = scipy.randn();

## True parameters:
rho_1 = 0.6;
rho_2 = -0.2;
sigma = 0.2;

# We generate the observed values here.
for i in range(numT-2):
y[i+2] = rho_1*y[i+1] + rho_2*y[i]+ sigma*scipy.randn();

Q = np.zeros((2,2));
Z = np.zeros((1, 2));
Z[0,0] = 1;
T = np.zeros((2,2));
T[1,0] = 1;
H = np.zeros((1,1));

# Loglikelihood function
def loglik(params):
T[0,:] = params[0:2];
Q[0,0] = params**2;
alpha_old = np.zeros((2,1));
P_old = np.zeros((2,2));
obj = 0;
t=0;
while(t < len(y)):
alpha_int = T @ alpha_old;
P_int = T @ P_old @ T.T + Q;
v = y[t] - Z @ alpha_int;
F = Z @ P_int @ Z.T + H;

alpha_new = alpha_int + P_int @ Z.T @ np.linalg.inv(F) @ v
P_new = P_int - P_int @ Z.T @ np.linalg.inv(F) @ Z @ P_int;

P_old = P_new;
alpha_old = alpha_new;
obj -= (-(1/2) * log(2*pi) - (1/2)*log(np.linalg.det(F)) - (1/2)*(v.T @ np.linalg.inv(F) @ v))
t = t+1;
return obj;

params_init = np.array([0.1,0.1,0.1])
bds = ((None,None),(None,None),(0, None))
u = scipy.optimize.minimize(loglik, params_init, method = "TNC",bounds=bds, tol=1e-8)
print(u.x)

[ 0.6311042  -0.25174543  0.21006471]


## MA(1) Process¶

Assume the process for $x_t$ is governed by

\begin{equation} x_t = \epsilon_{t} + \theta \epsilon_{t-1} \end{equation}

We can write the state equation as

$$\left[\begin{array}{c}\epsilon_{t} \\ \epsilon_{t-1} \end{array}\right] = \left[\begin{array}{cc}0& 0 \\ 1& 0 \end{array} \right]\left[\begin{array}{c}\epsilon_{t-1} \\ \epsilon_{t-2} \end{array}\right] + \left[\begin{array}{c}\epsilon_{t} \\ 0 \end{array}\right]$$ and the observation equation as

$$x_t = \left[\begin{array}{cc}1 & \theta \end{array}\right] \left[\begin{array}{c}\epsilon_{t} \\ \epsilon_{t-1} \end{array}\right]$$

So, in this model, the matrices we defined above are: $$T = \left[\begin{array}{cc}0& 0 \\ 1& 0 \end{array} \right] \\ Z = \left[\begin{array}{cc}1 & \theta \end{array}\right] \\ Q = \left[\begin{array}{cc}\sigma^2& 0 \\ 0 & 0 \end{array}\right] \\ H = 0\\$$

Suppose we set $\theta = -0.6$ and $\sigma = 0.2$, and estimate these parameters by maximum likelihood after we apply Kalman filter to the original $x_t$ process that we generate by using the parameter values we set. Here is how we do it.

In :
import numpy as np
from numpy import *
import matplotlib.pyplot as plt
import scipy
from scipy import optimize

# MA(1) process

q = 1;
r = q + 1;
numT = 1000;

# Initialization
y = np.zeros((numT,))
y = scipy.randn();

## True parameters:

sigma = 0.2;
theta = -0.6;

eps = sigma*scipy.randn(numT,1);

# We generate the observed values here.

for i in range(numT-1):
y[i+1] = eps[i+1] + theta*eps[i];

T = np.zeros((r,r));
T[1,0] = 1;

# Loglikelihood function
def loglik(params):
sigma = params;
theta = params;
Z = np.zeros((1,r));
Z[0, 0] = 1;
Z[0, 1] = theta;
Q = np.zeros((r,r));
Q[0,0] = sigma**2;
H = np.zeros((1,1));
alpha_old = np.zeros((r,1));
P_old = np.zeros((r,r));
obj = 0;
t = 0;
while(t < len(y)):
alpha_int = T @ alpha_old;
P_int = T @ P_old @ T.T + Q;
v = y[t] - Z @ alpha_int;
F = Z @ P_int @ Z.T + H;

alpha_new = alpha_int + P_int @ Z.T @ np.linalg.inv(F) @ v
P_new = P_int - P_int @ Z.T @ np.linalg.inv(F) @ Z @ P_int;

P_old = P_new;
alpha_old = alpha_new;
obj -= (-(1/2) * log(2*pi) - (1/2)*log(np.linalg.det(F)) - (1/2)*(v.T @ np.linalg.inv(F) @ v))
t = t+1;
return obj;

params_init = np.array([0.3,0.1])
bds = ((None,None),(0.00001, None))
u = scipy.optimize.minimize(loglik, params_init, method = "TNC",bounds=bds, tol=1e-8)
print(u.x)

[-0.61455049  0.19894218]


## Random walk¶

Finally, assume the process for xt is governed by $$x_t = \mu_t + \epsilon_t, \;\mu_t = \mu_{t-1} + \eta_t$$ Now, we can write this system as \begin{equation} \alpha_t = \alpha_{t-1} + \epsilon_{t} \\ x_t = \alpha_t \end{equation} So, in this model, the matrices we defined above are: $$T =  \\ Z =  \\ Q = [\sigma^2] \\ H = \\$$

Suppose we set $\sigma = 0.2$, and estimate these parameters by maximum likelihood after we apply Kalman filter to the original $x_t$ process that we generate by using the parameter values we set. Here is how we do it.

In :
import numpy as np
from numpy import *
import matplotlib.pyplot as plt
import scipy
from scipy import optimize

r = 1;

numT = 1000;

# Initialization
y = np.zeros((numT,))
y = scipy.randn();

## True parameters:

sigma = 0.2;
eps = sigma*scipy.randn(numT,1);

# We generate the observed values here.
for i in range(numT-1):
y[i+1] = y[i] + eps[i];

Z = np.zeros((1,1));
Z[0, 0] = 1;

T = np.zeros((1,1));
T[0,0] = 1;

H = np.zeros((1,1));

# Loglikelihood function
def loglik(params):
sigma = params;
Q = np.zeros((1,1));
Q[0,0] = sigma**2;
alpha_old = np.zeros((1,1));
P_old = np.zeros((1,1));
obj = 0;
t = 0;
while(t < len(y)):
alpha_int = T @ alpha_old;
P_int = T @ P_old @ T.T + Q;
v = y[t] - Z @ alpha_int;
F = Z @ P_int @ Z.T + H;

alpha_new = alpha_int + P_int @ Z.T @ np.linalg.inv(F) @ v
P_new = P_int - P_int @ Z.T @ np.linalg.inv(F) @ Z @ P_int;

P_old = P_new;
alpha_old = alpha_new;
obj -= (-(1/2) * log(2*pi) - (1/2)*log(np.linalg.det(F)) - (1/2)*(v.T @ np.linalg.inv(F) @ v))
t = t+1;
return obj;

params_init = np.array([0.3])
u = scipy.optimize.minimize(loglik, params_init, method = "TNC", tol=1e-8)
print(u.x)

[0.19600354]