Kalman Filter

Suppose we have a system as follows: $$ \begin{align} \alpha_t &= T \alpha_{t-1} + \eta_t \\ y_t &= Z \alpha_{t} + \epsilon_t \\ \end{align} $$

where

  • $\alpha: m\times 1$
  • $\eta : m\times 1$
  • $T: m\times m$
  • $y: n\times 1$
  • $\epsilon: n\times 1$
  • $Z: n\times m$

and

  • $E[\eta_t] = 0$
  • $E[\eta_t\eta_t'] = Q_{m\times m}$
  • $E[\epsilon_t] = 0$
  • $E[\epsilon_t\epsilon'_t] = H_{n\times n}$
  • $E[\epsilon_t\eta_s] = 0 $ for all $t$ and $s$.

$\alpha_t$ is unobserved, $y_t$ is observed.

Our goal is to write the system above as: $$ \begin{align*} \hat{\alpha}_{t+1|t} & = T\hat{\alpha}_{t|t-1} + K_tv_t\\ y_t &= Z\hat{\alpha}_{t|t-1} + v_t \end{align*} $$

Let's get started.

Suppose we are given the observations until time $t-1$, i.e. we know $y_0,\ldots,y_{t-1}$.

First, define the time $t-1$ estimate for the state vector $\alpha$:

$$ \hat{\alpha}_{t-1} = E[\alpha_{t-1}|y_0,\ldots,y_{t-1}] $$

The variance of the estimation error is:

$$ P_{t-1} = E[(\alpha_{t-1}-\hat{\alpha}_{t-1})(\alpha_{t-1}-\hat{\alpha}_{t-1})'] $$

Then, we can estimate $\alpha$ for period $t$ using $\hat{\alpha}_{t-1}$:

$$ \hat{\alpha}_{t|t-1} = T \hat{\alpha}_{t-1} $$

The variance of the prediction error is:

$$ \begin{align*} P_{t|t-1} & = E[(\alpha_{t}-\hat{\alpha}_{t|t-1})(\alpha_{t}-\hat{\alpha}_{t|t-1})'] \\ & = E[(T\alpha_{t-1}+\eta_t - T\hat{\alpha}_{t-1}) (T\alpha_{t-1}+\eta_t - T\hat{\alpha}_{t-1})'] \\ & = E[(T(\alpha_{t-1}-\hat{\alpha}_{t-1}) + \eta_t)(T(\alpha_{t-1}-\hat{\alpha}_{t-1}) + \eta_t)'] \\ & = T E[(\alpha_{t-1}-\hat{\alpha}_{t-1})(\alpha_{t-1}-\hat{\alpha}_{t-1})']T'+TE[(\alpha_{t-1}-\hat{\alpha}_{t-1})\eta_t'] + E[\eta_t(\alpha_{t-1}-\hat{\alpha}_{t-1})']T' + E[\eta_t\eta_t']\\ & = T E[(\alpha_{t-1}-\hat{\alpha}_{t-1})(\alpha_{t-1}-\hat{\alpha}_{t-1})']T' + E[\eta_t\eta_t']\\ & = T P_{t-1} T' + Q \end{align*} $$

Now, suppose we observe $y_t$. After observing the new data, we should update our estimations and predictions for $\alpha$.

Before observing $y_t$, our prediction for $y_t$ was

$$ \hat{y}_{t|t-1} = Z\hat{\alpha}_{t|t-1} $$

Now that we observe $y_t$, we can say that our prediction error for $y_t$, or the innovation, is

$$ \begin{align*} v_t &= y_t - \hat{y}_{t|t-1} \\ & = Z\alpha_t + \epsilon_t - Z\hat{\alpha}_{t|t-1} \\ & = Z(\alpha_t - \hat{\alpha}_{t|t-1}) + \epsilon_t \end{align*} $$

The variance of innovation is:

$$ \begin{align*} F_t &= E[(Z(\alpha_t - \hat{\alpha}_{t|t-1}) + \epsilon_t )(Z(\alpha_t - \hat{\alpha}_{t|t-1}) + \epsilon_t )'] \\ &= Z E[(\alpha_{t}-\hat{\alpha}_{t|t-1})(\alpha_{t}-\hat{\alpha}_{t|t-1})']Z'+ZE[(\alpha_{t}-\hat{\alpha}_{t|t-1})\epsilon_t'] + E[\epsilon_t(\alpha_{t}-\hat{\alpha}_{t|t-1})']Z' + E[\epsilon_t\epsilon_t']\\ &= Z E[(\alpha_{t}-\hat{\alpha}_{t|t-1})(\alpha_{t}-\hat{\alpha}_{t|t-1})']Z'+ E[\epsilon_t\epsilon_t']\\ & = Z P_{t|t-1} Z' + H \end{align*} $$

The covariance of the innovation with the prediction error $(\alpha_t - \hat{\alpha}_{t|t-1})$ is

$$ \begin{align*} G_t &= E[(y_t-\hat{y}_{t|t-1})(\alpha_t-\hat{\alpha}_{t|t-1})']\\ &= E[(Z(\alpha_t - \hat{\alpha}_{t|t-1}) + \epsilon_t )(\alpha_t-\hat{\alpha}_{t|t-1})'] \\ &= Z E[(\alpha_t - \hat{\alpha}_{t|t-1})(\alpha_t-\hat{\alpha}_{t|t-1})'] + E[\epsilon_t(\alpha_t-\hat{\alpha}_{t|t-1})'] \\ & = Z E[(\alpha_t - \hat{\alpha}_{t|t-1})(\alpha_t-\hat{\alpha}_{t|t-1})']\\ & = Z P_{t|t-1} \end{align*} $$

Now, let's take one step back. Assume $A$ and $B$ are two jointly normal random variables with mean $\bar{A}$ and $\bar{B}$ respectively, and with variance-covariance matrix

$$ \Sigma = \left[\begin{array}{cc}\Sigma_{AA}& \Sigma_{AB} \\ \Sigma_{BA} & \Sigma_{BB}\end{array}\right] $$

Then,

$$ \begin{align*} E[A|B] &= \bar{A} + \Sigma_{AB}\Sigma_{BB}^{-1}(B-\bar{B}) \\ Var[A|B] &= \Sigma_{AA} - \Sigma_{AB}\Sigma_{BB}^{-1}\Sigma_{BA} \end{align*} $$

Now, let's think of $A$ as our $\alpha$, and $B$ as our $y$. Then,

$$ \begin{align*} \hat{\alpha}_t &= \hat{\alpha}_{t|t-1} + (ZP_{t|t-1})'F_t^{-1}(y_t - Z\hat{\alpha}_{t|t-1}) \\ & = \hat{\alpha}_{t|t-1} + P_{t|t-1}Z'F_t^{-1}v_t \\ P_t &= P_{t|t-1} - (ZP_{t|t-1})'F_t^{-1}ZP_{t|t-1}\\ &= P_{t|t-1} - P_{t|t-1}Z'F_t^{-1}ZP_{t|t-1}\\ \end{align*} $$

So, after observing $y_t$, we updated our prediction about $\alpha$ from $\hat{\alpha}_{t|t-1}$ to $\hat{\alpha}_{t}$.

If we multiply both sides of the equation for $\hat{\alpha}_t$ by $T$, we get $$ \begin{align*} T\hat{\alpha}_t & = T\hat{\alpha}_{t|t-1} + TP_{t|t-1}Z'F_t^{-1}v_t \\ \Rightarrow \hat{\alpha}_{t+1|t} &= T\hat{\alpha}_{t|t-1} + TP_{t|t-1}Z'F_t^{-1}v_t \\ &= T\hat{\alpha}_{t|t-1} + K_tv_t \end{align*} $$

where $K_t \equiv TP_{t|t-1}Z'F_t^{-1}$ is called the Kalman gain.

AR(1) Process

Assume the process for $x_t$ is governed by

\begin{equation} x_t = \rho x_{t-1} + \epsilon_t \end{equation}

Then, we can write this process as follows:

\begin{equation} \alpha_t = \rho \alpha_{t-1} + \epsilon_{t} \\ x_t = \alpha_t \end{equation}

So, in this model, the matrices we defined above are: $$ T = [\rho] \\ Z = [1] \\ Q = [\sigma^2] \\ H = [0]\\ $$

Suppose we set $\rho = 0.6$ and $\sigma = 0.2$, and estimate these parameters by maximum likelihood after we apply Kalman filter to the original $x_t$ process that we generate by using the parameter values we set. Here is how we do it.

In [1]:
import numpy as np
from numpy import *
import matplotlib.pyplot as plt
import scipy
from scipy import optimize

# AR(p) process, p = 1;

p = 1;
numT = 1000;


# Initialization
y = np.zeros((numT,))
y[0] = scipy.randn();

    
## True parameters:
rho = 0.6;
sigma = 0.2;


# We generate the observed values here.
for i in range(numT-1):
    y[i+1] = rho*y[i] + sigma*scipy.randn();


Z = np.zeros((1, 1));
Z[0,0] = 1;
H = np.zeros((1,1));

# Loglikelihood function        
def loglik(params):
    T = np.zeros((1,1));
    T[0,0] = params[0];
    Q = np.zeros((1,1));
    Q[0,0] = params[1]**2;
    
    alpha_old = np.zeros((1,1));
    P_old = np.zeros((1,1));
    obj = 0;
    t=0;
    
    while(t < len(y)):    
        alpha_int = T @ alpha_old;
        P_int = T @ P_old @ T.T + Q;
        v = y[t] - Z @ alpha_int;
        F = Z @ P_int @ Z.T + H;

        alpha_new = alpha_int + P_int @ Z.T @ np.linalg.inv(F) @ v
        P_new = P_int - P_int @ Z.T @ np.linalg.inv(F) @ Z @ P_int;

        P_old = P_new;
        alpha_old = alpha_new;
        obj -= (-(1/2) * log(2*pi) - (1/2)*log(np.linalg.det(F)) - (1/2)*(v.T @ np.linalg.inv(F) @ v))
        t = t+1;
    return obj;

params_init = np.array([0.1, 0.1])
bds = ((None,None),(0, None))
u = scipy.optimize.minimize(loglik, params_init, method = "TNC", tol=1e-8)
print(u.x)
[0.60955509 0.19680586]

AR(2) Process

Now, let's do the same exercise, but this time assuming that the process for $x_t$ is governed by

\begin{equation} x_t = \rho_1 x_{t-1} + \rho_2 x_{t-2} + \epsilon_t \end{equation} Then, we can write this process as follows:

$$ \left[\begin{array}{c}x_{t} \\ x_{t-1}\end{array}\right]= \left[\begin{array}{cc}\rho_1 & \rho_2 \\\ 1 & 0\end{array}\right] \left[\begin{array}{c}x_{t-1}\\ x_{t-2}\end{array}\right] + \left[\begin{array}{c}\epsilon_{t}\\ 0\end{array}\right] \\ x_t = \left[\begin{array}{cc}1 & 0 \end{array}\right]\left[\begin{array}{c}x_{t} \\ x_{t-1}\end{array}\right] $$

So, in this model, the matrices we defined above are: $$ T = \left[\begin{array}{cc}\rho_1& \rho_2 \\ 1& 0 \end{array} \right] \\ Z = \left[\begin{array}{cc}1 & 0\end{array}\right] \\ Q = \left[\begin{array}{cc}\sigma^2& 0 \\ 0 & 0 \end{array}\right] \\ H = 0\\ $$

Suppose we set $\rho_1 = 0.6$, $\rho_2 = -0.2$, and $\sigma = 0.2$, and estimate these parameters by maximum likelihood after we apply Kalman filter to the original $x_t$ process that we generate by using the parameter values we set. Here is how we do it.

In [2]:
# import numpy as np
from numpy import *
import matplotlib.pyplot as plt
import scipy
from scipy import optimize

# AR(2) process

numT = 1000;

# Initialization
y = np.zeros((numT,))
y[0] = scipy.randn();
y[1] = scipy.randn();

    
## True parameters:
rho_1 = 0.6;
rho_2 = -0.2;
sigma = 0.2;


# We generate the observed values here.
for i in range(numT-2):
    y[i+2] = rho_1*y[i+1] + rho_2*y[i]+ sigma*scipy.randn();

Q = np.zeros((2,2));
Z = np.zeros((1, 2));
Z[0,0] = 1;
T = np.zeros((2,2));
T[1,0] = 1;
H = np.zeros((1,1));

# Loglikelihood function        
def loglik(params):
    T[0,:] = params[0:2];
    Q[0,0] = params[2]**2;
    alpha_old = np.zeros((2,1));
    P_old = np.zeros((2,2));
    obj = 0;
    t=0;
    while(t < len(y)):    
        alpha_int = T @ alpha_old;
        P_int = T @ P_old @ T.T + Q;
        v = y[t] - Z @ alpha_int;
        F = Z @ P_int @ Z.T + H;

        alpha_new = alpha_int + P_int @ Z.T @ np.linalg.inv(F) @ v
        P_new = P_int - P_int @ Z.T @ np.linalg.inv(F) @ Z @ P_int;

        P_old = P_new;
        alpha_old = alpha_new;
        obj -= (-(1/2) * log(2*pi) - (1/2)*log(np.linalg.det(F)) - (1/2)*(v.T @ np.linalg.inv(F) @ v))
        t = t+1;
    return obj;

params_init = np.array([0.1,0.1,0.1])
bds = ((None,None),(None,None),(0, None))
u = scipy.optimize.minimize(loglik, params_init, method = "TNC",bounds=bds, tol=1e-8)
print(u.x)
[ 0.6311042  -0.25174543  0.21006471]

MA(1) Process

Assume the process for $x_t$ is governed by

\begin{equation} x_t = \epsilon_{t} + \theta \epsilon_{t-1} \end{equation}

We can write the state equation as

$$ \left[\begin{array}{c}\epsilon_{t} \\ \epsilon_{t-1} \end{array}\right] = \left[\begin{array}{cc}0& 0 \\ 1& 0 \end{array} \right]\left[\begin{array}{c}\epsilon_{t-1} \\ \epsilon_{t-2} \end{array}\right] + \left[\begin{array}{c}\epsilon_{t} \\ 0 \end{array}\right] $$ and the observation equation as

$$ x_t = \left[\begin{array}{cc}1 & \theta \end{array}\right] \left[\begin{array}{c}\epsilon_{t} \\ \epsilon_{t-1} \end{array}\right] $$

So, in this model, the matrices we defined above are: $$ T = \left[\begin{array}{cc}0& 0 \\ 1& 0 \end{array} \right] \\ Z = \left[\begin{array}{cc}1 & \theta \end{array}\right] \\ Q = \left[\begin{array}{cc}\sigma^2& 0 \\ 0 & 0 \end{array}\right] \\ H = 0\\ $$

Suppose we set $\theta = -0.6$ and $\sigma = 0.2$, and estimate these parameters by maximum likelihood after we apply Kalman filter to the original $x_t$ process that we generate by using the parameter values we set. Here is how we do it.

In [3]:
import numpy as np
from numpy import *
import matplotlib.pyplot as plt
import scipy
from scipy import optimize

# MA(1) process

q = 1;
r = q + 1;
numT = 1000;


# Initialization
y = np.zeros((numT,))
y[0] = scipy.randn();


    
## True parameters:

sigma = 0.2;
theta = -0.6;

eps = sigma*scipy.randn(numT,1);

# We generate the observed values here.

for i in range(numT-1):
    y[i+1] = eps[i+1] + theta*eps[i];

T = np.zeros((r,r));
T[1,0] = 1;

# Loglikelihood function        
def loglik(params):
    sigma = params[1];
    theta = params[0];
    Z = np.zeros((1,r));
    Z[0, 0] = 1;
    Z[0, 1] = theta;
    Q = np.zeros((r,r));
    Q[0,0] = sigma**2;
    H = np.zeros((1,1));
    alpha_old = np.zeros((r,1));
    P_old = np.zeros((r,r));
    obj = 0;
    t = 0;
    while(t < len(y)):    
        alpha_int = T @ alpha_old;
        P_int = T @ P_old @ T.T + Q;
        v = y[t] - Z @ alpha_int;
        F = Z @ P_int @ Z.T + H;

        alpha_new = alpha_int + P_int @ Z.T @ np.linalg.inv(F) @ v
        P_new = P_int - P_int @ Z.T @ np.linalg.inv(F) @ Z @ P_int;

        P_old = P_new;
        alpha_old = alpha_new;
        obj -= (-(1/2) * log(2*pi) - (1/2)*log(np.linalg.det(F)) - (1/2)*(v.T @ np.linalg.inv(F) @ v))
        t = t+1;
    return obj;

params_init = np.array([0.3,0.1])
bds = ((None,None),(0.00001, None))
u = scipy.optimize.minimize(loglik, params_init, method = "TNC",bounds=bds, tol=1e-8)
print(u.x)
[-0.61455049  0.19894218]

Random walk

Finally, assume the process for xt is governed by $$ x_t = \mu_t + \epsilon_t, \;\mu_t = \mu_{t-1} + \eta_t $$ Now, we can write this system as \begin{equation} \alpha_t = \alpha_{t-1} + \epsilon_{t} \\ x_t = \alpha_t \end{equation} So, in this model, the matrices we defined above are: $$ T = [1] \\ Z = [1] \\ Q = [\sigma^2] \\ H = [0]\\ $$

Suppose we set $\sigma = 0.2$, and estimate these parameters by maximum likelihood after we apply Kalman filter to the original $x_t$ process that we generate by using the parameter values we set. Here is how we do it.

In [4]:
import numpy as np
from numpy import *
import matplotlib.pyplot as plt
import scipy
from scipy import optimize


r = 1;

numT = 1000;


# Initialization
y = np.zeros((numT,))
y[0] = scipy.randn();


    
## True parameters:

sigma = 0.2;
eps = sigma*scipy.randn(numT,1);

# We generate the observed values here.
for i in range(numT-1):
    y[i+1] = y[i] + eps[i];

    
Z = np.zeros((1,1));
Z[0, 0] = 1;

T = np.zeros((1,1));
T[0,0] = 1;

H = np.zeros((1,1));


# Loglikelihood function        
def loglik(params):
    sigma = params[0];
    Q = np.zeros((1,1));
    Q[0,0] = sigma**2;
    alpha_old = np.zeros((1,1));
    P_old = np.zeros((1,1));
    obj = 0;
    t = 0;
    while(t < len(y)):    
        alpha_int = T @ alpha_old;
        P_int = T @ P_old @ T.T + Q;
        v = y[t] - Z @ alpha_int;
        F = Z @ P_int @ Z.T + H;

        alpha_new = alpha_int + P_int @ Z.T @ np.linalg.inv(F) @ v
        P_new = P_int - P_int @ Z.T @ np.linalg.inv(F) @ Z @ P_int;

        P_old = P_new;
        alpha_old = alpha_new;
        obj -= (-(1/2) * log(2*pi) - (1/2)*log(np.linalg.det(F)) - (1/2)*(v.T @ np.linalg.inv(F) @ v))
        t = t+1;
    return obj;

params_init = np.array([0.3])
u = scipy.optimize.minimize(loglik, params_init, method = "TNC", tol=1e-8)
print(u.x)
[0.19600354]