Bayesian inference in dynamic models  an overview
The following algorithms all try to infer the hidden state of a dynamic
model from measurements. The input is a dynamic model and a measurement
sequence and the output is an approximate posterior distribution over the
hidden state at one or many times. Only discretetime models are discussed
here.
Inferring only the most recent hidden state is known as filtering;
inferring past states is known as smoothing.
Most filtering methods are online, which means they
process each measurement exactly once, after which it can be discarded.
This allows them to operate with a fixed amount of memory.
The opposite of online is offline or batch.
There are standard ways to turn an online filtering algorithm into a
batch filtering or smoothing algorithm.
Therefore, the overview is divided into two parts: online filtering and
batch filtering/smoothing.
Some of these algorithms are general algorithms for approximate Bayesian
inference and others are specialized for dynamic models. With the
description of each algorithm is a partial list of references. I've
included more references for algorithms which are less
wellknown.
Some related pages on the web:
Online filtering algorithms
The algorithms are grouped according to how they represent the posterior
distribution over the hidden state (their belief).
Gaussian belief
The following algorithms use a multivariate Gaussian for their belief. In
fact, most of them are more general than thisthey could use any
exponential family as their belief.
 Kalman filter
 The Kalman filter only applies to models with Gaussian noise, linear
state equations, and linear measurement equations, i.e.
s_t = A s_(t1) + noise
x_t = C s_t + noise
For these models the state posterior really is Gaussian, and the Kalman
filter is exact.
 Extended Kalman filter
 The Extended Kalman filter applies to models with Gaussian noise.
The state and measurement equations are linearized by a Taylor expansion
about the current state estimate. The noise variance in the equations is
not changed, i.e. the additional error due to linearization is not modeled.
After linearization, the Kalman filter is applied.
 "Stochastic models, estimation and control", Peter S. Maybeck,
Volume 2, Chapter 12, 1982.
 "A linear approximation method for probabilistic inference",
Ross Shachter, UAI'1990.
 Bottleneck filter
 This algorithm applies to any type of measurement equation.
The measurement equation is rewritten in terms of an intermediate
bottleneck variable b_t, such that p(x_tb_t) is
simple while p(b_ts_t) may be complicated. At each time step,
the Gaussian belief on s_t is converted into a Gaussian belief
on b_t (usually involving approximations), b_t is
updated exactly from x_t (since p(x_tb_t) is
simple), and the new Gaussian belief on b_t is converted back to
a Gaussian belief on s_t.
("Bottleneck" is my own terminology. In the West paper below, they
used Gamma distributions.)
 "Dynamic Generalized Linear Models and Bayesian Forecasting,"
M. West, P. J. Harrison, & H. S. Migon, J Am Stat Assoc 80:7397, 1985.
 Linearupdate filter
a.k.a. linearregression filter or "statistical linearization" filter

This algorithm applies to any type of measurement equation.
The measurement equation is converted into a linearGaussian equation
by regressing the observation onto the state.
The result is a Kalman filter whose Kalman gain is
cov(state,measurement)/var(measurement).
Note that the regression is done without reference to the actual measurement.
I call it "linearupdate" because the update to the state is always a linear
function of the measurement.
A variety of approximations have been proposed when
cov(state,measurement)
is not available analytically. The unscented filter,
central difference filter, and divided difference filter are
filters of this type.
 "Stochastic models, estimation and control", Peter S. Maybeck,
Volume 2, Chapter 12, 1982.
 "Kalman Filters for nonlinear systems: a comparison of performance",
Tine Lefebvre, Herman Bruyninckx, Joris De Schutter.
 "The Unscented Kalman Filter for Nonlinear Estimation",
Eric A. Wan and Rudolph van der Merwe, 2000.
 Assumeddensity filter a.k.a. moment matching
 This algorithm chooses the Gaussian belief which is "closest", in some
sense, to the exact state posterior given previous beliefs.
Usually this amounts to matching the moments of the exact posterior.
This is the most general approximation technique proposed to dateit
utilizes not only the form of the measurement equation but also the
measurement itself. The assumeddensity filter requires analytic or
numerical solution of integrals over the measurement distribution.
For this, one could use Monte Carlo, quadrature, or Laplace's method.

"Expectation Propagation for approximate Bayesian inference",
T. Minka, Uncertainty in AI'2001.
 "Stochastic models, estimation and control", Peter S. Maybeck,
Volume 2, Chapter 12.7, 1982.

"A nonlinear filtering algorithm based on an approximation of the conditional distribution",
H. J. Kushner and A. S. Budhiraja, IEEE Trans Automatic Control
45(3):580585, 2000.

"Gaussian filters for nonlinear filtering problems",
K. Ito and K. Q. Xiong, IEEE Trans Automatic Control 45(5): 910927, 2000.

"Approximate Learning in Complex Dynamic Bayesian Networks",
Settimi, Smith, & Gargoum, UAI'1999.

"A comparison of sequential learning methods for incomplete data",
R. G. Cowell, A. P. Dawid, & P. Sebastiani, Bayesian Statistics 5, 1996.

"A Bayesian Approach to Online Learning",
Manfred Opper, OnLine Learning in Neural Networks, 1999.
Mixture of Gaussian belief
A natural choice in moving beyond a single Gaussian is to use a mixture of
Gaussian belief. Unfortunately, an algorithm for general dynamic
models has proven elusive. Instead, existing algorithms assume that the
dynamic model is a mixture of linearGaussian models, i.e. it switches
randomly between different linearGaussian state/measurement equations.
This can be understood as having discrete state variables in addition to
the continuous ones.
For these models, the true state posterior is a mixture of Gaussians, but a
very big one. The algorithms focus on reducing the size of this mixture,
in an online way.
Most of them generalize beyond Gaussian to any exponential family.
 Assumeddensity filter
a.k.a. "pseudoBayes"
 Same as assumeddensity filtering for a single Gaussian, but now the
belief representation is categorical for the discrete state component and
conditional Gaussian for the continuous state component, producing a
mixture of Gaussian marginal for the continuous state component. For each
setting of the discrete state component, this algorithm chooses the
Gaussian which is "closest" to the exact state posterior given previous
beliefs. A drawback of this algorithm is that the size of the mixture is
predetermined by the cardinality of the discrete state component. However,
it does allow the state/measurement equations, conditional on the discrete
state, to be nonGaussian.
 Gaussiansum filter
 This is a family of algorithms which propagate the exact posterior for
one step, giving a large Gaussian mixture, and then reduce the mixture
as needed. Methods for reducing the mixture include:
 Drop mixture components with low weight

 Sample mixture components according to weight
 Used in RaoBlackwellised particle filters:
 Repeatedly merge most similar pair of mixture components

 Minimize divergence between the large and small mixture
by nonlinear optimization

Nonparametric belief
 Histogram filter
 Quantize the state space and represent the belief by a histogram.
The filtering equations then match a hidden Markov model.

"A general filter for measurements with any probability distribution",
Y. Rosenberg and M. Werman, CVPR'97, 654659.

"NonGaussian statespace modeling of nonstationary time series",
G. Kitagawa, J Am Stat Assoc 82:10321063, 1987.

"Recursive Bayesian estimation using piecewise constant approximations",
Kramer and Sorenson, Automatica 24(6):789801, 1988.
 Particle filter

Represent the state posterior by a large set of samples drawn from the
distribution. The particles are updated online to always represent the
current state posterior.
The most common way to do this is to pick a particle for the previous
state, sample a particle for the current state using the state equation,
and weight the particle by the probability of the measurement.
Sample the particles according to weight to get a set of unweighted
particles.
 Particle filter with interpolation

A particle filter where you increase diversity by fitting a density to
the particles and resampling from that density.
 Particle filter with MCMC steps

A particle filter where you increase diversity by including MCMC steps.

"Following a moving targetBayesian inference for dynamic Bayesian models"
W. Gilks and C. Berzuini, J Royal Stat Soc B 63(1):127146, 2001.

"Particle filters for state estimation of jump Markov linear systems"

"Sequential importance sampling for nonparametric Bayes models:
The next generation",
S. N. MacEachern, M. Clyde, J. S. Liu,
Canadian J of Statistics 27(2):251267, 1999.
 MCMC filtering
 MCMC can be specially designed to provide efficient, boundedmemory
filtering, for example via randomized Gibbs sampling.
Batch filtering/smoothing algorithms
 Kalman smoothing
 Used with the Kalman filter, or any filter which linearizes the
state equations, e.g. EKF, UF, LUF, ADF.
 Expectation Propagation
 Provides batch filtering and smoothing.
Can be used with any method for linearizing the measurement equations,
e.g. EKF, UF, LUF, ADF. Unlike Kalman smoothing, the measurement equations
are relinearized until a globallystable solution is reached, giving
better results.
 Variational lower bounds
 Provides batch filtering and smoothing. Meshes well with parameter
estimation.
 Twofilter smoothing
 Run filtering forward and independently in reverse and combine the results.
Useful for Gaussiansum filters.

"The twofilter formula for smoothing and an implementation of the
Gaussiansum smoother",
G. Kitagawa, Annals Institute of Statistical Mathematics 46(4):605623, 1994
 Particle smoothing by sampling
 Reweight and resample the particles at time t,
based on a sampled particle from time t+1.

"Monte Carlo smoothing with application to audio signal
enhancement",
W. Fong, S. Godsill, A. Doucet, & M. West,
IEEE Trans. on Signal Processing, to appear, 2001.

"Monte Carlo filter and smoother for nonGaussian nonlinear state
space models", G. Kitagawa, J. of Computational and Graphical
Statistics 5:125, 1996
 Particle smoothing by interpolation
 Reweight and resample the particles
at time t, based on a fitted density to the particles at time
t+1.
 MCMC
 Gibbs sampling or MetropolisHastings.
 Markov Random Field algorithms
 Relaxation, etc.
Last modified: Thu Mar 22 14:48:18 GMT 2007