\documentclass{article}
\usepackage{ifthen, psfig, epsfig, graphicx}
\begin{document}
\newcounter{OldSection}
\newcounter{ParCount}
\newcommand{\para}{
\vspace{.4cm}
\ifthenelse { \value{OldSection} < \value{section} }
{ \setcounter{OldSection}{ \value{section} }
\setcounter{ParCount}{ 0 } }
{}
\stepcounter{ParCount}
\noindent
\bf \arabic{section}.\arabic{ParCount}. \rm \hspace{.2cm}
}
\Large \begin{center}
Stochastic Calculus Notes, Lecture 2 \\
\normalsize
Last modified \today
\end{center} \normalsize
\section{Forward and Backward Equations for Markov chains}
\para Introduction:
Forward and backward equations are useful ways to get answers to quantitative
questions about Markov chains.
The probabilities $u(k,t) = P(X(t) = k)$
satisfy forward equations that allows us to compute all the numbers
$u(k,t+1)$ once the all the numbers $u(j,t)$ are known.
This moves us forward from time $t$ to time $t+1$.
The expected values $f(k,t) = E[V(X(T)) \mid X(t) = k]$ (for $t < T$) satisfy
a backward equation that allows us to calculate the numbers $f(k,t)$ once
all the $f(j,t+1)$ are known.
A {\em duality} relation allows us to infer the
forward equation from the backward equation, or conversely.
The transition matrix is the {\em generator} of both equations, though
in different ways.
There are many related problems that have solutions involving forward and
backward equations.
Two treated here are hitting probabilities and random compound interest.
\para Forward equation, functional version:
Let $u(k,t) = P(X(t) = k)$.
The law of total probability gives
\begin{eqnarray*}
u(k,t+1) & = & P(X(t+1) = k) \\
& = & \sum_j P(X(t+1) = k \mid X(t) = j) \cdot P(X(t) = j ) \; .
\end{eqnarray*}
Therefore
\begin{equation}
u(k,t+1) = \sum_j P_{jk} u(j,t) \; .
\label{ffe} \end{equation}
This is the {\em forward equation} for probabilities.
It is also called the Kolmogorov forward equation or the Chapman Kolmogorov
equation.
Once $u(j,t)$ is known for all $j \in \cal S$, (\ref{ffe}) gives
$u(k,t+1)$ for any $k$.
Thus, we can go forward in time from $t=0$ to
$t=1$, etc.\ and calculate all the numbers $u(k,t)$.
Note that if we just wanted one number, say $u(17,49)$, still we would
have to calculate many related quantities, all the $u(j,t)$ for
$t < 49$.
If the state space is too large, this direct forward equation approach may
be impractical.
\para Row and column vectors:
If $A$ is an $n \times m$ matrix, and $B$ is an $m \times p$ matrix, then $AB$
is $n \times p$.
The matrices are compatible for multiplication because the second dimension
of $A$, the number of columns, matches the first dimension of $B$, the number
of rows.
A matrix with just one column is a
{\em column vector}.\footnote{The physicists' more sophisticated idea that
a vector is a physical quantity with certain transformation properties is
``inoperative" here.}
Just one row makes it a {\em row vector}.
Matrix-vector multiplication is a special case of matrix-matrix multiplication.
We often denote genuine matrices (more than one row and column) with capital
letters and vectors, row or column, with lower case.
In particular, if $u$ is an $n$ dimensional row vector, a $1 \times n$
matrix, and $A$ is an $n \times n$ matrix, then $uA$ is another $n$
dimensional row vector.
We do not write $Au$ for this because that would be incompatible.
Matrix multiplication is always associative.
For example, if $u$ is
a row vector and $A$ and $B$ are square matrices, then
$(uA)B = u(AB)$. We can compute the row vector $uA$ then multiply by $B$,
or we can compute the $n \times n$ matrix $AB$ then multiply by $u$.
If $u$ is a row vector, we usually denote the $k$-th entry by $u_k$
instead of $u_{1k}$. Similarly, the $k$-th entry of column vector
$f$ is $f_k$ instead of $f_{k1}$.
If both $u$ and $f$ have $n$ components, then $uf= \sum_{k=1}^nu_kf_k$ is a
$1 \times 1$ matrix, i.e.\ a number.
Thus, treating row and column vectors as special kinds of matrices makes
the product of a row with a column vector natural, but not, for example,
the product of two column vectors.
\para Forward equation, matrix version:
The probabilities $u(k,t)$ form the components of a row vector, $u(t)$,
with components $u_k(t) = u(k,t)$ (an abuse of notation).
The forward equation (\ref{ffe}) may be expressed (check this)
\begin{equation}
u(t+1) = u(t) P \; .
\label{mfe} \end{equation}
Because matrix multiplication is associative, we have
\begin{equation}
u(t) = u(t-1)P = u(t-2)P^2 = \cdots = u(0) P^t \; .
\label{P^tfe} \end{equation}
Tricks of matrix multiplication give information about the evolution of
probabilities.
For example, we can write a formula for $u(t)$ in terms of the eigenvectors
and eigenvalues of $P$.
Also, we can save effort in computing $u(t)$ for large $t$ by
repeated squaring:
$$
P \rightarrow P^2 \rightarrow \left(P^2\right)^2 = P^4 \rightarrow \cdots
\rightarrow P^{2^k}
$$
using just $k$ matrix multiplications. For example, this computes
$P^{1024}$ using just ten matrix multiplies, instead of a thousand.
\para Backward equation, functional version:
Suppose we run the Markov chain until time $T$ then get a ``reward'',
$V(X(T))$.
For $t\leq T$, define the conditional expectations
\begin{equation}
f(k,t) = E\left[ V(X(T))\mid X(t) = k \right] \; .
\label{fDef} \end{equation}
This expression is used so often it often is abbreviated
$$
f(k,t) = E_{k,t}[V(X(T))] \; .
$$
These satisfy a {\em backward equation} that follows from the law of
total probability:
\begin{eqnarray}
f(k,t) & = & E\left[ V(X(T))\mid X(t) = k \right] \nonumber \\
& = & \sum_{j \in {\cal S}}
E\left[ V(X(T)) \mid X(t) = k \mbox{ and } X(t+1) = j \right]
\cdot P(X(t+1) = j \mid X(t) = k) \nonumber \\
f(k,t) & = & \sum_{j \in {\cal S}} f(j,t+1) P_{kj} \; .
\label{fbe} \end{eqnarray}
The Markov property is used to infer that
$$
E[V(X(T)) \mid X(t) = k \mbox{ and }X(t+1) = j]
= E_{j,t+1}[V(X(T))] \;.
$$
The dynamics (\ref{fbe}) must be supplemented with the {\em final condition}
\begin{equation}
f(k,T) = V(k) \; .
\label{beFinal} \end{equation}
Using these, we may compute all the numbers $f(k,T-1)$, then all the numbers
$f(k,T-2)$, etc.
\para Backward equation using modern conditional expectation:
As usual, ${\cal F}_t$ denotes the $\sigma-$algebra generated by
$X(0)$, $\ldots$, $X(t)$.
Define $F(t) = E[V(X(T))\mid {\cal F}_t]$.
The left side is a random variable that is measurable in ${\cal F}_t$,
which means that $F(t)$ is a function of $(X(0),\ldots,X(t))$.
The Markov property implies that $F(t)$ actually is measurable with respect
to ${\cal G}_t$, the $\sigma-$algebra generated by $X(t)$ alone.
This means that $F(t)$ is a function of $X(t)$ alone, which is to say that
there is a function $f(k,t)$ so that $F(t) = f(X(t),t)$, and
$$
f(X(t),t) = E[V(X(T)) \mid {\cal F}_t] = E[V(X(T))\mid {\cal G}_t] \; .
$$
Since ${\cal G}_t$ is generated by the partition
$\left\{k\right\} = \left\{X(t) = k \right\}$, this is the same definition
(\ref{fDef}).
Moreover, because ${\cal F}_t \subseteq {\cal F}_{t+1}$ and
$F(t+1) = E[V(X(T)) \mid {\cal F}_{t+1}]$, the tower property gives
$$
E[V(X(T)) \mid {\cal F}_t] = E[F(t+1)\mid {\cal F}_t ] \; ,
$$
so that, again using the Markov property,
\begin{equation}
F(t) = E[F(t+1) \mid {\cal G}_t ] \; .
\label{modBe} \end{equation}
Note that this is a version of the tower property.
On the event $\left\{X(t) = k\right\}$, the right side above takes the value
$$
\sum_{j \in \cal S} f(j,t+1)\cdot P(x(t+1)=j \mid X(t) = k) \; .
$$
Thus, (\ref{modBe}) is the same as the backward equation (\ref{fbe}).
In the continuous time versions to come, (\ref{modBe}) will be very handy.
\para Backward equation, matrix version:
We organize the numbers $f(k,t)$ into a column vector
$f(t) = (f(1,t), f(2,t), \cdots)^t$.
It is barely an abuse to write $f(t)$ both for a function of $k$ and a vector.
After all, any computer programmer knows that a vector really is a function
of the index.
The backward equation (\ref{fbe}) then is equivalent to (check this)
\begin{equation}
f(t) = Pf(t+1) \; .
\label{mbe} \end{equation}
Again the associativity of matrix multiplication lets us write, for example,
$$
f(t) = P^{T-t}V \; ,
$$
writing $V$ for the vector of values of $V$.
\para Invariant expectation value:
We combine the conditional expectations (\ref{fDef}) with the probabilities
$u(k,t)$ with the law of total probability to get, for any $t$,
\begin{eqnarray*}
E[V(X(T))] & = & \sum_{k \in \cal S} P(X(t) = k)\cdot E[V(X(T)) \mid X(t) = k]\\
& = & \sum_{k \in \cal S} u(k,t) f(k,t) \\
& = & u(t) f(t) \; .
\end{eqnarray*}
The last line is a natural example of an inner product between a row
vector and a column vector.
Note that the product $E[V(X(T))] = u(t) f(t)$ does not depend on
$t$ even though $u(t)$ and $f(t)$ are different for different $t$.
For this {\em invariance} to be possible, the forward evolution equation for
$u$ and the backward equation for $f$ must be related.
\para Relationship between the forward and backward equations:
It often is possible to derive the backward equation from the forward
equation and conversely using the invariance of $u(t)f(t)$.
For example, suppose we know that $f(t) = Pf(t+1)$.
Then $u(t+1)f(t+1)=u(t)f(t)$ may be rewritten $u(t+1)f(t+1)=u(t)Pf(t+1)$,
which may be rearranged as (using rules of matrix multiplication)
$$
\left(\; u(t+1) - u(t) P \; \right) f(t+1) = 0 \; .
$$
If this is true for enough linearly independent vectors $f(t+1)$, then
the vector $u(t+1) - u(t)P$ must be zero, which is the matrix version of
the forward equation (\ref{mfe}).
A theoretically minded reader can verify that enough $f$
vectors are produced if the transition matrix is nonsingular and we choose
a linearly independent family of ``reward'' vectors, $V$. In the same
way, the backward evolution of $f$ is a consequence of invariance and
the forward evolution of $u$.
We now have two ways to evaluate $E[V(X(T))]$: (i) start with given $u(0)$,
compute $u(T) = u(0)P^T$, evaluate $u(T)V$, or (ii) start with given
$V = f(T)$, compute $f(0) = P^TV$, then evaluate $u(0)f(0)$.
The former might be preferable, for example, if we had a number a number of
different reward functions to evaluate.
We could compute $u(T)$ once then evalute $u(T)V$ for all our $V$ vectors.
\para Duality:
In it's simplest form, {\em duality} is the relationship between a
matrix and its transpose.
The set of column vectors with $n$ components is a vector space of dimension
$n$.
The set of $n$ component row vectors is the {\em dual space}, which has
the same dimension but may be considered to be a different space.
We can combine an element of a vector space with an element of its dual to
get a number: row vector $u$ multiplied by column vector $f$ yields
the number $uf$.
Any linear transformation on the vector space of column vectors is
represented by an $n \times n$ matrix, $P$.
This matrix also defines a linear transformation, the {\em dual transformation},
on the dual space of row vectors, given by $u \rightarrow uP$.
This is the sense in which the forward and backward equations are dual
to each other.
Some people prefer not to use row vectors and instead think of organizing the
probabilities $u(k,t)$ into a column vector that is the transpose of
what we called $u(t)$. For them, the forward equation would be written
$u(t+1) = P^t u(t)$ (note the notational problem: the $t$ in $P^t$ means
``transpose'' while the $t$ in $u(t)$ and $f(t)$ refers to time.). The
invariance relation for them would be $u^t(t+1)f(t+1) = u^t(t)f(t)$.
The transpose of a matrix is often called its {\em dual}.
\para Hitting probabilities, backwards:
The hitting probability for state 1 up to time $T$ is
\begin{equation}
P\left( X(t) = 1 \mbox{ for some $t \in [0,T]$}\right) \; .
\label{HP} \end{equation}
Here and below we write $[a,b]$ for all the {\em integers} between $a$ and
$b$, including $a$ and/or $b$ if they are integers.
Hitting probabilities can be computed using forward or backward equations,
often by modifying $P$ and adding {\em boundary conditions}.
For one backward equation approach, define
\begin{equation}
f(k,t) = P\left( X(t^{\prime})=1
\mbox{ for some $t^{\prime}\in[t,T]$}\mid X(t) = k \right)
\; .
\label{fDefHit} \end{equation}
Clearly,
\begin{equation}
f(1,t) = 1 \mbox{ for all $t$,}
\label{bbc} \end{equation}
and
\begin{equation}
f(k,T) = 0 \mbox{ for $k \neq 1$.}
\label{finalCondHit} \end{equation}
Moreover, if $k\neq 1$, the law of total probabilities yields a backward
relation
\begin{equation}
f(k,t) = \sum_{j\in \cal S}P_{kj} f(j,t+1) \; .
\label{beHit} \end{equation}
The difference between this and the plain backward equation (\ref{fbe})
is that the relation (\ref{beHit}) holds only for {\em interior} states
$k \neq 1$, while the boundary condition (\ref{bbc}) supplies the values
of $f(1,t)$. The sum on the right of (\ref{beHit}) includes
the term corresponding to state $j=1$.
\para Hitting probabilities, forward:
We also can compute the hitting probabilities (\ref{HP}) using a forward
equation approach.
Define the {\em survival probabilities}
\begin{equation}
u(k,t) = P\left( X(t)=k
\mbox{ and } X(t^{\prime}) \neq 1
\mbox{ for $\displaystyle t^{\prime} \in [0,t] $}\right) \; .
\label{sp} \end{equation}
These satisfy the obvious {\em boundary condition}
\begin{equation}
u(1,t) = 0 \; ,
\label{fbc} \end{equation}
and initial condition
\begin{equation}
u(k,0) = 1 \mbox{ for $k \neq 1$.}
\label{ficHit} \end{equation}
The forward equation is (as the reader should check)
\begin{equation}
u(k,t+1) = \sum_{j \in \cal S} u(j,t) P_{jk} \; .
\label{feHit} \end{equation}
We may include or exclude the term with $j=1$ on the right because $u(1,t) = 0$.
Of course, (\ref{feHit}) applies only at interior states $k\neq 1$.
The overall probability of survival up to time $T$ is
$\sum_{k \in \cal S}u(k,T)$ and the hitting probability is the complementary
$1-\sum_{k \in \cal S} u(k,T)$.
The matrix vector formulation of this involves the row vector
$$
\widetilde{u}(t) = (u(2,t), u(3,t), \ldots)
$$
and the matrix
$\widetilde{P}$ formed from $P$ by removing the first row and column.
The evolution equation (\ref{feHit}) and boundary condition (\ref{fbc}) are
both expressed by the matrix equation
$$
\widetilde{u}(t+1) = \widetilde{u}(t)\widetilde{P} \; .
$$
Note that $\widetilde{P}$ is not a stochastic matrix because some of the row
sums are less than one:
$$
\sum_{j \neq 1} P_{kj} < 1 \;\;\;\mbox{ if } \;\;\;P_{k1}>0\; .
$$
\para Absorbing boundaries:
{\em Absorbing boundaries} are another way to think about hitting and
survival probabilities.
The absorbing boundary Markov chain is the same as the original chain
(same transition probabilities) as long as the state is not one of the
boundary states. In the absorbing chain, the state never again changes
after it visits an absorbing boundary point.
If $\overline{P}$ is the transition
matrix of the absorbing chain and $P$ is the original transition matrix,
this means that $\overline{P}_{jk} = P_{jk}$ if $j$ is not a boundary
state, while $\overline{P}_{jk} = 0$ if $j$ is a boundary state and $k\neq j$.
The probabilities $u(k,t)$ for the absorbing chain are the same as the
survival probabilities (\ref{sp}) for the original chain.
\para Running cost:
Suppose we have a {\em running cost} functtion, $W(x)$, and we want
to calculate
\begin{equation}
f = E\left[ \sum_{t=0}^T W(X(t))\right] \; .
\label{Running} \end{equation}
Sums like this are called {\em path dependent} because their value depends
on the whole path, not just the final value $X(T)$.
We can calculate (\ref{Running}) with the forward equation using
\begin{eqnarray}
f & = & \sum_{t=0}^T E\left[W(X(t))\right] \nonumber \\
& = & \sum_{t=0}^T u(t) W \; .
\label{fEqRunning} \end{eqnarray}
Here $W$ is the column vector with components $W_k = W(k)$.
We compute the probabilities that are the components of the $u(t)$ using
the standard forward equation (\ref{mfe}) and sum the products
(\ref{fEqRunning}).
One backward equation approach uses the quantities
\begin{equation}
f(k,t) = E_{k,t} \left[\sum_{t^{\prime}=t}^T W(X(t^{\prime}))\right] \; .
\label{fDefRunning} \end{equation}
These satisfy (check this):
\begin{equation}
f(t) = Pf(t+1) + W \; .
\label{bEqRunning} \end{equation}
Starting with $f(T) = W$, we work backwards with (\ref{bEqRunning}) until
we reach the desired $f(0)$.
\para Multiplicitive functionals:
For some reason, a function of a function is often called a {\em functional}.
The path, $X(t)$, is a function of $t$, so a function, $F(X)$, that depends
on the whole path is often called a functional. Some applications call for
finding the expected value of a multiplicative functional:
\begin{equation}
f = E\left[ \prod_{t=0}^T V(X(t))\right] \; .
\label{FK} \end{equation}
For example, $X(t)$ could represent the state of a financial market and
$V(k) = 1 + r(k)$ the interest rate for state $k$. Then (\ref{FK}) would
be the expected total interest.
We also can write $V(k) = e^{W(k)}$, so that
$$
\prod V(X(t)) = \exp\left[\sum W(X(t)) \right] = e^Z \; ,
$$
with $Z = \sum W(x(t))$.
This dos not solve the problem of evaluating (\ref{FK}) because
$E\left[ e^z\right] \neq e^{E(Z)}$.
The backward equation approach uses the intermediate quantities
$$
f(k,t) = E_{k,t} \left[ \prod_{t^{\prime}=t}^T V(X(t^{\prime}))\right] \; .
$$
The $t^{\prime} = t$ term in the product has $V(X(t)) = V(k)$.
The final condition is $f(k,T) = V(k)$.
The backward evolution equation is derived more or less as before:
\begin{eqnarray}
f(k,t) & = & E_{k,t} \left[ V(k) \prod_{t^{\prime} > t} V(X(t^{\prime}))\right]
\nonumber \\
& = & V(k) E_{k,t}\left[ \prod_{t^{\prime}=t+1}^T V(X(t^{\prime}))\right]
\nonumber \\
& = & V(k) E_{k,t} \left[ f(X(t+1),t+1)\right] \;\;\; \mbox{(the tower property)}
\nonumber \\
f(k,t) & = & V(k) \bigl( P f(t+1)\bigr) (k) \; .
\label{bEqMult}
\end{eqnarray}
In the last line on the right, $f(t+1)$ is teh column vector with components
$f(k,t+1)$ and $Pf(t+1)$ is teh matrix vector product.
We write $\displaystyle \bigl(Pf(t+1)\bigr)(k)$ for the $k^{\mbox{\em th}}$
component of the column vector $Pf(t+1)$.
We could express the whole thing in matrix terms using $\mbox{diag}(V)$,
the diagonal matrix with $V(k)$ in the $(k,k)$ position:
$$
f(t) = \mbox{diag}(V)Pf(t+1) \; .
$$
A version of (\ref{bEqMult}) for Brownian motion is called the Feynman-Kac
formula.
\para Branching processes:
One forward equation approach to (\ref{FK}) leads to a different interpretation
of the answer.
Let $B(k,t)$ be the event $\left\{X(t) = k \right\}$ and $I(k,t)$ the
indicator function of $B(k,t)$.
That is $I(k,t,X) = 1$ if $X \in B(k,t)$ (i.e.\ $X(t) = k$), and
$I(k,t,X) = 0$ otherwise.
It is in keeping with the probabilists' habbit of leaving out the
arguents of functions when the argument is the underlying random outcome.
We have $u(k,t) = E[I(k,t)]$.
The forward equation for the quantities
\begin{equation}
g(k,t) = E\left[ I(k,t) \prod_{t^{\prime}=0}^t V(X(t^{\prime})) \right]
\label{eNum} \end{equation}
is (see homework):
\begin{equation}
g(k,t) = V(k) \bigl(g(t-1)P\bigr)(k)\; .
\label{fEqMult} \end{equation}
This is also the forward equation for a {\em branching process} with branching
factors $V(k)$. At time $t$, the branching process has $N(k,t)$
{\em particles}, or {\em walkers}, at state $k$. The numbers $N(k,t)$ are
random. A time step of the branching process has two parts.
First, each particle takes one step of the Markov chain.
A particle at state $j$ goes to state $k$ with probability $P_{jk}$.
All steps for all particles are independent.
Then, each particle at state $k$ does a {\em branching} or {\em birth/death}
step in which the particle is replaced by a random number of particles
with expected number $V(k)$. For example, if $V(k) = 1/2$, we could
delete the particle (death) with probability half. If $V(k) = 2.8$, we
could keep the existing particle, one new one, then add a third with
probability $.8$. All particles are treated independently.
If there are $m$ particles in state $k$ before the birth/death step,
the expected number after the birth/death step is $V(k) m$.
The expected number of particles, $g(k,t) = E[N(k,t)]$, satisfies
(\ref{fEqMult}).
When $V(k) = 1$ for all $k$ there need be no birth or death.
There will be just one particle, the path $X(t)$.
The number of particles at state $k$ at time $t$, $N(k,t)$, will be
zero if $X(t) \neq k$ or one if $X(t) = k$. In fact, $N(k,t) = I(k,t)(X)$.
The expected values will be $g(k,t) = E[N(k,t)] = E[I(k,t)] = u(k,t)$.
The branching process representation of (\ref{FK}) is possible when
$V(k) \geq 0$ for all $k$.
Monte Carlo methods based on branching processes are more
accurate than direct Monte Carlo in many cases.
\section{Lattices, trees, and random walk}
\para Introduction:
Random walk on a lattice is an important example where the abstract
theory of Markov chains is used.
It is the simplest model of something
randomly moving through space with none of the subtlty of Brownian motion,
though random walk on a lattice is a useful approximation to Brownian
motion, and vice versa.
The forward and backward equations take a specific simple form for
lattice random walk and it is often possible to calculate or approximate
the solutions by hand.
Boundary conditions will be applied at the boundaries of lattices, hence
the name.
We pursue forward and backward equations for several reasons.
First, they often are the best way to calculate expectations
and hitting probabilities.
Second, many theoretical qualitative properties of specific Markov chains
are understood using backward or forward equatins.
Third, they help explain and motivate the partial differential equations
that arise as backward and forward equations for diffusion processes.
\para Simple random walk:
The state space for simple random walk is the integers, positive and
negative. At each time, the walker has three choices: (A) move up
one, (B) do not move, (C) move down one.
The probabilities are
$P(A) = P(k \rightarrow k+1) = a$, $P(B) = P(X(t+1) = X(t)) = b$, and
$P(X(t+1) = X(t)-1) = c$.
Naturally, we need $a$, $b$, and $c$ to be
non-negative and $a+b+c = 1$.
The transation
matrix\footnote{This ``matrix'' is infinite when the state space is infinite.
Matrix multiplication is still defined. For example, the $k$ component of
$uP$ is given by $(uP)_k = \sum_j u_j P_{jk}$.
This possibly infinite sum has only three nonzero terms when $P$
is tridiagonal.} has $b$ on the diagonal
($P_{kk} = b$ for all $k$), $a$ on the {\em super-diagonal}
($P_{k,k+1} = a$ for all $k$), and $c$ on the {\em sub diagonal}. All
other matrix elements $P_{jk}$ are zero.
This Markov chain is {\em homogeneous} or {\em translation invariant}:
The probalities of moving up or down are independent of $X(t)$.
A {\em translation} by $k$ is a shift of everything by $k$ (I do not know
why this is called ``translation''). Translation invariance means,
for example, that the probability of going from $m$ to $l$ in $s$ steps
is the same as the probability of going from $m+k$ to $l+k$ in $s$ steps:
$P(X(t+s) = l \mid X(t) = m) = P(X(t+s) = l+k \mid X(t) = m+k)$.
It is common to simplify general discussions by choosing $k$ so that
$X(0) = 0$. Mathematicians often say ``without loss of generality''
or ``w.l.o.g.'' when doing so.
Often, particularly when discussing multidimensional random walk, we use
$x$, $y$, etc.\ instead of $j$, $k$, etc.\ to denote lattice points
(states of the Markov chain). Probabilists often use lower case Latin
letters for general possible values of a random variable, while using the
capital letter for the random variable itself. Thus, we might write
$P_{xy} = P(X(t+1) = x \mid X(t) = y)$. As an execise in definition
unwrapping, review Lecture 1 and check that this is the same as
$P_{X(t),x} = P(X(t+1) = x \mid {\cal F}_t)$.
\para Gaussian approximation, drift, and volatility:
We can write $X(t+1) = X(t) + Y(t)$, where $P(Y(t) = 1) = a$,
$P(Y(t) = 0) = b$, and $P(Y(t) = -1) = c$. The random variables
$Y(t)$ are independent of each other because of the Markov property
and homogeniety. Assuming (without loss of generality) that $X(0)=0$,
we have
\begin{equation}
X(t) = \sum_{s=0}^{t-1} Y(s) \; ,
\label{XtSum} \end{equation}
which expresses $X(t)$ as a sum of {\em iid} (independent and identically
distributed) random variables. The central limit theorem then tells us
that for large $t$, $X(t)$ is approximately Gaussian with mean
$\mu t$ and variance $\sigma^2 t$, where $\mu = E[Y(t)] = a - b$
and $\sigma^2 = \mbox{var}[Y(t)] = a+c-(a-c)^2$. These are called
{\em drift} and {\em volatility}\footnote{People use the term
{\em volatility} in two distinct ways. In the Black Scholes theory,
volatility means something else.} respectively. The mean and variance
of $X(t)$ grow linearly in time with rate $\mu$ and $\sigma^2$
respectively. Figure 1 shows some probability distributions for
simple random walk.
\begin{figure}
\begin{center}
\includegraphics[height=4.0in]{l2Figures/srw.eps}
\caption{The probability distributions after $T=8$ (top) and $T=60$ (bottom)
steps for simple random walk. The smooth curve and circles represent the
central limit theorem Gaussian approximation. The plots have different
probability and $k$ scales. Values not shown have very small probability.
\label{SRWfig}}
\end{center}
\end{figure}
\para Trees:
Simple random walk can be thought of as a sequence of decisions.
At each time you decide: up($A$), stay($B$), or down($C$). A more
general sequence of decisions is a {\em decision tree}. In a general
decision tree, making choice $A$ at time $0$ then $B$ at time one would
have a different result than choosing first $B$ then $A$. After $t$
decisions, there could be $3^t$ different decision paths and results.
The simple random walk decision tree is {\em recombining}, which means that
many different decision paths lead to the same $X(t)$ For example,
start (w.l.o.g) with $X(0) = 0$, the paths $ABB$, $CAA$, $BBA$, etc.\
all lead to $X(3)=1$.
A recombining tree is much smaller than a general decision tree.
For simple random walk, after $t$ steps there are $2t+1$
possible states, instead of up to $3^t$. For $t=10$, this is
$21$ instead of about $60$ thousand.
\para Urn models:
Urn models illustrate several features of more general random walks.
Unlike simple random walk, urn models are {\em mean reverting} and have
{\em steady state probabilities} that determine their large time behavior.
We will come back to them when we discuss {\em scaling} in future lectures.
The simple urn contains $n$ balls that are identical except for their
color. There are $k$ red balls and $n-k$ green ones. At each state,
someone chooses one of the balls at random with each ball equally likely
to be chosen. He or she replaces the chosen ball with a fresh ball that
is red with probability $p$ and green with probability $1-p$. All choices
are independent. The number of red balls decreases by one if he or she
removes a red ball and returns a green one. This happens with probabilty
$(k/n)\cdot(1-p)$. Similarly, the $k \rightarrow k+1$ probability is
$((n-k)/n)\cdot p$. In formal terms, the state space is the integers
from $0$ to $n$ and the transition probabilities are
$$
P_{k,k-1} = \frac{k(1-p)}{n} \; , \;\;\;
P_{kk} = \frac{(2p-1)k + (p-1)n}{n} \; , \;\;\;
P_{k,k+1} = \frac{(n-k)p}{n} \; ,
$$
$$
P_{jk} = 0 \mbox{ otherwise.}
$$
If these formulas are right, then $P_{k,k-1} + P_{kk} + P_{k,k+1} = 1$.
\para Urn model steady state:
For the simple urn model, the probabilities $u(k,t) = P(X(t)=k)$
converge to steady state probabilities, $v(k)$, as $t \rightarrow \infty$.
This is illustrated in Figure (\ref{UrnProbfig}). The steady state
probabilities are
$$
v(k) = {n \choose k} p^k (1-p)^{n-k} \;.
$$
\begin{figure}
\begin{center}
\includegraphics[height=4.0in]{l2Figures/UrnProb.eps}
\caption{The probability distributions for the simple urn model
plotted every $T$ time steps. The first curve is blue, low, and flat.
The last one is red and most peaked in the center. The computation starts
with each state being equally likely. Over time, states near the edges
become less likely.}
\label{UrnProbfig}
\end{center}
\end{figure}
The steady state probabilities have the property that if $u(k,t) = v(k)$
for all $k$, then $u(k,t+1) = v(k)$ also for all $k$. This is
{\em statistical steady state} because the probabilities have reached
steady state values though the states themselves keep changing, as
in Figure (\ref{UrnPathsfig}).
\begin{figure}
\begin{center}
\includegraphics[height=4.0in]{l2Figures/UrnPaths.eps}
\caption{A Monte-Carlo sampling of 11 paths from the simple urn model.
At time $t=0$ (the left edge), the paths are evenly spaced within the state
space.}
\label{UrnPathsfig}
\end{center}
\end{figure}
In matrix vector notation, we can form the row vector, $v$, with entries
$v(k)$. Then $v$ is a statistical steady state if $vP=v$. It is no
coincedence that $v(k)$ is the probability of getting $k$ red balls
in $n$ independent trials with probability $p$ for each trial. The
steady state expected number of red balls is
$$
E_v[X] = np \; ,
$$
where the notation $E_v[]$ refers to expectation in probability distribution
$v$.
\para Urn model mean reversion:
If we let $m(t)$ be the expected value if $X(t)$, then a calculation
using the transition probabilities gives the relation
\begin{equation}
m(t+1) = m(t) + \frac{1}{n} \left( np - m(t) \right) \; .
\label{UrnMean1} \end{equation}
This relation shows not only that $m(t) = np$ is a steady state value
($m(t) = np$ implies $m(t+1) = np$), but also that $m(t) \rightarrow np$
as $t \rightarrow \infty$ (if $r(t) = m(t) - np$, then
$r(t+1) = \alpha r(t)$ with
$\left| \alpha \right|= \left| 1-\frac{1}{n} \right|<1$).
Another way of expression mean reversion will be useful in discussing
stochastic differential equations later. Because the urn Model is a
Markov chain,
$$
E\left[X(t+1) \mid {\cal F}_t \right] = E\left[X(t+1) \mid X(t) \right]
$$
Again using the transition probabilities, we get
\begin{equation}
E\left[X(t+1)\mid {\cal F}_t\right] = X(t)
+ \frac{1}{n} \left( np - X(t) \right) \; .
\label{UrnMean2} \end{equation}
If $X(t) > np$, we have
$$
E[\Delta X(t)] = E[X(t+1) - X(t)] - \frac{1}{n} \left( np - X(t) \right) \; ,
$$
is negative. If $X(t) < np$, it is positive.
\para Boundaries:
The terms {\em boundary}, {\em interior}, {\em region}, etc.\ as used
in the general discussion of Markov chain hitting probabilities come
from applications in lattice Markov chains such as simple random walk.
For example, the region $x > \beta$ has boundary $x=\beta$.
The quantities
$$
u(x,t) = P(X(t) = x \mbox{ and } X(s) > \beta \mbox{ for } 0 \leq s \leq t)
$$
satisfy the forward equation (just (\ref{ffe}) in this special case)
$$
u(x,t+1) = a u(x-1,t) + bu(x,t) + c u(x+1,t)
$$
for $x>\beta$ together with the {\em absorbing boundary condition}
$u(\beta,t) = 0$. We could create a finite state space Markov chain by
considering a region $\beta0$, is the set
of points $hx = (hx_1,\ldots,hx_d)$, where $x$ are integer lattice points.
In the present discussion, the scaling is irrelevent, so we use the
unit lattice.
We say that latice points $x$ and $y$ are {\em neighbors} if
$$
\left| x_j - y_j \right| \leq 1 \mbox{ for all coordinates }j = 1, \ldots,d \;.
$$
\end{document}