(This section assumes familiarity with basic probability theory using mathematicians' terminology. References on this include the probability books by G. C. Rota, W. Feller, Hoel and Stone, and B. V. Gnedenko.)
Many discrete time discrete state space stochastic models are Markov chains.
Such a Markov chain is characterized by its state space, , and its
transition matrix, P. We use the following notations:
and
The first is because the p(x,y) are probabilities, the second because the state x must go somewhere, possibly back to x. It is not true that
The Markov property is that knowledge of the state at time t is all the information about the present and past relevant to predicting the future. That is:
no matter what extra history information ( ,
) we have.
This may be thought of as a lack of long term memory. It may also be
thought of as a completeness property of the model: the state space is rich
enough to characterize the state of the system at time t completely.
The evolution equation for the probabilities u(x,t) is found using conditional probability:
To express this in matrix form, we suppose that the state space, ,
is finite, and that the states have been numbered
,
,
.
The transition matrix, P, is
and has (i,j) entry
. We sometimes conflate i with
and
write
; until you start programming the computer, there is
no need to order the states. With this convention,
(
) can be interpreted
as vector-matrix multiplication if we define a row vector
with components
,
where we have written
for
. As long as ordering is
unimportant, we could also write
. Now,
(
) can be rewritten
Since is a row vector, the expression
does
not make sense because the dimensions of the matrices are incompatible
for matrix multiplication. The
convention of using a row vector for the probabilities and therefore putting
the vector in the left of the matrix is common in applied probability.
The relation (
) can be used repeatedly to yield
where means P to the power t, not the transpose of P.