4. Economies as Dynamical Systems

4.1. Reduced-form representation

Is there a common theme underlying the different models that we will study, models that spanned so many decades and modelling philosophies?

Fortunately, the answer is yes.

The similarity is in the “final” mathematical or statistical form, i.e., the “reduced-form” representation of a model’s dynamics.

What may differ within the generic form are the different theories’ implied magnitudes and directions of causal relationships.

Even if the models vary in their underlying assumptions or economic-political philosophies, at the end of the day, after we have solved our agents’ decision problems and their interactions through markets, our model’s (dynamic) equilibrium can be represented by a recursive or Markovian state-space form.

This vital recursive mapping will suffice to describe the evolution of the model economy. End users can then use their preferred model and its resulting recursive system to do things like, fitting the model to the data and testing the model empirically; or performing counterfactual policy analyses and simulations.

4.2. Recursive maps by example

Suppose for now, we let a macroeconomy be represented by a simple probability model.

Warning

Note that there is no explicit economic theory behind this model (yet)! The (implicit) workflow here, as was the case of old-fashioned policy modelling, is

  • Write down a statistical model
  • Obtain values for unconstrained parameters. (Hold this thought for now; we’ll come to this point later).
  • Then weave an economic/policy narrative around the statistical relationship.

So, for the moment, we are undertaking a modelling exercise like that of the policy modellers from the 1950s-1960s. A reincarnation of such policy modelling practice still survives today as statistical Vector Auto Regression models.

To make sure we all start from the same page, consider an linear probability model,

\[Y_{t} = \beta X_{t} + \gamma \varepsilon_{t}, \qquad \varepsilon_{t} \sim \varphi\]

where

  • \(\beta\) and \(\gamma\) are given parameters;
  • \(Y_{t}\) is some endogenous variable, say, real GDP (let’s call this the state of the economy);
  • \(X_{t}\) is some variable that is observable—to both the allegorical agent in the model economy and to the statistician/modeller—at data location \(t\). Suppose the index variable \(t\) keeps track of natural time. Assume time is countable: \(t \in \mathbb{N} := \{0, 1, 2, ...\}\); and
  • \(\varepsilon_{t}\) is an exogenous shifter to the state of the economy which is governed by a given probability distribution \(\varphi\).

This linear probability model should be familiar to the reader with a first-course in statistics or econometrics training.

What if we make the content of \(X_{t}\), explicit?

Example 1

\(X_{t} := Y_{t-1}\).

Plug this into the previous linear probability model to get

\[Y_{t} = \beta Y_{t-1} + \gamma \varepsilon_{t}, \qquad \varepsilon_{t} \sim \varphi\]

This is a scalar first-order linear stochastic difference equation (LSDE). Let’s keep working with the reduced-form model of an economy for now. Consider:

Example 2

\(X_{t} := Y_{t-1}\) and \(\varepsilon_{t} = 1\) for all \(t\).

Plug this into the previous linear probability model to get

\[Y_{t} = \beta Y_{t-1} + \gamma,\]

As long as \(\beta \neq 1\), a particular solution to this deterministic difference equation exists, and takes the form of a constant solution:

\[Y_{t} = Y = \frac{\gamma}{1-\beta},\]

for all \(t \in \mathbb{N}\).

We often call this solution \(Y\) a stationary point, or, a steady state.

Exercise

Show that the general solution to this difference equation is

\[Y_{t} = \left( Y_{0} - Y\right)\beta^{t} + Y.\]
  1. Explain in words what this function says.
  2. Pick some numerical values for the parameters \((\beta,\gamma)\). Write a Python code and plot the general solution as a graph in \((t, Y_{t})\)-space.
  1. Show what happens to your graph if \(| \beta | > 1\)?
  2. Show what happens to your graph if \(| \beta | < 1\)?

4.3. Linear stochastic difference equation systems

Consider now the generalization of the scalar (univariate) example above. We often call these linear stochastic difference equations (LSDE):

Note

Many empirical and policy models with have (approximately) solutions of this form. In this course, we will begin with models where their individual solution is either an exact, or an approximate, linear recursive self-map. As in all the toy examples above.

What if we have a model with arbitrary dependency of its current state on an arbitrarily long record of its past? Not a problem. We will see that in general, we can re-define the problem to make it Markovian or recursive again. The trick will be in expanding the notion of the model’s state space and re-defining appropriate “dummy” or auxiliary state variables.

4.4. Markov Chains

It turns out that the previous probability models or specific representations of stochastic processes are examples of a class of stochastic processes called Markov processes.

A stochastic process \(\{\varepsilon_{t}\}\) has the Markov property if for all \(k\geq 1\) and all \(t\),

\[\Pr \left( \left. \varepsilon_{t+1}\right\vert \varepsilon_{t},\varepsilon_{t-1},...,\varepsilon_{t-k}\right) =\Pr \left( \left. \varepsilon_{t+1}\right\vert \varepsilon_{t}\right) .\]

In words, this says that, when the present is known, the future and the past are conditionally independent. Informally, we say:

\[\Pr \left( \left. \text{Future} \ \right\vert \text{Present and Past}\right) =\Pr \left( \left. \text{Future} \ \right\vert \text{Present}\right)\]

or

\[\Pr \left( \left. \text{Future and Past} \ \right\vert \text{Present}\right) =\Pr \left( \left. \text{Future} \ \right\vert \text{Present}\right) \Pr \left( \left. \text{Past} \ \right\vert \text{Present}\right).\]

A very special and practical class of Markov processes is called Markov Chains. In many applications, researchers approximate a continous Markov process, e.g., the AR(1) model above, as a finite-state-space Markov chain for computational tractability. Here we will focus on time-homogeneous, finite-state-space Markov Chains.

4.4.1. Time-homogeneous and finite-state Markov chains

We note a few important results for the finite-state, time-homogeneous Markov chain and the continuous state Markov chain or Markov Process.

A Markov chain is probably the simplest stochastic process one can utilize to model sequences of random variables. We will focus on time-homogeneous finite-state Markov chains here. Recall the following definition.

Let’s suppose the number of states for \(\varepsilon_{t}\) is retricted to a finite constant \(n \in \mathbb{N}\). So the states of \(\varepsilon_{t}\) can only take on finite points indexed by \(1,...,n\).

A time invariant (homogeneous) Markov chain is defined by

  1. The finite state space for the Markov chain \(S =\{s_1,...,s_n\}\).

  2. An \(n\times n\) transition matrix \(P\) that contains the probabilities of moving from one state to another in one period. So an entry in this matrix is

    \[P_{ij}=\Pr \left( \left. \varepsilon_{t+1}=s_{j}\right\vert \varepsilon_{t}=s_{i}\right)\]

    for \(i,j=1,...,n\).

  3. A \(1 \times n\) vector \(\lambda _{0}\) – containing the initial distribution across states – whose \(i\)th element is the probability of being in state \(i\) at time \(0\):

    \[\lambda _{0i}=\Pr \left( \varepsilon_{0}=s_{i}\right) .\]

For this definition to be valid, matrix \(P\) and vector \(\lambda\) must always satisfy the following:

  1. \(P_{ij}\geq 0\),

  2. The rows of \(P\) are probability distributions and so must sum to \(1\):

    \[\begin{split}\begin{aligned} \sum_{j=1}^{n}P_{ij}= & 1 \\ = & \Pr \bigg{\{}\left( \left. \varepsilon_{1}=s_{1}\right\vert \varepsilon_{0}=s_{i}\right) \cup\left( \left. \varepsilon_{1}=s_{2}\right\vert \varepsilon_{0}=s_{i}\right)\\ & \qquad\qquad \cdots \cup \left( \left. \varepsilon_{1}=s_{n}\right\vert \varepsilon_{0}=s_{i}\right)\bigg{\}},\end{aligned}\end{split}\]
  3. \(\sum_{i=1}^{n}\lambda _{0i}=1\).

A matrix \(P\) satisfying \(\sum_{j=1}^{n}P_{ij}=1\) is called a stochastic matrix or Markov matrix, or transition probability matrix.

4.4.2. Conditional probabilities

Recall, for a MC \(\left( P,\lambda \right)\) the probability of moving from state \(i\) to state \(j\) in one period is the element \(P_{ij}\) of the stochastic matrix \(P\):

\[P_{ij}=\Pr \left( \left. \varepsilon_{t+1}=s_{j}\right\vert \varepsilon_{t}=s_{i}\right) .\]

We can show that

\[\Pr \left( \left. \varepsilon_{t+k}=s_{j}\right\vert \varepsilon_{t}=s_{i}\right) =P_{ij}^{\left( k\right) }.\]

where \(P_{ij}^{\left( k\right) }\) is the \(i,j\) element of the matrix \(P^{k}=\underset{k\text{ times}}{\underbrace{PPP\cdots P}}\).

4.4.3. Unconditional probability distributions

Now we can recursively find the unconditional probability distributions of \(\varepsilon_{t}\) for each period \(t.\) Start at \(t=0\) with given initial \(1\times n\) unconditional distribution \(\lambda _{0}\)

\[\lambda _{1}=\lambda_{0}P\]

so we have

\[\begin{split}\begin{aligned} \lambda _{2} & = &\lambda_{1}P=\lambda_{0}P^{2} \\ \vdots & \\ \lambda_{t} & = &\lambda _{0} P^{t}\end{aligned}\end{split}\]

where the \(i\)th element of \(\lambda_{t}\) is \(\Pr\left( \varepsilon_{t}=s_{i}\right)\). Or we have

\[\lambda_{t+1} = \lambda _{t} P.\]

4.4.4. Stationary distributions

In this section we want to show that if \(P\) is a stochastic matrix, then we are guaranteed at least one stationary distribution \(\lambda^{\ast}\) that satisfies the set of equations \(\lambda^{\ast} = \lambda^{\ast}P\). The statement is easily proved by appealing to Brouwer’s fixed point theorem. But first we state some preliminary results (which may be obvious to some) that will help build toward proving the existence of an invariant distribution \(\lambda^{\ast}\).

If \(P\) is a stochastic matrix defining the mapping \(P: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}\), then \(P\) is \(d_{1}\)-nonexpansive on \(\mathbb{R}^{n}\), i.e. \(\|\lambda P \|_{1} \leq \| \lambda \|_{1}\) for any \(\lambda \in \mathbb{R}^{n}\).

Note that \(\sum_{j=1}^{n}P_{ij} = 1\). Since

\[\begin{split}\lambda P = \left( \begin{array}{c} \sum_{i=1}^{n} \lambda_{i}P_{i1} \\ \vdots \\ \sum_{i=1}^{n} \lambda_{i}P_{in} \end{array} \right)',\end{split}\]

then

\[\|\lambda P \|_{1} = \sum_{j=1}^{n} \left| \sum_{i=1}^{n} \lambda_{i}P_{ij} \right| \leq \sum_{j=1}^{n} \left| \lambda_{i} \right| = \| \lambda \|_{1}.\]

If the stochastic matrix \(P\) defining an operator \(P: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}\) is \(d_{1}\)-nonexpansive on \(\mathbb{R}^{n}\), then \(P\) is continuous at \(\lambda \in \mathbb{R}^{n}\).

Pick two points \(\lambda, \mu \in \mathbb{R}^{n}\). Fix an \(\epsilon > 0\). Let \(\| \lambda P -\mu P \|_{1} < \epsilon\). Since \(P: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}\) is a \(d_{1}\)-contraction, then we can always find some \(\delta >0\) such that

\[\| \lambda P -\mu P \|_{1} \leq \| \lambda - \mu \|_{1} < \delta.\]

(For example, \(\delta \geq \epsilon > 0\).) Therefore, \(P\) is continuous.

Let the set of probability distributions on \(S = \{s_{1},...,s_{n}\}\) be given by \(\mathcal{P}(S) =\{ \lambda \in \mathbb{R}^{n} | \sum_{i=1}^{n}\lambda_{i} = 1 \}\). That is \(\mathcal{P}(S)\) is just the unit-simplex in \(\mathbb{R}^{n}\). Now, if \(P\) is a stochastic matrix, then it is also true that the image of \(\mathcal{P}(S)\), i.e. \(P[\mathcal{P}(S)] \subset \mathcal{P}(S)\). In other words, \(P\) restricted on the set of probability distributions \(\mathcal{P}(S)\) maps into \(\mathcal{P}(S)\) again.

Given a stochastic matrix \(P\), there is always at least one stationary distribution \(\lambda^{\ast}\) such that \(\lambda^{\ast} = \lambda^{\ast}P\), \(\lambda_{i}^{\ast} \geq 0\) and \(\sum_{i=1}^n \lambda^{\ast}_i = 1\).

We apply Brouwer’s fixed point theorem. We have previously shown that \(P\) is continuous on all \(\mathbb{R}^{n}\). Therefore it is also continuous on \(\mathcal{P}(S) \subset \mathbb{R}^{n}\). Next, \(\mathcal{P}(S)\) is a compact and convex subset of \(\mathbb{R}^{n}\). (Why?) Then by Brouwer’s fixed point theorem there exists at least one fixed point of the mapping \(P: \mathcal{P}(S) \rightarrow \mathcal{P}(S)\).

4.4.5. Asymptotic stability

The previous theorem only gives existence of a stationary distribution. When can we say that the Markov chain eventually forgets its starting point – i.e., when does a unique invariant distribution \(\lambda^{\ast} = \lim_{t \rightarrow \infty}\lambda_0 P^{t}\) exist? Before we state the condition, we recall a useful result:

\((\mathbb{R}^{n},\| \cdot \|_{1})\) is a complete metric space. Also, a closed subset of a complete metric space is also complete.

We leave the proof as an exercise for the motivated reader. Since \(\mathcal{P}(S) \subset \mathbb{R}^{n}\) is closed, \(( \mathcal{P}(S),\| \cdot \|_{1})\) is also complete. We will use this in the next main result on asymptotic stability.

If \(P_{ij} > 0\) for all \(i,j = 1,...,n\), then there exists a unique invariant distribution \(\lambda^{\ast} = \lim_{t \rightarrow \infty}\lambda_0 P^{t}\) satisfying \(\lambda^{\ast} = \lambda^{\ast}P\), regardless of \(\lambda_{0}\).

Let \(\mathcal{P}(S)\) be the set of all probability distributions on state space \(S\). The Markov matrix defines an operator \(P: \mathcal{P}(S) \rightarrow \mathcal{P}(S)\) via \(\lambda_{t+1} = \lambda_{t}P\). Define the metric on vector space or norm \(d_{1}(x,y) = || x-y ||_1 = \sum_{i=1}^{n} |x_i -y_i|\) for \(x,y \in \mathcal{P}(S)\). Claim: \(P\) is a contraction of modulus \(1-\delta\) on the complete metric space \((\mathcal{P}(S),|| \cdot ||_1)\), where \(\delta = \sum_{j=1}^n \min_{i} P_{ij} > 0\). Proof: We leave this as an exercise for the motivated reader. Therefore for any initial \(\lambda_0 \in \mathcal{P}(S)\), \(\lambda^{\ast} = \lim_{t \rightarrow \infty}\lambda_0 P^{t}\) is unique.

The last theorem can in fact have a weaker assumption.

If \(P_{ij}^{(\tau)} > 0\) for all \(i,j = 1,...,n\), then there exists a unique invariant distribution \(\lambda^{\ast} = \lim_{t \rightarrow \infty}\lambda_0 P^{t}\) satisfying \(\lambda^{\ast} = \lambda^{\ast}P\).

Then all we require is that the operator \(P: \mathcal{P}(S) \rightarrow \mathcal{P}(S)\) is a \(\tau\)-stage contraction mapping.

Sometimes we want to look at functions of the underlying Markov chain. We state a law of large numbers for Markov chains.

Let \(h: S \rightarrow \mathbb{R}\). If \(\{ \varepsilon_t \}\) is a Markov chain \((P,\lambda_0)\) on the finite set \(S = \{s_1,...,s_n \}\) such that it is asymptotically stable with stationary distribution \(\lambda^{\ast}\), then as \(T \rightarrow \infty\),

\[\frac{1}{T} \sum_{t=0}^{T} h(\varepsilon_{t}) \rightarrow \sum_{j=1}^{n} h(s_j) \lambda^{\ast} (s_j)\]

with probability one.

Intuitively, this says that the sample average of length \(T+1\) of the stochastic process \(\{h(\varepsilon_{t}) \}\) converges (with probability one) to the “true” expected value with respect to the invariant distribution \(\lambda^{\ast}\), as the sample size gets larger. That is, the probability of realizing a sample path that does not satisfy this convergence is zero. So we can have a good approximation of the stationary distribution with the sample distribution of size \(T+1\) when \(T\) is “very large”.

4.5. Postscript

Now take a look back to what we’ve done here; then look ahead to the rest of this course. Overall, the models we’ll be studying will generally have solutions to the respective decision-making or equilibrium concept in the form of a recursive function:

\[x_{t+1} = F(x_{t},w_{t})\]

where \((x,w) \mapsto F(x,w)\) is some general (possibly nonlinear) recursive map, \(x_{t}\) represents the state vector of the model economy, and, \(w_{t}\) is some forcing process exogenous to the model system.

We can then take this resulting recursive map to simulate and study the dynamic behavior of the model economy.