7. Dynamic Programming

Previously, we used the special-case optimal growth model (with Cobb-Douglas production technology and log preference representation) to heuristically motivate two approaches to solving its infinite-horizon program of the benevolent social planner. As an aside, we also asked you to characterize that economy’s decentralized competitive equilibrium, in which allocations are sustained through markets with the mysterious Walrasian pricing system.

Coming up next, we’ll deal with the theory of dynamic programming—the nuts and bolts behind our heuristic example from the previous chapter.

7.1. Infinite Horizon Program

Let’s look at the infinite-horizon deterministic decision problem more formally this time. Our time domain is \(\mathbb{N} = \{0,1,...\}\). At each period \(t \in \mathbb{N}\), the set of possible locations of the state of the system is \(X \subset \mathbb{R}^n\). Let \(\{x_t\} : \mathbb{N} \rightarrow X^{\mathbb{N}}\) denote the sequence of state vectors. Suppose our evolution of the state is summarized by a particular transition law that takes a current state-action pair and maps it to some next-period state, i.e. \((x_{t},u_{t}) \mapsto f(x_{t},u_{t})\). We often write this controllable Markov process as:

\begin{equation} x_{t+1} = f(x_t,u_t), \label{state transition 1} \end{equation}

with the initial position of the state \(x_0\) given.

Let the action set for each \(t\) be \(A \subset \mathbb{R}^k\). At each current state, \(x_t\), the set of feasible actions is \(\Gamma(x_t)\) for each \(x_t \in X\), where \(\Gamma: X \rightarrow P(A)\). [1]

Footnotes

[1]Recall objects like \(\Gamma: X \rightarrow P(A)\) is called a multifunction, a set valued function, or a correspondence, and \(P(A)\) is the set of all subsets of \(A\).

A decision maker standing in an arbitrary period \(0\), picks a sequence of control(s) \(\{u_t\}_{t \in \mathbb{N}}\) to obtain the maximal discounted sum of its payoffs over time, \(U(u_t,x_t)\). We write the optimal value of this problem as the value function.

\begin{align} \text{(P1)} \qquad v(x_0) = & \sup_{\{u_t \in \Gamma(x_t)\}_{t=0}^{\infty}} \sum_{t=0}^{\infty} \beta^t U(u_t,x_t) \label{Criterion P1}\\ & \text{s.t.} \notag \\ & \qquad x_{t+1} = f(x_t,u_t) \label{State transition P1} \\ & \qquad x_0 \ \text{given}. \label{Initial state P1} \end{align}

Note that since the decision problem is Markov, it means all we need to predict the future paths of the state of the system is the current state \(x_t\), and the sequence of controls \(u_t\). Therefore the value of the optimal problem is only a function of the current state \(v(x_t)\). In economics, we also call this the indirect (total discounted) utility.

Note

Often we assume a more specific problem:

\begin{align*} \text{(P1')} \qquad v(x_0) = & \max_{\{u_t \in \Gamma(x_t)\}_{t=0}^{\infty}} \sum_{t=0}^{\infty} \beta^t U(u_t,x_t) \\ & \text{s.t.} \notag \\ & \qquad x_{t+1} = f(x_t,u_t) \label{State transition P1b} \\ & \qquad x_0 \ \text{given}. \end{align*}

In most applications, \(U: A \times X \rightarrow \mathbb{R}\) is a bounded, continuously twice-differentiable function. If \(U\) is not a bounded function, then it suffices for \(X\) and \(A\) to be compact, so that \(X \times A\) is compact due to Tychonoff’s theorem for product topologies, and for \(U\) to be continuous on this product topology. So we will obtain a maximum in (P1).

The next major result we want to get to is the one that says the infinite sequence solution to (P1) can be found recursively as the solution to the “Bellman (functional) equation”, also known as the Principle of Optimality. This boils down to the following indirect steps. First, we need to be able to find a well-defined valued function \(v\) that yields the value of the infinite-sequence programming problem in (P1) starting from any initial state \(x_{0}\) which also satisfies the Principle of Optimality. The existence of such an indirect uility function \(v\) is taken care of in Section From value function to Bellman functionals.

Second, we want to know if this \(v\) is unique. Moreover, we want to be able to say if a solution in terms of an optimal strategy, say \(\sigma^{\ast}\), exists, given \(v\). The groundwork for addressing these questions is done in Section [section: existence of optimal strategy].

Third, we may also wish to be able to characterize conditions under which an optimal strategy is unique. In many applications, especially empirical ones, the researcher would like to have a sharper prediction of the model in the data. Uniqueness of a solution will facilitate that, e.g., in the application of econometric methods. This property is often model-specific, so we will look at this by way of a familiar optimal-growth model (see Example). Finally in Section Computing optimal strategies, we will reinforce what we learnt theoretically, by applying the theoretical solution algorithm approximately on the computer.

7.1.1. The value function, histories and strategies

A neat way to think about (P1) is to use the construct of a history-dependent strategy. We’ll break (P1) down into its constituent parts, starting with the notion of histories, to the construct of strategies and payoff flows and thus stategy-dependent total payoffs, and finally to the idea of the value function.

First we list a few notations:

  • A \(t\)-history: \(h^t = \{x_0,u_0,...,x_{t-1},u_{t-1},x_t\}\).
  • Set of all possible \(t\)-histories: Let \(H_0 = X\). Then \(H_t = \{ h^t | t = 1, 2, ... \}\).
  • Period \(t\) state under history \(h^t\): \(x_t[h^t]\).
  • A strategy \(\sigma = \{ \sigma_t(h^t)\}_{t=0}^{\infty}\) is a plan of action specified as a function of the revealed history, such that for each \(t \in \mathbb{N}\), \(\sigma_t : H_t \rightarrow A\) and the actions are feasible for each history: \(\sigma_t(h^t) \in \Gamma(x_t[h^t])\).
  • Set of all strategies: \(\Sigma\).

We shall now see how strategies define unique state-action vectors and thus a unique sequence of payoffs. Suppose our decision maker fixes her plan of action at the sequence \(\sigma\). From an initial state \(x_0\) and for a given \(\sigma\), we can pin down a unique trajectory for the state in future periods. How? We do this by recursion. First, set \(x_0(\sigma,x_0) = x_0\), so \(h^0 = x_0\) and select \(u_0(\sigma,x_0) = \sigma_0(h^0(\sigma,x_0))\). This says that the time-\(0\) action is consistently selected from the \(0\)-th component of the strategy \(\sigma\), when starting out at state \(x_0\). Then we have

\[\begin{split}\begin{aligned} x_1(\sigma,x_0) =& f(x_0(\sigma,x_0),u_0(\sigma,x_0)) \\ h^1(\sigma,x_0) =& \{ x_0(\sigma),u_0(\sigma,x_0),x_1(\sigma,x_0)\} \end{aligned}\end{split}\]

Once the time-\(0\) state and action pin down the location of the state \(x_1 = x_1(\sigma,x_0)\), history \(h^1(\sigma,x_0)\) is collected. Given this history, at time \(1\), the decision maker again picks an action \(u_1 = u_1(\sigma,x_0)\) consistent with strategy \(\sigma\) starting out from \(x_0\):

\[\begin{split}\begin{aligned} u_1(\sigma,x_0) =& \sigma_1(h^1(\sigma,x_0)) \\ x_2(\sigma,x_0) =& f(x_1(\sigma,x_0),u_1(\sigma,x_0))\end{aligned}\end{split}\]

and so on. So in general for \(t \in \mathbb{N}\), we have the recursion under \(\sigma\) starting from \(x_0\) as

\[\begin{split}\begin{aligned} h^t(\sigma,x_0) =& \{ x_0(\sigma),u_0(\sigma,x_0),...,x_t(\sigma,x_0)\} \\ u_t(\sigma,x_0) =& \sigma_t(h^t(\sigma,x_0)) \\ x_{t+1}(\sigma,x_0) =& f(x_t(\sigma,x_0),u_t(\sigma,x_0))\end{aligned}\end{split}\]

So we can generate the infinite sequence of states and actions \(\{x_t(\sigma,x_0),u_t(\sigma,x_0)\}_{t \in \mathbb{N}}\). If we know these objects, we can also assign payoffs to them. More precisely, each strategy \(\sigma\), starting from initial state \(x_0\), generates a period-\(t\) return:

\[U_t(\sigma)(x_0) = U[ x_t(\sigma,x_0),u_t(\sigma,x_0)].\]

Define the total discounted returns from \(x_0\) under strategy \(\sigma\) as

\[W(\sigma)(x_0) = \sum_{t=0}^{\infty}\beta^t U_t(\sigma)(x_0).\]

By definition the value function is the maximal of such total discounted returns across all possible strategies:

\[v(x_0) = \sup_{\sigma \in \Sigma} W(\sigma)(x_0).\]

What this suggests is that we can construct all possible strategies from \(\Sigma\) and evaluate the discounted lifetime payoff of each strategy. (But again, there are uncountably many such infinite sequences of actions to consider!)

A strategy \(\sigma^{\ast}\) is said to be an optimal strategy, if it induces the maximal total discounted return beginning from any initial state \(x_{0}\). That is, \(\sigma^{\ast}\) is optimal if,

\[W(\sigma^{\ast})(x_0) = v(x_0).\]

for all initial state \(x_0\). However, this does not say that an optimal strategy will always exists. We will come back to this issue later, as the second part of a three-part problem. We need to first show that the problem (P1) can be solved for indirectly in terms of a well-defined value function \(v\). Another problem is that this value function may be unbounded so we may not be able to compare between strategies. For example, two different strategies may yield respective total discounted rewards of \(\infty\).

So we have two problems at hand. Let’s deal with the second one first (i.e. whether \(v\) is well-defined), and then, we will come back to the first regarding existence of a solution in terms of an optimal strategy, \(\sigma^{\ast}\).

A typical assumption to ensure that \(v\) is well-defined would be the following:

Assumption [\(U\) bounded]

There is a real number \(K < +\infty\) such that \(| U(u_t,x_t) | \leq K\) for all \((u_t,x_t) \in A \times X\).

Alternatively, we could assume that the product space \(A \times X\) is compact, and \(U\) is continuous on \(A \times X\). Then \(U\) will automatically be bounded. Either of these assumptions will ensure that we have a well-defined value function. Specifically we will have

Lemma

Assume that \(U\) is bounded. Then \(v: X \rightarrow \mathbb{R}\) is bounded.

For any \(x_0 \in X\) given, an upper bound on per period payoffs is \(K\) by assumption. So a bound of the total discounted reward is one with a strategy that delivers per period payoff \(\pm K\) each period. That is \(|W(\sigma)(x_0)| \leq K/(1-\beta)\). By definition, \(v(x_0) = \sup_{\sigma}W(\sigma)(x_0)\), so \(v(x_0)\) is also bounded.

The last Lemma thus allows us to assign finite numbers when ordering or ranking alternative strategies.

Lemma

Assume that \(U\) is bounded. Given any \(x_0 \in X\), and any \(\epsilon > 0\), there exists a (possibly \(\epsilon\)-dependent) strategy, \(\sigma := \sigma_{\epsilon}(x_{0})\) such that \(W(\sigma)(x_0) \geq v(x_0) - \epsilon\).

You can prove this as follows: Fix any \(x_0 \in X\) and \(\epsilon >0\). Suppose there does not exist any \(\sigma\) such that \(W(\sigma)(x_0) \geq v(x_0) - \epsilon\). Then it must be that \(W(\sigma)(x_0) < v(x_0) - \epsilon\), so that \(v(x_0) = \sup_{\sigma}W(\sigma)(x_0) < v(x_0) - \epsilon\). We have a contradiction.

This Lemma will be useful in the next section where we will prove the Bellman Principle of Optimality. Recall that is the first of the three parts involved in finding a solution to (P1).

7.1.2. From value function to Bellman functionals

Previously we concluded that we can construct all possible strategies from \(\Sigma\) and evaluate the discounted lifetime payoff of each strategy. An optimal strategy, \(\sigma^{\ast}\) is said to be one that yields maximal total discounted return \(v(x_0)\). At this stage, an optimal strategy, \(\sigma^{\ast}\), need not always exist. For now, though, even if it does exist, how do we evaluate all \(\sigma \in \Sigma\)?

First and foremost, we will need to prove one of the most important results — one that says we can have a recursive representation of the value function \(v: X \rightarrow \mathbb{R}\) for (P1). This recursive representation called the Bellman (functional) equation, intuitively, is like a machine (or operator) that maps a value function in the set of value functions into the set itself. The Bellman equation essentially says that one’s actions or decisions along the optimal path has to be dynamically consistent. That is, once we are on this path, there is no incentive to deviate from its prescription along any future decision nodes. Another way to say this is that if the planner originally were given the opportunity to reformulate her initial, date-\(0\) optimal plan (i.e., strategy) at a later date \(t>0\), the choices she makes contingent on the state at each future date will remain consistent with the original date-\(0\) plan.

Theorem [Bellman principle of optimality]

Let \(x\) denote the current state and \(x'\) the next-period state. For each \(x \in X\), the value function \(v: X \rightarrow \mathbb{R}\) of (P1) satisfies

(1)\[v(x) = \sup_{u \in \Gamma(x)} \{ U(x,u) + \beta v(x') \} \ \text{s.t.} \ x' = f(x,u)\]

To prove this statement, let \(W: X \rightarrow \mathbb{R}\) be defined by

\[W(x) = \sup_{u \in \Gamma(x)} \{ U(x,u) + \beta v(f(x,u)) \},\]

for any \(x \in X\). We will show that \(v(x) = W(x)\) for any \(x \in X\), in two steps.

Step 1.  Fix any \(x \in X\) and \(\epsilon >0\). Let \(u \in \Gamma(x)\) and \(x' = f(x,u)\). This says that actions are selected from the feasible set at any state \(x\), and the state obeys the transition law \(f\) under action \(u\). This ensures that we can pick a strategy \(\sigma\) that is feasible from \(x\). By Lemma [strategy exist], such a strategy always satisfies:

\[W(\sigma)(x') \geq v(x') - \epsilon.\]

So then

\[\begin{split}v(x) \geq & U(x,u) + \beta W(\sigma)(f(x,u)) \\ \geq & U(x,u) + \beta v(f(x,u)) - \beta\epsilon.\end{split}\]

Since this holds for all \(u \in \Gamma(x)\), and since \(\epsilon >0\) is arbitrary, then

\[\begin{split}v(x) \geq & \sup_{u \in \Gamma(x)} \{ U(x,u) + \beta v(f(x,u)) - \beta\epsilon\} \\ \geq & \sup_{u \in \Gamma(x)} \{ U(x,u) + \beta v(f(x,u)) \} = W(x).\end{split}\]

Step 2.  Fix any \(x \in X\) and \(\epsilon >0\). Let \(\sigma\) be a strategy such that

\[W(\sigma)(x) \geq v(x) - \epsilon.\]

Let the first component of \(\sigma\) starting at \(x\) be \(u = \sigma_0(x)\), and so \(x_1 = f(x,u)\). Define the continuation strategy under \(\sigma\), following \(\sigma_0(x)\) be \(\sigma |_1\). Since \(\sigma |_1\) is a feasible continuation strategy, then

\[v(x_1) \geq W (\sigma \mid_{1} )(x_1).\]

So then,

\[\begin{split}\begin{aligned} v(x) - \epsilon & \leq W(\sigma)(x) \\ & = U(x,u) + \beta W(\sigma \mid_1)[f(x,u)] \\ & \leq U(x,u) + \beta v(f(x,u)) \\ & \leq \sup_{u \in \Gamma(x)} \{ U(x,u) + \beta v(f(x,u)) \} = W(x).\end{aligned}\end{split}\]

Since \(\epsilon > 0\) is arbitrary,

\[v(x) \leq \sup_{u \in \Gamma(x)} \{ U(x,u) + \beta v(f(x,u)) \} = W(x).\]

Since, for any \(x \in X\), \(v(x) \geq W(x)\) and \(v(x) \leq W(x)\), then \(v(x) = W(x)\).

We have now shown that the sequence problem (P1) is the same as the recursive problem written down as the Bellman equation. The next big question is when does the Bellman equation, and therefore (P1), have a solution. We can answer this by using the Bellman equation itself.

7.1.3. Existence of optimal strategy

Now, we look at the second part of our three-part recursive deconstruction on the infinite-sequence problem in (P1). You may wish to review the material on metric spaces and functional analysis in , or , first. For the reader familiar with these concepts, you may proceed.

We have shown previously that the optimized value or the value function of the infinite-horizon sequence problem (P1) satisfied the Bellman functional equation. Our aim here is to do the following:

  1. Show that under regularity assumptions, there exists a unique value function that solves the Bellman equation.
  2. Show that a sequence of actions is an optimal strategy if and only if the discounted total payoff under this strategy also satisfies the Bellman equation for every current state.
  3. Consider a useful class of strategies called stationary strategies and show that there can exist a stationary strategy such that the discounted total payoff under this particular type of strategy also satisfies the Bellman equation for every current state.

We will apply the Banach fixed-point theorem or often known as the contraction mapping theorem. Recall we started by show that we can write down a Bellman equation for the sequence problem. Now, the problem then become one of not knowing what the value function \(v: X \rightarrow \mathbb{R}\) looks like. The idea behind this powerful theorem is that, under some regularity assumptions, we can start with any guess of \(v(x)\) for all \(x \in X\), and apply the Bellman operator to produce another guess of \(v(x)\). The theorem tells us that this iterative procedure will eventually converge to a unique value function \(v: X \rightarrow \mathbb{R}\) that satisfies “both sides” of the Bellman equation.

7.1.4. A small technical detour

We will continue dealing with bounded value functions. Let \(B (X)\) denote the set of all bounded functions from \(X\) to \(\mathbb{R}\). So then \(v \in B (X)\). We use the sup-norm metric to measure how “close” two functions \(v,w \in B (X)\) are. Let the sup-norm distance be given by

\[d_{\infty}(v,w) = \sup_{x \in X} \mid v(x)-w(x) \mid.\]

This distance function gives us the least upper bound of the absolute distance between two functions evaluated at every \(x \in X\). So in practice, if this real number converges to zero, it implies that the values of the two functions are “the same” at every \(x \in X\).

What do we mean by the Bellman operator? Let the map \(T\) on \(B(X)\) be defined as follows. For any \(w \in B(X)\), let \(Tw\) be a function such that

\[Tw(x) = \sup_{u \in \Gamma(x)} \{ U(x,u) + \beta w(f(x,u)) \}\]

at any \(x \in X\). Since \(U\) and \(w\) are bounded, then \(Tw \in B(X)\). So our Bellman equation (on the RHS) defines the operator \(T: B(X) \rightarrow B(X)\). It maps the set of bounded functions into itself. A fixed-point of this operator will give us the value function of (P1) that we have been seeking.

Recall that a complete metric space is one where each Cauchy sequence in that space converges to a point in the same space. We then have an important property for our set of value functions.

Lemma

Let \(d(v,w) = \sup_{x \in X} \mid v(x)-w(x) \mid\) for \(v,w \in B (X)\). The metric space (\(B (X),d)\) is complete.

Let \(\{v_n\}\) be a Cauchy sequence in \(B (X)\), where for each \(n\), \(v_n : X \rightarrow \mathbb{R}\). Fixing each \(x \in X\), \(\{v_n (x)\}\) is also a Cauchy sequence on \(\mathbb{R}\). Since \(\mathbb{R}\) is complete, \(\{v_n (x)\}\) converges, and let the limit be \(v(x)\), such that \(v: X \rightarrow \mathbb{R}\) and \(v_n (x) \rightarrow v(x)\) for every \(x \in X\). Since \(\{v_n\}\) is bounded for each \(n\), \(v\) is bounded, so \(v \in B (X)\). We have shown pointwise convergence of the sequence\(\{v_n (x)\}\) to a limit \(v(x) \in \mathbb{R}\) for fixed \(x \in X\). Next we show that the sequence of functions \(\{v_n\}\) converge to \(v \in B(x)\) uniformly. Since \(\{v_n\}\) is a Cauchy sequence in \(B (X)\), for any \(\epsilon > 0\), there exists \(N(\epsilon) \in \mathbb{N}\) such that \(m,n \geq N(\epsilon)\) implies, for every \(x \in X\),

\[| v_n(x)-v_m(x) | < \epsilon.\]

Since \(v_m(x) \rightarrow v(x)\) as \(m \rightarrow \infty\), then for every \(x \in X\)

\[| v_n(x)-v(x) | < \epsilon.\]

So \(\{v_n\}\) converge to \(v\in B (X)\) uniformly (which is the same as sup-norm convergence).

Definition

Let \((S,d)\) be a metric space and the map \(T: S \rightarrow S\). Let \(T(w):=: Tw\) be the value of \(T\) at \(w \in S\). \(T\) is a contraction with modulus \(0 \leq \beta < 1\) if \(d(Tw,Tv) \leq \beta d(w,v)\) for all \(w,v \in S\).

A contraction mapping \(T\) arises in a very useful result called the Contraction Mapping Theorem, or

Theorem [Banach Fixed Point Theorem]

If \((S,d)\) is a complete metric space and \(T: S \rightarrow S\) is a contraction, then there is a fixed point for \(T\) and it is unique.

First we prove existence – that \(T\) has at least one fixed point. Then we prove this fixed point is unique.

Step 1. Fix a \(w \in S\), let \(T^n w := T(T^{n-1}w)\). Since \(T\) is a contraction, then \(d(T^{n+1}w,T^n w) \leq \beta d(T^n w, T^{n-1} w)\), so that \(d(T^{n+1}w,T^n w) \leq \beta^n d(Tw,w)\). So for any \(m,n \in \mathbb{N}\) such that \(m > n\), we have

\[\begin{split}d(T^m w, T^n w) \leq & d(T^m w, T^{m-1} w) + ... + d(T^{n+1}w, T^n w) \qquad \text{(by triangle inequality)} \\ \leq & (\beta^{m-1} + ... + \beta^n)d(T w,w) \\ = & (1 + \beta + ... + \beta^{m-n-1})\beta^n d(Tw,w) \\ \leq & \frac{\beta^n}{1-\beta}d(Tw,w).\end{split}\]

Since \(w\) is fixed, then \(d(Tw,w)\) is a fixed real number. Therefore, as \(n \rightarrow \infty\), \(d(T^m w, T^n w) \rightarrow 0\), for any \(m > n\). Therefore \(\{T^n w\}\) is a Cauchy sequence. Since \(S\) is complete, \(\{T^n w\}\) converges to a limit \(v \in S\). To show that \(v\) is a fixed point, note that for all \(n \in \mathbb{N}\) and all \(w_0 \in S\),

\[\begin{split}d(Tv,v) \leq & d(Tv,T^n w_0) + d(T^n w_0, v) \\ \leq & \beta d(v, T^{n-1}w_0) + d(T^n w_0, v).\end{split}\]

Since \(d(v, T^{n-1}w_0) \rightarrow 0\) and \(d(T^n w_0, v) \rightarrow 0\) as \(n \rightarrow \infty\), \(d(Tv,v) \rightarrow 0\), or \(Tv = v\).

Step 2. Suppose \(v\) and \(\hat{v}\) were both fixed points of \(T\) and \(v \neq \hat{v}\). Then it must be that \(d(Tv,T \hat{v})= d(v,\hat{v}) > 0\). Since \(T\) is a contraction, then \(d(Tv,T \hat{v}) \leq \beta d(v,\hat{v})\). Since \(\beta < 1\), then \(d(Tv,T \hat{v}) < d(v,\hat{v})\). A contradiction. So \(v\) is the unique fixed point of \(T\).

We can make use of the following result to verify whether \(T\) is a contraction mapping.

Lemma [Blackwell’s sufficient conditions for a contraction]

Let \(M: B(X) \rightarrow B(X)\) be any map satisfying

  1. Monotonicity: For any \(v,w \in B(X)\) such that \(w \geq v \Rightarrow Mw \geq Mv\).
  2. Discounting: There exists a \(0 \leq \beta < 1\) such that \(M(w+c) = Mw + \beta c\), for all \(w \in B(X)\) and \(c \in \mathbb{R}\). (Define \((f+c)(x) = f(x) + c\).)

Then \(M\) is a contraction with modulus \(\beta\).

Let \(v,w \in B(X)\) and \(w \geq v\). For any \(x \in X\), we have

\[\begin{aligned} w(x) - v(x) \leq & \sup_{x \in X} | w(x) - v(x) | = \Vert w - v \Vert \Rightarrow w(x) \leq v(x) + \Vert w - v \Vert.\end{aligned}\]

Applying the operator \(M\), monotonicity and followed by discounting, we have

\[Mw(x) \leq M(v + \Vert w - v \Vert)(x) \leq Mv(x) + \beta \Vert w - v \Vert.\]

so that \(Mw(x) - Mv(x) \leq \beta \Vert w - v \Vert\).

Consider \(v,w \in B(X)\) and \(w \leq v\). Now, \(w(x) - v(x) \leq \Vert w - v \Vert\), so that

\[Mv(x) \leq M(w + \Vert w - v \Vert)(x) \leq Mw(x) + \beta \Vert w - v \Vert\]

or \(Mv(x) - Mw(x) \leq \beta \Vert w - v \Vert\).

Combining both cases, we have \(| Mw(x) - Mv(x) | \leq \beta \Vert w - v \Vert\), so then

\[\Vert Mw - Mv \Vert = \sup_{x \in X} | Mw(x) - Mv(x) | \leq \beta \Vert w - v \Vert.\]

Since \(0 \leq \beta < 1\), \(M\) is a contraction with modulus \(\beta\).

7.1.5. A fixed-point theorem for the Bellman equation

Now we are back on track and ready to prove that there exists a unique solution to the Bellman equation problem. First we can show that the Bellman operator defines a mapping \(T: B(X) \rightarrow B(X)\) that is a contraction mapping. Then we can just apply the Banach fixed point theorem to prove the existence and uniqueness of the solution.

Exercise

  1. Show that the Bellman equation (1) satisfies Blackwell’s sufficient conditions.

Theorem

\(v: X \rightarrow \mathbb{R}\) is the unique fixed point of the Bellman operator \(T: B(X) \rightarrow B(X)\), such that if \(w \in B(X)\) is any function satisfying

\[w(x) = \sup_{u \in \Gamma(x)} \{ U(x,u) + \beta w(f(x,u)) \}\]

at any \(x \in X\), then it must be that \(w = v\).

\(T: B(X) \rightarrow B(X)\) is a contraction with modulus \(\beta\). Since \((B(X),d_{\infty})\) is a complete metric space, \(v: X \rightarrow \mathbb{R}\) is the unique fixed point of \(T\) by Banach’s Fixed Point Theorem.

7.1.6. Optimal strategies

Finally we have our second main result.

Theorem

Assume \(U\) is bounded. A strategy \(\sigma\) is optimal if and only if \(W(\sigma)\) satisfies the Bellman equation

\[W(\sigma)(x) = \sup_{u \in \Gamma(x)} \{ U(x,u) + \beta W(\sigma) (f(x,u))\}\]

at each \(x \in X\).

(If.) Suppose \(W(\sigma)\) satisfies

\[W(\sigma)(x) = \sup_{u \in \Gamma(x)} \{ U(x,u) + \beta W(\sigma) (f(x,u))\}\]

at each \(x \in X\). Since \(U\) is bounded by Assumption [U bounded], \(W(\sigma)\) is also bounded. By Theorem [exist unique v], \(W(\sigma) =v\). Finally, \(W(\sigma) =v\) implies that \(\sigma\) is an optimal strategy.

(Only if.) Now let \(\sigma\) be an optimal strategy. By definition of the value function \(v\), it must be that \(W(\sigma) =v\), so indeed

\[W(\sigma)(x) = \sup_{u \in \Gamma(x)} \{ U(x,u) + \beta W(\sigma) (f(x,u))\}\]

at all \(x \in X\).

So now we have the following tools handy: As long as we can show our Bellman operator defines a mapping \(T\) which is a contraction on \(B(X)\), we can apply the Banach fixed-point theorem to prove that there is a unique value function solving the Bellman equation. Moreover, we presented a Theorem which says that a strategy (or infinite sequence of actions) is optimal if and only if it induces (or supports) a total payoff that satisfies the Bellman equation.

7.1.7. Stationary optimal strategies

Why a stationary strategy? What is a stationary strategy? Consider any \(t\)-history \(h^t = (x_0,u_0,...x_t)\) in a dynamic programming problem described by the set of objects \(\{ X, A, U, f, \Gamma, \beta\}\). Suppose we start out from initial state \(x = x_0\), and suppose each period’s best action function does not depend on the time that action is taken, but only on the state prevailing at that time. Then we can deduce that we have the same dynamic programming problem \(\{ X, A, U, f, \Gamma, \beta\}\), if we start out with initial state \(s = s_t\), for any \(t \geq 0\). So it appears that there is no additional advantage gained by conditioning one’s decision rule each period on anything else other than the current state (not even the current date nor the entire history leading up to the current date).

A Markovian strategy \(\pi\) for \(\{ X, A, U, f, \Gamma, \beta\}\) is a strategy such that \(\pi = \{\pi_t\}_{t \in \mathbb{N}}\) and \(\pi_t = \pi_t(x_t[h^t])\), where for each \(t\), \(\pi_t : X \rightarrow A\) such that \(\pi_t(x_t) \in \Gamma(x_t)\).

Notice that \(\pi_t = \pi_t(x_t[h^t])\) is the requirement that each period’s action is conditioned on the history \(h^t\) only insofar as it affects the current state. Further, this action has to be in the set of feasible actions determined by the current state.

A Markovian strategy \(\pi = \{\pi_t\}_{t \in \mathbb{N}}\) with the further property that \(\pi_t(x) = \pi_{\tau}(x) = \pi(x)\) for all \(x \in X\) and all \(t,\tau \in \mathbb{N}\), and \(t \neq \tau\), is called a stationary strategy.

Note the further restriction that decision functions for each period that are time-invariant functions of the current state only.

7.1.8. Optimality and stationary strategies

We now show that with additional regularity assumptions, there exists a strategy \(\pi^{\ast}\) from the class of “stationary strategies” that is optimal – viz. this strategy satisfies the Bellman Principle of Optimality.

Assumption

  1. \(U\) is continuous on \(X \times A\).
  2. \(f\) is continuous on \(X \times A\).
  3. \(\Gamma\) is a continuous, compact valued correspondence on \(X\).

With these additional assumptions along with the assumption that \(U\) is bounded on \(X \times A\), we will show the following three conclusions:

  1. Existence of a unique continuous and bounded value function that satisfies the Bellman Principle of Optimality;
  2. Existence of a well-defined feasible action correspondence admitting a stationary optimal strategy that satisfies the Bellman Principle of Optimality; and
  3. This stationary strategy delivers a total discounted payoff that is equal to the value function, and is indeed an optimal strategy.

7.1.9. Existence of a unique continuous value function

First, we want to show the existence of a unique continuous and bounded value function that satisfies the Bellman Principle of Optimality. Notice now with Assumptions [\(U\) bounded]-[\(\Gamma\) continuous compact], we can focus on the space of bounded and continuous functions from \(X\) to \(\mathbb{R}\) denoted by \(C_b(X)\). First we recall the definition of a continuous function which uses the \(\epsilon\)-\(\delta\) idea.

Definition

Let \((S,d)\) be a metric space and \(f: S \rightarrow \mathbb{R}\). The function \(f\) is said to be continuous at \(x \in S\) if for any fixed \(\epsilon > 0\) there exists a \(\delta > 0\) such that \(d(x,y) < \delta\) implies \(|f(x) - f(y)| < \epsilon\). Then \(f\) is continuous on \(S\) if \(f\) is continuous at \(x\) for all \(x \in S\).

Also there is the idea of uniform convergence.

Definition

Let \((Y,\rho)\) be a metric space. Let \(\{f_n\}\) be a sequence of functions such that for each \(n \in \mathbb{N}\), \(f_n: S \rightarrow Y\). \(\{f_n\}\) converges uniformly to \(f: S \rightarrow Y\) if given \(\epsilon >0\), there exists \(N(\epsilon) \in \mathbb{N}\) such that for all \(n \geq N(\epsilon)\), \(\rho(f_n(x),f(x)) < \epsilon\) for all \(x \in S\).

These definitions are used in the following result that will be used in the proof of its subsequent theorem.

Theorem [Uniform convergence]

Let \(\{f_n\}\) be a sequence of functions from \(S\) to metric space \((Y,\rho)\) such that \(f_n\) converges to \(f\) uniformly. If each function \(f_n\) is bounded and continuous, then the limit \(f: S \rightarrow Y\) is also bounded and continuous.

We first show \(f\) is also bounded. Since \(f_n(x) \rightarrow f(x)\) for each \(x \in S\), if \(f\) is unbounded, then the functions must also be unbounded. Contradiction.

To show that \(f\) is continuous, we need to show \(f\) is continuous at any \(x \in S\). Fix any \(\epsilon >0\), the triangle inequality implies

\[\rho (f(x),f(y)) \leq \rho(f(x),f_n (x)) + \rho(f_n (x),f_n (y)) + \rho(f_n(y),f (y)).\]

Since \(f_n\) converges to \(f\) uniformly, then there exists some \(n \geq N(\epsilon/3)\) such that \(\rho(f(x),f_n (x)) < \epsilon/3\) for all \(x \in S\) and also \(\rho(f_n(y),f (y)) < \epsilon/3\) for all \(y \in S\). Since \(f_n\) is continuous, there exists \(\delta >0\) such that \(d(x,y) < \delta\) implies \(\rho(f_n (x),f_n (y)) < \epsilon/3\), so that

\[\begin{split}\begin{aligned} \rho (f(x),f(y)) \leq & \rho(f(x),f_n (x)) + \rho(f_n (x),f_n (y)) + \rho(f_n(y),f (y)) \\ < & \epsilon/3 + \epsilon/3 + \epsilon/3 = \epsilon.\end{aligned}\end{split}\]

Therefore \(f\) is bounded and continuous.

Now we can just replace \(S = X\) and \((Y,\rho) = (\mathbb{R},|\cdot|)\) when using the Uniform Convergence Theorem below.

Lemma

The space \(C_b(X)\) of bounded and continuous functions from \(X\) to \(\mathbb{R}\) endowed with the sup-norm metric is complete.

The proof is identical to the proof for the result that \((B(X), d_{\infty})\) is a complete metric space, with the addition that any Cauchy sequence \(\{v_n\}\), with \(v_n \in C_b(X)\), is uniformly convergent implying that the limit is a bounded and continuous function by the Uniform Convergence Theorem.

Previously we defined the Bellman operator \(T\) on \(B(X)\). Now the space in which our candidate value functions live is \(C_b(X)\). Define the operator \(T: C_b(X) \rightarrow C_b(X)\) by

\[Tw(x) = \max_{u \in \Gamma(x)} \{ U(x,u) + \beta w(f(x,u))\}\]

for each \(x \in X\). The function of the function \(w\), \(Tw\), is clearly bounded on \(X\) since \(w\) and \(U\) are bounded functions. Further, since \(w\) and \(f\) are continuous functions, the composite function \(w\circ f : X \times A \rightarrow \mathbb{R}\) given by \((w \circ f)(x,u):= w(f(x,u))\) for all \((x,u) \in X \times A\), is also continuous. Since for each \(x \in X\) the set of feasible actions \(\Gamma(x)\) is assumed compact, the maximum on the RHS is well-defined. Then the Maximum Theorem says that \(Tw\) is also continuous on \(X\) – viz. if we think of each \(x \in X\) as a “parameter” defining the initial condition of the RHS problem, then as we vary the initial condition, the value of the optimal program will vary with it continuously.

Therefore we have the following observations:

Lemma

\(T\) maps \(C_b(X)\) into \(C_b(X)\).

Furthermore, it is easy to show that \(T\) satisfies the conditions of Blackwell’s sufficient conditions with the operator replaced by \(T: C_b(X) \rightarrow C_b(X)\). So then we have,

Lemma

\(T : C_b(X) \rightarrow C_b(X)\) is a contraction with modulus \(\beta\).

Finally, by Banach’s fixed point theorem, we can show the existence of a unique continuous and bounded value function that satisfies the Bellman Principle of Optimality.

Theorem

There exists a unique \(w^{\ast} \in C_b(X)\) such that given each \(x \in X\)

\[w^{\ast}(x) = \max_{u \in \Gamma(x)} \{ U(x,u) + \beta w^{\ast} (f(x,u))\}.\]

In other words \(T\) has a unique fixed point \(w^{\ast}\).

7.1.10. Existence of stationary optimal strategy

Second, we show the existence of a well-defined feasible action correspondence admitting a stationary strategy that satisfies the Bellman Principle of Optimality.

Define \(G^{\ast}: X \rightarrow P(A)\) by

\[G^{\ast}(x) = \text{arg} \ \max_{u \in \Gamma(x)} \{ U(x,u) + \beta w^{\ast} (f(x,u))\}.\]

By the Maximum Theorem \(G^{\ast}\) is a nonempty upper-semicontinuous correspondence. Thus there exists a function \(\pi^{\ast} : X \rightarrow A\) such that given each \(x \in X\), \(\pi^{\ast} \in G^{\ast} \subset \Gamma(x)\). By construction, at all \(x \in X\) and for any \(\tilde{u} \neq \pi^{\ast} (x)\), it must be that

\[\begin{split}\begin{aligned} w^{\ast}(x) = & \max_{u \in \Gamma(x)} \{ U(x,\pi^{\ast}(x)) + \beta w^{\ast} [f(x,\pi^{\ast}(x))]\} \\ \geq & U(x,\tilde{u}) + \beta w^{\ast} [f(x,\tilde{u})], \qquad \tilde{u} \in \Gamma(x).\end{aligned}\end{split}\]

So the function \(\pi^{\ast} : X \rightarrow A\) defines a stationary optimal strategy.

7.1.11. Total discounted payoff under stationary optimal strategy

Third and last, we want to show that the stationary strategy delivers a total discounted payoff that is equal to the value function, and is thus a stationary optimal strategy. The idea is that if we could decompose the sequence problem into a recursive one, we should be able to reconstitute the sequence problem from the recursive one, and they would have the same value function under the stationary optimal strategy.

Here is how we do this formally. Let \(\pi^{\ast}\) be the stationary optimal strategy as defined in the last section. Fix any initial state \(x \in X\). Then the sequence \(\{ x_t(x,\pi^{\ast}),u_t(x,\pi^{\ast})\}\) is the sequence of state and action pairs resulting from \(\pi^{\ast}\) beginning from \(x\). Assign each period-\(t\) state-action pair the payoff \(U_t(\pi^{\ast})(x) := U[x_t(x,\pi^{\ast}),u_t(x,\pi^{\ast})]\) from \(\pi^{\ast}\) beginning from \(x\). To lighten our notation we now write \(x_t := x_t(x,\pi^{\ast})\) and \(u_t := u_t(x,\pi^{\ast})\).

By definition,

\[W(\pi^{\ast})(x) = \sum_{t=0}^{\infty}\beta^t U_t(\pi^{\ast})(x).\]

From the previous section, we know

\[\begin{split}\begin{aligned} w^{\ast}(x) = & U(x,\pi^{\ast}(x)) + \beta w^{\ast} [f(x,\pi^{\ast}(x))] \\ = & U(x,\pi^{\ast}(x)) + \beta w^{\ast} (x_1) \\ = & U(x,\pi^{\ast}(x)) + \beta U(x_1,u_1) + \beta^2 w^{\ast} [f(x_1)] \\ = & U_0(\pi^{\ast})(x) + \beta U_1(\pi^{\ast})(x) + \beta^2 w^{\ast} [x_2 (\pi^{\ast},x)].\end{aligned}\end{split}\]

By recursive forward substitution, we have for any \(T \geq 1\),

\[w^{\ast}(x) = \sum_{t=0}^{T-1} \beta^t U_t (\pi^{\ast})(x) + \beta^T w^{\ast} [x_T (\pi^{\ast},x)].\]

Since we have shown \(w^{\ast}\) is a bounded function and \(\beta < 1\), the sum is finite for \(T \rightarrow \infty\):

\[w^{\ast}(x) = \sum_{t=0}^{\infty} \beta^t U_t (\pi^{\ast})(x).\]

So we have shown \(w^{\ast} = W(\pi^{\ast})\). By Theorem [exist v bounded-continuous] we have a unique fixed-point \(w^{\ast}\) satisfying the Bellman equation. So it must be that

\[W(\pi^{\ast})(x) = \max_{u \in \Gamma(x)} \{ U(x,u) + \beta W(\pi^{\ast}) [f(x,u)]\}.\]

We have shown that the value function for the sequence problem is the same as that for the Bellman equation under the stationary optimal strategy. Finally, to close the logic, note that by Theorem [optimal strategy], the last equation confirms that indeed the stationary strategy \(\pi^{\ast}\) is an optimal strategy.

Great! So after all that hard work, we can put together the following gem of a result.

Theorem

If the stationary dynamic programming problem \(\{X,A,\Gamma,f,U,\beta \}\) satisfies all the previous assumptions, then there exists a stationary optimal policy \(\pi^{\ast}\). Furthermore the value function \(v = W(\pi^{\ast})\) is bounded and continuous on \(X\), and satisfies for each \(x \in X\),

\[\begin{split}\begin{aligned} v(x) = & \max_{u \in \Gamma(x)} \{ U(x,u) + \beta v (f(x,u)) \} \\ = & U(x,\pi^{\ast} (x)) + \beta W(\pi^{\ast}) (f(x,\pi^{\ast} (x))). \end{aligned}\end{split}\]

7.2. Example

Now we can reconsider our friendly example again––the Cass-Koopmans optimal growth model—but more generally. Recall, that we could only solve a special case of this model by hand. Now, we can develop a way to approximately solve this model generally.

First we set out the model description, and then we apply the results we have learned so far to check whether optimal strategies exist in this model. It turns out the answer is in the affirmative. We then go further to impose additional restrictions on the shape of preference and production functions so then the result is strengthened to deliver a unique stationary optimal strategy or solution. Before actually computing the solution to the model, we can deduce the behavior of the unique stationary optimal strategy from characterizations of optimality in the growth problem itself.

7.2.1. The deterministic optimal growth problem

First we recall the basic ingredients of the model. There exists:

  • A single good - can be consumed or invested;
  • Capital investment flow in period \(t\), \(x_t\);
  • Capital depreciates at rate \(\delta \in (0,1]\);
  • Capital stock accumulation: \(k_{t+1} = (1-\delta)k_t + x_t\) with initial stock \(k_0 > 0\);
  • Production function, \(F: \mathbb{R}_+ \rightarrow \mathbb{R}_+\). Output from production function: \(y_t = F(k_t)\); so that
  • Total available resources at time \(t\) is \(f(k_t):= (1-\delta)k_t + F(k_t)\); resource constraint: \(f(k_t) \geq c_t + k_{t+1}\).
  • Feasibility: \(0 \leq k_{t+1} \leq f(k_t)\) for all \(t \in \mathbb{N}\).
  • Utility function: \(U: \mathbb{R}_+ \rightarrow \mathbb{R}\), where for each \(c_t \in \mathbb{R}_+\), \(U(c_t) \in \mathbb{R}\); and
  • Subjective discount factor \(\beta \in [0,1)\).

The sequence problem is one of maximizing

\[\sum_{t=0}^{\infty} \beta^t U(c_t)\]

subject to

\[\begin{split}\begin{aligned} k_0 = & k \ \text{given}, \\ f(k_t) \geq & c_t + k_{t+1}, \\ 0 \leq & k_{t+1} \leq f(k_t), \\ c_t, k_t \in & \mathbb{R}_+.\end{aligned}\end{split}\]

for all \(t \in \mathbb{N}\).

In an optimal program, output will not be wasted so the first constraint holds with equality. The sequence problem is now one of maximizing

\[\sum_{t=0}^{\infty} \beta^t U(c_t)\]

subject to

\[\begin{split}\begin{aligned} k_0 = & k \ \text{given}, \\ f(k_t) = & c_t + k_{t+1}, \\ 0 \leq & k_{t+1} \leq f(k_t).\end{aligned}\end{split}\]

for all \(t \in \mathbb{N}\).

We can cast this model in the dynamic programming framework. To do so, we specialize the following objects from the previous general theory:

  • State space, \(X = \mathbb{R}_+\). State variable \(k \in X\).
  • Action space, \(A = \mathbb{R}_+\).
  • State transition function \(g: X \times A \rightarrow X\), such that for each \((k,c) \in X \times A\), the next period’s state is \(k' = g(k,c) = f(k) - c\).
  • Feasible action correspondence, \(\Gamma(k) = [0,f(k)]\).
  • Per period payoff from action \(c \in \Gamma(k)\) given state \(k\), \(U(k,c) = U(c)\).

The 6-tuple \(\{ X,A,\Gamma,U,g,\beta\}\) fully describes the elements of a stationary discounted dynamic programming problem in the optimal growth model.

We would like to check the following items and questions for this example model:

  1. When do (stationary) optimal strategies exist?
  2. Is an optimal strategy unique here?
  3. Characterizing optimal strategy. What are the dynamic properties – i.e. the trajectory of \(\{c_t,k_t\}_{t \in \mathbb{N}}\) under the optimal strategy? What is the behavior of the transitional path (short run)? The steady state (long run)?

7.2.2. Existence of optimal strategies

Rather than assume \(U: \mathbb{R}_+ \rightarrow \mathbb{R}\) to be a bounded function to guarantee existence of optimal strategies, we will make the alternative assumption that \(S\) and \(A\) are compact, and that \(U\) is continuous on \(\mathbb{R}_+\). Specifically, let \(X = A = [0,\overline{k}],\overline{k} < +\infty\). Since \(U\) is continuous, it will be bounded. If we further assume \(f: X \rightarrow \mathbb{R}_+\) to be continuous and nondecreasing on \(X\) and \(f\) is bounded, then an optimal strategy exists in the following sense.

Proposition

There exists a stationary optimal strategy \(\pi: X \rightarrow A\) for the optimal growth model given by \(\{ X,A,\Gamma,U,g,\beta\}\), such that

\[\begin{split}\begin{aligned} v(\pi)(k) = & \max_{k' \in \Gamma(k)} \{ U(f(k) - k') + \beta v(\pi)[k'] \} \\ = & U(\pi(k)) + \beta v[g(k,\pi(k))] \}. \end{aligned}\end{split}\]

So far our assumptions are as before in the general theory. What insights can we squeeze out from this model apart from knowing that stationary optimal strategies do exist under the above assumptions? It turns out we can show that the value function inherits some of the properties of the primitives (preference and technology functions, \(U\) and \(f\)).

Proposition

\(v: X \rightarrow \mathbb{R}\) is a nondecreasing function on \(X\).

(This proof is not that precise!) Define \(T: C_b(X) \rightarrow C_b(X)\) by

\[Tw(k) = \max_{k' \in [0,f(k)]} \{ U(f(k)-k') + \beta w(k')\}\]

\(T\) can be proved to be a contraction on \((C_b(X),d_{\infty})\). Since \(C_b(X)\) is complete \(T: C_b(X) \rightarrow C_b(X)\) has a unique fixed point \(v \in C_b(X)\), so \(Tw = w = v\) is bounded and continuous. Since \(f\) is nondecreasing on \(X\) and \(c \in \Gamma(k) = [0,f(k)]\), then \(k' = g(k,c) = f(k) - c\) is also nondecreasing on \(X\). Starting at any \(w\) on \(X\) that is nondecreasing and given \(g\) is nondecreasing, \(Tw\) is also nondecreasing on \(X\). Therefore \(v\) is nondecreasing on \(X\).

So it seems we cannot say more about the behavior of the model without more structure. So we proceed, as always, to impose additional assumptions on the looser model so far. We make additional restrictions on the preferences \(U\).

Assumption

\(U: \mathbb{R}_+ \rightarrow \mathbb{R}\) is strictly increasing on \(\mathbb{R}_+\).

Then we have:

Proposition

\(v: X \rightarrow \mathbb{R}\) is a strictly increasing function on \(X\).

Exercise

  1. Prove the last result.

    Hint: Use the Bellman principle of optimality. Also show that the feasible action correspondence is monotone. Use these two facts, with some (weak) inequalities, to show the result.

We can impose a further restriction on the convexity of preferences:

Assumption

\(U: \mathbb{R}_+ \rightarrow \mathbb{R}\) is strictly concave on \(\mathbb{R}_+\).

With the assumptions we have so far on \(U\) and \(f\) we can then deduce the following observation. This proposition says that if the initial stock of capital in any period increases from \(k\) to \(\hat{k}\), then the optimal savings level beginning from state \(\hat{k}\) must be at least as large as that beginning from \(k\).

Proposition

Given the above assumptions, the optimal savings level \(\pi(k) := f(k) - c(k)\) under the optimal strategy \(\pi\), where \(k' = \pi(k)\), is nondecreasing on \(X\).

We prove this by contradiction. Suppose that there exists some \(k,\hat{k} \in X\) such that \(k < \hat{k}\) and \(f(k) < f(\hat{k})\), and \(\pi(k) > \pi(\hat{k})\). (This assumption is legitimate since the earlier assumption of \(f\) nondecreasing contains this possibility.) Note that, \(f(\hat{k}) > f(k) \geq \pi(k)\), where the second weak inequality comes from feasibility at \(k\). That is, \(\pi(k)\) is also feasible at \(\hat{k}\). Since \(f(\hat{k}) > \pi(k)\), and \(\pi(k) > \pi(\hat{k})\) by assumption, then \(\pi(\hat{k})\) is also feasible at \(k\). Since \(\pi(k)\) is part of an optimal strategy starting from \(k\), we must have

\[v(k) = U(f(k) - \pi(k)) + \beta v[\pi(k)] \geq U(f(k) - \pi(\hat{k})) + \beta v[\pi(\hat{k})].\]

The last weak inequality arises from the fact that \(\pi(k)\) is optimal at \(k\). Similarly, Since \(\pi(\hat{k})\) is constructed from an optimal strategy starting from \(\hat{k}\),

\[v(\hat{k}) = U(f(\hat{k}) - \pi(\hat{k})) + \beta v[\pi(\hat{k})] \geq U(f(\hat{k}) - \pi(k)) + \beta v[\pi(k)].\]

From the inequalities we have

\[\begin{split}\begin{aligned} U(f(k) - \pi(k)) - U(f(k) - \pi(\hat{k})) & \geq \beta \{ v[\pi(\hat{k})] - v[\pi(k)]\} \\ \geq &U(f(\hat{k}) -\pi(k)) - U(f(\hat{k}) - \pi(\hat{k})) ,\end{aligned}\end{split}\]

so that

\[U(f(k) - \pi(\hat{k})) - U(f(k) -\pi(k)) \leq U(f(\hat{k}) - \pi(\hat{k})) - U(f(\hat{k}) - \pi(k)).\]

Note that \((f(k) - \pi(\hat{k})) \in \mathbb{R}_+\) and \((f(k) - \pi(k)) \in \mathbb{R}_+\) are the same distance apart (using the usual metric on \(\mathbb{R}\)) as \((f(\hat{k}) - \pi(\hat{k}))\in \mathbb{R}_+\) and \((f(\hat{k}) - \pi(k))\in \mathbb{R}_+\). Since \(f(k) < f(\hat{k})\), by strict concavity of \(U\) and \(U\) is increasing on \(\mathbb{R}_+\), we must have

\[U(f(k) - \pi(\hat{k})) - U(f(k) - \pi(k)) > U(f(\hat{k}) - \pi(\hat{k})) - U(f(\hat{k}) - \pi(k)).\]

But we arrive at a contradiction.

7.2.3. Uniqueness of optimal strategy

Now we step up the level of restriction on the primitives of the model. We now add the following assumption.

Assumption

\(f\) is (weakly) concave on \(\mathbb{R}_+\).

Recall we defined \(f(k) = F(k) + (1-\delta)k\). So clearly, this assumption requires that \(F\) be concave on \(\mathbb{R}_+\), since the linear part in \(k\) is already (weakly) concave.

This assumption clearly restricts the class of production functions we can consider. (Usual parametric suspects are the linear, CES or Cobb-Douglas forms). However, it does buy us the uniqueness of the optimal strategy \(\pi\). We show this in two parts.

Proposition

Under the assumptions on \(U\) and \(f\) above, the value function \(v\) is (weakly) concave on \(X\).

Exercise

Prove the last result.

Hint: Pick any two initial states \(k, \tilde{k} \in X\), such that \(k \neq \tilde{k}\), and pick any real number \(\lambda \in (0,1)\). By (weak) concavity we mean that for any convex combination of states, \(x_{\lambda} = \lambda x + (1-\lambda) \tilde{x}\), we need to show that \(v(x_{\lambda}) \geq \lambda v(x) + (1-\lambda) v(\tilde{x})\).

The astute reader will have noted that we have yet to assume differentiability of the primitive functions and that this is the reason why we use the more general definition of “concavity” of a (not necessarily differentiable) function.

Proposition [unique optimal strategy]

Under the assumptions on \(U\) and \(f\) above, the correspondence \(G^{\ast}: X \rightarrow P(A)\) defined by

\[G^{\ast}(k) = \bigg{\{} k' \bigg{|} \max_{k' \in \Gamma(k)} \{ U(f(k)-k') + \beta v (k')\},k \in X \bigg{\}}.\]

is a singleton set (a set of only one maximizer \(k'\)) for each state \(k \in X\). Therefore \(G^{\ast}\) admits a unique optimal strategy \(\pi\). Furthermore, \(\pi\) is a continuous function on \(X\).

By assumption \(U\) is strictly concave and \(f\) is concave. We have previously shown that the value function \(v\) inherits this concavity property. Hence the RHS of the Bellman equation at the optimum,

\[U(f(k)-k') + \beta v(k'),\]

is strictly concave as a function of \(c\). This is just a concave optimization problem so the set of maximizers at each \(k\) must contain a unique maximizer \(c\) (or \(k'\)). As a single-valued correspondence, \(G\) must admit a unique selection \(\pi: X \rightarrow P(A)\). Further, since \(G\) is upper-semicontinous, and \(G\) is single valued, then \(\pi\) must be a continuous function.

Note that this theorem not only ensures that there is a unique stationary optimal strategy, but it also ensures the uniqueness of a more general optimal strategy.

7.2.4. Characterizing optimal strategies

Without solving for the strategies, can we say anything meaningful about them? We can if we tighten the basic assumptions of the model further. The next set of assumptions relate to differentiability of the primitives of the model.

Assumption

\(U \in C^{1}((0,\infty))\) and \(\lim_{c \searrow 0} U'(c) = \infty\).

This assumption says that the per-period payoff \(U: \mathbb{R}_+ \rightarrow \mathbb{R}\) is once continuously differentiable on \(\mathbb{R}_+\) and that the Inada condition (the marginal utility of consumption tends to infinity when consumption goes to zero) holds. This buys us the outcome that if \(c_t\) is positive but \(c_{t+1} = 0\), a decision maker can increase total utility by shifting a small amount of consumption from period \(t\) to \(t+1\). But for this argument to be complete, implicitly we are assuming that the shifting of resources from consumption in period \(t\) to \(t+1\), involves saving in period \(t\) in productive capital for \(t+1\), \(f(k_{t+1})\), that would more than compensate for the loss in consumption in period \(t\). The next assumption ensures that this is the case.

Assumption

\(f \in C^{1}((0,\infty))\) and \(\lim_{k \searrow 0} f'(k) > 1/ \beta\).

For example, the Inada condition of this assumption says if \(k_{t+1}\) tends to zero – which implies consumption is \(c_t = f(k_t)\) – then the marginal product of capital (imputed return on capital) must exceed the subjective gross return \(1/\beta\), so that the agent can always plan better (i.e. achieve a higher total discounted payoff) by reducing \(c_t\) and thus increasing \(k_{t+1}\) to lower \(f'(k_{t+1})\) until it equates with \(1/\beta\). So this condition, together with Assumption [U is C1] ensures that in each state \(k_t\), consumption and capital are always non-zero, and they also would never hit the upper bound \(f(k_t)\). So the solutions must always be interior, as the next result states.

Proposition

Given the assumptions above, the solution \(k' = \pi(k)\) is such that \(\pi(k) \in (0, f(k))\) for all \(k \in X\).

Exercise

Prove this.

This theorem tells us that with the additional assumptions above, our solutions will always be interior (the optimal saving policy picks a capital for next period that is always feasible and consumption is always positive).

A straightforward implication of this result is that the first-order condition for optimality in this model always holds with equality.

Proposition [Euler equation]

Under Assumptions [U bounded]-[f is C1], then for each \(k >0\), \(\pi\) satisfies

\[U_c [f(k)-\pi(k)] = \beta U_c [f(k')-\pi(k')] f_k (\pi(k))\]

such that \(k' = \pi(k)\).

Often the dependency of the decision function on the current state \(k\) is suppressed in the notation, so that we can write more compactly

\[U_c [c_t] = \beta U_c [c_{t+1}] f_k (k_{t+1})\]

such that for all \(t \in \mathbb{N}\),

\[k_{t+1} = f(k_t) - c_t.\]

Finally using the Euler equation we can show that the sequence of consumption decisions from any initial state \(k\) are monotone. Define \(c(k) = f(k) - \pi(k)\).

Proposition

Given the assumptions so far, \(c\) is increasing on \(X\). That is, for \(k > \tilde{k}\), \(c(k) > c(\tilde{k})\).

Exercise

Prove this.

Another characteristic of the optimal growth plan is that for any initial capital stock \(k \in X\), the unique sequence of states of the system converges to a unique steady state limit. To show this, we first define the notion of steady state or long run for the model as the state \(k_{ss} \in X\) such that \(k_{ss} = k_{t+1} = k_t\) or

\[k_{ss} = f(k_{ss}) - c(k_{ss}).\]

So if we begin the system at \(k_{ss}\), we will be at the same point forever, under the optimal policy \(\pi(k_{ss})\) or \(c(k_{ss})\). The golden rule consumption level \(c^{\ast} := c(k_{ss}) = c_{ss}\) is such that \(U_c(c_t) = U_c(c_{t+1}) = U_c(c(k_{ss}))\) for all \(t\), so from the Euler equation, we get

\[f_k (k_{ss}) = 1/\beta.\]

Notice that \(k_{ss}\), and thus \(c^{\ast} := c(k_{ss})\), depend only on \(f\) and \(\beta\). (As an exercise, check what that relationship says.)

Now we can talk about transition to the long run.

Proposition

Given any initial condition \(k \in X\), the sequence of states \(\{k_{t+1}(k)\}_{t \in \mathbb{N}}\) under the optimal policy function \(\pi: X \rightarrow P(A)\), and the sequence of consumption levels \(\{c_t(k)\}_{t \in \mathbb{N}}\) converge to \(k_{ss}\) and \(c_{ss}\) respectively. Furthermore, \(k_{ss}\) and \(c_{ss}\) are unique.

Fix a \(k \in X\). Since the saving function \(\pi\) and also \(f\) are nondecreasing on \(X\), then \(\{k_{t+1}(k)\}_{t \in \mathbb{N}}\) is a monotone sequence. Thus \(c\) is also nondecreasing on \(X\), and \(\{c(k)\}_{t \in \mathbb{N}}\) is also monotone. Therefore both sequences have limits \(k_{\infty}\) and \(c_{\infty}\), respectively. Since

\[k_{t+1} = f(k_t) -c_t\]

for all \(t \in \mathbb{N}\), it must also be that

\[k_{\infty} = f(k_{\infty}) -c_{\infty}\]

since \(f\) is continuous. Furthermore, since

\[U_c [c_t] = \beta U_c [c_{t+1}] f_k (f(k_t) -c_t),\]

then in the limit

\[U_c [c_{\infty}] = \beta U_c [c_{\infty}] f_k (f(k_{\infty}) -c_{\infty}) \Rightarrow f'(k_{\infty}) = 1/\beta.\]

exists since \(f,U \in C^{1}[(0,\infty)]\). Since by Theorem [unique optimal strategy], the optimal solutions \(\{k_{t+1}(k)\}_{t \in \mathbb{N}}\) and \(\{c(k)\}_{t \in \mathbb{N}}\) are unique, then the limits \(k_{\infty}\) and \(c_{\infty}\), respectively are unique. By definition these points are steady state solutions so \(k_{\infty} =k_{ss}\), and \(c_{\infty} = c_{ss}\).

7.2.5. Computing optimal strategies

So we know how to check when solutions exist and when they can be unique in the optimal growth model. We also learned how to characterize or describe the features of such optimal strategies. But it turns out in most cases we cannot solve the problem in closed-form (i.e. using pen and paper).

Since we cannot solve more general problems by hand, we have to turn to our trusty computers to do that task. Note that there are many ways to attack the problem, but we will have time to look at maybe one or two methods. A good numerical recipe has to be well-informed by theory as we shall observe in our accompanying TutoLabo session.

Note

Next, We’ll get our hands dirty in the accompanying TutoLabo sessions.

7.3. Stochastic Dynamic Programs

We now consider a simple extension of the deterministic dynamic programming problem to a stochastic case. We will assume some exogenous finite-state Markov chain “shock” that perturbs the previously deterministic transition function for the (endogenous) state vector. For example, suppose \(x_t\) is the current endogenous state variable (e.g. capital in the growth model), \(u_t\) is the current action (e.g. consumption). The random variable \(\varepsilon_{t+1}\) is the realization of a finite-state Markov chain \((P,\lambda_{0})\) at the beginning of time \(t+1\), where \(P\) is a stochastic matrix containing the conditional probability of moving from one state to another and \(\lambda_{0}\) is the initial distribution of the finite states. (Section Time-homogeneous and finite-state Markov chains reviews some properties of time-homogenous Markov chains.)

So now we may have a stochastic evolution of the (endogenous) state vector described by:

\[x_{t+1} = F(x_t, u_t, \varepsilon_{t+1}).\]

Notice that now, at the beginning of \(t\), \(x_t\) is realized, but, some action \(u_t\) has to be taken before the random shock (with finitely many probable realizations) \(\varepsilon_{t+1}\) is known.

Therefore, a decision maker when making a plan or strategy of such actions can no longer just plan a deterministic sequence of actions, but a sequence of alternative state-contingent actions. Intuitively, to be able to make such a comprehensive list of fall-back plans, the decision maker must be able to form “correct” expectations of the stochastic evolution of future states for each choice of contingent plan. We will be using the usual von-Neumann-Morgernstern notion of expected utility to model decision making in such risky environments.

7.3.1. The Bellman principle of optimality

Let \(\varepsilon_{t} \in S = \{s_{1},...,s_{n}\}\). And \(\{\varepsilon_{t}\}\) is generated by a Markov chain \((P,\lambda_{0})\). Suppose the current state is \((x_{t},\varepsilon_{t}) = (x,s_{i})\), where \(i \in \{1,...,n\}\). So now the Bellman equation is given by

\[V(x,s_{i}) = \sup_{x' \in \Gamma(x,s_{i})} U(x,x',s_{i}) + \beta \sum_{j=1}^{n}P_{ij}V(x',s_{j})\]

for all \((x,s_{i}) \in X \times S\).

For every \(x \in X\), there is a vector

\[\mathbb{R}^{n} \ni \mathbf{v}(x) = (V(x,s_{1}),...,V(x,s_{n})) \equiv (V_{1}(x),...,V_{n}(x)).\]

Define the space of all such vectors of real continuous and bounded functions each mapping \(X\) to \(\mathbb{R}^{n}\), as \([C_{b}(X)]^{n}\). Then the metric space \(([C_{b}(X)]^{n},d)\) with metric \(d: [C_{b}(X)]^{n} \times [C_{b}(X)]^{n} \rightarrow \mathbb{R}_{+}\) given either by

\[ \begin{align}\begin{aligned} d_{\infty}^{n}(\mathbf{v},\mathbf{v'}) = \sum_{i=1}^{n}d_{\infty}(V_{i},V'_{i}) = \sum_{i=1}^{n} \sup_{x \in X} \mid V(x,s_{i}) - V'(s,s_{i})\mid,\\or\end{aligned}\end{align} \]
\[d_{\infty}^{\max}(\mathbf{v},\mathbf{v'}) = \max_{i \in \{1,...,n\}} \{ d_{\infty}(V_{i},V'_{i})\} = \max_{i \in \{1,...,n\}} \left\{ \sup_{x \in X} \mid V(x,s_{i}) - V'(s,s_{i}) \mid \right\}.\]

This metric space is complete. (Check this as an exercise.)

The idea is that if we fix each current \(\varepsilon = s_{i}\), for every \(x \in X\), the RHS of the Bellman equation defines a operator \(T_{i}\) that maps a continuous bounded function into another continuous bounded function, so \(T_{i} : C_{b}(X) \rightarrow C_{b}(X)\), \(i=1,...,n\). This is because for fixed \(\varepsilon = s_{i}\) the expected continuation value on the RHS of the Bellman equation, \(\sum_{j=1}^{n}P_{ij}V(x',s_{j})\), is just a convex combination of all probable continuation values, each contingent on the realization of \(\varepsilon ' = s_{j}\). So “stacking” these \(T_{i}\)’s together we have for each \(x \in X\), \(T: [C_{b}(X)]^{n} \rightarrow [C_{b}(X)]^{n}\) defined as

\[\begin{split}\begin{aligned} T\mathbf{v}(x) =& \left[ \begin{array}{c} T_{1}V(x,s_{1}) \\ \vdots \\ T_{n}V(x,s_{n}) \end{array} \right] \\ & \qquad = \left[ \begin{array}{c} \sup_{x' \in \Gamma(x,s_{1})} U(x,x',s_{1}) + \beta \sum_{j=1}^{n}P_{1j}V(x',s_{j}) \\ \vdots \\ \sup_{x' \in \Gamma(x,s_{n})} U(x,x',s_{n}) + \beta \sum_{j=1}^{n}P_{nj}V(x',s_{j}) \end{array} \right].\end{aligned}\end{split}\]

Since each \(i\)-th component of \(T\), \(T_{i}\) is a contraction mapping on a complete metric space \((C_{b}(X), d_{\infty})\), by our metric on \([C_{b}(X)]^{n}\) it is straightforward to show that \(T: [C_{b}(X)]^{n} \rightarrow [C_{b}(X)]^{n}\) is also a contraction mapping on the complete metric space \(([C_{b}(X)]^{n},d)\), where \(d:=d_{\infty}^{n}\) or \(d:=d_{\infty}^{\max}\).

This suggests that we can still borrow the existence results (and also uniqueness) for optimal strategies from the deterministic dynamic programming problem for our optimal plans when there is risk arising from the Markov chain shock(s). Without proving them again, we state the following results.

First, if we assume for each \(\varepsilon' \in S\):

  1. \(U: X \times A \rightarrow \mathbb{R}\) is bounded and continuous;
  2. \(F: X \times A \rightarrow \mathbb{R}\) is continuous; and
  3. \(\Gamma: X \rightarrow P(A)\) is compact and continuous,

then the value function is also bounded on \(X \times S\) and for each \(\varepsilon \in S\), and it is continuous on \(X\). Furthermore if \(U\) is strictly increasing, \(V\) is strictly increasing on \(X\) since the conditional expectation operator is just a convex combination. Also if \(U\) is strictly concave and \(F\) is weakly concave, for each \(\varepsilon\), then \(V_{i}\) is weakly concave on \(X\), for each \(i = 1,...,n\), and there exists a unique strategy (stationary optimal strategy) for each \(\varepsilon \in S\).

As an illustration of these results (and to keep you on the edge of your seats), we shall compute an example of a stochastic growth model with a finite-state Markov chain shock next.

7.3.2. Numerical example

Here we illustrate some examples of computing stochastic dynamic programming problems on the computer. Why do we do this? For example, in the previous example, we can solve the stochastic growth model by hand, because we assumed log utility, 100% capital depreciation per period, and Cobb-Douglas production function. However things are less pen and paper when we have a more general setting. We need to resort to numerical computations.

In this example we will solve the more generally parameterized stochastic growth model using Python. We now deal with an \(N\)-state Markov chain \((P,\lambda_{0})\) for technology shocks, \(A_{t}(i) \in S = \{ A(1),A(2),...,A(N) \}\). The planner’s problem in the stochastic optimal growth model is to solve

(2)\[V(k,A(i)) = \max_{k' \in \Gamma(k,A(i))} U(c) + \beta \sum_{j=1}^{N}P_{ij}V[k',A'(j)]\]

subject to

\[\begin{split}\begin{aligned} & c = f(k,A(i)) - k' \\ & \Gamma(k,A(i)) = \left\{ k' : k' \in [0, f(k,A(i))] \right\},\end{aligned}\end{split}\]

for any \((k,A(i)) \in X \times S\).

To begin, we equip the preference and technology functions, respectively, with the specific parametric forms:

\[\begin{split}\begin{aligned} U(c) \begin{cases} = \frac{c^{1-\sigma} - 1}{1-\sigma} & \sigma > 1 \\ \\ = \ln(c) & \sigma \rightarrow 1 \end{cases} \end{aligned}\end{split}\]

and

\[f(k,A(i)) = A(i)k^{\alpha} + (1-\delta)k; \ \ \alpha \in (0,1), \delta \in (0,1].\]

Footnotes

[2]Since \((k,A) \mapsto f(k,A) \in \mathcal{X}\) defines a mapping from the product state space of \(N \times N_{grid}\) points to the endogenous state space \(\mathcal{X}\) with \(N_{grid}\) points.
[3]

The set of maximizers, recall, is

\[G^{\ast} = \left\{ k' \in \Gamma(A,k) : \forall (A,k) \in \mathcal{X} \times S, \ \text{s.t.} \ V(k,A(i)) = \max_{k' \in \Gamma(k,A(i))} U(c) + \beta \sum_{j=1}^{N}P_{ij}V[k',A'(j)] \right\}.\]

Note

Next, We’ll get our hands dirty in the accompanying TutoLabo sessions.

[Su1996]Sundaram, Rangarajan K., A First Course in Optimization Theory, 1996, Cambridge University Press.
[HL1996]Hernandez-Lerma, Onesimo and Jean Bernard Lasserre, Discrete-Time Markov Control Processes: Basic Optimality Criteria, 1996, Springer Verlag.
[HL1999]Hernandez-Lerma, Onesimo and Jean Bernard Lasserre, Further Topics on Discrete-Time Markov Control Processes, 1999, Springer Verlag.