Skip to content

Commit ef22741

Browse files
committed
For mpi: v^0 set to min_(s, a) r(s, a) / (1-beta) to guarantee convergence
1 parent ce96498 commit ef22741

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

quantecon/markov/mdp.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -598,8 +598,11 @@ def solve(self, method='policy_iteration',
598598
Solution method.
599599
600600
v_init : array_like(float, ndim=1), optional(default=None)
601-
Initial value function, of length n. If None, set v_init(s)
602-
= max_a r(s, a).
601+
Initial value function, of length n. If None, `v_init` is
602+
set such that v_init(s) = max_a r(s, a) for value iteration
603+
and policy iteration; for modified policy iteration,
604+
v_init(s) = min_(s', a) r(s', a)/(1 - beta) to guarantee
605+
convergence.
603606
604607
epsilon : scalar(float), optional(default=None)
605608
Value for epsilon-optimality. If None, the value stored in
@@ -733,7 +736,7 @@ def midrange(z):
733736

734737
v = np.empty(self.num_states)
735738
if v_init is None:
736-
self.s_wise_max(self.R, out=v)
739+
v[:] = self.R[self.R > -np.inf].min() / (1 - self.beta)
737740
else:
738741
v[:] = v_init
739742

0 commit comments

Comments
 (0)