On Markov decision models with an absorbing set
- 547 Downloads
We study a countable state and action Markov decision process with bounded rewards occurring up to the entrance into an absorbing set. Two optimality criteria are considered, the classical total reward criterion and a target-level criterion. For all discount factors smaller than a critical one, the standard results in dynamic programming (optimality equation, optimality of a decision rule, value iteration) are shown to hold. The value iteration is combined with an extrapolation giving upper and lower bounds to the value function at each step of iteration. The asymptotic behavior of the extrapolation method as well as the characterizations of the critical discount factor are based on the Perron-Frobenius theory for nonlinear operators. The special case of a Markov decision model with a random horizon is studied in detail. Finally, as a byproduct, an efficient computation of the mean entrance time of a Markov chain into an absorbing set is obtained.
KeywordsMarkov decision processes transient Markov decision processes extrapolation methods Perron-Frobenius theory expected total reward criterion target-level criterion absorbing sets Markov chains hitting times
Unable to display preview. Download preview PDF.
- E. Altman. Applications of Markov Decision Processes in Communication Networks: a Survey. In E. Feinberg and A. Shwartz (eds): Markov Decision Processes, Models, Methods, Directions, and Open Problems. Kluwer, Bosten, 488–536, 2001.Google Scholar
- R.E. Bellman. Dynamic Programming, Princeton University Press, Princeton, NJ, 1957.Google Scholar
- D.P. Bertsekas. Dynamic Programming and Optimal Control, Vol I/II, Athena Scientific, Belmont, Massachusetts, 2000/2001.Google Scholar
- A. Hordijk (1974). Dynamic programming and Markov potential theory, Mathematical Centre Tracts 51, Amsterdam, 1974.Google Scholar
- G. Latouche and V. Ramaswami. Introduction to Matrix Analytic Methods in Stochastic Modeling, ASA-SIAM series on statistics and applied probability, Philadelphia, 1999.Google Scholar
- S.R. Pliska. On the transient case for Markov decision chains with general state spaces. In M.L. Puterman (ed.). Dynamic Programming and Its Applications, Academic Press, New York, pages 335–349, 1978.Google Scholar