Advertisement

On Markov decision models with an absorbing set

  • Karl-Heinz Waldmann
Chapter
  • 547 Downloads
Part of the CISM International Centre for Mechanical Sciences book series (CISM, volume 482)

Abstract

We study a countable state and action Markov decision process with bounded rewards occurring up to the entrance into an absorbing set. Two optimality criteria are considered, the classical total reward criterion and a target-level criterion. For all discount factors smaller than a critical one, the standard results in dynamic programming (optimality equation, optimality of a decision rule, value iteration) are shown to hold. The value iteration is combined with an extrapolation giving upper and lower bounds to the value function at each step of iteration. The asymptotic behavior of the extrapolation method as well as the characterizations of the critical discount factor are based on the Perron-Frobenius theory for nonlinear operators. The special case of a Markov decision model with a random horizon is studied in detail. Finally, as a byproduct, an efficient computation of the mean entrance time of a Markov chain into an absorbing set is obtained.

Keywords

Markov decision processes transient Markov decision processes extrapolation methods Perron-Frobenius theory expected total reward criterion target-level criterion absorbing sets Markov chains hitting times 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

  1. [1]
    E. Altman. Applications of Markov Decision Processes in Communication Networks: a Survey. In E. Feinberg and A. Shwartz (eds): Markov Decision Processes, Models, Methods, Directions, and Open Problems. Kluwer, Bosten, 488–536, 2001.Google Scholar
  2. [2]
    R.E. Bellman. Dynamic Programming, Princeton University Press, Princeton, NJ, 1957.Google Scholar
  3. [3]
    D.P. Bertsekas. Dynamic Programming and Optimal Control, Vol I/II, Athena Scientific, Belmont, Massachusetts, 2000/2001.Google Scholar
  4. [4]
    M. Boukiz and Y. Kebir. Target-Level Criterion in Markov Decision Processes. J. Optim. Theory. Appl., 86: 1–15, 1995.CrossRefMathSciNetGoogle Scholar
  5. [5]
    O. Hernández-Lerma and J. B. Lasserre. Further Topics on Discrete-Time Markov Control Processes, Springer, New York, 1999.zbMATHGoogle Scholar
  6. [6]
    K. Hinderer and K.-H. Waldmann. The critical discount factor for finite Markovian decision processes with an absorbing set. Math. Methods of Oper. Res., 57: 1–19, 2003.zbMATHCrossRefMathSciNetGoogle Scholar
  7. [7]
    K. Hinderer and K.-H. Waldmann. Algorithms for countable state Markov decision models with an absorbing set. SIAM J. Control and Optimization, 43: 2109–2131, 2005.zbMATHCrossRefMathSciNetGoogle Scholar
  8. [8]
    A. Hordijk (1974). Dynamic programming and Markov potential theory, Mathematical Centre Tracts 51, Amsterdam, 1974.Google Scholar
  9. [9]
    G. Hübner. Bounds and good policies in stationary finite-stage Markovian decision problems. Adv. Appl. Prob., 12: 154–173, 1980.zbMATHCrossRefGoogle Scholar
  10. [10]
    G. Latouche and V. Ramaswami. Introduction to Matrix Analytic Methods in Stochastic Modeling, ASA-SIAM series on statistics and applied probability, Philadelphia, 1999.Google Scholar
  11. [11]
    T. Ogiwara. Nonlinear Perron-Frobenius problem on an ordered Banach space. Japan J. Math., 21: 43–103, 1995.zbMATHMathSciNetGoogle Scholar
  12. [12]
    S.R. Pliska. On the transient case for Markov decision chains with general state spaces. In M.L. Puterman (ed.). Dynamic Programming and Its Applications, Academic Press, New York, pages 335–349, 1978.Google Scholar
  13. [13]
    M.L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York, 1994.zbMATHGoogle Scholar
  14. [15]
    H. Schellhaas. Zur Extrapolation in Markoffschen Entscheidungsmodellen mit Diskontierung. ZOR, 18: 91–104, 1974.zbMATHMathSciNetGoogle Scholar
  15. [15]
    L.I. Sennott. Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley, New York, 1999.zbMATHGoogle Scholar
  16. [16]
    A.F. Veinott. Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Stat, 40: 1635–1660, 1969.MathSciNetzbMATHGoogle Scholar
  17. [17]
    K.-H. Waldmann. On bounds for dynamic programs. Mathematics of Operations Research, 10: 220–232, 1985.zbMATHMathSciNetCrossRefGoogle Scholar
  18. [18]
    K.-H. Waldmann. Bounds for the Distribution of the Run length of One-Sided and Two-Sided CUSUM Quality Control Schemes. Technometrics, 28: 61–67, 1986a.zbMATHCrossRefMathSciNetGoogle Scholar
  19. [19]
    K.-H. Waldmann. Bounds for the Distribution of the Run length of Geometric Moving Average Charts. Appl. Statist. 36: 151–158, 1986b.CrossRefMathSciNetGoogle Scholar
  20. [20]
    P. Whittle. Optimization Over Time, Vol II, John Wiley, New York, 1983.zbMATHGoogle Scholar
  21. [21]
    C. Wu and Y. Lin. Minimizing Risk Models in Markov Decision Processes with Policies Depending on Target Values. J. Math. Anal. Appl, 231: 47–60, 1999.zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© CISM, Udine 2006

Authors and Affiliations

  • Karl-Heinz Waldmann
    • 1
  1. 1.Inst. Wirtschaftstheorie und Operations ResearchUniversität KarlsruheKarlsruheGermany

Personalised recommendations