CONSTRAINED MARKOV DECISION PROCESSES By Eitan ALTMAN CRC Press / Chapman and Hall, March 1999 Contents 1 Introduction 1 1.1 Examples of constrained dynamic control problems 1 1.2 On solution approaches for CMDPs with expected costs 3 1.3 Other types of CMDPs 5 1.4 Cost criteria and assumptions 7 1.5 The convex analytical approach and occupation measures 8 1.6 Linear Programming and Lagrangian approach for CMDPs 10 1.7 About the methodology 12 1.8 The structure of the book 17 I Part One: Finite MDPs 19 2 Markov decision processes 21 2.1 The model 21 2.2 Cost criteria and the constrained problem 23 2.3 Some notation 24 2.4 The dominance of Markov policies 25 3 The discounted cost 27 3.1 Occupation measure and the primal LP 27 3.2 Dynamic programming and dual LP: the unconstrained case 30 3.3 Constrained control: Lagrangian approach 32 3.4 The dual LP 33 3.5 Number of randomizations 34 4 The expected average cost 37 4.1 Occupation measure and the primal LP 37 4.2 Equivalent Linear Program 41 4.3 The Dual Program 42 4.4 Number of randomizations 43 5 Flow and service control in a single-server queue 45 5.1 The model 45 5.2 The Lagrangian 47 5.3 The original constrained problem 53 5.4 Structure of randomization and implementation issues 53 5.5 On coordination between controllers 54 5.6 Open questions 55 II Part Two: Infinite MDPs 57 6 MDPs with infinite state and action spaces 59 6.1 The model 59 6.2 Cost criteria 61 6.3 Mixed policies and topologic structure 62 6.4 The dominance of Markov policies 63 6.5 Aggregation of states 65 6.6 Extra randomization in the policies 68 6.7 Equivalent quasi-Markov model and quasi-Markov policies 70 7 The total cost: classification of MDPs 75 7.1 Transient and Absorbing MDPs 75 7.2 MDPs with uniform Lyapunov functions 77 7.3 Equivalence of MDP with unbounded and bounded costs 78 7.4 Properties of MDPs with uniform Lyapunov functions 84 7.5 Properties for fixed initial distribution 89 7.6 Examples of uniform Lyapunov functions 93 7.7 Contracting MDPs 96 8 The total cost: occupation measures and the primal LP 101 8.1 Occupation measure 101 8.2 Continuity of occupation measures 104 8.3 More properties of MDPs 110 8.4 Characterization of the sets of occupation measure 110 8.5 Relation between cost and occupation measure 112 8.6 Dominating classes of policies 114 8.7 Equivalent Linear Program 115 8.8 The dual program 116 9 The total cost: Dynamic and Linear Programming 117 9.1 Non-constrained control: Dynamic and Linear Programming 118 9.2 Super-harmonic functions and Linear Programming 122 9.3 Set of achievable costs 127 9.4 Constrained control: Lagrangian approach 128 9.5 The Dual LP 131 9.6 State truncation 132 9.7 A second LP approach for optimal mixed policies 133 9.8 More on unbounded costs 134 10 The discounted cost 137 10.1 The equivalent total cost model 137 10.2 Occupation measure and LP 138 10.3 Non-negative immediate cost 138 10.4 Weak contracting assumptions and Lyapunov functions 139 10.5 Example: flow and service control 140 11 The expected average cost 143 11.1 Occupation measure 143 11.2 Completeness properties of stationary policies 147 11.3 Relation between cost and occupation measure 150 11.4 Dominating classes of policies 154 11.5 Equivalent Linear Program 157 11.6 The Dual Program 158 11.7 The contracting framework 158 11.8 Other conditions for the uniform integrability 160 11.9 The case of uniform Lyapunov conditions 161 12 Expected average cost: Dynamic Programming and LP 165 12.1 The non-constrained case: optimality inequality 165 12.2 Non-constrained control: cost bounded below 169 12.3 Dynamic programming and uniform Lyapunov function 171 12.4 Superharmonic functions and linear programming 173 12.5 Set of achievable costs 176 12.6 Constrained control: Lagrangian approach 176 12.7 The dual LP 178 12.8 A second LP approach for optimal mixed policies 179 III Part Three: Asymptotic methods and approximations 181 13 Sensitivity analysis 183 13.1 Introduction 183 13.2 Approximation of the values 186 13.3 Approximation and robustness of the policies 190 14 Convergence of discounted constrained MDPs 193 14.1 Convergence in the discount factor 193 14.2 Convergence to the expected average cost 194 14.3 The case of uniform Lyapunov function 195 15 Convergence as the horizon tends to infinity 199 15.1 The discounted cost 199 15.2 The expected average cost: stationary policies 200 15.3 The expected average cost: general policies 201 16 State truncation and approximation 205 16.1 The approximating sets of states 206 16.2 Scheme I: the total cost 208 16.3 Scheme II: the total cost 211 16.4 Scheme III: the total cost 214 16.5 The expected average cost 214 16.6 Infinite MDPs: on the number of randomizations 215 17 Appendix: Convergence of probability measures 217 18 References 221 19 List of Symbols and Notation 235 Index 239 For more information, see http://www.crcpress.com/ or send an email to altman@sophia.inria.fr