CONSTRAINED MARKOV DECISION PROCESSES
		By
	    Eitan ALTMAN

CRC Press / Chapman and Hall, March 1999


Contents 

1 Introduction 1

1.1 Examples of constrained dynamic control problems 1 
1.2 On solution approaches for CMDPs with expected costs 3 
1.3 Other types of CMDPs 5 
1.4 Cost criteria and assumptions 7 
1.5 The convex analytical approach and occupation measures 8 
1.6 Linear Programming and Lagrangian approach for CMDPs 10 
1.7 About the methodology 12 
1.8 The structure of the book 17

I Part One: Finite MDPs 19 

2 Markov decision processes 21

2.1 The model 21 
2.2 Cost criteria and the constrained problem 23 
2.3 Some notation 24 
2.4 The dominance of Markov policies 25

3 The discounted cost 27

3.1 Occupation measure and the primal LP 27 
3.2 Dynamic programming and dual LP: the unconstrained case 30 
3.3 Constrained control: Lagrangian approach 32 
3.4 The dual LP 33 3.5 Number of randomizations 34

4 The expected average cost 37

4.1 Occupation measure and the primal LP 37 
4.2 Equivalent Linear Program 41 
4.3 The Dual Program 42 
4.4 Number of randomizations 43

5 Flow and service control in a single-server queue 45

5.1 The model 45 
5.2 The Lagrangian 47
5.3 The original constrained problem 53 
5.4 Structure of randomization and implementation issues 53 
5.5 On coordination between controllers 54 
5.6 Open questions 55

II Part Two: Infinite MDPs 57 

6 MDPs with infinite state and action spaces 59

6.1 The model 59 
6.2 Cost criteria 61 
6.3 Mixed policies and topologic structure  62 
6.4 The dominance of Markov policies 63 
6.5 Aggregation of states  65 
6.6 Extra randomization in the policies  68 
6.7 Equivalent quasi-Markov model and quasi-Markov policies 70

7 The total cost: classification of MDPs 75

7.1 Transient and Absorbing MDPs 75 
7.2 MDPs with uniform Lyapunov functions 77 
7.3 Equivalence of MDP with unbounded and bounded costs  78 
7.4 Properties of MDPs with uniform Lyapunov functions  84 
7.5 Properties for fixed initial distribution  89 
7.6 Examples of uniform Lyapunov functions 93 
7.7 Contracting MDPs 96

8 The total cost: occupation measures and the primal LP 101

8.1 Occupation measure 101 
8.2 Continuity of occupation measures 104 
8.3 More properties of MDPs  110 
8.4 Characterization of the sets of occupation measure 110 
8.5 Relation between cost and occupation measure 112 
8.6 Dominating classes of policies 114 
8.7 Equivalent Linear Program 115 
8.8 The dual program 116

9 The total cost: Dynamic and Linear Programming 117

9.1 Non-constrained control: Dynamic and Linear Programming 118 
9.2 Super-harmonic functions and Linear Programming 122 
9.3 Set of achievable costs 127 
9.4 Constrained control: Lagrangian approach 128 
9.5 The Dual LP 131 
9.6 State truncation 132 
9.7 A second LP approach for optimal mixed policies 133 
9.8 More on unbounded costs 134

10 The discounted cost 137

10.1 The equivalent total cost model 137 
10.2 Occupation measure and LP 138 
10.3 Non-negative immediate cost 138 
10.4 Weak contracting assumptions and Lyapunov functions 139 
10.5 Example: flow and service control 140

11 The expected average cost 143

11.1 Occupation measure 143 
11.2 Completeness properties of stationary policies 147 
11.3 Relation between cost and occupation measure 150 
11.4 Dominating classes of policies 154 
11.5 Equivalent Linear Program 157 
11.6 The Dual Program 158 
11.7 The contracting framework 158 
11.8 Other conditions for the uniform integrability 160 
11.9 The case of uniform Lyapunov conditions 161

12 Expected average cost: Dynamic Programming and LP 165

12.1 The non-constrained case: optimality inequality 165 
12.2 Non-constrained control: cost bounded below 169 
12.3 Dynamic programming and uniform Lyapunov function 171 
12.4 Superharmonic functions and linear programming 173 
12.5 Set of achievable costs 176 
12.6 Constrained control: Lagrangian approach 176 
12.7 The dual LP 178 
12.8 A second LP approach for optimal mixed policies 179

III Part Three: Asymptotic methods and approximations 181 

13 Sensitivity analysis 183

13.1 Introduction 183 
13.2 Approximation of the values 186 
13.3 Approximation and robustness of the policies 190

14 Convergence of discounted constrained MDPs 193

14.1 Convergence in the discount factor 193 
14.2 Convergence to the expected average cost 194 
14.3 The case of uniform Lyapunov function 195

15 Convergence as the horizon tends to infinity 199

15.1 The discounted cost 199 
15.2 The expected average cost: stationary policies 200 
15.3 The expected average cost: general policies 201

16 State truncation and approximation 205

16.1 The approximating sets of states 206 
16.2 Scheme I: the total cost 208 
16.3 Scheme II: the total cost 211 
16.4 Scheme III: the total cost 214 
16.5 The expected average cost 214 
16.6 Infinite MDPs: on the number of randomizations 215

17 Appendix: Convergence of probability measures 217 

18 References 221 

19 List of Symbols and Notation 235 

Index 239

For more information, see http://www.crcpress.com/
or send an email to altman@sophia.inria.fr