1 Introduction PART I Mathematical background 2 Real analysis and linear algebra 2.1 Definitions and notation 2.1.1 Numbers, sets and vectors 2.1.2 Logical notation 2.1.3 Set algebra 2.1.4 The supremum and infimum 2.1.5 Rounding off 2.1.6 Functions 2.1.7 Sequences and limits 2.1.8 Infinite series 2.1.9 Geometric series 2.1.10 Classes of real valued functions 2.1.11 Graphs 2.1.12 The binomial coefficient 2.1.13 Stirling's approximation of the factorial 2.1.14 L'Hopital's rule 2.1.15 Taylor's theorem 2.1.16 The lp norm 2.1.17 Power means 2.2 Equivalence relationships 2.3 Linear algebra 2.3.1 Matrices 2.3.2 Eigenvalues and spectral decomposition 2.3.3 Symmetric, Hermitian and positive definite matrices 2.3.4 Positive matrices 2.3.5 Stochastic matrices 2.3.6 Nonnegative matrices and graph structure 3 Background - measure theory 3.1 Topological spaces 3.1.1 Bases of topologies 3.1.2 Metric space topologies 3.2 Measure spaces 3.2.1 Formal construction of measures 3.2.2 Completion of measures 3.2.3 Outer measure 3.2.4 Extension of measures 3.2.5 Counting measure 3.2.6 Lebesgue measure 3.2.7 Borel sets 3.2.8 Dynkin system theorem 3.2.9 Signed measures 3.2.10 Decomposition of measures 3.2.11 Measurable functions 3.3 Integration 3.3.1 Convergence of integrals 3.3.2 Lp spaces 3.3.3 Radon-Nikodym derivative 3.4 Product spaces 3.4.1 Product topologies 3.4.2 Product measures 3.4.3 The Kolmogorov extension theorem 4 Background - probability theory 4.1 Probability measures - basic properties 4.2 Moment generating functions (MGF) and cumulant generating functions (CGF) 4.2.1 Moments and cumulants 4.2.2 MGF and CGF of independent sums 4.2.3 Relationship of the CGF to the normal distribution 4.2.4 Probability generating functions 4.3 Conditional distributions 4.4 Martingales 4.4.1 Stopping times 4.5 Some important theorems 4.6 Inequalities for tail probabilities 4.6.1 Chernoff bounds 4.6.2 Chernoff bound for the normal distribution 4.6.3 Chernoff bound for the gamma distribution 4.6.4 Sample means 4.6.5 Some inequalities for bounded random variables 4.7 Stochastic ordering 4.7.1 MGF ordering of the gamma and exponential distribution 4.7.2 Improved bounds based on hazard functions 4.8 Theory of stochastic limits 4.8.1 Covergence of random variables 4.8.2 Convergence of measures 4.8.3 Total variation norm 4.9 Stochastic kernels 4.9.1 Measurability of measure kernels 4.9.2 Continuity of measure kernels 4.10 Convergence of sums 4.11 The law of large numbers 4.12 Extreme value theory 4.13 Maximum likelihood estimation 4.14 Nonparametric estimates of distributions 4.15 Total variation distance for discrete distributions 5 Background - stochastic processes 5.1 Counting processes 5.1.1 Renewal processes 5.1.2 Poisson process 5.2 Markov processes 5.2.1 Discrete state spaces 5.2.2 Global properties of Markov chains 5.2.3 General state spaces 5.2.4 Geometric ergodicity 5.2.5 Spectral properties of Markov chains 5.3 Continuous-time Markov chains 5.3.1 Birth and death processes 5.4 Queueing systems 5.4.1 Queueing systems as birth and death processes 5.4.2 Utilization factor 5.4.3 General queueing systems and embedded Markov chains 5.5 Adapted counting processes 5.5.1 Asymptotic behavior 5.5.2 Relationship to adapted events 6 Functional analysis 6.1 Metric spaces 6.1.1 Contractive mappings 6.2 The Banach fixed point theorem 6.2.1 Stopping rules for fixed point algorithms 6.3 Vector spaces 6.3.1 Quotient spaces 6.3.2 Basis of a vector space 6.3.3 Operators 6.4 Banach spaces 6.4.1 Banach spaces and completeness 6.4.2 Linear operators 6.5 Norms and norm equivalence 6.5.1 Norm dominance 6.5.2 Equivalence properties of norm equivalence classes 6.6 Quotient spaces and seminorms 6.7 Hilbert spaces 6.8 Examples of Banach spaces 6.8.1 Finite dimensional spaces 6.8.2 Matrix norms and the submultiplicative property 6.8.3 Weighted norms on function spaces 6.8.4 Span seminorms 6.8.5 Operators on span quotient spaces 6.9 Measure kernels as linear operators 6.9.1 The contraction property of stochastic kernels 6.9.2 Stochastic kernels and the span seminorm 7 Fixed point equations 7.1 Contraction as a norm equivalence property 7.2 Linear fixed point equations 7.3 The geometric series theorem 7.4 Invariant transformations of fixed point equations 7.5 Fixed point algorithms and the span seminorm 7.5.1 Approximations in the span seminorm 7.5.2 Magnitude of fixed points in the span seminorm 7.6 Stopping rules for fixed point algorithms 7.6.1 Fixed point iteration in the span seminorm 7.7 Perturbations of fixed point equations 8 The distribution of a maximum 8.1 General approach 8.2 Bounds on -M based on MGFs 8.2.1 Sample means 8.2.2 Gamma distribution 8.3 Bounds for varying marginal distributions 8.3.1 Example 8.4 Tail probabilities of maxima 8.4.1 Extreme value distributions 8.4.2 Tail probabilities based on Boole's inequality 8.4.3 The normal case 8.4.4 The gamma(alpha, lambda) case 8.5 Variance mixtures based on random sample sizes 8.6 Bounds for maxima based on the first two moments 8.6.1 Stability PART II General theory of approximate iterative algorithms 9 Background - linear convergence 9.1 Linear convergence 9.2 Construction of envelopes - the nonstochastic case 9.3 Construction of envelopes - the stochastic case 9.4 A version of l'Hopital's rule for series 10 A general theory of approximate iterative algorithms (AIA) 10.1 A general tolerance model 10.2 Example: a preliminary model 10.3 Model elements of an AIA 10.3.1 Lipschitz kernels 10.3.2 Lipschitz convolutions 10.4 A classification system for AIAs 10.4.1 Relative error model 10.5 General inequalities 10.5.1 Hilbert space models of AIAs 10.6 Nonexpansive operators 10.6.1 Application of general inequalities to nonexpansive AIAs 10.6.2 Weakly contractive AIAs 216 10.6.3 Examples 10.6.4 Stochastic approximation (Robbins-Monro algorithm) 10.7 Rates of convergence for AIAs 10.7.1 Monotonicity of the Lipschitz kernel 10.7.2 Case I - strongly contractive models with nonvanishing bounds 10.7.3 Case II - rapidly vanishing approximation error 10.7.4 Case III - approximation error decreasing at contraction rate 10.7.5 Case IV - Approximation error greater than contraction rate 10.7.6 Case V - Contraction rates approaching 1 10.7.7 Adjustments for relative error models 10.7.8 A comparison of Banach space and Hilbert space models 10.8 Stochastic approximation as a weakly contractive algorithm 10.9 Tightness of algorithm tolerance 10.10 Finite bounds 10.10.1 Numerical example 10.11 Summary of convergence rates for strongly contractive models 11 Selection of approximation schedules for coarse-to-fine AIAs 11.1 Extending the tolerance model 11.1.1 Comparison model for tolerance schedules 11.1.2 Regularity conditions for the computation function 11.2 Main result 11.3 Examples of cost functions 11.4 A general principle for AIAs PART III Application to Markov decision processes 12 Markov decision processes (MDP) - background 12.1 Model definition 12.2 The optimal control problem 12.2.1 Adaptive control policies 12.2.2 Optimal control policies 12.3 Dynamic programming and linear operators 12.3.1 The dynamic programming operator (DPO) 12.3.2 Finite horizon dynamic programming 12.3.3 Infinite horizon problem 12.3.4 Classes of MDP 12.3.5 Measurability of the DPO 12.4 Dynamic programming and value iteration 12.4.1 Value iteration and optimality 12.5 Regret and epsilon-optimal solutions 12.6 Banach space structure of dynamic programming 12.6.1 The contraction property 12.6.2 Contraction properties of the DPO 12.6.3 The equivalence of uniform convergence and contraction for the DPO 12.7 Average cost criterion for MDP 13 Markov decision processes - value iteration 13.1 Value iteration on quotient spaces 13.2 Contraction in the span seminorm 13.2.1 Contraction properties of the DPO 13.3 Stopping rules for value iteration 13.4 Value iteration in the span seminorm 13.5 Example: M/D/1/K queueing system 13.6 Efficient calculation of |||QJ |||SP 13.7 Example: M/D/1/K system with optimal control of service capacity 13.8 Policy iteration 13.9 Value iteration for the average cost optimization 14 Model approximation in dynamic programming - general theory 14.1 The general inequality for MDPs 14.2 Model distance 14.3 Regret 14.4 A comment on the approximation of regret 14.5 Example 15 Sampling based approximation methods 15.1 Modeling maxima 15.1.1 Nonuniform sample allocation: Dependence on qmin, and the 'Curse of the Supremum Norm' 15.1.2 Some queueing system examples 15.1.3 Truncated geometric model 15.1.4 M/G/1/K queueing model 15.1.5 Restarting schemes 15.2 Continuous state/action spaces 15.3 Parametric estimation of MDP models 16 Approximate value iteration by truncation 16.1 Truncation algorithm 16.2 Regularity conditions for tolerance-cost model 16.2.1 Suboptimal orderings 16.3 Example 17 Grid approximations of MDPs with continuous state/action spaces 17.1 Discretization methods 17.2 Complexity analysis 17.3 Application of approximation schedules 18 Adaptive control of MDPs 18.1 Regret bounds for adaptive policies 18.2 Definition of an adaptive MDP 18.3 Online parameter estimation 18.4 Exploration schedule Bibliography Subject index
{{comment.content}}