# PhD course "Markov Decision Processes and Stochastic Games"

Many economic decisions are of a dynamic nature. It could be to decide when to replace the engine of a bus with a new engine, or when to mark up or announce sales to manage the large inventory of certain goods in a retail supermarket store, or when to stop further studies and cash one’s education or when to acquire additional training to improve one’s job potential, or when to retire from service and start drawing social security benefits, or when to exit from tough competitions. All such problems are of a dynamic nature. In recent years many such economic problems have attracted the attention of several economists. For example John Rust (Econometrica 1987) used actual data maintained by an individual supervisor at the bus maintenance station at Madison, Wisconsin Metra where the supervisor kept complete records of the replacement of the engines with a new engine based on several factors like the mileage on the bus, the list of components repaired or replaced etc. on 162 buses over a period of roughly 10 years for the period December 1974 to May 1985. Whether the decisions he took can be considered optimal is analyzed by treating the problem as a Markov decision process with the basic assumptions about what he considered as his cost structure and what he would have considered as the state of the system. Mark Kennet (J. Applied Econometrics 1994) studied a similar repair problem for airplanes along the model of Rust. There were initially very many restrictions by the Congress demanding stricter engine overhaul requirements. After some years such restrictions where relaxed by the Congress by a new deregulation policy. Using the existing data on 42 engine histories collected from Pratt and Whitney Inc. Kennet finds that the airlines have drastically chosen to change the engine overhaul policies due to the efficient operation of flying longer hours before the costly overhaul operations. To claim this, the study assumes that efficient firms would choose a maintenance policy which minimizes expected discounted cost by solving a suitable stochastic discrete dynamic programming problem. The model provides insights into the perceived relative costs of engine maintenance, in flight shut down and ordinary operations. John Miller (J. Political Economy 1984) studies the problem of how an inexperienced job searcher can build up a career profile where certain types of jobs are to be sampled with some calculated risk before others are tried and after reaching the equilibrium job turnover rate one should settle down ultimately in a job that is the most appropriate for the person. He models this along the lines of multi-armed bandit problem. In his paper, on “Who Solved the Secretary Problem?”, Tom Ferguson (Statistical Science (1989)) discusses the historical aspects of the problem, including his speculation that Johannes Kepler may have used the optimal strategy to choose his second wife. The article also discusses several generalizations of the problem. Here the main problem is to search for the most beautiful secretary by interviewing one by one till one stops the search. The hitch is that when you reject a secretary the secretary is no more available for a second round of search and the order in which they are searched is quite random. The problem is to develop an optimal search policy which maximizes the probability of selecting the most beautiful secretary.

In studying the dynamics of mark ups and inventory decisions Victor Aguirregabiria (Review of Economic Studies 1999) analyses the interaction between prices and inventory decisions in retail sales, where the model reveals the pattern of long periods without even nominal price changes and very short periods with very low prices via the so called sales promotions. He shows how the fixed ordering costs and inventories play a major role in the types of price decisions. The problems are studied via Markov decision processes. Another example of a price adjustment model is on saltine cracker sales in a small town. Margaret Slade (Review of Economic Studies 1998) models this as a Markov decision process based on the Cournot model of demand as a function of price and the model shows that fixed costs are more relevant for price changes than variable costs. In stock market literature the so called Option pricing markets are quite appropriate as models of dynamic decisions. Often one ends up with qualitative properties of optimal decisions and the real problems are left with no matching solution techniques to arrive at optimal policies.

While models may be useful for qualitative observations the usefulness and applicability of the models will be many fold when suitable computational tools are developed simultaneously. While theoretical developments progress at linear rate algorithmic developments proceed at logarithmic rate!! When dynamic decisions are interactive, among many competing persons in an economy the problems are lot more complex. The problems are quite tricky even at two person dynamic competitions. Here the Markov Decision Problems extend to the so called theory of stochastic games. Often one can appreciate the depth and complexity and the progress in the area of stochastic games can be appreciated only after realizing the difficulties one has to overcome even at the single decision maker’s level.

**Topics will be:**

- Linear Programming and linear economic models, duality and linear complementarity.
- Mixed strategies and Minimax theorem for matrix games, geometric intuitions.
- Prisoner's dilemma, battle of sexes and other models of bimatrix games, Nash equilibrium theore, and computational aspects.
- Economic models of Markov decision processes: Stochastic matrices and basic notions of stationary Markov chain. Secretary problem, investment problems, finite horizon Markov decision processes with discounted and undiscounted payoffs, Optimality equation.
- Constructive approaches to discounted and undiscounted Markov decision processes via linear programming, and policy improvement algorithms together with economic examples.
- Stochastic games and order field property: Models of tax evasion, dynamic advertising competitions and stochastic games with order field property.
- Value for stochastic games via contraction mapping, policy improvement approach to stochastic games of perfect information.
- Big match and other complexities in behavioral strategies, Nash equilibria for structured stochastic games.

**Registration**

The course is free of charge for:

1. DGPE members from AU, KU, CBS, SDU.

2. PhD students from Economics Departments at other Nordic universities.

The course fee is EUR400 for other participants.

Registration no later than 6 June 2016 via webshop.

**Contact**

Peter Sudhölter, Department of Business and Economics, University of Southern Denmark,psu@sam.sdu.dkpauls@sam.sdu.dk