Control of a Markov Chain with Unknown Dynamics and Cost Structure
S. Lakshmivarahan
Additional contact information
S. Lakshmivarahan: University of Oklahoma, School of Electrical Engineering and Computer Science
Chapter Chapter 8 in Learning Algorithms Theory and Applications, 1981, pp 228-256 from Springer
Abstract:
Abstract This chapter deals with the application of the “absolutely expedient” learning algorithms (developed in chapter 3) for the problem of control of a finite state Markov chain whose transition probabilities as a function of a finite number of control actions are unknown. At any instant of time depending on the state of the Markov chain and the control action chosen a reward is incurred. It is assumed that this reward is a two valued (binary) random variable whose distribution as a function of the state and the control action is unknown, but the sequence of states actually visited by the Markov chain is available. In other words we consider a Markov chain whose dynamics and reward structure are unknown but the state is observable exactly.
Keywords: Markov Chain; Control Action; Markov Chain Model; Reward Structure; Finite Markov Chain (search for similar items in EconPapers)
Date: 1981
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-1-4612-5975-6_8
Ordering information: This item can be ordered from
http://www.springer.com/9781461259756
DOI: 10.1007/978-1-4612-5975-6_8
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().