Deriving the Optimal Strategy for the Two Dice Pig Game via Reinforcement Learning
Tian Zhu () and
Merry H. Ma
Additional contact information
Tian Zhu: Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, NY 11794, USA
Merry H. Ma: Stony Brook School, 1 Chapman Pkwy, Stony Brook, NY 11790, USA
Stats, 2022, vol. 5, issue 3, 1-14
Abstract:
Games of chance have historically played a critical role in the development and teaching of probability theory and game theory, and, in the modern age, computer programming and reinforcement learning. In this paper, we derive the optimal strategy for playing the two-dice game Pig, both the standard version and its variant with doubles, coined “Double-Trouble”, using certain fundamental concepts of reinforcement learning, especially the Markov decision process and dynamic programming. We further compare the newly derived optimal strategy to other popular play strategies in terms of the winning chances and the order of play. In particular, we compare to the popular “hold at n” strategy, which is considered to be close to the optimal strategy, especially for the best n, for each type of Pig Game. For the standard two-player, two-dice, sequential Pig Game examined here, we found that “hold at 23” is the best choice, with the average winning chance against the optimal strategy being 0.4747. For the “Double-Trouble” version, we found that the “hold at 18” is the best choice, with the average winning chance against the optimal strategy being 0.4733. Furthermore, time in terms of turns to play each type of game is also examined for practical purposes. For optimal vs. optimal or optimal vs. the best “hold at n” strategy, we found that the average number of turns is 19, 23, and 24 for one-die Pig, standard two-dice Pig, and the “Double-Trouble” two-dice Pig games, respectively. We hope our work will inspire students of all ages to invest in the field of reinforcement learning, which is crucial for the development of artificial intelligence and robotics and, subsequently, for the future of humanity.
Keywords: dynamic programming; game theory; Markov decision process; optimization; two-dice pig game; value iteration (search for similar items in EconPapers)
JEL-codes: C1 C10 C11 C14 C15 C16 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.mdpi.com/2571-905X/5/3/47/pdf (application/pdf)
https://www.mdpi.com/2571-905X/5/3/47/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jstats:v:5:y:2022:i:3:p:47-818:d:890803
Access Statistics for this article
Stats is currently edited by Mrs. Minnie Li
More articles in Stats from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().