Learning to play Sokoban from videos
Nicolai Benjamin Fricker,
Nicolai Krüger and
Constantin Schubart
IU Discussion Papers - IT & Engineering from IU International University of Applied Sciences
Abstract:
In order to learn a task through behavior cloning, a dataset consisting of state-action pairs is needed. However, this kind of data is often not available in sufficient quantity or quality. Consequently, several publications have addressed the issue of extracting actions from a sequence of states to convert them into corresponding state-action pairs (Torabi et al., 2018; Edwards et al., 2019; Baker et al., 2022; Bruce et al., 2024). Using this dataset, an agent can then be trained via behavior cloning. For instance, this approach was applied to games such as Cartpole and Mountain Car (Edwards et al., 2019). Additionally, actions were extracted from videos of Minecraft (Baker et al., 2022) and jump 'n' run games (Edwards et al., 2019; Bruce et al., 2024) to train deep neural network models to play these games. In this work, videos from YouTube as well as synthetic videos of the game Sokoban were analyzed. Sokoban is a single-player, turn-based game where the player has to push boxes onto target squares (Murase et al., 1996). The actions that a user performs in the videos were extracted using a modified training procedure described by Edwards et al. (2019). The resulting state-action pairs were used to train deep neural network models to play Sokoban. These models were further improved with reinforcement learning in combination with a Monte Carlo tree search as a planning step. The resulting agent demonstrated moderate playing strength. In addition to learning how to solve a Sokoban puzzle, the rules of Sokoban were learned from videos. This enabled the creation of a Sokoban simulator, which was used to carry out model-based reinforcement learning. This work serves as a proof of concept, demonstrating that it is possible to extract actions from videos of a strategy game, perform behavior cloning, infer the rules of the game, and perform model-based reinforcement learning - all without direct interaction with the game environment. Code and models are available at https://github.com/loanMaster/sokoban_learning.
Keywords: Imitation learning; behavior cloning; deep neural network models; reinforcement learning (search for similar items in EconPapers)
Date: 2024
New Economics Papers: this item is included in nep-big
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.econstor.eu/bitstream/10419/303523/1/1904169317.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:zbw:iubhit:303523
Access Statistics for this paper
More papers in IU Discussion Papers - IT & Engineering from IU International University of Applied Sciences
Bibliographic data for series maintained by ZBW - Leibniz Information Centre for Economics ().