CAN YOU BEAT THE ODDS? A REINFORCEMENT LEARNING APPROACH TO OPTIMAL POLICY SEARCH IN BLACKJACK
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractBlackjack is a popular casino card game in which players compete against the dealer to get as close to 21 points as possible without going over. Given the stakes involved, the question of whether there exists an optimal strategy to consistently win naturally arises. However, finding this strategy can be fairly challenging and requires sophisticated methods. We present a reinforcement learning approach to search for the optimal policy and determine if human players can beat the odds by implementing such a strategy. Reinforcement learning consists of an agent interacting with an environment by taking actions and receiving rewards that indicate the value of those actions. A specific type of reinforcement learning, known as Q-learning, keeps track of state-action values that update during gameplay. We make a comparison between the strategy learned by the Q-learning algorithm and an existing strategy for Blackjack. Additionally, we explore the effects of tuning hyperparameters and including more information about the state of the game on win rate performance. No combination of state space representation and hyperparameters, however, reach the performance of the existing strategy. This strategy achieves a win rate of about 43% with a negative cumulative average reward over time, indicating that the odds of winning consistently are stacked against the player.
Degree ProgramStatistics and Data Science