36,59 €
Implement reinforcement learning techniques and algorithms with the help of real-world examples and recipes
Key Features
Book Description
Reinforcement learning (RL) is a branch of machine learning that has gained popularity in recent times. It allows you to train AI models that learn from their own actions and optimize their behavior. PyTorch has also emerged as the preferred tool for training RL models because of its efficiency and ease of use.
With this book, you'll explore the important RL concepts and the implementation of algorithms in PyTorch 1.x. The recipes in the book, along with real-world examples, will help you master various RL techniques, such as dynamic programming, Monte Carlo simulations, temporal difference, and Q-learning. You'll also gain insights into industry-specific applications of these techniques. Later chapters will guide you through solving problems such as the multi-armed bandit problem and the cartpole problem using the multi-armed bandit algorithm and function approximation. You'll also learn how to use Deep Q-Networks to complete Atari games, along with how to effectively implement policy gradients. Finally, you'll discover how RL techniques are applied to Blackjack, Gridworld environments, internet advertising, and the Flappy Bird game.
By the end of this book, you'll have developed the skills you need to implement popular RL algorithms and use RL techniques to solve real-world problems.
What you will learn
Who this book is for
Machine learning engineers, data scientists and AI researchers looking for quick solutions to different reinforcement learning problems will find this book useful. Although prior knowledge of machine learning concepts is required, experience with PyTorch will be useful but not necessary.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 343
Veröffentlichungsjahr: 2019
Copyright © 2019 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Amey VarangaonkarAcquisition Editor:Devika BattikeContent Development Editor:Athikho Sapuni RishanaSenior Editor: Ayaan HodaTechnical Editor: Utkarsha S. KadamCopy Editor: Safis EditingProject Coordinator:Kirti PisatProofreader: Safis EditingIndexer:Rekha NairProduction Designer:Shraddha Falebhai
First published: October 2019
Production reference: 1311019
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-83855-196-4
www.packt.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Yuxi (Hayden) Liu is an experienced data scientist who's focused on developing machine learning and deep learning models and systems. He has worked in a variety of data-driven domains and has applied his expertise in reinforcement learning to computational problems. He is an education enthusiast and is the author of a series of machine learning books. His first book, Python Machine Learning By Example, was a #1 bestseller on Amazon India in 2017 and 2018. His other books include R Deep Learning Projects and Hands-On Deep Learning Architectures with Python, published by Packt. He has also published five first-authored IEEE transaction and conference papers during his master's research at the University of Toronto.
Greg Walters has been involved with computers and computer programming since 1972. Currently, he is extremely well versed in Visual Basic, Visual Basic .NET, Python, and SQL using MySQL, SQLite, Microsoft SQL Server, Oracle, C++, Delphi, Modula-2, Pascal, C, 80x86 Assembler, COBOL, and Fortran. He is a programming trainer and has trained numerous people in many pieces of computer software, including MySQL, Open Database Connectivity, Quattro Pro, Corel Draw!, Paradox, Microsoft Word, Excel, DOS, Windows 3.11, Windows for Workgroups, Windows 95, Windows NT, Windows 2000, Windows XP, and Linux. He is currently retired and, in his spare time, is a musician and avid cook, but he is also open to working as a freelancer on various projects.
Robert Moni is a PhD student at Budapest University of Technology and Economics (BME) and is also a Deep Learning Expert at Continental's Deep Learning Competence Center in Budapest. He also manages a cooperation project established between and BME with the goal of supporting students in conducting research in the field of deep learning and autonomous driving. His research topic is deep reinforcement learning in complex environments, and his goal is to apply this technology to self-driving vehicles.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
PyTorch 1.x Reinforcement Learning Cookbook
About Packt
Why subscribe?
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Sections
Getting ready
How to do it…
How it works…
There's more…
See also
Get in touch
Reviews
Getting Started with Reinforcement Learning and PyTorch
Setting up the working environment
How to do it...
How it works...
There's more... 
See also
Installing OpenAI Gym
How to do it...
How it works...
There's more...
See also
Simulating Atari environments
How to do it...
How it works...
There's more...
See also
Simulating the CartPole environment
How to do it...
How it works...
There's more...
Reviewing the fundamentals of PyTorch
How to do it...
There's more...
See also
Implementing and evaluating a random search policy
How to do it...
How it works...
There's more...
Developing the hill-climbing algorithm
How to do it...
How it works...
There's more...
See also
Developing a policy gradient algorithm
How to do it...
How it works...
There's more...
See also
Markov Decision Processes and Dynamic Programming
Technical requirements
Creating a Markov chain
How to do it...
How it works...
There's more...
See also
Creating an MDP
How to do it...
How it works...
There's more...
See also
Performing policy evaluation
How to do it...
How it works...
There's more...
Simulating the FrozenLake environment
Getting ready
How to do it...
How it works...
There's more...
Solving an MDP with a value iteration algorithm
How to do it...
How it works...
There's more...
Solving an MDP with a policy iteration algorithm
How to do it...
How it works...
There's more...
See also
Solving the coin-flipping gamble problem
How to do it...
How it works...
There's more...
Monte Carlo Methods for Making Numerical Estimations
Calculating Pi using the Monte Carlo method
How to do it...
How it works...
There's more...
See also
Performing Monte Carlo policy evaluation
How to do it...
How it works...
There's more...
Playing Blackjack with Monte Carlo prediction
How to do it...
How it works...
There's more...
See also
Performing on-policy Monte Carlo control
How to do it...
How it works...
There's more...
Developing MC control with epsilon-greedy policy
How to do it...
How it works...
Performing off-policy Monte Carlo control
How to do it...
How it works...
There's more...
See also
Developing MC control with weighted importance sampling
How to do it...
How it works...
There's more...
See also
Temporal Difference and Q-Learning
Setting up the Cliff Walking environment playground
Getting ready
How to do it...
How it works...
Developing the Q-learning algorithm
How to do it...
How it works...
There's more...
Setting up the Windy Gridworld environment playground
How to do it...
How it works...
Developing the SARSA algorithm
How to do it...
How it works...
There's more...
Solving the Taxi problem with Q-learning
Getting ready
How to do it...
How it works...
Solving the Taxi problem with SARSA
How to do it...
How it works...
There's more...
Developing the Double Q-learning algorithm
How to do it...
How it works...
See also
Solving Multi-armed Bandit Problems
Creating a multi-armed bandit environment
How to do it...
How it works...
Solving multi-armed bandit problems with the epsilon-greedy policy
How to do it...
How it works...
There's more...
Solving multi-armed bandit problems with the softmax exploration
How to do it...
How it works...
Solving multi-armed bandit problems with the upper confidence bound algorithm
How to do it...
How it works...
There's more...
See also
Solving internet advertising problems with a multi-armed bandit
How to do it...
How it works...
Solving multi-armed bandit problems with the Thompson sampling algorithm
How to do it...
How it works...
See also
Solving internet advertising problems with contextual bandits
How to do it...
How it works...
Scaling Up Learning with Function Approximation
Setting up the Mountain Car environment playground
Getting ready
How to do it...
How it works...
Estimating Q-functions with gradient descent approximation
How to do it...
How it works...
See also
Developing Q-learning with linear function approximation
How to do it...
How it works...
Developing SARSA with linear function approximation
How to do it...
How it works...
Incorporating batching using experience replay
How to do it...
How it works...
Developing Q-learning with neural network function approximation
How to do it...
How it works...
See also
Solving the CartPole problem with function approximation
How to do it...
How it works...
Deep Q-Networks in Action
Developing deep Q-networks
How to do it...
How it works...
See also
Improving DQNs with experience replay
How to do it...
How it works...
Developing double deep Q-Networks
How to do it...
How it works...
Tuning double DQN hyperparameters for CartPole
How to do it...
How it works...
Developing Dueling deep Q-Networks
How to do it...
How it works...
Applying Deep Q-Networks to Atari games
How to do it...
How it works...
Using convolutional neural networks for Atari games
How to do it...
How it works...
See also
Implementing Policy Gradients and Policy Optimization
Implementing the REINFORCE algorithm
How to do it...
How it works...
See also
Developing the REINFORCE algorithm with baseline
How to do it...
How it works...
Implementing the actor-critic algorithm
How to do it...
How it works...
Solving Cliff Walking with the actor-critic algorithm
How to do it...
How it works...
Setting up the continuous Mountain Car environment
How to do it...
How it works...
Solving the continuous Mountain Car environment with the advantage actor-critic network
How to do it...
How it works...
There's more...
See also
Playing CartPole through the cross-entropy method
How to do it...
How it works...
Capstone Project – Playing Flappy Bird with DQN
Setting up the game environment
Getting ready
How to do it...
How it works...
Building a Deep Q-Network to play Flappy Bird
How to do it...
How it works...
Training and tuning the network
How to do it...
How it works...
Deploying the model and playing the game
How to do it...
How it works...
Other Books You May Enjoy
Leave a review - let other readers know what you think
The surge in interest in reinforcement learning is due to the fact that it revolutionizes automation by learning the optimal actions to take in an environment in order to maximize the notion of cumulative reward.
PyTorch 1.x Reinforcement Learning Cookbook introduces you to important reinforcement learning concepts and implementations of algorithms in PyTorch. Each chapter of the book walks you through a different type of reinforcement learning method and its industry-adopted applications. With the help of recipes that contain real-world examples, you will find it intriguing to enhance your knowledge and proficiency of reinforcement learning techniques in areas such as dynamic programming, Monte Carlo methods, temporal difference and Q-learning, multi-armed bandit, function approximation, deep Q-Networks, and policy gradients—they are no more obscure than you thought. Interesting and easy-to-follow examples, such as Atari games, Blackjack, Gridworld environments, internet advertising, Mountain Car, and Flappy Bird, will keep you interested until you reach your goal.
By the end of this book, you will have mastered the implementation of popular reinforcement learning algorithms and learned the best practices of applying reinforcement learning techniques to solve other real-world problems.
Machine learning engineers, data scientists, and AI researchers looking for quick solutions to different problems in reinforcement learning will find this book useful. Prior exposure to machine learning concepts is required, while previous experience with PyTorch will be a bonus.
Chapter 1, Getting Started with Reinforcement Learning and PyTorch, is the starting point for readers who are looking forward to beginning this book's step-by-step guide to reinforcement learning with PyTorch. We will set up the working environment and OpenAI Gym and get familiar with reinforcement learning environments using the Atari and CartPole playgrounds. The chapter will also cover the implementation of several basic reinforcement learning algorithms, including random search, hill-climbing, and policy gradient. At the end, readers will also have a chance to review the essentials of PyTorch and get ready for the upcoming learning examples and projects.
Chapter 2, Markov Decision Process and Dynamic Programming, starts with the creation of a Markov chain and a Markov Decision Process, which is the core of most reinforcement learning algorithms. It will then move on to two approaches to solve a Markov Decision Process (MDP), value iteration and policy iteration. We will get more familiar with MDP and the Bellman equation by practicing policy evaluation. We will also demonstrate how to solve the interesting coin flipping gamble problem step by step. At the end, we will learn how to perform dynamic programming to scale up the learning.
Chapter 3, Monte Carlo Methods for Making Numerical Estimations, is focused on Monte Carlo methods. We will start by estimating the value of pi with Monte Carlo. Moving on, we will learn how to use the Monte Carlo method to predict state values and state-action values. We will demonstrate training an agent to win at Blackjack using Monte Carlo. Also, we will explore on-policy, first-visit Monte Carlo control and off-policy Monte Carlo control by developing various algorithms. Monte Carlo Control with an epsilon-greedy policy and weighted importance sampling will also be covered.
Chapter 4, Temporal Difference and Q-Learning, starts by setting up the CliffWalking and Windy Gridworld environment playground, which will be used in temporal difference and Q-Learning. Through our step-by-step guide, readers will explore Temporal Difference for prediction, and will gain practical experience with Q-Learning for off-policy control, and SARSA for on-policy control. We will also work on an interesting project, the taxi problem, and demonstrate how to solve it using the Q-Learning and SARSA algorithms. Finally, we will cover the Double Q-learning algorithm as a bonus section.
Chapter 5, Solving Multi-Armed Bandit Problems, covers the multi-armed bandit algorithm, which is probably one of the most popular algorithms in reinforcement learning. This will start with the creation of a multi-armed bandit problem. We will see how to solve the multi-armed bandit problem using four strategies, these being the epsilon-greedy policy, softmax exploration, the upper confidence bound algorithm, and the Thompson sampling algorithm. We will also work on a billion-dollar problem, online advertising, and demonstrate how to solve it using the multi-armed bandit algorithm. Finally, we will develop a more complex algorithm, the contextual bandit algorithm, and use it to optimize display advertising.
Chapter 6, Scaling Up Learning with Function Approximation, is focused on function approximation and will start with setting up the Mountain Car environment playground. Through our step-by-step guide, we will cover the motivation for function approximation over Table Lookup, and gain experience in incorporating function approximation into existing algorithms such as Q-Learning and SARSA. We will also cover an advanced technique, batching using experience replay. Finally, we will cover how to solve the CartPole problem using what we have learned in the chapter as a whole.
Chapter 7, Deep Q-Networks in Action, covers Deep Q-Learning, or Deep Q Network (DQN), which is considered the most modern reinforcement learning technique. We will develop a DQN model step by step and understand the importance of Experience Replay and a target network in making Deep Q-Learning work in practice. To help readers solve Atari games, we will demonstrate how to incorporate convolutional neural networks into DQNs. We will also cover two DQN variants, Double DQNs and Dueling DQNs. We will cover how to fine-tune a Q-Learning algorithm using Double DQNs as an example.
Chapter 8, Implementing Policy Gradients and Policy Optimization, focuses on policy gradients and optimization and starts by implementing the REINFORCE algorithm. We will then develop the REINFORCE algorithm with the baseline for CliffWalking. We will also implement the actor-critic algorithm and apply it to solve the CliffWalking problem. To scale up the deterministic policy gradient algorithm, we apply tricks from DQN and develop the Deep Deterministic Policy Gradients. As a bit of fun, we train an agent based on the cross-entropy method to play the CartPole game. Finally, we will talk about how to scale up policy gradient methods using the asynchronous actor-critic method and neural networks.
Chapter 9, Capstone Project – Playing Flappy Bird with DQN, takes us through a capstone project – playing Flappy Bird using reinforcement learning. We will apply what we have learned throughout this book to build an intelligent bot. We will focus on building a DQN, fine-tuning model parameters, and deploying the model. Let's see how long the bird can fly in the air.
Data scientists, machine learning engineers, and AI researchers looking for quick solutions to different problems in reinforcement learning will find this book useful. Prior exposure to machine learning concepts is required, while previous experience with PyTorch is not required but will be a bonus.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
Support
tab.
Click on
Code Downloads
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/PyTorch-1.x-Reinforcement-Learning-Cookbook. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781838551964_ColorImages.pdf.
In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There's more..., and See also).
To give clear instructions on how to complete a recipe, use these sections as follows:
This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.
This section contains the steps required to follow the recipe.
This section usually consists of a detailed explanation of what happened in the previous section.
This section consists of additional information about the recipe in order to make you more knowledgeable about the recipe.
This section provides helpful links to other useful information for the recipe.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
We kick off our journey of practical reinforcement learning and PyTorch with the basic, yet important, reinforcement learning algorithms, including random search, hill climbing, and policy gradient. We will start by setting up the working environment and OpenAI Gym, and you will become familiar with reinforcement learning environments through the Atari and CartPole playgrounds. We will also demonstrate how to develop algorithms to solve the CartPole problem step by step. Also, we will review the essentials of PyTorch and prepare for the upcoming learning examples and projects.
This chapter contains the following recipes:
Setting up the working environment
Installing OpenAI Gym
Simulating Atari environments
Simulating the CartPole environment
Reviewing the fundamentals of PyTorch
Implementing and evaluating a random search policy
Developing the hill-climbing algorithm
Developing a policy gradient algorithm
Let's get started with setting up the working environment, including the correct versions of Python and Anaconda, and PyTorch as the main framework that is used throughout the book.
Python is the language we use to implement all reinforcement learning algorithms and techniques throughout the book. In this book, we will be using Python 3, or more specifically, 3.6 or above. If you are a Python 2 user, now is the best time for you to switch to Python 3, as Python 2 will no longer be supported after 2020. The transition is very smooth, though, so don't panic.
Anaconda is an open source Python distribution (www.anaconda.com/distribution/) for data science and machine learning. We will be using Anaconda's package manager, conda, to install Python packages, along with pip.
PyTorch (https://pytorch.org/), primarily developed by the Facebook AI Research (FAIR) Group, is a trendy machine learning library based on Torch (http://torch.ch/). Tensors in PyTorch replace NumPy's ndarrays, which provides more flexibility and compatibility with GPUs. Because of the powerful computational graphs and the simple and friendly interface, the PyTorch community is expanding on a daily basis, and it has seen heavy adoption by more and more tech giants.
Let's see how to properly set up all of these components.
We have just created a tensor of size 3 x 4 in PyTorch. It is an empty matrix. By saying empty, this doesn't mean all elements are of the value Null. Instead, they are a bunch of meaningless floats that are considered placeholders. Users are required to set all the values later. This is very similar to NumPy's empty array.
Some of you may question the necessity of installing Anaconda and using conda to manage packages since it is easy to install packages with pip. In fact, conda is a better packaging tool than pip. We mainly use conda for the following four reasons:
It handles library dependencies nicely
: Installing a package with
conda
will automatically download all of its dependencies. However, doing so with
pip
will lead to a warning, and installation will be aborted.
It solves conflicts of packages gracefully
: If installing a package requires another package of a specific version (let's say 2.3 or after, for example),
conda
will update the version of the other package automatically.
It creates a virtual environment easily
: A virtual environment is a self-contained package directory tree. Different applications or projects can use different virtual environments. All virtual environments are isolated from each other. It is recommended to use virtual environments so that whatever we do for one application doesn't affect our system environment or any other environment.
It is also compatible with pip
: We can still use
pip
in
conda
with the following command:
conda install pip
If you are interested in learning more about conda, feel free to check out the following resources:
Conda user guide
:
https://conda.io/projects/conda/en/latest/user-guide/index.html
Creating and managing virtual environments with conda
:
https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
If you want to get more familiar with PyTorch, you can go through the Getting Started section in the official tutorial at https://pytorch.org/tutorials/#getting-started. We recommend you at least finish the following:
What is PyTorch
:
https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py
Learning PyTorch with examples
:
https://pytorch.org/tutorials/beginner/pytorch_with_examples.html
After setting up the working environment, we can now install OpenAI Gym. You can't work on reinforcement learning without using OpenAI Gym, which gives you a variety of environments in which to develop your learning algorithms.
OpenAI (https://openai.com/) is a non-profit research company that is focused on building safe artificial general intelligence (AGI) and ensuring that it benefits humans. OpenAI Gym is a powerful and open source toolkit for developing and comparing reinforcement learning algorithms. It provides an interface to varieties of reinforcement learning simulations and tasks, from walking to moon landing, from car racing to playing Atari games. See https://gym.openai.com/envs/ for the full list of environments.We can write agents to interact with OpenAI Gym environments using any numerical computation library, such as PyTorch, TensorFlow, or Keras.
There are two ways to install Gym. The first one is to use pip, as follows:
pip install gym
For conda users, remember to install pip first in conda using the following command before installing Gym using pip:
conda install pip
This is because Gym is not officially available in conda as of early 2019.
Another approach is to build from source:
First, clone the package directly from its Git repository:
git clone https://github.com/openai/gym
Go to the downloaded folder and install Gym from there:
cd gym
pip install -e .
And now you are good to go. Feel free to play around with gym.
You can also check the available
gym
environment by typing the following lines of code:
>>> from gym import envs
>>> print(envs.registry.all())
dict_values([EnvSpec(Copy-v0), EnvSpec(RepeatCopy-v0), EnvSpec(ReversedAddition-v0), EnvSpec(ReversedAddition3-v0), EnvSpec(DuplicatedInput-v0), EnvSpec(Reverse-v0), EnvSpec(CartPole-v0), EnvSpec(CartPole-v1), EnvSpec(MountainCar-v0), EnvSpec(MountainCarContinuous-v0), EnvSpec(Pendulum-v0), EnvSpec(Acrobot-v1), EnvSpec(LunarLander-v2), EnvSpec(LunarLanderContinuous-v2), EnvSpec(BipedalWalker-v2), EnvSpec(BipedalWalkerHardcore-v2), EnvSpec(CarRacing-v0), EnvSpec(Blackjack-v0)
...
...
This will give you a long list of environments if you installed Gym properly. We will play around with some of them in the next recipe, Simulating Atari environments.
Compared to the simple pip approach for installing Gym, the second approach provides more flexibility if you want to add new environments and modify Gym itself.
You may wonder why we need to test reinforcement learning algorithms on Gym's environments since the actual environments we work in can be a lot different. You will recall that reinforcement learning doesn't make many assumptions about the environment, but it gets to know more about the environment by interacting with it. Also, when comparing the performance of different algorithms, we need to apply them to standardized environments. Gym is a perfect benchmark, covering many versatile and easy-to-use environments. This is similar to the datasets that we often use as benchmarks in supervised and unsupervised learning, such as MNIST, Imagenet, MovieLens, and Thomson Reuters News.
Take a look at the official Gym documentation at https://gym.openai.com/docs/.
To get started with Gym, let's play some Atari games with it.
The Atari environments (https://gym.openai.com/envs/#atari) are a variety of Atari 2600 video games, such as Alien, AirRaid, Pong, and Space Race. If you have ever played Atari games, this recipe should be fun for you, as you will play an Atari game, Space Invaders. However, an agent will act on your behalf.
Using Gym, we can easily create an environment instance by calling the make() method with the name of the environment as the parameter.
As you may have noticed, the actions that the agent performs are randomly chosen using the sample() method.
Note that, normally, we would have a more sophisticated agent guided by reinforcement learning algorithms. Here, we just demonstrated how to simulate an environment, and how an agent takes actions regardless of the outcome.
Run this a few times and see what we get:
>>> env.action_space.sample()
0
>>> env.action_space.sample()
3
>>> env.action_space.sample()
0
>>> env.action_space.sample()
4
>>> env.action_space.sample()
2
>>> env.action_space.sample()
1
>>> env.action_space.sample()
4
>>> env.action_space.sample()
5
>>> env.action_space.sample()
1
>>> env.action_space.sample()
0
There are six possible actions in total. We can also see this by running the following command:
>>> env.action_space
Discrete(6)
Actions from 0 to 5 stand for No Operation, Fire, Up, Right, Left, and Down, respectively, which are all the moves the spaceship in the game can do.
The step() method will let the agent take the action that is specified as its parameter. The render() method will update the display window based on the latest observation of the environment.
The observation of the environment, new_state, is represented by a 210 x 160 x 3 matrix, as follows:
>>> print(new_state.shape)
(210, 160, 3)
This means that each frame of the display screen is an RGB image of size 210 x 160.
If you are looking to simulate an environment but are not sure of the name you should use in the make() method, you can find it in the table of environments at https://github.com/openai/gym/wiki/Table-of-environments. Besides the name used to call an environment, the table also shows the size of the observation matrix and the number of possible actions. Have fun playing around with the environments.
In this recipe, we will work on simulating one more environment in order to get more familiar with Gym. The CartPole environment is a classic one in reinforcement learning research.
CartPole is a traditional reinforcement learning task in which a pole is placed upright on top of a cart. The agent moves the cart either to the left or to the right by 1 unit in a timestep. The goal is to balance the pole and prevent it from falling over. The pole is considered to have fallen if it is more than 12 degrees from the vertical, or the cart moves 2.4 units away from the origin. An episode terminates when any of the following occurs:
The pole falls over
The number of timesteps reaches 200
