PyTorch 1.x Reinforcement Learning Cookbook - Yuxi (Hayden) Liu - E-Book

PyTorch 1.x Reinforcement Learning Cookbook E-Book

Yuxi (Hayden) Liu

0,0
36,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Implement reinforcement learning techniques and algorithms with the help of real-world examples and recipes




Key Features



  • Use PyTorch 1.x to design and build self-learning artificial intelligence (AI) models


  • Implement RL algorithms to solve control and optimization challenges faced by data scientists today


  • Apply modern RL libraries to simulate a controlled environment for your projects



Book Description



Reinforcement learning (RL) is a branch of machine learning that has gained popularity in recent times. It allows you to train AI models that learn from their own actions and optimize their behavior. PyTorch has also emerged as the preferred tool for training RL models because of its efficiency and ease of use.






With this book, you'll explore the important RL concepts and the implementation of algorithms in PyTorch 1.x. The recipes in the book, along with real-world examples, will help you master various RL techniques, such as dynamic programming, Monte Carlo simulations, temporal difference, and Q-learning. You'll also gain insights into industry-specific applications of these techniques. Later chapters will guide you through solving problems such as the multi-armed bandit problem and the cartpole problem using the multi-armed bandit algorithm and function approximation. You'll also learn how to use Deep Q-Networks to complete Atari games, along with how to effectively implement policy gradients. Finally, you'll discover how RL techniques are applied to Blackjack, Gridworld environments, internet advertising, and the Flappy Bird game.






By the end of this book, you'll have developed the skills you need to implement popular RL algorithms and use RL techniques to solve real-world problems.




What you will learn



  • Use Q-learning and the state–action–reward–state–action (SARSA) algorithm to solve various Gridworld problems


  • Develop a multi-armed bandit algorithm to optimize display advertising


  • Scale up learning and control processes using Deep Q-Networks


  • Simulate Markov Decision Processes, OpenAI Gym environments, and other common control problems


  • Select and build RL models, evaluate their performance, and optimize and deploy them


  • Use policy gradient methods to solve continuous RL problems



Who this book is for



Machine learning engineers, data scientists and AI researchers looking for quick solutions to different reinforcement learning problems will find this book useful. Although prior knowledge of machine learning concepts is required, experience with PyTorch will be useful but not necessary.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 343

Veröffentlichungsjahr: 2019

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



PyTorch 1.x Reinforcement Learning Cookbook

 

 

Over 60 recipes to design, develop, and deploy self-learning AI models using Python

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Yuxi (Hayden) Liu

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

PyTorch 1.x Reinforcement Learning Cookbook

Copyright © 2019 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

 

Commissioning Editor: Amey VarangaonkarAcquisition Editor:Devika BattikeContent Development Editor:Athikho Sapuni RishanaSenior Editor: Ayaan HodaTechnical Editor: Utkarsha S. KadamCopy Editor: Safis EditingProject Coordinator:Kirti PisatProofreader: Safis EditingIndexer:Rekha NairProduction Designer:Shraddha Falebhai

First published: October 2019

Production reference: 1311019

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-83855-196-4

www.packt.com

 

Packt.com

Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Fully searchable for easy access to vital information

Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. 

Contributors

About the author

Yuxi (Hayden) Liu is an experienced data scientist who's focused on developing machine learning and deep learning models and systems. He has worked in a variety of data-driven domains and has applied his expertise in reinforcement learning to computational problems. He is an education enthusiast and is the author of a series of machine learning books. His first book, Python Machine Learning By Example, was a #1 bestseller on Amazon India in 2017 and 2018. His other books include R Deep Learning Projects and Hands-On Deep Learning Architectures with Python, published by Packt. He has also published five first-authored IEEE transaction and conference papers during his master's research at the University of Toronto.

 

About the reviewers

Greg Walters has been involved with computers and computer programming since 1972. Currently, he is extremely well versed in Visual Basic, Visual Basic .NET, Python, and SQL using MySQL, SQLite, Microsoft SQL Server, Oracle, C++, Delphi, Modula-2, Pascal, C, 80x86 Assembler, COBOL, and Fortran. He is a programming trainer and has trained numerous people in many pieces of computer software, including MySQL, Open Database Connectivity, Quattro Pro, Corel Draw!, Paradox, Microsoft Word, Excel, DOS, Windows 3.11, Windows for Workgroups, Windows 95, Windows NT, Windows 2000, Windows XP, and Linux. He is currently retired and, in his spare time, is a musician and avid cook, but he is also open to working as a freelancer on various projects.

 

Robert Moni is a PhD student at Budapest University of Technology and Economics (BME) and is also a Deep Learning Expert at Continental's Deep Learning Competence Center in Budapest. He also manages a cooperation project established between and BME with the goal of supporting students in conducting research in the field of deep learning and autonomous driving. His research topic is deep reinforcement learning in complex environments, and his goal is to apply this technology to self-driving vehicles.

 

 

 

 

 

 

 

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

PyTorch 1.x Reinforcement Learning Cookbook

About Packt

Why subscribe?

Contributors

About the author

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Get in touch

Reviews

Getting Started with Reinforcement Learning and PyTorch

Setting up the working environment

How to do it...

How it works...

There's more... 

See also

Installing OpenAI Gym

How to do it...

How it works...

There's more...

See also

Simulating Atari environments

How to do it...

How it works...

There's more...

See also

Simulating the CartPole environment

How to do it...

How it works...

There's more...

Reviewing the fundamentals of PyTorch

How to do it...

There's more...

See also

Implementing and evaluating a random search policy

How to do it...

How it works...

There's more...

Developing the hill-climbing algorithm

How to do it...

How it works...

There's more...

See also

Developing a policy gradient algorithm

How to do it...

How it works...

There's more...

See also

Markov Decision Processes and Dynamic Programming

Technical requirements

Creating a Markov chain

How to do it...

How it works...

There's more...

See also

Creating an MDP

How to do it...

How it works...

There's more...

See also

Performing policy evaluation

How to do it...

How it works...

There's more...

Simulating the FrozenLake environment

Getting ready

How to do it...

How it works...

There's more...

Solving an MDP with a value iteration algorithm

How to do it...

How it works...

There's more...

Solving an MDP with a policy iteration algorithm

How to do it...

How it works...

There's more...

See also

Solving the coin-flipping gamble problem

How to do it...

How it works...

There's more...

Monte Carlo Methods for Making Numerical Estimations

Calculating Pi using the Monte Carlo method

How to do it...

How it works...

There's more...

See also

Performing Monte Carlo policy evaluation

How to do it...

How it works...

There's more...

Playing Blackjack with Monte Carlo prediction

How to do it...

How it works...

There's more...

See also

Performing on-policy Monte Carlo control

How to do it...

How it works...

There's more...

Developing MC control with epsilon-greedy policy

How to do it...

How it works...

Performing off-policy Monte Carlo control

How to do it...

How it works...

There's more...

See also

Developing MC control with weighted importance sampling

How to do it...

How it works...

There's more...

See also

Temporal Difference and Q-Learning

Setting up the Cliff Walking environment playground

Getting ready

How to do it...

How it works...

Developing the Q-learning algorithm

How to do it...

How it works...

There's more...

Setting up the Windy Gridworld environment playground

How to do it...

How it works...

Developing the SARSA algorithm

How to do it...

How it works...

There's more...

Solving the Taxi problem with Q-learning

Getting ready

How to do it...

How it works...

Solving the Taxi problem with SARSA

How to do it...

How it works...

There's more...

Developing the Double Q-learning algorithm

How to do it...

How it works...

See also

Solving Multi-armed Bandit Problems

Creating a multi-armed bandit environment

How to do it...

How it works...

Solving multi-armed bandit problems with the epsilon-greedy policy

How to do it...

How it works...

There's more...

Solving multi-armed bandit problems with the softmax exploration

How to do it...

How it works...

Solving multi-armed bandit problems with the upper confidence bound algorithm

How to do it...

How it works...

There's more...

See also

Solving internet advertising problems with a multi-armed bandit

How to do it...

How it works...

Solving multi-armed bandit problems with the Thompson sampling algorithm

How to do it...

How it works...

See also

Solving internet advertising problems with contextual bandits

How to do it...

How it works...

Scaling Up Learning with Function Approximation

Setting up the Mountain Car environment playground

Getting ready

How to do it...

How it works...

Estimating Q-functions with gradient descent approximation

How to do it...

How it works...

See also

Developing Q-learning with linear function approximation

How to do it...

How it works...

Developing SARSA with linear function approximation

How to do it...

How it works...

Incorporating batching using experience replay

How to do it...

How it works...

Developing Q-learning with neural network function approximation

How to do it...

How it works...

See also

Solving the CartPole problem with function approximation

How to do it...

How it works...

Deep Q-Networks in Action

Developing deep Q-networks

How to do it...

How it works...

See also

Improving DQNs with experience replay

How to do it...

How it works...

Developing double deep Q-Networks

How to do it...

How it works...

Tuning double DQN hyperparameters for CartPole

How to do it...

How it works...

Developing Dueling deep Q-Networks

How to do it...

How it works...

Applying Deep Q-Networks to Atari games

How to do it...

How it works...

Using convolutional neural networks for Atari games

How to do it...

How it works...

See also

Implementing Policy Gradients and Policy Optimization

Implementing the REINFORCE algorithm

How to do it...

How it works...

See also

Developing the REINFORCE algorithm with baseline

How to do it...

How it works...

Implementing the actor-critic algorithm

How to do it...

How it works...

Solving Cliff Walking with the actor-critic algorithm

How to do it...

How it works...

Setting up the continuous Mountain Car environment

How to do it...

How it works...

Solving the continuous Mountain Car environment with the advantage actor-critic network

How to do it...

How it works...

There's more...

See also

Playing CartPole through the cross-entropy method

How to do it...

How it works...

Capstone Project – Playing Flappy Bird with DQN

Setting up the game environment

Getting ready

How to do it...

How it works...

Building a Deep Q-Network to play Flappy Bird

How to do it...

How it works...

Training and tuning the network

How to do it...

How it works...

Deploying the model and playing the game

How to do it...

How it works...

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

The surge in interest in reinforcement learning is due to the fact that it revolutionizes automation by learning the optimal actions to take in an environment in order to maximize the notion of cumulative reward.

PyTorch 1.x Reinforcement Learning Cookbook introduces you to important reinforcement learning concepts and implementations of algorithms in PyTorch. Each chapter of the book walks you through a different type of reinforcement learning method and its industry-adopted applications. With the help of recipes that contain real-world examples, you will find it intriguing to enhance your knowledge and proficiency of reinforcement learning techniques in areas such as dynamic programming, Monte Carlo methods, temporal difference and Q-learning, multi-armed bandit, function approximation, deep Q-Networks, and policy gradients—they are no more obscure than you thought. Interesting and easy-to-follow examples, such as Atari games, Blackjack, Gridworld environments, internet advertising, Mountain Car, and Flappy Bird, will keep you interested until you reach your goal.

By the end of this book, you will have mastered the implementation of popular reinforcement learning algorithms and learned the best practices of applying reinforcement learning techniques to solve other real-world problems.

Who this book is for

Machine learning engineers, data scientists, and AI researchers looking for quick solutions to different problems in reinforcement learning will find this book useful. Prior exposure to machine learning concepts is required, while previous experience with PyTorch will be a bonus.

What this book covers

Chapter 1, Getting Started with Reinforcement Learning and PyTorch, is the starting point for readers who are looking forward to beginning this book's step-by-step guide to reinforcement learning with PyTorch. We will set up the working environment and OpenAI Gym and get familiar with reinforcement learning environments using the Atari and CartPole playgrounds. The chapter will also cover the implementation of several basic reinforcement learning algorithms, including random search, hill-climbing, and policy gradient. At the end, readers will also have a chance to review the essentials of PyTorch and get ready for the upcoming learning examples and projects.

Chapter 2, Markov Decision Process and Dynamic Programming, starts with the creation of a Markov chain and a Markov Decision Process, which is the core of most reinforcement learning algorithms. It will then move on to two approaches to solve a Markov Decision Process (MDP), value iteration and policy iteration. We will get more familiar with MDP and the Bellman equation by practicing policy evaluation. We will also demonstrate how to solve the interesting coin flipping gamble problem step by step. At the end, we will learn how to perform dynamic programming to scale up the learning. 

Chapter 3, Monte Carlo Methods for Making Numerical Estimations, is focused on Monte Carlo methods. We will start by estimating the value of pi with Monte Carlo. Moving on, we will learn how to use the Monte Carlo method to predict state values and state-action values. We will demonstrate training an agent to win at Blackjack using Monte Carlo. Also, we will explore on-policy, first-visit Monte Carlo control and off-policy Monte Carlo control by developing various algorithms. Monte Carlo Control with an epsilon-greedy policy and weighted importance sampling will also be covered.  

Chapter 4, Temporal Difference and Q-Learning, starts by setting up the CliffWalking and Windy Gridworld environment playground, which will be used in temporal difference and Q-Learning. Through our step-by-step guide, readers will explore Temporal Difference for prediction, and will gain practical experience with Q-Learning for off-policy control, and SARSA for on-policy control. We will also work on an interesting project, the taxi problem, and demonstrate how to solve it using the Q-Learning and SARSA algorithms. Finally, we will cover the Double Q-learning algorithm as a bonus section.

Chapter 5, Solving Multi-Armed Bandit Problems, covers the multi-armed bandit algorithm, which is probably one of the most popular algorithms in reinforcement learning. This will start with the creation of a multi-armed bandit problem. We will see how to solve the multi-armed bandit problem using four strategies, these being the epsilon-greedy policy, softmax exploration, the upper confidence bound algorithm, and the Thompson sampling algorithm. We will also work on a billion-dollar problem, online advertising, and demonstrate how to solve it using the multi-armed bandit algorithm. Finally, we will develop a more complex algorithm, the contextual bandit algorithm, and use it to optimize display advertising.

Chapter 6, Scaling Up Learning with Function Approximation, is focused on function approximation and will start with setting up the Mountain Car environment playground. Through our step-by-step guide, we will cover the motivation for function approximation over Table Lookup, and gain experience in incorporating function approximation into existing algorithms such as Q-Learning and SARSA. We will also cover an advanced technique, batching using experience replay. Finally, we will cover how to solve the CartPole problem using what we have learned in the chapter as a whole.

Chapter 7, Deep Q-Networks in Action, covers Deep Q-Learning, or Deep Q Network (DQN), which is considered the most modern reinforcement learning technique. We will develop a DQN model step by step and understand the importance of Experience Replay and a target network in making Deep Q-Learning work in practice. To help readers solve Atari games, we will demonstrate how to incorporate convolutional neural networks into DQNs. We will also cover two DQN variants, Double DQNs and Dueling DQNs. We will cover how to fine-tune a Q-Learning algorithm using Double DQNs as an example.

Chapter 8, Implementing Policy Gradients and Policy Optimization, focuses on policy gradients and optimization and starts by implementing the REINFORCE algorithm. We will then develop the REINFORCE algorithm with the baseline for CliffWalking. We will also implement the actor-critic algorithm and apply it to solve the CliffWalking problem. To scale up the deterministic policy gradient algorithm, we apply tricks from DQN and develop the Deep Deterministic Policy Gradients. As a bit of fun, we train an agent based on the cross-entropy method to play the CartPole game. Finally, we will talk about how to scale up policy gradient methods using the asynchronous actor-critic method and neural networks.

Chapter 9, Capstone Project – Playing Flappy Bird with DQN, takes us through a capstone project – playing Flappy Bird using reinforcement learning. We will apply what we have learned throughout this book to build an intelligent bot. We will focus on building a DQN, fine-tuning model parameters, and deploying the model. Let's see how long the bird can fly in the air. 

To get the most out of this book

Data scientists, machine learning engineers, and AI researchers looking for quick solutions to different problems in reinforcement learning will find this book useful. Prior exposure to machine learning concepts is required, while previous experience with PyTorch is not required but will be a bonus.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packt.com

.

Select the

Support

tab.

Click on

Code Downloads

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/PyTorch-1.x-Reinforcement-Learning-Cookbook. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781838551964_ColorImages.pdf.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There's more..., and See also).

To give clear instructions on how to complete a recipe, use these sections as follows:

Getting ready

This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make you more knowledgeable about the recipe.

See also

This section provides helpful links to other useful information for the recipe.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Getting Started with Reinforcement Learning and PyTorch

We kick off our journey of practical reinforcement learning and PyTorch with the basic, yet important, reinforcement learning algorithms, including random search, hill climbing, and policy gradient. We will start by setting up the working environment and OpenAI Gym, and you will become familiar with reinforcement learning environments through the Atari and CartPole playgrounds. We will also demonstrate how to develop algorithms to solve the CartPole problem step by step. Also, we will review the essentials of PyTorch and prepare for the upcoming learning examples and projects.

This chapter contains the following recipes:

Setting up the working environment

Installing OpenAI Gym

Simulating Atari environments

Simulating the CartPole environment

Reviewing the fundamentals of PyTorch

Implementing and evaluating a random search policy

Developing the hill-climbing algorithm

Developing a policy gradient algorithm

Setting up the working environment

Let's get started with setting up the working environment, including the correct versions of Python and Anaconda, and PyTorch as the main framework that is used throughout the book.

Python is the language we use to implement all reinforcement learning algorithms and techniques throughout the book. In this book, we will be using Python 3, or more specifically, 3.6 or above. If you are a Python 2 user, now is the best time for you to switch to Python 3, as Python 2 will no longer be supported after 2020. The transition is very smooth, though, so don't panic.

Anaconda is an open source Python distribution (www.anaconda.com/distribution/) for data science and machine learning. We will be using Anaconda's package manager, conda, to install Python packages, along with pip.

PyTorch (https://pytorch.org/), primarily developed by the Facebook AI Research (FAIR) Group, is a trendy machine learning library based on Torch (http://torch.ch/). Tensors in PyTorch replace NumPy's ndarrays, which provides more flexibility and compatibility with GPUs. Because of the powerful computational graphs and the simple and friendly interface, the PyTorch community is expanding on a daily basis, and it has seen heavy adoption by more and more tech giants.

Let's see how to properly set up all of these components.

How it works...

We have just created a tensor of size 3 x 4 in PyTorch. It is an empty matrix. By saying empty, this doesn't mean all elements are of the value Null. Instead, they are a bunch of meaningless floats that are considered placeholders. Users are required to set all the values later. This is very similar to NumPy's empty array.

There's more... 

Some of you may question the necessity of installing Anaconda and using conda to manage packages since it is easy to install packages with pip. In fact, conda is a better packaging tool than pip. We mainly use conda for the following four reasons:

It handles library dependencies nicely

: Installing a package with

conda

will automatically download all of its dependencies. However, doing so with

pip

will lead to a warning, and installation will be aborted.

It solves conflicts of packages gracefully

: If installing a package requires another package of a specific version (let's say 2.3 or after, for example),

conda

will update the version of the other package automatically.

It creates a virtual environment easily

: A virtual environment is a self-contained package directory tree. Different applications or projects can use different virtual environments. All virtual environments are isolated from each other. It is recommended to use virtual environments so that whatever we do for one application doesn't affect our system environment or any other environment.

It is also compatible with pip

: We can still use

pip

in

conda

with the following command:

conda install pip

See also

If you are interested in learning more about conda, feel free to check out the following resources:

Conda user guide

https://conda.io/projects/conda/en/latest/user-guide/index.html

Creating and managing virtual environments with conda

https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html

If you want to get more familiar with PyTorch, you can go through the Getting Started section in the official tutorial at https://pytorch.org/tutorials/#getting-started. We recommend you at least finish the following:

What is PyTorch

:

https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py

Learning PyTorch with examples

:

https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

Installing OpenAI Gym

After setting up the working environment, we can now install OpenAI Gym. You can't work on reinforcement learning without using OpenAI Gym, which gives you a variety of environments in which to develop your learning algorithms.

OpenAI (https://openai.com/) is a non-profit research company that is focused on building safe artificial general intelligence (AGI) and ensuring that it benefits humans. OpenAI Gym is a powerful and open source toolkit for developing and comparing reinforcement learning algorithms. It provides an interface to varieties of reinforcement learning simulations and tasks, from walking to moon landing, from car racing to playing Atari games. See https://gym.openai.com/envs/ for the full list of environments.We can write agents to interact with OpenAI Gym environments using any numerical computation library, such as PyTorch, TensorFlow, or Keras.

How to do it...

There are two ways to install Gym. The first one is to use pip, as follows:

pip install gym

For conda users, remember to install pip first in conda using the following command before installing Gym using pip:

conda install pip

This is because Gym is not officially available in conda as of early 2019.

Another approach is to build from source:

First, clone the package directly from its Git repository:

git clone https://github.com/openai/gym

Go to the downloaded folder and install Gym from there:

cd gym

pip install -e .

And now you are good to go. Feel free to play around with gym.

You can also check the available

gym

environment by typing the following lines of code:

>>> from gym import envs

>>> print(envs.registry.all())

dict_values([EnvSpec(Copy-v0), EnvSpec(RepeatCopy-v0), EnvSpec(ReversedAddition-v0), EnvSpec(ReversedAddition3-v0), EnvSpec(DuplicatedInput-v0), EnvSpec(Reverse-v0), EnvSpec(CartPole-v0), EnvSpec(CartPole-v1), EnvSpec(MountainCar-v0), EnvSpec(MountainCarContinuous-v0), EnvSpec(Pendulum-v0), EnvSpec(Acrobot-v1), EnvSpec(LunarLander-v2), EnvSpec(LunarLanderContinuous-v2), EnvSpec(BipedalWalker-v2), EnvSpec(BipedalWalkerHardcore-v2), EnvSpec(CarRacing-v0), EnvSpec(Blackjack-v0)

...

...

This will give you a long list of environments if you installed Gym properly. We will play around with some of them in the next recipe, Simulating Atari environments.

How it works...

Compared to the simple pip approach for installing Gym, the second approach provides more flexibility if you want to add new environments and modify Gym itself.

There's more...

You may wonder why we need to test reinforcement learning algorithms on Gym's environments since the actual environments we work in can be a lot different. You will recall that reinforcement learning doesn't make many assumptions about the environment, but it gets to know more about the environment by interacting with it. Also, when comparing the performance of different algorithms, we need to apply them to standardized environments. Gym is a perfect benchmark, covering many versatile and easy-to-use environments. This is similar to the datasets that we often use as benchmarks in supervised and unsupervised learning, such as MNIST, Imagenet, MovieLens, and Thomson Reuters News.

See also

Take a look at the official Gym documentation at https://gym.openai.com/docs/.

Simulating Atari environments

To get started with Gym, let's play some Atari games with it.

The Atari environments (https://gym.openai.com/envs/#atari) are a variety of Atari 2600 video games, such as Alien, AirRaid, Pong, and Space Race. If you have ever played Atari games, this recipe should be fun for you, as you will play an Atari game, Space Invaders. However, an agent will act on your behalf.

How it works...

Using Gym, we can easily create an environment instance by calling the make() method with the name of the environment as the parameter.

As you may have noticed, the actions that the agent performs are randomly chosen using the sample() method.

Note that, normally, we would have a more sophisticated agent guided by reinforcement learning algorithms. Here, we just demonstrated how to simulate an environment, and how an agent takes actions regardless of the outcome.

Run this a few times and see what we get:

>>> env.action_space.sample()

0

>>> env.action_space.sample()

3

>>> env.action_space.sample()

0

>>> env.action_space.sample()

4

>>> env.action_space.sample()

2

>>> env.action_space.sample()

1

>>> env.action_space.sample()

4

>>> env.action_space.sample()

5

>>> env.action_space.sample()

1

>>> env.action_space.sample()

0

There are six possible actions in total. We can also see this by running the following command:

>>> env.action_space

Discrete(6)

Actions from 0 to 5 stand for No Operation, Fire, Up, Right, Left, and Down, respectively, which are all the moves the spaceship in the game can do.

The step() method will let the agent take the action that is specified as its parameter. The render() method will update the display window based on the latest observation of the environment.

The observation of the environment, new_state, is represented by a 210 x 160 x 3 matrix, as follows:

>>> print(new_state.shape)

(210, 160, 3)

This means that each frame of the display screen is an RGB image of size 210 x 160.

See also

If you are looking to simulate an environment but are not sure of the name you should use in the make() method, you can find it in the table of environments at https://github.com/openai/gym/wiki/Table-of-environments. Besides the name used to call an environment, the table also shows the size of the observation matrix and the number of possible actions. Have fun playing around with the environments.

Simulating the CartPole environment

In this recipe, we will work on simulating one more environment in order to get more familiar with Gym. The CartPole environment is a classic one in reinforcement learning research.  

CartPole is a traditional reinforcement learning task in which a pole is placed upright on top of a cart. The agent moves the cart either to the left or to the right by 1 unit in a timestep. The goal is to balance the pole and prevent it from falling over. The pole is considered to have fallen if it is more than 12 degrees from the vertical, or the cart moves 2.4 units away from the origin. An episode terminates when any of the following occurs:

The pole falls over

The number of timesteps reaches 200