E-Book
103,99 €

Multi-Agent Machine Learning E-Book

H. M. Schwartz

0,0

103,99 €

oder

Leseprobe lesen

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: John Wiley & Sons
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-player grid games—two player grid games, Q-learning, and Nash Q-learning. Chapter 5 discusses differential games, including multi player differential games, actor critique structure, adaptive fuzzy control and fuzzy interference systems, the evader pursuit game, and the defending a territory games. Chapter 6 discusses new ideas on learning within robotic swarms and the innovative idea of the evolution of personality traits.

• Framework for understanding a variety of methods and approaches in multi-agent machine learning.

• Discusses methods of reinforcement learning such as a number of forms of multi-agent Q-learning

• Applicable to research professors and graduate students studying electrical and computer engineering, computer science, and mechanical and aerospace engineering

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 318

Veröffentlichungsjahr: 2014

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Cover

Title

Preface

References

Chapter 1: A Brief Review of Supervised Learning

1.1 Least Squares Estimates

1.2 Recursive Least Squares

1.3 Least Mean Squares

1.4 Stochastic Approximation

References

Chapter 2: Single-Agent Reinforcement Learning

2.1 Introduction

2.2

-Armed Bandit Problem

2.3 The Learning Structure

2.4 The Value Function

2.5 The Optimal Value Functions

2.6 Markov Decision Processes

2.7 Learning Value Functions

2.8 Policy Iteration

2.9 Temporal Difference Learning

2.10 TD Learning of the State-Action Function

2.11 Q-Learning

2.12 Eligibility Traces

References

Chapter 3: Learning in Two-Player Matrix Games

3.1 Matrix Games

3.2 Nash Equilibria in Two-Player Matrix Games

3.3 Linear Programming in Two-Player Zero-Sum Matrix Games

3.4 The Learning Algorithms

3.5 Gradient Ascent Algorithm

3.6 WoLF-IGA Algorithm

3.7 Policy Hill Climbing (PHC)

3.8 WoLF-PHC Algorithm

3.9 Decentralized Learning in Matrix Games

3.10 Learning Automata

3.11 Linear Reward–Inaction Algorithm

3.12 Linear Reward–Penalty Algorithm

3.13 The Lagging Anchor Algorithm

3.14

Lagging Anchor Algorithm

References

Chapter 4: Learning in Multiplayer Stochastic Games

4.1 Introduction

4.2 Multiplayer Stochastic Games

4.3 Minimax-Q Algorithm

4.5 The Simplex Algorithm

4.6 The Lemke–Howson Algorithm

4.7 Nash-Q Implementation

4.8 Friend-or-Foe Q-Learning

4.9 Infinite Gradient Ascent

4.10 Policy Hill Climbing

4.11 WoLF-PHC Algorithm

4.12 Guarding a Territory Problem in a Grid World

4.13 Extension of

Lagging Anchor Algorithm to Stochastic Games

4.14 The Exponential Moving-Average Q-Learning (EMA Q-Learning) Algorithm

4.15 Simulation and Results Comparing EMA Q-Learning to Other Methods

References

Chapter 5: Differential Games

5.1 Introduction

5.2 A Brief Tutorial on Fuzzy Systems

5.3 Fuzzy Q-Learning

5.4 Fuzzy Actor–Critic Learning

5.5 Homicidal Chauffeur Differential Game

5.6 Fuzzy Controller Structure

5.7 Q(

)-Learning Fuzzy Inference System

5.9 Learning in the Evader–Pursuer Game with Two Cars

5.6 Fuzzy Controller Structure

5.10 Simulation of the Game of Two Cars

5.11 Differential Game of Guarding a Territory

5.12 Reward Shaping in the Differential Game of Guarding a Territory

5.13 Simulation Results

References

Chapter 6: Swarm Intelligence and the Evolution of Personality Traits

6.1 Introduction

6.2 The Evolution of Swarm Intelligence

6.3 Representation of the Environment

6.4 Swarm-Based Robotics in Terms of Personalities

6.5 Evolution of Personality Traits

6.6 Simulation Framework

6.7 A Zero-Sum Game Example

6.8 Implementation for Next Sections

6.9 Robots Leaving a Room

6.10 Tracking a Target

6.11 Conclusion

References

Index

End User License Agreement

List of Tables

Chapter 2: Single-Agent Reinforcement Learning

Table 2.1 Temporal difference Q-table learning result.

Table 2.2 Temporal difference Q-table learning result.

Chapter 3: Learning in Two-Player Matrix Games

Table 3.1 Examples of two-player matrix games.

Table 3.2 Comparison of learning algorithms in matrix games.

Table 3.3 Examples of two-player matrix games.

Chapter 4: Learning in Multiplayer Stochastic Games

Table 4.1 Action-value function

in Example 4.1.

Table 4.2 Minimax solution for the defender in the state

Table 4.3 Minimax solution for the defender in the state

. (a)

-values of the defender for the state

. (b) Linear constraints for the defender in the state

Table 4.4 States and strategies.

Table 4.5 Grid game 1: Nash

-values in state (0, 2).

Table 4.6 Grid game 1. Nash

-values in state (1,3).

Table 4.7 Comparison of multiagent reinforcement learning algorithms.

Chapter 5: Differential Games

Table 5.1 Tabular format.

Table 5.2 Pursuer's fuzzy decision table before learning.

Table 5.3 Evader's fuzzy decision table before learning.

Table 5.4 Capture time (s) for different numbers of learning episodes.

Table 5.5 The evader's fuzzy decision table after 1000 learning episodes.

Table 5.6 The pursuer's fuzzy decision table after 1000 learning episodes.

Table 5.7 Summary of the time of capture for different numbers of learning episodes in the game of two cars.

Table 5.8 The evader's fuzzy decision table and the output constant

after learning.

Table 5.9 The pursuer's fuzzy decision table and the output constant

after learning.

Chapter 6: Swarm Intelligence and the Evolution of Personality Traits

Table 6.1 Zero-sum game example.

Table 6.2 Optimal mixed strategies.

Table 6.3 Experimental results obtained for both players.

Table 6.4 Modeling of a game between two robots trying to leave a room.

Table 6.5 Utility payoffs for states.

Table 6.6 Convergence of the personality traits.

Table 6.7 Simulation results.

List of Illustrations

Chapter 2: Single-Agent Reinforcement Learning

Figure 2-1 Agent–environment interaction in reinforcement learning.

Figure 2-2 Armed bandit with varying

Figure 2-3 Example of the grid world.

Figure 2-4 Values for each of the states.

Figure 2-5 Resulting optimal policies.

Figure 2-6 Resulting state values based on the optimal policies.

Figure 2-7 Comparison of TD learning with and without eligibility traces for

Figure 2-8 Comparison of Q-learning with and without eligibility traces for Q(1, UP).

Chapter 3: Learning in Two-Player Matrix Games

Figure 3-1 Simplex method for player 1 in the matching pennies game.

Figure 3-2 Simplex method for player 1 in the revised matching pennies game.

Figure 3-3 Simplex method at

in Example 3.3. (a) Simplex method for player 1 at

. (b) Simplex method for player 2 at

Figure 3-4 Players' NE strategies versus

Figure 3-5 GA in matching pennies game.

Figure 3-6 PHC matching pennies game, player 1, probability of choosing action 1, heads.

Figure 3-7 PHC matching pennies game, player 1, probability of choosing action 1, heads when player 2 always chooses heads.

Figure 3-8 WoLF-PHC matching pennies game, player 1, probability of choosing action 1.

Figure 3-9 Trajectories of players' strategies during learning in matching pennies.

Figure 3-10 Trajectories of players' strategies during learning in prisoners' dilemma.

Figure 3-11 Trajectories of players' strategies during learning in rock-paper-scissors.

Chapter 4: Learning in Multiplayer Stochastic Games

Figure 4-1 Example of stochastic games. (a) A

grid game with two players. (b) The numbered cells in the game. (c) Possible state transitions given players' joint action (

). Reproduced from [5], © X. Lu.

Figure 4-2 A

grid game. (a) Initial positions of the players: state

. (b) Invader in top-right versus defender in bottom-left: state

. (c) Invader in bottom-left versus defender in top-right: state

. Reproduced from [5], © X. Lu.

Figure 4-3 Minimax Q-learning for the defender/invader game. Action probability for the defender.

Figure 4-4 Two stochastic games [7]. (a) Grid game 1. (b) Grid game 2.

Figure 4-5 (a) Nash equilibrium of grid game 1. (b) Nash equilibrium of grid game 2. Reproduced from [8] with permission from MIT press.

Figure 4-6 Grid game with barriers, start position (0,1).

Figure 4-7 Constraint equations plotted for the simplex method.

Figure 4-8 Polytope defined by

Figure 4-9 Polytope defined by

Figure 4-10 Nash-Q learner with exploit-explore. Reproduced from [15], © P. De Beck-Courcelle.

Figure 4-11 Nash-Q learner with explore only. Reproduced from [15], © P. De Beck-Courcelle.

Figure 4-12 Nash-Q learning with exploit only. Reproduced from [15], © P. De Beck-Courcelle.

Figure 4-13 Guarding a territory in a grid world. (a) Initial positions of the players when the game starts. (b) Terminal positions of the players when the game ends. Reproduced from [5], © X. Lu.

Figure 4-14 Players' strategies at state

using the minimax-Q algorithm in the first simulation for the

grid game. (a) Defender's strategy

(solid line) and

(dash line). (b) Invader's strategy

(solid line) and

(dash line). Reproduced from [5], © X. Lu.

Figure 4-15 Players' strategies at state

using the WoLF-PHC algorithm in the first simulation for the

grid game. (a) Defender's strategy

(solid line) and

(dash line). (b) Invader's strategy

(solid line) and

(dash line). Reproduced from [5], © X. Lu.

Figure 4-16 Defender's strategy at state

in the second simulation for the

grid game. (a) Minimax-Q-learned strategy of the defender at state

against the invader using a fixed strategy. Solid line: probability of defender moving up; Dashed line: probability of defender moving left. (b) WoLF-PHC learned strategy of the defender at state

against the invader using a fixed strategy. Solid line: probability of defender moving up; Dashed line: probability of defender moving left. Reproduced from [5], © X. Lu.

Figure 4-17 A

grid game. (a) Initial positions of the players. (b) One of the terminal positions of the players. Reproduced from [5], © X. Lu.

Figure 4-18 Results in the first simulation for the

grid game. (a) Result of the minimax-Q learned strategy of the defender against the minimax-Q-learned strategy of the invader. (b) Result of the WoLF-PHC learned strategy of the defender against the WoLF-PHC-learned strategy of the invader. Reproduced from [5], © X. Lu.

Figure 4-19 Results in the second simulation for the

grid game. (a) Result of the minimax-Q-learned strategy of the defender against the invader using a fixed strategy. (b) Result of the WoLF-PHC-learned strategy of the defender against the invader using a fixed strategy. Reproduced from [5], © X. Lu.

Figure 4-20 Hu and Wellman's grid game. (a) Grid game. (b) Nash equilibrium path 1. (c) Nash equilibrium path 2. Reproduced from [24] © M. Awheda and Schwartz, H. M.

Figure 4-21 Learning trajectories of players' strategies at the initial state in the grid game. Reproduced from [5] © X. Lu.

Figure 4-22 Probability distributions of the second actions for both players in the dilemma game. (a) The EMA Q-learning, (b) PGA-APP, and (c) WPL algorithms are shown. Reproduced from [24] © M. Awheda and Schwartz, H. M.

Figure 4-23 Probability distributions of the first actions for the three players in the three-player matching pennies game. (a) The EMA Q-learning, (b) PGA-APP, and (c) WPL algorithms are shown. Reproduced from [24] © M. Awheda and Schwartz, H. M.

Figure 4-24 Probability distributions of player 1's actions in the Shapley's game. (a) The EMA Q-learning, (b) PGA-APP, and (c) WPL algorithms are shown. Reproduced from [24] © M. Awheda and Schwartz, H. M.

Figure 4-25 Probability distributions of the first actions for both players in the biased game. (a) The EMA Q-learning, (b) PGA-APP, and (c) WPL algorithms are shown. Reproduced from [24] © M. Awheda and Schwartz, H. M.

Figure 4-26 Grid game 1. (a) Probability of action North of player 1 when learning with the EMA Q-learning algorithm with different values of the constant gain

. Plots (b) and (c) illustrate the probability of action North of player 1 and player 2, respectively, when learning with the EMA Q-learning, PGA-APP, and WPL algorithms. Reproduced from [24] © M. Awheda and Schwartz, H. M.

Figure 4-27 Two stochastic games [8]. (a) Grid game 1. (b) Grid game 2. Reproduced from [24] © M. Awheda and Schwartz, H. M.

Figure 4-28 (a) Nash equilibrium of grid game 1. (b) Nash equilibrium of grid game 2 [8] with permission from MIT press. Reproduced from [24] © M. Awheda and Schwartz, H. M.

Figure 4-29 Grid game 2. (a) Probability of selecting action North by player 1 when learning with the EMA Q-learning, PGA-APP, and WPL algorithms. (b) Probability of selecting action West by player 2 when learning with the EMA Q-learning, PGA-APP, and WPL algorithms. Reproduced from [24] © M. Awheda and Schwartz, H. M.

Chapter 5: Differential Games

Figure 5-1 Examples of membership functions.

Figure 5-2 Fuzzy system components.

Figure 5-3 Membership functions. (a) Membership functions of five fuzzy sets. (b) Membership functions of seven fuzzy sets.

Figure 5-4 Nonlinear function

and the estimation

with five rules and seven rules.

Figure 5-5 Estimation error

with five rules and seven rules.

Figure 5-6 Basic configuration of fuzzy systems.

Figure 5-7 Architecture of the actor–critic learning system.

Figure 5-8 Homicidal chauffeur problem model.

Figure 5-9 The vehicle cannot turn into the circular region defined by its minimum turning radius

Figure 5-10 Membership functions before training. (a) Pursuer membership functions before training. (b) Evader membership functions before training.

Figure 5-11 Construction of the learning system where the white Gaussian noise

is added as an exploration mechanism.

Figure 5-12 The pursuer captures the evader with 100 learning episodes.

Figure 5-13 The evader increases the capture time after 500 learning episodes.

Figure 5-14 The evader learns to escape after 1000 learning episodes.

Figure 5-15 The evader avoids capture when

rad.

Figure 5-16 The pursuer can capture the evader when

rad.

Figure 5-17 The game of two cars.

Figure 5-18 The pursuer captures the evader with 100 learning episodes.

Figure 5-19 The evader increases the capture time after 500 learning episodes.

Figure 5-20 The evader learns to escape after 1300 learning episodes. (a) The evader learns to escape after 1300 learning episodes. (b) Zoomed version of (a).

Figure 5-21 The pursuer's membership functions after training. (a) The angle difference φ. (b) The rate of change of the angle difference

Figure 5-22 The evader's membership functions after training. (a) The angle difference φ. (b) The distance between the pursuer and the evader

Figure 5-23 The time of capture with the use of eligibility traces in the game of two cars.

Figure 5-24 The differential game of guarding a territory.

Figure 5-25 MFs for

Figure 5-26 Membership functions for input variables.

Figure 5-27 Membership functions for input variables.

Figure 5-28 Reinforcement learning with no shaping function in Example 5.2. (a) Trained defender using FQL with no shaping function. (b) Trained defender using FACL with no shaping function.

Figure 5-29 Reinforcement learning with a bad shaping function in Example 5.2. (a) Trained defender using FQL with the bad shaping function in Example 5.2. (b) Trained defender using FACL with the bad shaping function in Example 5.2.

Figure 5-30 Reinforcement learning with a good shaping function in Example 5.2. (a) Trained defender using FQL with the good shaping function in Example 5.2. (b) Trained defender using FACL with the good shaping function in Example 5.2.

Figure 5-31 Initial positions of the defender in the training and testing episodes in Example 5.3.

Figure 5-32 Example 5.3: average performance of the trained defender versus the NE invader. (a) Average performance error

in the FQL algorithm. (b) Average performance error

in the FACL algorithm.

Figure 5-33 The differential game of guarding a territory with three players.

Figure 5-34 Reinforcement learning without shaping or with a bad shaping function in Example 5.4. (a) Two trained defenders using FACL with no shaping function versus the NE invader after one training trial. (b) Two trained defenders using FACL with the bad shaping function versus the NE invader after one training trial.

Figure 5-35 Two trained defenders using FACL with the good shaping function versus the NE invader after one training trial in Example 5.4.

Figure 5-36 Example 5.5: average performance of the two trained defenders versus the NE invader. (a) Initial positions of the players in the training and testing episodes. (b) Average performance error for the trained defenders versus the NE invader.

Chapter 6: Swarm Intelligence and the Evolution of Personality Traits

Figure 6-1 (a) Actual configuration of the world. (b) The way robot A perceives it. (c) The way robot B perceives it. Reproduced from [21] © S. Givigi and H. M. Schwartz.

Figure 6-2 Simplex of a player with two strategies. Reproduced from [21] © S. Givigi and H. M. Schwartz.

Figure 6-3 Artistic depiction of the problem of robots leaving a room. Reproduced from [21] © S. Givigi and H. M. Schwartz.

Figure 6-4 Artistic depiction of the simulation environment. Reproduced from [21] © S. Givigi and H. M. Schwartz.

Figure 6-5 Utility function and personality traits of one robot. Reproduced from [21] © S. Givigi and H. M. Schwartz.

Figure 6-6 State of the robots during the simulation. Reproduced from [21] © S. Givigi and H. M. Schwartz.

Figure 6-7 State of the simulation when two robots turned courageous. Reproduced from [21] © S. Givigi and H. M. Schwartz.

Figure 6-8 State of the simulation when five robots turned courageous. Reproduced from [21] © S. Givigi and H. M. Schwartz.

Figure 6-9 State of the simulation when 10 robots turned courageous. Reproduced from [21] © S. Givigi and H. M. Schwartz.

Figure 6-10 Robot waiting for a more courageous robot. Reproduced from [21] © S. Givigi and H. M. Schwartz.

Guide

Cover

Table of Contents

Start Reading

Pages

Cover

Contents

iii

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

237

238

239

240

241

242

Multi-Agent Machine Learning

A Reinforcement Approach

Howard M. Schwartz

Department of Systems and Computer Engineering

Carleton University

Published by John Wiley & Sons, Inc., Hoboken, New Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Schwartz, Howard M., editor.

Multi-agent machine learning : a reinforcement approach / Howard M. Schwartz.

pages cm

Includes bibliographical references and index.

ISBN 978-1-118-36208-2 (hardback)

1. Reinforcement learning. 2. Differential games. 3. Swarm intelligence. 4. Machine learning. I. Title.

Q325.6.S39 2014

519.3–dc23

2014016950

Preface

For a decade I have taught a course on adaptive control. The course focused on the classical methods of system identification, using such classic texts as Ljung [1, 2]. The course addressed traditional methods of model reference adaptive control and nonlinear adaptive control using Lyapunov techniques. However, the theory had become out of sync with current engineering practice. As such, my own research and the focus of the graduate course changed to include adaptive signal processing, and to incorporate adaptive channel equalization and echo cancellation using the least mean squares (LMS) algorithm. The course name likewise changed, from “Adaptive Control” to “Adaptive and Learning Systems.” My research was still focused on system identification and nonlinear adaptive control with application to robotics. However, by the early 2000s, I had started work with teams of robots. It was now possible to use handy robot kits and low-cost microcontroller boards to build several robots that could work together. The graduate course in adaptive and learning systems changed again; the theoretical material on nonlinear adaptive control using Lyapunov techniques was reduced, replaced with ideas from reinforcement learning. A whole new range of applications developed. The teams of robots had to learn to work together and to compete.

Today, the graduate course focuses on system identification using recursive least squares techniques, some model reference adaptive control (still using Lyapunov techniques), adaptive signal processing using the LMS algorithm, and reinforcement learning using Q-learning. The first two chapters of this book present these ideas in an abridged form, but in sufficient detail to demonstrate the connections among the learning algorithms that are available; how they are the same; and how they are different. There are other texts that cover this material in detail [2–4].

The research then began to focus on teams of robots learning to work together. The work examined applications of robots working together for search and rescue applications, securing important infrastructure and border regions. It also began to focus on reinforcement learning and multiagent reinforcement learning. The robots are the learning agents. How do children learn how to play tag? How do we learn to play football, or how do police work together to capture a criminal? What strategies do we use, and how do we formulate these strategies? Why can I play touch football with a new group of people and quickly be able to assess everyone's capabilities and then take a particular strategy in the game?

As our research team began to delve further into the ideas associated with multiagent machine learning and game theory, we discovered that the published literature covered many ideas but was poorly coordinated or focused. Although there are a few survey articles [5], they do not give sufficient details to appreciate the different methods. The purpose of this book is to introduce the reader to a particular form of machine learning. The book focuses on multiagent machine learning, but it is tied together with the central theme of learning algorithms in general. Learning algorithms come in many different forms. However, they tend to have a similar approach. We will present the differences and similarities of these methods.

This book is based on my own work and the work of several doctoral and masters students who have worked under my supervision over the past 10 years. In particular, I would like to thank Prof. Sidney Givigi. Prof. Givigi was instrumental in developing the ideas and algorithms presented in Chapter 6. The doctoral research of Xiaosong (Eric) Lu has also found its way into this book. The work on guarding a territory is largely based on his doctoral dissertation. Other graduate students who helped me in this work include Badr Al Faiya, Mostafa Awheda, Pascal De Beck-Courcelle, and Sameh Desouky. Without the dedicated work of this group of students, this book would not have been possible.

H. M. SchwartzOttawa, CanadaSeptember, 2013

References

[1] L. Ljung, System Identification: Theory for the User. Upper Saddle River, NJ: Prentice Hall, 2nd ed., 1999.

[2] L. Ljung and T. Soderstrom, Theory and Practice of Recursive Identification. Cambridge, Massachusetts: The MIT Press, 1983.

[3] R. S. Sutton and A. G. Barto, Reinforcement learning: An Introduction. Cambridge, Massachusetts: The MIT Press, 1998.

[4] Astrom, K. J. and Wittenmark, B., Adaptive Control. Boston, Massachusetts: Addison-Wesley Longman Publishing Co., Inc., 2nd ed., 1994, ISBN = 0201558661.

[5] L. Buoniu and R. Babuška, and B. D. Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Syst. Man Cybern. Part C, Vol. 38, no. 2, pp. 156–172, 2008.

Chapter 1A Brief Review of Supervised Learning

There are a number of algorithms that are typically used for system identification, adaptive control, adaptive signal processing, and machine learning. These algorithms all have particular similarities and differences. However, they all need to process some type of experimental data. How we collect the data and process it determines the most suitable algorithm to use. In adaptive control, there is a device referred to as the self-tuning regulator. In this case, the algorithm measures the states as outputs, estimates the model parameters, and outputs the control signals. In reinforcement learning, the algorithms process rewards, estimate value functions, and output actions. Although one may refer to the recursive least squares (RLS) algorithm in the self-tuning regulator as a supervised learning algorithm and reinforcement learning as an unsupervised learning algorithm, they are both very similar. In this chapter, we will present a number of well-known baseline supervised learning algorithms.

Lesen Sie weiter in der vollständigen Ausgabe!

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben: