141,99 €
Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. Edited by the pioneers of RL and ADP research, the book brings together ideas and methods from many fields and provides an important and timely guidance on controlling a wide variety of systems, such as robots, industrial processes, and economic decision-making.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 1011
Veröffentlichungsjahr: 2013
Contents
Cover
Series Page
Title Page
Copyright
Preface
Contributors
Part I: Feedback Control Using RL And ADP
Chapter 1: Reinforcement Learning and Approximate Dynamic Programming (RLADP)—Foundations, Common Misconceptions, and the Challenges Ahead
1.1 Introduction
1.2 What is RLADP?
1.3 Some Basic Challenges in Implementing ADP
Disclaimer
References
Chapter 2: Stable Adaptive Neural Control of Partially Observable Dynamic Systems
2.1 Introduction
2.2 Background
2.3 Stability Bias
2.4 Example Application
References
Chapter 3: Optimal Control of Unknown Nonlinear Discrete-Time Systems Using the Iterative Globalized Dual Heuristic Programming Algorithm
3.1 Background Material
3.2 Neuro-Optimal Control Scheme Based on the Iterative ADP Algorithm
3.3 Generalization
3.4 Simulation Studies
3.5 Summary
References
Chapter 4: Learning and Optimization in Hierarchical Adaptive Critic Design
4.1 Introduction
4.2 Hierarchical ADP Architecture with Multiple-Goal Representation
4.3 Case Study: The Ball-and-Beam System
4.4 Conclusions and Future Work
Acknowledgments
References
Chapter 5: Single Network Adaptive Critics Networks—Development, Analysis, and Applications
5.1 Introduction
5.2 Approximate Dynamic Programing
5.3 SNAC
5.4 J-SNAC
5.5 Finite-SNAC
5.6 Conclusions
Acknowledgments
References
Chapter 6: Linearly Solvable Optimal Control
6.1 Introduction
6.2 Linearly Solvable Optimal Control Problems
6.3 Extension to Risk-Sensitive Control and Game Theory
6.4 Properties and Algorithms
6.5 Conclusions and Future Work
References
Chapter 7: Approximating Optimal Control with Value Gradient Learning
7.1 Introduction
7.2 Value Gradient Learning and BPTT Algorithms
7.3 A Convergence Proof for VGL(1) for Control with Function Approximation
7.4 Vertical Lander Experiment
7.5 Conclusions
References
Chapter 8: A Constrained Backpropagation Approach to Function Approximation and Approximate Dynamic Programming
8.1 Background
8.2 Constrained Backpropagation (CPROP) Approach
8.3 Solution of Partial Differential Equations in Nonstationary Environments
8.4 Preserving Prior Knowledge in Exploratory Adaptive Critic Designs
8.5 Summary
Algebraic ANN Control Matrices
References
Chapter 9: Toward Design of Nonlinear ADP Learning Controllers with Performance Assurance
9.1 Introduction
9.2 Direct Heuristic Dynamic Programming
9.3 A Control Theoretic View on the Direct HDP
9.4 Direct HDP Design with Improved Performance Case 1—Design Guided by a Priori LQR Information
9.5 Direct HDP Design with Improved Performance Case 2—Direct HDP for Coorindated Damping Control of Low-Frequency Oscillation
9.6 Summary
Acknowledgment
References
Chapter 10: Reinforcement Learning Control with Time-Dependent Agent Dynamics
10.1 Introduction
10.2 Q-Learning
10.3 Sampled Data Q-Learning
10.4 System Dynamics Approximation
10.5 Closing Remarks
References
Chapter 11: Online Optimal Control of Nonaffine Nonlinear Discrete-Time Systems without Using Value and Policy Iterations
11.1 Introduction
11.2 Background
11.3 Reinforcement Learning Based Control
11.4 Time-Based Adaptive Dynamic Programming-Based Optimal Control
11.5 Simulation Result
References
Chapter 12: An Actor–Critic–Identifier Architecture for Adaptive Approximate Optimal Control
12.1 Introduction
12.2 Actor–Critic–Identifier Architecture for HJB Approximation
12.3 Actor–Critic Design
12.4 Identifier Design
12.5 Convergence and Stability Analysis
12.6 Simulation
12.7 Conclusion
References
Chapter 13: Robust Adaptive Dynamic Programming
13.1 Introduction
13.2 Optimality Versus Robustness
13.3 Robust-ADP Design for Disturbance Attenuation
13.4 Robust-ADP for Partial-State Feedback Control
13.5 Applications
13.6 Summary
Acknowledgment
References
Part II: Learning and Control in Multiagent Games
Chapter 14: Hybrid Learning in Stochastic Games and Its Application in Network Security
14.1 Introduction
14.2 Two-Person Game
14.3 Learning in NZSGs
14.4 Main Results
14.5 Security Application
14.6 Conclusions and future works
Appendix: Assumptions for Stochastic Approximation
References
Chapter 15: Integral Reinforcement Learning for Online Computation of Nash Strategies of Nonzero-Sum Differential Games
15.1 Introduction
15.2 Two-Player Games and Integral Reinforcement Learning
15.3 Continuous-Time Value Iteration to Solve the Riccati Equation
15.4 Online Algorithm to Solve Nonzero-Sum Games
15.5 Analysis of the Online Learning Algorithm for NZS Games
15.6 Simulation Result for the Online Game Algorithm
15.7 Conclusion
References
Chapter 16: Online Learning Algorithms for Optimal Control and Dynamic Games
16.1 Introduction
16.2 Optimal Control and the Continuous Time Hamilton–Jacobi–Bellman Equation
16.3 Online Solution of Nonlinear Two-Player Zero-Sum Games and Hamilton–Jacobi–Isaacs Equation
16.4 Online Solution of Nonlinear Nonzero-Sum Games and Coupled Hamilton–Jacobi Equations
References
Part III: Foundations in MDP And RL
Chapter 17: Lambda-Policy Iteration: A Review and a New Implementation
17.1 Introduction
17.2 Lambda-Policy Iteration without Cost Function Approximation
17.3 Approximate Policy Evaluation Using Projected Equations
17.4 Lambda-Policy Iteration with Cost Function Approximation
17.5 Conclusions
Acknowledgments
References
Chapter 18: Optimal Learning and Approximate Dynamic Programming
18.1 Introduction
18.2 Modeling
18.3 The Four Classes of Policies
18.4 Basic Learning Policies for Policy Search
18.5 Optimal Learning Policies for Policy Search
18.6 Learning with a Physical State
References
Chapter 19: An Introduction to Event-Based Optimization: Theory and Applications
19.1 Introduction
19.2 Literature Review
19.3 Problem Formulation
19.4 Policy Iteration for EBO
19.5 Example: Material Handling Problem
19.6 Conclusions
Acknowledgments
References
Chapter 20: Bounds for Markov Decision Processes
20.1 Introduction
20.2 Problem Formulation
20.3 The Linear Programming Approach
20.4 The Martingale Duality Approach
20.5 The Pathwise Optimization Method
20.6 Applications
20.7 Conclusion
References
Chapter 21: Approximate Dynamic Programming and Backpropagation on Timescales
21.1 Introduction: Timescales Fundamentals
21.2 Dynamic Programming
21.3 Backpropagation
21.4 Conclusions
Acknowledgments
References
Chapter 22: A Survey of Optimistic Planning in Markov Decision Processes
22.1 Introduction
22.2 Optimistic Online Optimization
22.3 Optimistic Planning Algorithms
22.4 Related Planning Algorithms
22.5 Numerical Example
References
Chapter 23: Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning
23.1 Introduction
23.2 The Framework
23.3 The Feature Adaptation Scheme
23.4 Convergence Analysis
23.5 Application to Traffic Signal Control
23.6 Conclusions
References
Chapter 24: Feature Selection for Neuro-Dynamic Programming
24.1 Introduction
24.2 Optimality Equations
24.3 Neuro-Dynamic Algorithms
24.4 Fluid Models
24.5 Diffusion Models
24.6 Mean Field Games
24.7 Conclusions
References
Chapter 25: Approximate Dynamic Programming for Optimizing Oil Production
25.1 Introduction
25.2 Petroleum Reservoir Production Optimization Problem
25.3 Review of Dynamic Programming and Approximate Dynamic Programming
25.4 Approximate Dynamic Programming Algorithm for Reservoir Production Optimization
25.5 Simulation Results
25.6 Concluding Remarks
Acknowledgments
References
Chapter 26: A Learning Strategy for Source Tracking in Unstructured Environments
26.1 Introduction
26.2 Reinforcement Learning
26.3 Light-Following Robot
26.4 Simulation Results
26.5 Experimental Results
26.6 Conclusions and Future Work
Acknowledgments
References
Index
IEEE Press Series on Computational Intelligence
IEEE Press
445 Hoes Lane
Piscataway, NJ 08854
IEEE Press Editorial Board 2012
John Anderson, Editor in Chief
Kenneth Moore, Director of IEEE Book and Information Services (BIS)
Cover Illustration: Courtesy of Frank L. Lewis and Derong Liu
Cover Design: John Wiley & Sons, Inc.
Copyright © 2013 by The Institute of Electrical and Electronics Engineers, Inc.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey. All rights reserved
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Reinforcement learning and approximate dynamic programming for feedback control / edited by Frank L. Lewis, Derong Liu.
p. cm.
ISBN 978-1-118-10420-0 (hardback)
1. Reinforcement learning. 2. Feedback control systems. I. Lewis, Frank L.
II. Liu, Derong, 1963-
Q325.6.R464 2012
003′.5—dc23
2012019014
Preface
Modern day society relies on the operation of complex systems including aircraft, automobiles, electric power systems, economic entities, business organizations, banking and finance systems, computer networks, manufacturing systems, and industrial processes. Decision and control are responsible for ensuring that these systems perform properly and meet prescribed performance objectives. The safe, reliable, and efficient control of these systems is essential for our society. Therefore, automatic decision and control systems are ubiquitous in human engineered systems and have had an enormous impact on our lives. As modern systems become more complex and performance requirements more stringent, improved methods of decision and control are required that deliver guaranteed performance and the satisfaction of prescribed goals.
Feedback control works on the principle of observing the actual outputs of a system, comparing them to desired trajectories, and computing a control signal based on that error, which is used to modify the performance of the system to make the actual output follow the desired trajectory. The optimization of sequential decisions or controls that are repeated over time arises in many fields, including artificial intelligence, automatic control systems, power systems, economics, medicine, operations research, resource allocation, collaboration and coalitions, business and finance, and games including chess and backgammon. Optimal control theory provides methods for computing feedback control systems that deliver optimal performance. Optimal controllers optimize user-prescribed performance functions and are normally designed offline by solving Hamilton–Jacobi–Bellman (HJB) design equations. This requires knowledge of the full system dynamics model. However, it is often difficult to determine an accurate dynamical model of practical systems. Moreover, determining optimal control policies for nonlinear systems requires the offline solution of nonlinear HJB equations, which are often difficult or impossible to solve. Dynamic programming (DP) is a sequential algorithmic method for finding optimal solutions in sequential decision problems. DP was developed beginning in the 1960s with the work of Bellman and Pontryagin. DP is fundamentally a backwards-in-time procedure that does not offer methods for solving optimal decision problems in a forward manner in real time.
The real-time adaptive learning of optimal controllers for complex unknown systems has been solved in nature. Every agent or system is concerned with acting on its environment in such a way as to achieve its goals. Agents seek to learn how to collaborate to improve their chances of survival and increase. The idea that there is a cause and effect relation between actions and rewards is inherent in animal learning. Most organisms in nature act in an optimal fashion to conserve resources while achieving their goals. It is possible to study natural methods of learning and use them to develop computerized machine learning methods that solve sequential decision problems.
Reinforcement learning (RL) describes a family of machine learning systems that operate based on principles used in animals, social groups, and naturally occurring systems. RL methods were used by Ivan Pavlov in the 1860s to train his dogs. RL refers to an actor or agent that interacts with its environment and modifies its actions, or control policies, based on stimuli received in response to its actions. RL computational methods have been developed by the Computational Intelligence Community that solve optimal decision problems in real time and do not require the availability of analytical system models. The RL algorithms are constructed on the idea that successful control decisions should be remembered, by means of a reinforcement signal, such that they become more likely to be used another time. Successful collaborating groups should be reinforced. Although the idea originates from experimental animal learning, it has also been observed that RL has strong support from neurobiology, where it has been noted that the dopamine neurotransmitter in the basal ganglia acts as a reinforcement informational signal, which favors learning at the level of the neurons in the brain. RL techniques were first developed for Markov decision processes having finite state spaces. They have been extended for the control of dynamical systems with infinite state spaces.
One class of RL methods is based on the actor–critic structure, where an actor component applies an action or a control policy to the environment, whereas a critic component assesses the value of that action. Actor–critic structures are particularly well adapted for solving optimal decision problems in real time through reinforcement learning techniques. Approximate dynamic programing (ADP) refers to a family of practical actor–critic methods for finding optimal solutions in real time. These techniques use computational enhancements such as function approximation to develop practical algorithms for complex systems with disturbances and uncertain dynamics. Now, the ADP approach has become a key direction for future research in understanding brain intelligence and building intelligent systems.
The purpose of this book is to give an exposition of recently developed RL and ADP techniques for decision and control in human engineered systems. Included are both single-player decision and control and multiplayer games. RL is strongly connected from a theoretical point of view with both adaptive learning control and optimal control methods. There has been a great deal of interest in RL and recent work has shown that ideas based on ADP can be used to design a family of adaptive learning algorithms that converge in real-time to optimal control solutions by measuring data along the system trajectories. The study of RL and ADP requires methods from many fields, including computational intelligence, automatic control systems, Markov decision processes, stochastic games, psychology, operations research, cybernetics, neural networks, and neurobiology. Therefore, this book is interested in bringing together ideas from many communities.
This book has three parts. Part I develops methods for feedback control of systems based on RL and ADP. Part II treats learning and control in multiagent games. Part III presents some ideas of fundamental importance in understanding and implementing decision algorithm in Markov processes.
F.L. LewisDerong Liu
Fort Worth, TX
Chicago, IL
Contributors
Eduardo Alonso, School of Informatics, City University, London, UK
Charles W. Anderson, Department of Computer Science, Colorado State University, Fort Collins, CO, USA
Titus Appel, MARHES Lab, Department of Electrical & Computer Engineering, University of New Mexico, Albuquerque, NM, USA
Khalid Aziz, Department of Energy Resources Engineering, Stanford University, Stanford, CA, USA
Robert Babuska, Delft Center for Systems and Control, Delft University of Technology, Delft, The Netherlands
S.N. Balakrishnan, Department of Mechanical and Aerospace Engineering, Missouri University of Science and Technology, Rolla, MO, USA
Tamer Baar, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Dimitri Bertsekas, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
Shubhendu Bhasin, Department of Electrical Engineering, Indian Institute of Technology, Delhi, India
Shalabh Bhatnagar, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India
V.S. Borkar, Department of Electrical Engineering, Indian Institute of Technology, Powai, Mumbai, India
Lucian Busoniu, Université de Lorraine, CRAN, UMR 7039 and CNRS, CRAN, UMR 7039, Vandœuvre-lès-Nancy, France
Xi-Ren Cao, Shanghai Jiaotong University, Shanghai, China
W. Chen, Coordinated Science Laboratory, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Vijay Desai, Industrial Engineering and Operations Research, Columbia University, New York, NY, USA
Gianluca Di Muro, Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
Jie Ding, Department of Mechanical and Aerospace Engineering, Missouri University of Science and Technology, Rolla, MO, USA
Warren E. Dixon, Department of Mechanical and Aerospace Engineering, University of Florida, FL, USA
Louis J. Durlofsky, Department of Energy Resources Engineering, Stanford University, Stanford, CA, USA
Krishnamurthy Dvijotham, Computer Science and Engineering, University of Washington, Seattle, WA, USA
Michael Fairbank, School of Informatics, City University, London, UK
Vivek Farias, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, USA
Silvia Ferrari, Laboratory for Intelligent Systems and Control (LISC), Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
Rafael Fierro, MARHES Lab, Department of Electrical & Computer Engineering, University of New Mexico, Albuquerque, NM, USA
Haibo He, Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI, USA
Ali Heydari, Department of Mechanical and Aerospace Engineering, Missouri University of Science and Technology, Rolla, MO, USA
Dayu Huang, Coordinated Science Laboratory, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
S. Jagannathan, Electrical and Computer Engineering Department, Missouri University of Science and Technology, Rolla, MI, USA
Qing-Shan Jia, Department of Automation, Tsinghua University, Beijing, China
Yu Jiang, Department of Electrical and Computer Engineering, Polytechnic Institute of New York University, Brooklyn, NY, USA
Marcus Johnson, Department of Mechanical and Aerospace Engineering, University of Florida, FL, USA
Zhong-Ping Jiang, Department of Electrical and Computer Engineering, Polytechnic Institute of New York University, Brooklyn, NY, USA
Rushikesh Kamalapurkar, Department of Mechanical and Aerospace Engineering, University of Florida, FL, USA
Kenton Kirkpatrick, Department of Aerospace Engineering, Texas A&M University, College Station, TX, USA
J. Nate Knight, Numerica Corporation, Loveland, CO, USA
F.L. Lewis, UTA Research Institute, University of Texas, Arlington, TX, USA
Derong Liu, State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R. China
Chao Lu, Department of Electrical Engineering, Tsinghua University, Beijing, P. R. China
Ron Lumia, Department of Mechanical Engineering, University of New Mexico, Albuquerque, NM, USA
P. Mehta, Coordinated Science Laboratory, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Sean Meyn, Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, USA
Ciamac Moallemi, Graduate School of Business, Columbia University, New York, NY, USA
Remi Munos, SequeL team, INRIA Lille– Nord Europe, France
Zhen Ni, Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI, USA
Warren B. Powell, Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA
L.A. Prashanth, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India
Danil Prokhorov, Toyota Research Institute North America, Toyota Technical Center, Ann Arbor, MI, USA
Armando A. Rodriguez, School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, USA
Brandon Rohrer, Sandia National Laboratories, Albuquerque, NM, USA
Keith Rudd, Laboratory for Intelligent Systems and Control (LISC), Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
I.O. Ryzhov, Department of Decision, Operations and Information Technologies, Robert H. Smith School of Business, University of Maryland, College Park, MD, USA
John Seiffertt, Department of Electrical and Computer Engineering, Missouri University of Science & Technology, Rolla, MO, USA
Jennie Si, School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, USA
A. Surana, United Technologies Research Center, East Hartford, CT, USA
Hamidou Tembine, Telecommunication Department, Supelec, Gif sur Yvette, France
Emanuel Todorov, Applied Mathematics, Computer Science and Engineering, University of Washington, Seattle, WA, USA
Kostas S. Tsakalis, School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, USA
John Valasek, Department of Aerospace Engineering, Texas A&M University, College Station, TX, USA
K. Vamvoudaki, Center for Control, Dynamical-Systems and Computation, University of California, Santa Barbara, CA, USA
Benjamin Van Roy, Department of Management Science and Engineering and Department of Electrical Engineering, Stanford University, Stanford, CA, USA
Draguna Vrabie, United Technologies Research Center, East Hartford, CT, USA
Ding Wang, State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R. China
Zheng Wen, Department of Electrical Engineering, Stanford University, Stanford, CA, USA
Paul Werbos, National Science Foundation, Arlington, VA, USA
John Wood, Department of Mechanical Engineering, University of New Mexico, Albuquerque, NM, USA
Don Wunsch, Department of Electrical and Computer Engineering, Missouri University of Science & Technology, Rolla, MO, USA
Lei Yang, College of Information and Control Science and Engineering, Zhejiang University, Hangzhou, China
Qinmin Yang, State Key Laboratory of Industrial Control Technology, Department of Control Science and Engineering, Zhejiang University, Hangzhou, Zhejiang, China
Hassan Zargarzadeh, Embedded Systems and Networking Laboratory, Electrical and Computer Engineering Department, Missouri University of Science and Technology, Rolla, MI, USA
Dongbin Zhao, State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Qianchuan Zhao, Department of Automation, Tsinghua University, Beijing, China
Yanjia Zhao, Department of Automation, Tsinghua University, Beijing, China
Quanyan Zhu, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Part I
Feedback Control Using RL And ADP
Chapter 1
Reinforcement Learning and Approximate Dynamic Programming (RLADP)—Foundations, Common Misconceptions, and the Challenges Ahead
Paul J. Werbos
National Science Foundation (NSF), Arlington, VA, USA
Many new formulations of reinforcement learning and approximate dynamic programming (RLADP) have appeared in recent years, as it has grown in control applications, control theory, operations research, computer science, robotics, and efforts to understand brain intelligence. The chapter reviews the foundations and challenges common to all these areas, in a unified way but with reference to their variations. It highlights cases where experience in one area sheds light on obstacles or common misconceptions in another. Many common beliefs about the limits of RLADP are based on such obstacles and misconceptions, for which solutions already exist. Above all, this chapter pinpoints key opportunities for future research important to the field as a whole and to the larger benefits it offers.
The field of reinforcement learning and approximate dynamic programming (RLADP) has undergone enormous expansion since about 1988 [1], the year of the first NSF workshop on Neural Networks for Control, which evaluated RLADP as one of several important new tools for intelligent control, with or without neural networks. Since then, RLADP has grown enormously in many disciplines of engineering, computer science, and cognitive science, especially in neural networks, control engineering, operations research, robotics, machine learning, and efforts to reverse engineer the higher intelligence of the brain. In 1988, when I began funding this area, many people viewed the area as a small and curious niche within a small niche, but by the year 2006, when the Directorate of Engineering at NSF was reorganized, many program directors said “we all do ADP now.”
Many new tools, serious applications, and stability theorems have appeared, and are still appearing, in ever great numbers. But at the same time, a wide variety of misconceptions about RLADP have appeared, even within the field itself. The sheer variety of methods and approaches has made it ever more difficult for people to appreciate the underlying unity of the field and of the mathematics, and to take advantage of the best tools and concepts from all parts of the field. At NSF, I have often seen cases where the most advanced and accomplished researchers in the field have become stuck because of fundamental questions or assumptions that were taken care of 30 years before, in a different part of the field. The goal of this chapter is to provide a kind of unified view of the past, present, and future of this field, to address those challenges. I will review many points that, though basic, continue to be obstacles to progress. I will also focus on the larger, long-term research goal of building real-time learning systems which can cope effectively with the degree of system complexity, nonlinearity, random disturbance, computer hardware complexity, and partial observability which even a mouse brain somehow seems to be able to handle [2]. I will also try to clarify issues of notation that have become more and more of a problem as the field grows more diverse. I will try to make this chapter accessible to people across multiple disciplines, but will often make side comments for specialists in different disciplines—as in the next paragraph.
Optimal control, robust control, and adaptive control are often seen as the three main pillars of modern control theory. ADP may be seen as part of optimal control, the part that seeks computationally feasible general methods for the nonlinear stochastic case. It may be seen as a computational tool to find the most accurate possible solutions, subject to computational constraints, to the HJB equation, as required by general nonlinear robust control. It may be formulated as an extension of adaptive control which, because of the implicit “look ahead,” achieves stability under much weaker conditions than the well-known forms of direct and indirect adaptive control. The most impressive practical applications so far have involved highly nonlinear challenges, such as missile interception [3] and continuous production of carbon–carbon thermoplastic parts [4].
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
