118,99 €
A comprehensive exploration of the control schemes of human-robot interactions In Human-Robot Interaction Control Using Reinforcement Learning, an expert team of authors delivers a concise overview of human-robot interaction control schemes and insightful presentations of novel, model-free and reinforcement learning controllers. The book begins with a brief introduction to state-of-the-art human-robot interaction control and reinforcement learning before moving on to describe the typical environment model. The authors also describe some of the most famous identification techniques for parameter estimation. Human-Robot Interaction Control Using Reinforcement Learning offers rigorous mathematical treatments and demonstrations that facilitate the understanding of control schemes and algorithms. It also describes stability and convergence analysis of human-robot interaction control and reinforcement learning based control. The authors also discuss advanced and cutting-edge topics, like inverse and velocity kinematics solutions, H2 neural control, and likely upcoming developments in the field of robotics. Readers will also enjoy: * A thorough introduction to model-based human-robot interaction control * Comprehensive explorations of model-free human-robot interaction control and human-in-the-loop control using Euler angles * Practical discussions of reinforcement learning for robot position and force control, as well as continuous time reinforcement learning for robot force control * In-depth examinations of robot control in worst-case uncertainty using reinforcement learning and the control of redundant robots using multi-agent reinforcement learning Perfect for senior undergraduate and graduate students, academic researchers, and industrial practitioners studying and working in the fields of robotics, learning control systems, neural networks, and computational intelligence, Human-Robot Interaction Control Using Reinforcement Learning is also an indispensable resource for students and professionals studying reinforcement learning.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 285
Veröffentlichungsjahr: 2021
Cover
Series Page
Title Page
Copyright
Dedication
Author Biographies
List of Figures
List of Tables
Preface
Part I: Human‐robot Interaction Control
1 Introduction
1.1 Human‐Robot Interaction Control
1.2 Reinforcement Learning for Control
1.3 Structure of the Book
References
2 Environment Model of Human‐Robot Interaction
2.1 Impedance and Admittance
2.2 Impedance Model for Human‐Robot Interaction
2.3 Identification of Human‐Robot Interaction Model
2.4 Conclusions
References
3 Model Based Human‐Robot Interaction Control
3.1 Task Space Impedance/Admittance Control
3.2 Joint Space Impedance Control
3.3 Accuracy and Robustness
3.4 Simulations
3.5 Conclusions
References
4 Model Free Human‐Robot Interaction Control
4.1 Task‐Space Control Using Joint‐Space Dynamics
4.2 Task‐Space Control Using Task‐Space Dynamics
4.3 Joint Space Control
4.4 Simulations
4.5 Experiments
4.6 Conclusions
References
5 Human‐in‐the‐loop Control Using Euler Angles
5.1 Introduction
5.2 Joint‐Space Control
5.3 Task‐Space Control
5.4 Experiments
5.5 Conclusions
References
Part II: Reinforcement Learning for Robot Interaction Control
6 Reinforcement Learning for Robot Position/Force Control
6.1 Introduction
6.2 Position/Force Control Using an Impedance Model
6.3 Reinforcement Learning Based Position/Force Control
6.4 Simulations and Experiments
6.5 Conclusions
References
Note
7 Continuous‐Time Reinforcement Learning for Force Control
7.1 Introduction
7.2 K‐means Clustering for Reinforcement Learning
7.3 Position/Force Control Using Reinforcement Learning
7.4 Experiments
7.5 Conclusions
References
8 Robot Control in Worst‐Case Uncertainty Using Reinforcement Learning
8.1 Introduction
8.2 Robust Control Using Discrete‐Time Reinforcement Learning
8.3 Double Q‐Learning with ‐Nearest Neighbors
8.4 Robust Control Using Continuous‐Time Reinforcement Learning
8.5 Simulations and Experiments: Discrete‐Time Case
8.6 Simulations and Experiments: Continuous‐Time Case
8.7 Conclusions
References
Note
9 Redundant Robots Control Using Multi‐Agent Reinforcement Learning
9.1 Introduction
9.2 Redundant Robot Control
9.3 Multi‐Agent Reinforcement Learning for Redundant Robot Control
9.4 Simulations and experiments
9.5 Conclusions
References
10 Robot
Neural Control Using Reinforcement Learning
10.1 Introduction
10.2 Neural Control Using Discrete‐Time Reinforcement Learning
10.3 Neural Control in Continuous Time
10.4 Examples
10.5 Conclusion
References
11 Conclusions
A Robot Kinematics and Dynamics
A.1 Kinematics
A.2 Dynamics
A.3 Examples
References
B Reinforcement Learning for Control
B.1 Markov decision processes
B.2 Value functions
B.3 Iterations
B.4 TD learning
Reference
Index
End User License Agreement
Chapter 4
Table 4.1 Model‐free controllers gains.
Table 4.2 2‐DOF pan and tilt robot control gains.
Table 4.3 Control gains for the 4‐DOF exoskeleton
Chapter 5
Table 5.1 Controller gains for the pan and tilt robot
Table 5.2 Controller gains for the exoskeleton
Chapter 6
Table 6.1 Learning parameters
Table 6.2 Controllers gains
Chapter 8
Table 8.1 Learning parameters DT RL: cart pole system
Table 8.2 Learning parameters DT RL: 2‐DOF robot
Table 8.3 Learning parameters CT RL: cart pole system
Table 8.4 Learning parameters CT RL: 2‐DOF robot
Chapter 9
Table 9.1 PID control gains
Chapter 10
Table 10.1 Parameters of the neural RL
Appendix A
Table A.1 Denavit‐Hartenberg parameters of the pan and tilt robot
Table A.2 Kinematic parameters of the exoskeleton
Table A.3 Denavit‐Hartenberg parameters of the exoskeleton
Table A.4 2‐DOF pan and tilt robot kinematic and dynamic parameters
Chapter 1
Figure 1.1 Classic robot control
Figure 1.2 Model compensation control
Figure 1.3 Position/force control
Figure 1.4 Reinforcement learning for control
Chapter 2
Figure 2.1 RLC circuit.
Figure 2.2 Mass‐spring‐damper system.
Figure 2.3 Position control.
Figure 2.4 Force control.
Figure 2.5 Second‐order system for environment and robot.
Figure 2.6 Estimation of damping, stiffness and force.
Chapter 3
Figure 3.1 Impedance and admittance control.
Figure 3.2 High stiffness environment in task‐space.
Figure 3.3 High stiffness environment in joint space.
Figure 3.4 Low stiffness environment in task space.
Figure 3.5 Low stiffness environment in joint space.
Chapter 4
Figure 4.1 Task‐space control using joint‐space dynamics.
Figure 4.2 Model‐free control in high stiffness environment.
Figure 4.3 Position tracking in high stiffness environment.
Figure 4.4 Model‐free control in low stiffness environment.
Figure 4.5 Position tracking in low stiffness environment.
Figure 4.6 Pan and tilt robot with force sensor.
Figure 4.7 Environment for the pan and tilt robot.
Figure 4.8 Tracking results in
.
Figure 4.9 Pan and tilt robot tracking control.
Figure 4.10 4‐DOF exoskeleton robot with force/torque sensor.
Figure 4.11 Tracking in joint space.
Figure 4.12 Tracking in task space X.
Figure 4.13 Contact force and trajectory tracking.
Chapter 5
Figure 5.1 HITL in joint space.
Figure 5.2 HITL in task space.
Figure 5.3 2‐DOF pan and tilt robot.
Figure 5.4 4‐DOF exoskeleton robot.
Figure 5.5 Control of pan and tilt robot in joint space.
Figure 5.6 Control of pan and tilt robot in task space.
Figure 5.7 Control of 4‐DOF exoskeleton robot in joint space.
Figure 5.8 Torques and forces of 4‐DOF exoskeleton robot.
Figure 5.9 Control of 4‐DOF exoskeleton robot in task space.
Chapter 6
Figure 6.1 Position/force control.
Figure 6.2 Robot‐environment interaction
Figure 6.3 Position/force control with
,
Figure 6.4 Position/force control with
and
Figure 6.5 Experimental setup
Figure 6.6 Environment estimation
Figure 6.7 Experiment results
Chapter 7
Figure 7.1 Hybrid reinforcement learning.
Figure 7.2 Learning curves.
Figure 7.3 Control results.
Figure 7.4 Hybrid RL in unknown environments.
Figure 7.5 Comparisons of different methods.
Figure 7.6 Learning process of RL.
Chapter 8
Figure 8.1 Pole position.
Figure 8.2 Mean error of RL methods.
Figure 8.3 2‐DOF planar robot.
Figure 8.4
position regulation.
Figure 8.5 Control actions of the cart‐pole balancing system after 10 second...
Figure 8.6 Pole position.
Figure 8.7 Total cumulative reward curve.
Figure 8.8 Control input.
Figure 8.9
‐function learning curves for
.
Figure 8.10 ISE comparisons.
Figure 8.11 Joint position
tracking.
Figure 8.12 Total cumulative reward.
Figure 8.13
‐function learning curves for
.
Chapter 9
Figure 9.1 Control methods of redundant robots.
Figure 9.2 One hidden layer feedforward network.
Figure 9.3 RL control scheme.
Figure 9.4 Position tracking of simulations.
Figure 9.5 MARL Learning curve.
Figure 9.6 Total reward curve.
Figure 9.7 Position tracking of experiments.
Chapter 10
Figure 10.1 Tracking results.
Figure 10.2 Mean squared error.
Figure 10.3 Learning curves.
Figure 10.4 Convergence of kernel matrices
and
.
Figure 10.5 Tracking results.
Figure 10.6 Mean squared error.
Figure 10.7 Learning curves.
Figure 10.8 Convergence of kernel matrices
and
Appendix A
Figure A.1 4‐DOF exoskeleton robot.
Figure A.2 2‐DOF pan and tilt robot.
Figure A.3 2‐DOF planar robot.
Figure A.4 Cart‐pole system.
Appendix B
Figure B.1 Control system in the form of Markov decision process.
Cover
Table of Contents
Series Page
Title Page
Copyright
Dedication
Author Biographies
List of Figures
List of Figures
Preface
Begin Reading
A Robot Kinematics and Dynamics
B Reinforcement Learning for Control
Index
WILEY END USER LICENSE AGREEMENT
ii
iii
iv
v
xi
xii
xiii
xiv
xv
xvii
xix
xx
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
97
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
IEEE Press
445 Hoes LanePiscataway, NJ 08854
IEEE Press Editorial Board
Ekram Hossain,
Editor in Chief
Jón Atli Benediktsson
Xiaoou Li
Jeffrey Reed
Anjan Bose
Lian Yong
Diomidis Spinellis
David Alan Grier
Andreas Molisch
Sarah Spurgeon
Elya B. Joffe
Saeid Nahavandi
Ahmet Murat Tekalp
Wen YuCINVESTAV‐IPN
Adolfo PerrusquíaCINVESTAV‐IPN
IEEE Press Series on Systems Science and Engineering
MengChu Zhou, Series Editor
Copyright © 2022 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per‐copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750‐8400, fax (978) 750‐4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748‐6011, fax (201) 748‐6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of WarrantyWhile the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762‐2974, outside the United States at (317) 572‐3993 or fax (317) 572‐4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging‐in‐Publication Data is applied forHardback: 9781119782742
Cover Design: WileyCover Image: © Westend61/Getty Images
To the Wen Yu's daughters: Huijia and LisaTo the Adolfo Perrusquía's parents: Adolfo and Graciela
Wen Yu received the B.S. degree in automatic control from Tsinghua University, Beijing, China in 1990 and the M.S. and Ph.D. degrees, both in Electrical Engineering, from Northeastern University, Shenyang, China, in 1992 and 1995, respectively. From 1995 to 1996, he served as a lecturer in the Department of Automatic Control at Northeastern University, Shenyang, China. Since 1996, he has been with CINVESTAV‐IPN (National Polytechnic Institute), Mexico City, Mexico, where he is currently a professor with the Departamento de Control Automatico. From 2002 to 2003, he held research positions with the Instituto Mexicano del Petroleo. He was a Senior Visiting Research Fellow with Queen's University Belfast, Belfast, U.K., from 2006 to 2007, and a Visiting Associate Professor with the University of California, Santa Cruz, from 2009 to 2010. He also holds a visiting professorship at Northeastern University in China from 2006. Dr.Wen Yu serves as associate editors of IEEE Transactions on Cybernetics, Neurocomputing, and Journal of Intelligent and Fuzzy Systems. He is a member of the Mexican Academy of Sciences.
Adolfo Perrusquía (Member, IEEE) received the B. Eng. degree in Mechatronic Engineering from the Interdisciplinary Professional Unit on Engineering and Advanced Technologies of the National Polytechnic Institute (UPIITA‐IPN), Mexico, in 2014; and the M.Sc. and Ph.D. degrees, both in Automatic Control from the Automatic Control Department at the Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV‐IPN), Mexico, in 2016 and 2020, respectively. He is currently a research fellow in Cranfield University. He is a member of the IEEE Computational Intelligence Society. His main research of interest focuses on robotics, mechanisms, machine learning, reinforcement learning, nonlinear control, system modelling, and system identification.
Figure 1.1 Classic robot control
Figure 1.2 Model compensation control
Figure 1.3 Position/force control
Figure 1.4 Reinforcement learning for control
Figure 2.1 RLC circuit.
Figure 2.2 Mass‐spring‐damper system.
Figure 2.3 Position control.
Figure 2.4 Force control.
Figure 2.5 Second‐order system for environment and robot.
Figure 2.6 Estimation of damping, stiffness and force.
Figure 3.1 Impedance and admittance control.
Figure 3.2 High stiffness environment in task‐space.
Figure 3.3 High stiffness environment in joint space.
Figure 3.4 Low stiffness environment in task space.
Figure 3.5 Low stiffness environment in joint space.
Figure 4.1 Task‐space control using joint‐space dynamics.
Figure 4.2 Model‐free control in high stiffness environment.
Figure 4.3 Position tracking in high stiffness environment.
Figure 4.4 Model‐free control in low stiffness environment.
Figure 4.5 Position tracking in low stiffness environment.
Figure 4.6 Pan and tilt robot with force sensor.
Figure 4.7 Environment for the pan and tilt robot.
Figure 4.8 Tracking results in .
Figure 4.9 Pan and tilt robot tracking control.
Figure 4.10 4‐DOF exoskeleton robot with force/torque sensor.
Figure 4.11 Tracking in joint space.
Figure 4.12 Tracking in task space X.
Figure 4.13 Contact force and trajectory tracking.
Figure 5.1 HITL in joint space.
Figure 5.2 HITL in task space.
Figure 5.3 2‐DOF pan and tilt robot.
Figure 5.4 4‐DOF exoskeleton robot.
Figure 5.5 Control of pan and tilt robot in joint space.
Figure 5.6 Control of pan and tilt robot in task space.
Figure 5.7 Control of 4‐DOF exoskeleton robot in joint space.
Figure 5.8 Torques and forces of 4‐DOF exoskeleton robot.
Figure 5.9 Control of 4‐DOF exoskeleton robot in task space.
Figure 6.1 Position/force control.
Figure 6.2 Robot‐environment interaction
Figure 6.3 Position/force control with ,
Figure 6.4 Position/force control with and
Figure 6.5 Experimental setup
Figure 6.6 Environment estimation
Figure 6.7 Experiment results
Figure 7.1 Hybrid reinforcement learning.
Figure 7.2 Learning curves.
Figure 7.3 Control results.
Figure 7.4 Hybrid RL in unknown environments.
Figure 7.5 Comparisons of different methods.
Figure 7.6 Learning process of RL.
Figure 8.1 Pole position.
Figure 8.2 Mean error of RL methods.
Figure 8.3 2‐DOF planar robot.
Figure 8.4 position regulation.
Figure 8.5 Control actions of the cart‐pole balancing system after 10 seconds.
Figure 8.6 Pole position.
Figure 8.7 Total cumulative reward curve.
Figure 8.8 Control input.
Figure 8.9‐function learning curves for .
Figure 8.10 ISE comparisons.
Figure 8.11 Joint position tracking.
Figure 8.12 Total cumulative reward.
Figure 8.13‐function learning curves for .
Figure 9.1 Control methods of redundant robots.
Figure 9.2 One hidden layer feedforward network.
Figure 9.3 RL control scheme.
Figure 9.4 Position tracking of simulations.
Figure 9.5 MARL Learning curve.
Figure 9.6 Total reward curve.
Figure 9.7 Position tracking of experiments.
Figure 10.1 Tracking results.
Figure 10.2 Mean squared error.
Figure 10.3 Learning curves.
Figure 10.4 Convergence of kernel matrices and .
Figure 10.5 Tracking results.
Figure 10.6 Mean squared error.
Figure 10.7 Learning curves.
Figure 10.8 Convergence of kernel matrices and
Table 4.1 Model‐free controllers gains.
Table 4.2 2‐DOF pan and tilt robot control gains.
Table 4.3 Control gains for the 4‐DOF exoskeleton
Table 5.1 Controller gains for the pan and tilt robot
Table 5.2 Controller gains for the exoskeleton
Table 6.1 Learning parameters
Table 6.2 Controllers gains
Table 8.1 Learning parameters DT RL: cart pole system
Table 8.2 Learning parameters DT RL: 2‐DOF robot
Table 8.3 Learning parameters CT RL: cart pole system
Table 8.4 Learning parameters CT RL: 2‐DOF robot
Table 9.1 PID control gains
Table 10.1 Parameters of the neural RL
Table A.1 Denavit‐Hartenberg parameters of the pan and tilt robot
Table A.2 Kinematic parameters of the exoskeleton
Table A.4 2‐DOF pan and tilt robot kinematic and dynamic parameters
Robot control is a topic of interest for the development of control theory and applications. The main theoretical contributions use the linear and non‐linear methods, such that the robot is capable of performing some specific tasks. Robot interaction control is a growing topic for research and industrial applications. The main goal of any robot‐interaction control scheme is to achieve a desired performance between the robot and the environment with safe and precise movements. The environment can be any material or system exogenous to the robot, e.g., a human. The robot‐interaction controller can be designed for position, force, or both.
Recently reinforcement learning techniques have been applied for optimal and robust control, through the use of dynamic programming theory. They do not require system dynamics and are capable of internal and external changes.
From 2013, the authors started to study human‐robot interaction control with intelligent techniques, such as neural networks and fuzzy system. In 2016, the authors put more of their attention on how to solve the human‐robot interaction with reinforcement learning. After four years of work, they present their results on model‐based and model‐free impedance and admittance control, in both joint space and task space. The human‐in‐the‐loop control is analyzed. The model‐free optimal robot‐interaction control and the design of position/force control using reinforcement learning are discussed. Reinforcement learning methods are studied in large discrete‐time space and continuous‐time space. For the redundant robots control, we use multi‐agent reinforcement learning to solve it. The convergence property of the reinforcement learning is analyzed. The robust human‐robot interaction control under the worst‐case uncertainty is transformed into the problem. The optimal controller is designed and realized by reinforcement learning and neural networks.
We assume the readers are familiar with some applications of robot interaction control, using classical and advanced controllers. We will further develop the systematic analysis for system identification, model‐based, and model‐free robot interaction controllers. The book is aimed at graduate students, as well as the practitioner engineer. The prerequisites for this book are: robot control, nonlinear systems analysis, in particular Lyapunov approach, neural networks, optimization techniques, and machine learning. The book is useful for a large number of researchers and engineers interested in robot and control.
Many people have contributed to shape this book. The first author wants to thank the financial support of CONACYT under Grant CONACyT‐A1‐S‐8216 and CINVESTAV under Grant SEP‐CINVESTAV‐62 and Grant CNR‐CINVESTAV; he also thanks the time and dedication of his wife, Xiaoou. Without her this book would have not been possible. The second author would like to express his sincere gratitude to his advisor Prof. Wen Yu for the continuous support of his Ph.D. study and research and for his patience, motivation, enthusiasm, and immense knowledge. His guidance helped him throughout his research and writing of this book. Also, he would like to thank Prof. Alberto Soria, Prof. Rubén Garrido, Ing. José de Jesús Meza. Last, but not least, the second author thanks the time and dedication of his parents, Adolfo and Graciela. Without them, this book would have not been possible.
Mexico
Wen Yu, Adolfo Perrusquía
If we know the robot dynamics, we can use them to design model‐based controllers (See Figure 1.1). The famous linear controllers are: Proportional‐Derivative (PD) [1], linear quadratic regulator (LQR), and Proportional‐Integral‐Derivative (PID) [2]. They use linear system theory, so the robot dynamics are required to be linearized at some point of operation. The LQR [3–5] control has been used as a basis for the design of reinforcement learning approaches [6].
The classic controllers use complete or partial knowledge of the robot's dynamics. In these cases (without considering disturbances), it is possible to design controllers that guarantee perfect tracking performance. By using the compensation or the pre‐compensation techniques, the robot dynamics is canceled and establishes a simpler desired dynamics [7–9]. The control schemes with model compensation or pre‐compensation in joint space can be seen in Figure 1.2. Here is the desired reference, is the robot's joint position, is the joint error, is the compensator or pre‐compensator of the dynamics, is the control coming from the controller, and is the control torque. A typical model‐compensation control is the proportional‐derivative (PD) controller with gravity compensation, which helps to decrease the steady‐state error caused by the gravity terms of the robot dynamics.
When we do not have exact knowledge of the dynamics, it is not possible to design the previous controllers. Therefore, we need to use model‐free controllers. Some famous controllers are: PID control [10, 11], sliding mode control [2, 12], and neural control [13]. These controllers are tuned according to specific plant under certain conditions (disturbances, friction, parameters). When new conditions arise, the controllers do not display the same behavior, even reaching instability. Model‐free controllers perform well for different tasks and are relatively easy to tune; however, they cannot guarantee an optimal performance and require re‐tuning the control gains when the robot parameter are changed or a disturbance is applied.
Figure 1.1 Classic robot control
Figure 1.2 Model compensation control
All the above controllers are designed for position control and do not consider interaction with the environment. There is a great diversity of works related to the interaction, such as stiffness control, force control, hybrid control, and impedance control [14]. The force control regulates the interaction force using P (stiffness control), PD, and PID force controllers [15]. The position control can also use force control to perform position and velocity tracking [16, 17] (see Figure 1.3). Here is the desired force, is the contact force, is the force error, is the output of the force controller, and is the position error in task space. The force/position control uses the force for the compensation [17]. It can also use full dynamics to linearize the closed‐loop system for perfect tracking [18].
Figure 1.3 Position/force control
Impedance control [7] addresses the problem of how to move the robot end‐effector when it is in contact with the external environment. It uses a desired dynamic model, also known as mechanical impedance, to design the control. The simplest impedance control is the stiffness control, where the stiffness of the robot and the environment have a proportional interaction [19].
Traditional impedance control linearizes the system by assuming that the robot model is known exactly [20–22]. These algorithms need the strong assumption that the exact robot dynamics are known [23]. The robustness of the control lies in the compensation of the model.
Most impedance controllers assume that the desired inertia of the impedance model is equal to the robot inertia. Thus, we only have the stiffness and damping terms, which is equivalent to a PD control law [8, 21, 24]. One way to solve the inaccuracy of dynamic model compensation is through the use of adaptive algorithms, neural networks, or other intelligent methods [9, 25–31]
There are several implementations of impedance control. In [32], the impedance control uses human characteristics to obtain the inertia, damping, and stiffness components of the desired impedance. For the position control a PID control is used, which favors the omission of the model compensation. Another way to avoid the use of the model or to proceed without its full knowledge is to take advantage of system characteristics, that is, the high gear‐ratio velocity reduction that causes the non‐linear elements to become very small and the system to become decoupled [33].
In mechanical systems, particularly in the haptic field, the admittance is the dynamic mapping from force to motion. The input force “admits” certain amount of movement [11]. The position control based on impedance or admittance needs the inverse impedance model to obtain the reference position [34–38]. This type of scheme is more complete because there is a double control loop where the interaction with the environment can be used more directly.
The applications of impedance/admittance control are quite wide; for example, exoskeletons are used by a human operator. In order to maintain human safety, low mechanical impedance is required, while tracking control requires high impedance to reject the disturbances. So there are different solutions such as frequency molding and the reduction of mechanical impedance using the poles and zeros of the system [39, 40].
Model‐based impedance/admittance control is sensitive to modeling error. There exist several modifications to the classical impedance/admittance controllers, such as the position‐based impedance control, which improves robustness in the presence of modeling error using an internal position control loop [21].
Figure 1.4 shows the control scheme with reinforcement learning. The main difference with the model‐free controller in Figure 1.1 is that the reinforcement learning updates its value in each step using the tracking error and control torque.
The reinforcement learning schemes are first designed for discrete‐time systems with discrete input space [6, 41]. Among the most famous methods are Monte Carlo [42], Q‐learning [43], Sarsa [44], and critic algorithms [45].
If the input space is large or continuous, the classical reinforcement learning algorithms cannot be directly implemented due to the computational cost, and in most cases the algorithm would not converge to a solution [41, 46]. This problem is known as the curse of dimensionality of machine learning. For robot control, the curse of dimensionality increases because there are various degrees of freedom (DOFs), and each DOF needs its own input space [47, 48]. Another factor that makes the dimension problem more acute is the disturbances, because new states and controls must be considered.
To solve the curse of dimensionality, the model‐based techniques can be applied to the reinforcement learning [49–51]. These learning methods are very popular; some algorithms are called “policy search” [52–59]. However, these methods require model knowledge to decrease the dimension of the input space.
There are a wide variety of model‐free algorithms similar to the discrete‐time algorithms. The main idea of these algorithms is to design adequate reward and approximators, which reduces the computational cost in presence of a large or continuous input space.
The simplest approximator to decrease the input space is the handcraft methods [60–65]. They speed up the learning time by looking for regions where the reward is minimized/maximized. [66, 67] use learning methods from input data, similarly to discrete‐time learning algorithms, but the learning time increases. Other techniques are based on previously established actions in a sequential and related way; that is, the actions that must be taken at each time instant are defined to do a simple task by themselves [68–72]. The main problems of these methods require an expert knowledge to obtain the best regions and to set the predefined actions.
Figure 1.4 Reinforcement learning for control
A linear combination of approximators learn from input data without expert intervention. The most widely used approximators in robot control are inspired by human morphology [73, 74], neural networks [75–77], local models [74, 78], and Gaussian regression processes [79–82]. The success of these approximators is due to the adequate choice of their parameters and hyper‐parameters.
A poor reward design can involve a long learning time, convergence to wrong solutions, or the algorithms never converging to any solution. On the other hand, the proper design of a reward helps the algorithm to find the best solutions in each moment of time in a faster way. This problem is known as the “curse of the reward design” [83].
When model‐free methods are used, the reward should be designed in such a way that it adapts to changes in the system and possible errors, which is extremely useful in robust control problems where the controller is required to be able to compensate the disturbances or limit disturbances to obtain an optimal performance.
The book consists of two principal parts:
The first part relates to the design of
human‐robot interaction control
in different environments (Chapters 2, 3, 4 and 5).
The second part deals with
reinforcement learning for robot interaction control
(Chapters 6, 7, 8, 9 and 10).
Part 1
Chapter 2: We address some important concepts for robot interaction control in a mechanical and electrical sense. The concepts of impedance and admittance play important roles for the design of robot interaction control and the environment modeling. The typical environment models and some of the most famous identification techniques for parameters estimation of the environment models are introduced.
Chapter 3: We discuss our first robot‐interaction schemes using impedance and admittance controllers. The classical controllers are based on the design of feedback‐linearization control laws. The closed‐loop dynamics is reduced to a desired dynamics based on the proposed impedance model, which is designed as a second‐order linear system. Precision and robustness problems are explained in detail for classical impedance and admittance control. The applicability of these controllers is illustrated by simulations in two different environments.
Chapter 4: We study some model‐free controllers that do not need complete knowledge of the robot dynamics. The model‐free controllers are designed for an admittance control scheme. The interaction is controlled by the admittance model, while the position controller uses the adaptive control, PID control, or sliding mode control. Stabilities of these controllers are given via Lyapunov stability theory. The applicability of these algorithms is proven via simulations and experiments using different environments and robots.
Chapter 5: We give new robot interaction control scheme known as human‐in‐the‐loop control. Here the environment is the human operator. The human has no contact with the robot. This method uses the input forces/torques of the human operator and maps them into position/orientations of the end‐effector via the admittance model. Since the human is in the control loop, she does not know if the applied force/torque yields to singular positions, so it is dangerous for real applications. Therefore, the admittance controllers of the previous chapters are modified to avoid the inverse kinematics, and the Jacobian matrix is modified by using the Euler angles. Experiments illustrate the effectiveness of the approach in both joint and task spaces.
Part 2
Chapter 6: The previous chapters use the desired impedance/admittance model to achieve the desired robot‐environment interaction. In most cases, these interactions do not have the optimal performance, they have relative high contact forces or high position errors because they require the environment and robot dynamics. This chapter deals with the reinforcement learning approach for the position/force control in discrete time. The reinforcement learning techniques can achieve a sub‐optimal robot‐environment interaction.
