Human-Robot Interaction Control Using Reinforcement Learning - Wen Yu - E-Book

Human-Robot Interaction Control Using Reinforcement Learning E-Book

Wen Yu

0,0
118,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A comprehensive exploration of the control schemes of human-robot interactions In Human-Robot Interaction Control Using Reinforcement Learning, an expert team of authors delivers a concise overview of human-robot interaction control schemes and insightful presentations of novel, model-free and reinforcement learning controllers. The book begins with a brief introduction to state-of-the-art human-robot interaction control and reinforcement learning before moving on to describe the typical environment model. The authors also describe some of the most famous identification techniques for parameter estimation. Human-Robot Interaction Control Using Reinforcement Learning offers rigorous mathematical treatments and demonstrations that facilitate the understanding of control schemes and algorithms. It also describes stability and convergence analysis of human-robot interaction control and reinforcement learning based control. The authors also discuss advanced and cutting-edge topics, like inverse and velocity kinematics solutions, H2 neural control, and likely upcoming developments in the field of robotics. Readers will also enjoy: * A thorough introduction to model-based human-robot interaction control * Comprehensive explorations of model-free human-robot interaction control and human-in-the-loop control using Euler angles * Practical discussions of reinforcement learning for robot position and force control, as well as continuous time reinforcement learning for robot force control * In-depth examinations of robot control in worst-case uncertainty using reinforcement learning and the control of redundant robots using multi-agent reinforcement learning Perfect for senior undergraduate and graduate students, academic researchers, and industrial practitioners studying and working in the fields of robotics, learning control systems, neural networks, and computational intelligence, Human-Robot Interaction Control Using Reinforcement Learning is also an indispensable resource for students and professionals studying reinforcement learning.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 285

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Series Page

Title Page

Copyright

Dedication

Author Biographies

List of Figures

List of Tables

Preface

Part I: Human‐robot Interaction Control

1 Introduction

1.1 Human‐Robot Interaction Control

1.2 Reinforcement Learning for Control

1.3 Structure of the Book

References

2 Environment Model of Human‐Robot Interaction

2.1 Impedance and Admittance

2.2 Impedance Model for Human‐Robot Interaction

2.3 Identification of Human‐Robot Interaction Model

2.4 Conclusions

References

3 Model Based Human‐Robot Interaction Control

3.1 Task Space Impedance/Admittance Control

3.2 Joint Space Impedance Control

3.3 Accuracy and Robustness

3.4 Simulations

3.5 Conclusions

References

4 Model Free Human‐Robot Interaction Control

4.1 Task‐Space Control Using Joint‐Space Dynamics

4.2 Task‐Space Control Using Task‐Space Dynamics

4.3 Joint Space Control

4.4 Simulations

4.5 Experiments

4.6 Conclusions

References

5 Human‐in‐the‐loop Control Using Euler Angles

5.1 Introduction

5.2 Joint‐Space Control

5.3 Task‐Space Control

5.4 Experiments

5.5 Conclusions

References

Part II: Reinforcement Learning for Robot Interaction Control

6 Reinforcement Learning for Robot Position/Force Control

6.1 Introduction

6.2 Position/Force Control Using an Impedance Model

6.3 Reinforcement Learning Based Position/Force Control

6.4 Simulations and Experiments

6.5 Conclusions

References

Note

7 Continuous‐Time Reinforcement Learning for Force Control

7.1 Introduction

7.2 K‐means Clustering for Reinforcement Learning

7.3 Position/Force Control Using Reinforcement Learning

7.4 Experiments

7.5 Conclusions

References

8 Robot Control in Worst‐Case Uncertainty Using Reinforcement Learning

8.1 Introduction

8.2 Robust Control Using Discrete‐Time Reinforcement Learning

8.3 Double Q‐Learning with ‐Nearest Neighbors

8.4 Robust Control Using Continuous‐Time Reinforcement Learning

8.5 Simulations and Experiments: Discrete‐Time Case

8.6 Simulations and Experiments: Continuous‐Time Case

8.7 Conclusions

References

Note

9 Redundant Robots Control Using Multi‐Agent Reinforcement Learning

9.1 Introduction

9.2 Redundant Robot Control

9.3 Multi‐Agent Reinforcement Learning for Redundant Robot Control

9.4 Simulations and experiments

9.5 Conclusions

References

10 Robot

Neural Control Using Reinforcement Learning

10.1 Introduction

10.2 Neural Control Using Discrete‐Time Reinforcement Learning

10.3 Neural Control in Continuous Time

10.4 Examples

10.5 Conclusion

References

11 Conclusions

A Robot Kinematics and Dynamics

A.1 Kinematics

A.2 Dynamics

A.3 Examples

References

B Reinforcement Learning for Control

B.1 Markov decision processes

B.2 Value functions

B.3 Iterations

B.4 TD learning

Reference

Index

End User License Agreement

List of Tables

Chapter 4

Table 4.1 Model‐free controllers gains.

Table 4.2 2‐DOF pan and tilt robot control gains.

Table 4.3 Control gains for the 4‐DOF exoskeleton

Chapter 5

Table 5.1 Controller gains for the pan and tilt robot

Table 5.2 Controller gains for the exoskeleton

Chapter 6

Table 6.1 Learning parameters

Table 6.2 Controllers gains

Chapter 8

Table 8.1 Learning parameters DT RL: cart pole system

Table 8.2 Learning parameters DT RL: 2‐DOF robot

Table 8.3 Learning parameters CT RL: cart pole system

Table 8.4 Learning parameters CT RL: 2‐DOF robot

Chapter 9

Table 9.1 PID control gains

Chapter 10

Table 10.1 Parameters of the neural RL

Appendix A

Table A.1 Denavit‐Hartenberg parameters of the pan and tilt robot

Table A.2 Kinematic parameters of the exoskeleton

Table A.3 Denavit‐Hartenberg parameters of the exoskeleton

Table A.4 2‐DOF pan and tilt robot kinematic and dynamic parameters

List of Illustrations

Chapter 1

Figure 1.1 Classic robot control

Figure 1.2 Model compensation control

Figure 1.3 Position/force control

Figure 1.4 Reinforcement learning for control

Chapter 2

Figure 2.1 RLC circuit.

Figure 2.2 Mass‐spring‐damper system.

Figure 2.3 Position control.

Figure 2.4 Force control.

Figure 2.5 Second‐order system for environment and robot.

Figure 2.6 Estimation of damping, stiffness and force.

Chapter 3

Figure 3.1 Impedance and admittance control.

Figure 3.2 High stiffness environment in task‐space.

Figure 3.3 High stiffness environment in joint space.

Figure 3.4 Low stiffness environment in task space.

Figure 3.5 Low stiffness environment in joint space.

Chapter 4

Figure 4.1 Task‐space control using joint‐space dynamics.

Figure 4.2 Model‐free control in high stiffness environment.

Figure 4.3 Position tracking in high stiffness environment.

Figure 4.4 Model‐free control in low stiffness environment.

Figure 4.5 Position tracking in low stiffness environment.

Figure 4.6 Pan and tilt robot with force sensor.

Figure 4.7 Environment for the pan and tilt robot.

Figure 4.8 Tracking results in

.

Figure 4.9 Pan and tilt robot tracking control.

Figure 4.10 4‐DOF exoskeleton robot with force/torque sensor.

Figure 4.11 Tracking in joint space.

Figure 4.12 Tracking in task space X.

Figure 4.13 Contact force and trajectory tracking.

Chapter 5

Figure 5.1 HITL in joint space.

Figure 5.2 HITL in task space.

Figure 5.3 2‐DOF pan and tilt robot.

Figure 5.4 4‐DOF exoskeleton robot.

Figure 5.5 Control of pan and tilt robot in joint space.

Figure 5.6 Control of pan and tilt robot in task space.

Figure 5.7 Control of 4‐DOF exoskeleton robot in joint space.

Figure 5.8 Torques and forces of 4‐DOF exoskeleton robot.

Figure 5.9 Control of 4‐DOF exoskeleton robot in task space.

Chapter 6

Figure 6.1 Position/force control.

Figure 6.2 Robot‐environment interaction

Figure 6.3 Position/force control with

,

Figure 6.4 Position/force control with

and

Figure 6.5 Experimental setup

Figure 6.6 Environment estimation

Figure 6.7 Experiment results

Chapter 7

Figure 7.1 Hybrid reinforcement learning.

Figure 7.2 Learning curves.

Figure 7.3 Control results.

Figure 7.4 Hybrid RL in unknown environments.

Figure 7.5 Comparisons of different methods.

Figure 7.6 Learning process of RL.

Chapter 8

Figure 8.1 Pole position.

Figure 8.2 Mean error of RL methods.

Figure 8.3 2‐DOF planar robot.

Figure 8.4

position regulation.

Figure 8.5 Control actions of the cart‐pole balancing system after 10 second...

Figure 8.6 Pole position.

Figure 8.7 Total cumulative reward curve.

Figure 8.8 Control input.

Figure 8.9

‐function learning curves for

.

Figure 8.10 ISE comparisons.

Figure 8.11 Joint position

tracking.

Figure 8.12 Total cumulative reward.

Figure 8.13

‐function learning curves for

.

Chapter 9

Figure 9.1 Control methods of redundant robots.

Figure 9.2 One hidden layer feedforward network.

Figure 9.3 RL control scheme.

Figure 9.4 Position tracking of simulations.

Figure 9.5 MARL Learning curve.

Figure 9.6 Total reward curve.

Figure 9.7 Position tracking of experiments.

Chapter 10

Figure 10.1 Tracking results.

Figure 10.2 Mean squared error.

Figure 10.3 Learning curves.

Figure 10.4 Convergence of kernel matrices

and

.

Figure 10.5 Tracking results.

Figure 10.6 Mean squared error.

Figure 10.7 Learning curves.

Figure 10.8 Convergence of kernel matrices

and

Appendix A

Figure A.1 4‐DOF exoskeleton robot.

Figure A.2 2‐DOF pan and tilt robot.

Figure A.3 2‐DOF planar robot.

Figure A.4 Cart‐pole system.

Appendix B

Figure B.1 Control system in the form of Markov decision process.

Guide

Cover

Table of Contents

Series Page

Title Page

Copyright

Dedication

Author Biographies

List of Figures

List of Figures

Preface

Begin Reading

A Robot Kinematics and Dynamics

B Reinforcement Learning for Control

Index

WILEY END USER LICENSE AGREEMENT

Pages

ii

iii

iv

v

xi

xii

xiii

xiv

xv

xvii

xix

xx

1

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

97

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

 

 

IEEE Press

445 Hoes LanePiscataway, NJ 08854

IEEE Press Editorial Board

Ekram Hossain,

Editor in Chief

Jón Atli Benediktsson

Xiaoou Li

Jeffrey Reed

Anjan Bose

Lian Yong

Diomidis Spinellis

David Alan Grier

Andreas Molisch

Sarah Spurgeon

Elya B. Joffe

Saeid Nahavandi

Ahmet Murat Tekalp

Human‐Robot Interaction Control Using Reinforcement Learning

 

Wen YuCINVESTAV‐IPN

 

Adolfo PerrusquíaCINVESTAV‐IPN

 

 

 

 

IEEE Press Series on Systems Science and Engineering

MengChu Zhou, Series Editor

Copyright © 2022 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per‐copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750‐8400, fax (978) 750‐4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748‐6011, fax (201) 748‐6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of WarrantyWhile the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762‐2974, outside the United States at (317) 572‐3993 or fax (317) 572‐4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging‐in‐Publication Data is applied forHardback: 9781119782742

Cover Design: WileyCover Image: © Westend61/Getty Images

 

 

To the Wen Yu's daughters: Huijia and LisaTo the Adolfo Perrusquía's parents: Adolfo and Graciela

Author Biographies

Wen Yu received the B.S. degree in automatic control from Tsinghua University, Beijing, China in 1990 and the M.S. and Ph.D. degrees, both in Electrical Engineering, from Northeastern University, Shenyang, China, in 1992 and 1995, respectively. From 1995 to 1996, he served as a lecturer in the Department of Automatic Control at Northeastern University, Shenyang, China. Since 1996, he has been with CINVESTAV‐IPN (National Polytechnic Institute), Mexico City, Mexico, where he is currently a professor with the Departamento de Control Automatico. From 2002 to 2003, he held research positions with the Instituto Mexicano del Petroleo. He was a Senior Visiting Research Fellow with Queen's University Belfast, Belfast, U.K., from 2006 to 2007, and a Visiting Associate Professor with the University of California, Santa Cruz, from 2009 to 2010. He also holds a visiting professorship at Northeastern University in China from 2006. Dr.Wen Yu serves as associate editors of IEEE Transactions on Cybernetics, Neurocomputing, and Journal of Intelligent and Fuzzy Systems. He is a member of the Mexican Academy of Sciences.

Adolfo Perrusquía (Member, IEEE) received the B. Eng. degree in Mechatronic Engineering from the Interdisciplinary Professional Unit on Engineering and Advanced Technologies of the National Polytechnic Institute (UPIITA‐IPN), Mexico, in 2014; and the M.Sc. and Ph.D. degrees, both in Automatic Control from the Automatic Control Department at the Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV‐IPN), Mexico, in 2016 and 2020, respectively. He is currently a research fellow in Cranfield University. He is a member of the IEEE Computational Intelligence Society. His main research of interest focuses on robotics, mechanisms, machine learning, reinforcement learning, nonlinear control, system modelling, and system identification.

List of Figures

Figure 1.1 Classic robot control

Figure 1.2 Model compensation control

Figure 1.3 Position/force control

Figure 1.4 Reinforcement learning for control

Figure 2.1 RLC circuit.

Figure 2.2 Mass‐spring‐damper system.

Figure 2.3 Position control.

Figure 2.4 Force control.

Figure 2.5 Second‐order system for environment and robot.

Figure 2.6 Estimation of damping, stiffness and force.

Figure 3.1 Impedance and admittance control.

Figure 3.2 High stiffness environment in task‐space.

Figure 3.3 High stiffness environment in joint space.

Figure 3.4 Low stiffness environment in task space.

Figure 3.5 Low stiffness environment in joint space.

Figure 4.1 Task‐space control using joint‐space dynamics.

Figure 4.2 Model‐free control in high stiffness environment.

Figure 4.3 Position tracking in high stiffness environment.

Figure 4.4 Model‐free control in low stiffness environment.

Figure 4.5 Position tracking in low stiffness environment.

Figure 4.6 Pan and tilt robot with force sensor.

Figure 4.7 Environment for the pan and tilt robot.

Figure 4.8 Tracking results in .

Figure 4.9 Pan and tilt robot tracking control.

Figure 4.10 4‐DOF exoskeleton robot with force/torque sensor.

Figure 4.11 Tracking in joint space.

Figure 4.12 Tracking in task space X.

Figure 4.13 Contact force and trajectory tracking.

Figure 5.1 HITL in joint space.

Figure 5.2 HITL in task space.

Figure 5.3 2‐DOF pan and tilt robot.

Figure 5.4 4‐DOF exoskeleton robot.

Figure 5.5 Control of pan and tilt robot in joint space.

Figure 5.6 Control of pan and tilt robot in task space.

Figure 5.7 Control of 4‐DOF exoskeleton robot in joint space.

Figure 5.8 Torques and forces of 4‐DOF exoskeleton robot.

Figure 5.9 Control of 4‐DOF exoskeleton robot in task space.

Figure 6.1 Position/force control.

Figure 6.2 Robot‐environment interaction

Figure 6.3 Position/force control with ,

Figure 6.4 Position/force control with and

Figure 6.5 Experimental setup

Figure 6.6 Environment estimation

Figure 6.7 Experiment results

Figure 7.1 Hybrid reinforcement learning.

Figure 7.2 Learning curves.

Figure 7.3 Control results.

Figure 7.4 Hybrid RL in unknown environments.

Figure 7.5 Comparisons of different methods.

Figure 7.6 Learning process of RL.

Figure 8.1 Pole position.

Figure 8.2 Mean error of RL methods.

Figure 8.3 2‐DOF planar robot.

Figure 8.4 position regulation.

Figure 8.5 Control actions of the cart‐pole balancing system after 10 seconds.

Figure 8.6 Pole position.

Figure 8.7 Total cumulative reward curve.

Figure 8.8 Control input.

Figure 8.9‐function learning curves for .

Figure 8.10 ISE comparisons.

Figure 8.11 Joint position tracking.

Figure 8.12 Total cumulative reward.

Figure 8.13‐function learning curves for .

Figure 9.1 Control methods of redundant robots.

Figure 9.2 One hidden layer feedforward network.

Figure 9.3 RL control scheme.

Figure 9.4 Position tracking of simulations.

Figure 9.5 MARL Learning curve.

Figure 9.6 Total reward curve.

Figure 9.7 Position tracking of experiments.

Figure 10.1 Tracking results.

Figure 10.2 Mean squared error.

Figure 10.3 Learning curves.

Figure 10.4 Convergence of kernel matrices and .

Figure 10.5 Tracking results.

Figure 10.6 Mean squared error.

Figure 10.7 Learning curves.

Figure 10.8 Convergence of kernel matrices and

List of Tables

Table 4.1 Model‐free controllers gains.

Table 4.2 2‐DOF pan and tilt robot control gains.

Table 4.3 Control gains for the 4‐DOF exoskeleton

Table 5.1 Controller gains for the pan and tilt robot

Table 5.2 Controller gains for the exoskeleton

Table 6.1 Learning parameters

Table 6.2 Controllers gains

Table 8.1 Learning parameters DT RL: cart pole system

Table 8.2 Learning parameters DT RL: 2‐DOF robot

Table 8.3 Learning parameters CT RL: cart pole system

Table 8.4 Learning parameters CT RL: 2‐DOF robot

Table 9.1 PID control gains

Table 10.1 Parameters of the neural RL

Table A.1 Denavit‐Hartenberg parameters of the pan and tilt robot

Table A.2 Kinematic parameters of the exoskeleton

Table A.4 2‐DOF pan and tilt robot kinematic and dynamic parameters

Preface

Robot control is a topic of interest for the development of control theory and applications. The main theoretical contributions use the linear and non‐linear methods, such that the robot is capable of performing some specific tasks. Robot interaction control is a growing topic for research and industrial applications. The main goal of any robot‐interaction control scheme is to achieve a desired performance between the robot and the environment with safe and precise movements. The environment can be any material or system exogenous to the robot, e.g., a human. The robot‐interaction controller can be designed for position, force, or both.

Recently reinforcement learning techniques have been applied for optimal and robust control, through the use of dynamic programming theory. They do not require system dynamics and are capable of internal and external changes.

From 2013, the authors started to study human‐robot interaction control with intelligent techniques, such as neural networks and fuzzy system. In 2016, the authors put more of their attention on how to solve the human‐robot interaction with reinforcement learning. After four years of work, they present their results on model‐based and model‐free impedance and admittance control, in both joint space and task space. The human‐in‐the‐loop control is analyzed. The model‐free optimal robot‐interaction control and the design of position/force control using reinforcement learning are discussed. Reinforcement learning methods are studied in large discrete‐time space and continuous‐time space. For the redundant robots control, we use multi‐agent reinforcement learning to solve it. The convergence property of the reinforcement learning is analyzed. The robust human‐robot interaction control under the worst‐case uncertainty is transformed into the problem. The optimal controller is designed and realized by reinforcement learning and neural networks.

We assume the readers are familiar with some applications of robot interaction control, using classical and advanced controllers. We will further develop the systematic analysis for system identification, model‐based, and model‐free robot interaction controllers. The book is aimed at graduate students, as well as the practitioner engineer. The prerequisites for this book are: robot control, nonlinear systems analysis, in particular Lyapunov approach, neural networks, optimization techniques, and machine learning. The book is useful for a large number of researchers and engineers interested in robot and control.

Many people have contributed to shape this book. The first author wants to thank the financial support of CONACYT under Grant CONACyT‐A1‐S‐8216 and CINVESTAV under Grant SEP‐CINVESTAV‐62 and Grant CNR‐CINVESTAV; he also thanks the time and dedication of his wife, Xiaoou. Without her this book would have not been possible. The second author would like to express his sincere gratitude to his advisor Prof. Wen Yu for the continuous support of his Ph.D. study and research and for his patience, motivation, enthusiasm, and immense knowledge. His guidance helped him throughout his research and writing of this book. Also, he would like to thank Prof. Alberto Soria, Prof. Rubén Garrido, Ing. José de Jesús Meza. Last, but not least, the second author thanks the time and dedication of his parents, Adolfo and Graciela. Without them, this book would have not been possible.

Mexico

Wen Yu, Adolfo Perrusquía

Part IHuman‐robot Interaction Control

 

1Introduction

1.1 Human‐Robot Interaction Control

If we know the robot dynamics, we can use them to design model‐based controllers (See Figure 1.1). The famous linear controllers are: Proportional‐Derivative (PD) [1], linear quadratic regulator (LQR), and Proportional‐Integral‐Derivative (PID) [2]. They use linear system theory, so the robot dynamics are required to be linearized at some point of operation. The LQR [3–5] control has been used as a basis for the design of reinforcement learning approaches [6].

The classic controllers use complete or partial knowledge of the robot's dynamics. In these cases (without considering disturbances), it is possible to design controllers that guarantee perfect tracking performance. By using the compensation or the pre‐compensation techniques, the robot dynamics is canceled and establishes a simpler desired dynamics [7–9]. The control schemes with model compensation or pre‐compensation in joint space can be seen in Figure 1.2. Here is the desired reference, is the robot's joint position, is the joint error, is the compensator or pre‐compensator of the dynamics, is the control coming from the controller, and is the control torque. A typical model‐compensation control is the proportional‐derivative (PD) controller with gravity compensation, which helps to decrease the steady‐state error caused by the gravity terms of the robot dynamics.

When we do not have exact knowledge of the dynamics, it is not possible to design the previous controllers. Therefore, we need to use model‐free controllers. Some famous controllers are: PID control [10, 11], sliding mode control [2, 12], and neural control [13]. These controllers are tuned according to specific plant under certain conditions (disturbances, friction, parameters). When new conditions arise, the controllers do not display the same behavior, even reaching instability. Model‐free controllers perform well for different tasks and are relatively easy to tune; however, they cannot guarantee an optimal performance and require re‐tuning the control gains when the robot parameter are changed or a disturbance is applied.

Figure 1.1 Classic robot control

Figure 1.2 Model compensation control

All the above controllers are designed for position control and do not consider interaction with the environment. There is a great diversity of works related to the interaction, such as stiffness control, force control, hybrid control, and impedance control [14]. The force control regulates the interaction force using P (stiffness control), PD, and PID force controllers [15]. The position control can also use force control to perform position and velocity tracking [16, 17] (see Figure 1.3). Here is the desired force, is the contact force, is the force error, is the output of the force controller, and is the position error in task space. The force/position control uses the force for the compensation [17]. It can also use full dynamics to linearize the closed‐loop system for perfect tracking [18].

Figure 1.3 Position/force control

Impedance control [7] addresses the problem of how to move the robot end‐effector when it is in contact with the external environment. It uses a desired dynamic model, also known as mechanical impedance, to design the control. The simplest impedance control is the stiffness control, where the stiffness of the robot and the environment have a proportional interaction [19].

Traditional impedance control linearizes the system by assuming that the robot model is known exactly [20–22]. These algorithms need the strong assumption that the exact robot dynamics are known [23]. The robustness of the control lies in the compensation of the model.

Most impedance controllers assume that the desired inertia of the impedance model is equal to the robot inertia. Thus, we only have the stiffness and damping terms, which is equivalent to a PD control law [8, 21, 24]. One way to solve the inaccuracy of dynamic model compensation is through the use of adaptive algorithms, neural networks, or other intelligent methods [9, 25–31]

There are several implementations of impedance control. In [32], the impedance control uses human characteristics to obtain the inertia, damping, and stiffness components of the desired impedance. For the position control a PID control is used, which favors the omission of the model compensation. Another way to avoid the use of the model or to proceed without its full knowledge is to take advantage of system characteristics, that is, the high gear‐ratio velocity reduction that causes the non‐linear elements to become very small and the system to become decoupled [33].

In mechanical systems, particularly in the haptic field, the admittance is the dynamic mapping from force to motion. The input force “admits” certain amount of movement [11]. The position control based on impedance or admittance needs the inverse impedance model to obtain the reference position [34–38]. This type of scheme is more complete because there is a double control loop where the interaction with the environment can be used more directly.

The applications of impedance/admittance control are quite wide; for example, exoskeletons are used by a human operator. In order to maintain human safety, low mechanical impedance is required, while tracking control requires high impedance to reject the disturbances. So there are different solutions such as frequency molding and the reduction of mechanical impedance using the poles and zeros of the system [39, 40].

Model‐based impedance/admittance control is sensitive to modeling error. There exist several modifications to the classical impedance/admittance controllers, such as the position‐based impedance control, which improves robustness in the presence of modeling error using an internal position control loop [21].

1.2 Reinforcement Learning for Control

Figure 1.4 shows the control scheme with reinforcement learning. The main difference with the model‐free controller in Figure 1.1 is that the reinforcement learning updates its value in each step using the tracking error and control torque.

The reinforcement learning schemes are first designed for discrete‐time systems with discrete input space [6, 41]. Among the most famous methods are Monte Carlo [42], Q‐learning [43], Sarsa [44], and critic algorithms [45].

If the input space is large or continuous, the classical reinforcement learning algorithms cannot be directly implemented due to the computational cost, and in most cases the algorithm would not converge to a solution [41, 46]. This problem is known as the curse of dimensionality of machine learning. For robot control, the curse of dimensionality increases because there are various degrees of freedom (DOFs), and each DOF needs its own input space [47, 48]. Another factor that makes the dimension problem more acute is the disturbances, because new states and controls must be considered.

To solve the curse of dimensionality, the model‐based techniques can be applied to the reinforcement learning [49–51]. These learning methods are very popular; some algorithms are called “policy search” [52–59]. However, these methods require model knowledge to decrease the dimension of the input space.

There are a wide variety of model‐free algorithms similar to the discrete‐time algorithms. The main idea of these algorithms is to design adequate reward and approximators, which reduces the computational cost in presence of a large or continuous input space.

The simplest approximator to decrease the input space is the handcraft methods [60–65]. They speed up the learning time by looking for regions where the reward is minimized/maximized. [66, 67] use learning methods from input data, similarly to discrete‐time learning algorithms, but the learning time increases. Other techniques are based on previously established actions in a sequential and related way; that is, the actions that must be taken at each time instant are defined to do a simple task by themselves [68–72]. The main problems of these methods require an expert knowledge to obtain the best regions and to set the predefined actions.

Figure 1.4 Reinforcement learning for control

A linear combination of approximators learn from input data without expert intervention. The most widely used approximators in robot control are inspired by human morphology [73, 74], neural networks [75–77], local models [74, 78], and Gaussian regression processes [79–82]. The success of these approximators is due to the adequate choice of their parameters and hyper‐parameters.

A poor reward design can involve a long learning time, convergence to wrong solutions, or the algorithms never converging to any solution. On the other hand, the proper design of a reward helps the algorithm to find the best solutions in each moment of time in a faster way. This problem is known as the “curse of the reward design” [83].

When model‐free methods are used, the reward should be designed in such a way that it adapts to changes in the system and possible errors, which is extremely useful in robust control problems where the controller is required to be able to compensate the disturbances or limit disturbances to obtain an optimal performance.

1.3 Structure of the Book

The book consists of two principal parts:

The first part relates to the design of

human‐robot interaction control

in different environments (Chapters 2, 3, 4 and 5).

The second part deals with

reinforcement learning for robot interaction control

(Chapters 6, 7, 8, 9 and 10).

Part 1

Chapter 2: We address some important concepts for robot interaction control in a mechanical and electrical sense. The concepts of impedance and admittance play important roles for the design of robot interaction control and the environment modeling. The typical environment models and some of the most famous identification techniques for parameters estimation of the environment models are introduced.

Chapter 3: We discuss our first robot‐interaction schemes using impedance and admittance controllers. The classical controllers are based on the design of feedback‐linearization control laws. The closed‐loop dynamics is reduced to a desired dynamics based on the proposed impedance model, which is designed as a second‐order linear system. Precision and robustness problems are explained in detail for classical impedance and admittance control. The applicability of these controllers is illustrated by simulations in two different environments.

Chapter 4: We study some model‐free controllers that do not need complete knowledge of the robot dynamics. The model‐free controllers are designed for an admittance control scheme. The interaction is controlled by the admittance model, while the position controller uses the adaptive control, PID control, or sliding mode control. Stabilities of these controllers are given via Lyapunov stability theory. The applicability of these algorithms is proven via simulations and experiments using different environments and robots.

Chapter 5: We give new robot interaction control scheme known as human‐in‐the‐loop control. Here the environment is the human operator. The human has no contact with the robot. This method uses the input forces/torques of the human operator and maps them into position/orientations of the end‐effector via the admittance model. Since the human is in the control loop, she does not know if the applied force/torque yields to singular positions, so it is dangerous for real applications. Therefore, the admittance controllers of the previous chapters are modified to avoid the inverse kinematics, and the Jacobian matrix is modified by using the Euler angles. Experiments illustrate the effectiveness of the approach in both joint and task spaces.

Part 2

Chapter 6: The previous chapters use the desired impedance/admittance model to achieve the desired robot‐environment interaction. In most cases, these interactions do not have the optimal performance, they have relative high contact forces or high position errors because they require the environment and robot dynamics. This chapter deals with the reinforcement learning approach for the position/force control in discrete time. The reinforcement learning techniques can achieve a sub‐optimal robot‐environment interaction.