111,99 €
Comprehensive guide on learning automata, introducing two variants to accelerate convergence and computational update speed Learning Automata and Their Applications to Intelligent Systems provides a comprehensive guide on learning automata from the perspective of principles, algorithms, improvement directions, and applications. The text introduces two variants to accelerate the convergence speed and computational update speed, respectively; these two examples demonstrate how to design new learning automata for a specific field from the aspect of algorithm design to give full play to the advantage of learning automata. As noisy optimization problems exist widely in various intelligent systems, this book elaborates on how to employ learning automata to solve noisy optimization problems from the perspective of algorithm design and application. The existing and most representative applications of learning automata include classification, clustering, game, knapsack, network, optimization, ranking, and scheduling. They are well-discussed. Future research directions to promote an intelligent system are suggested. Written by two highly qualified academics with significant experience in the field, Learning Automata and Their Applications to Intelligent Systems covers such topics as: * Mathematical analysis of the behavior of learning automata, along with suitable learning algorithms * Two application-oriented learning automata: one to discover and track spatiotemporal event patterns, and the other to solve stochastic searching on a line * Demonstrations of two pioneering variants of Optimal Computing Budge Allocation (OCBA) methods and how to combine learning automata with ordinal optimization * How to achieve significantly faster convergence and higher accuracy than classical pursuit schemes via lower computational complexity of updating the state probability A timely text in a rapidly developing field, Learning Automata and Their Applications to Intelligent Systems is an essential resource for researchers in machine learning, engineering, operation, and management. The book is also highly suitable for graduate level courses on machine learning, soft computing, reinforcement learning and stochastic optimization.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 372
Veröffentlichungsjahr: 2024
Cover
Table of Contents
Title Page
Copyright
About the Authors
Preface
Acknowledgments
A Guide to Reading this Book
Organization of the Book
1 Introduction
1.1 Ranking and Selection in Noisy Optimization
1.2 Learning Automata and Ordinal Optimization
1.3 Exercises
References
2 Learning Automata
2.1 Environment and Automaton
2.2 Fixed Structure Learning Automata
2.3 Variable Structure Learning Automata
2.4 Summary
2.5 Exercises
References
3 Fast Learning Automata
3.1 Last‐position Elimination‐based Learning Automata
3.2 Fast Discretized Pursuit Learning Automata
3.3 Exercises
References
4 Application‐Oriented Learning Automata
4.1 Discovering and Tracking Spatiotemporal Event Patterns
4.2 Stochastic Searching on the Line
4.3 Fast Adaptive Search on the Line in Dual Environments
4.4 Exercises
References
5 Ordinal Optimization
5.1 Optimal Computing‐Budget Allocation
5.2 Optimal Computing‐Budget Allocation for Selection of Best and Worst Designs
5.3 Optimal Computing‐Budget Allocation for Subset Ranking
5.4 Exercises
References
6 Incorporation of Ordinal Optimization into Learning Automata
6.1 Background and Motivation
6.2 Learning Automata with Optimal Computing Budget Allocation
6.3 Proof of Optimality
6.4 Simulation Studies
6.5 Summary
6.6 Exercises
References
7 Noisy Optimization Applications
7.1 Background and Motivation
7.2 Particle Swarm Optimization
7.3 Resampling for Noisy Optimization Problems
7.4 PSO‐Based LA and OCBA
7.5 Simulations Studies
7.6 Summary
7.7 Exercises
References
8 Applications and Future Research Directions of Learning Automata
8.1 Summary of Existing Applications
8.2 Future Research Directions
8.3 Exercises
References
Index
End User License Agreement
Chapter 3
Table 3.1 Accuracy (number of correct convergences/number of experiments)....
Table 3.2 Accuracy (number of correct convergences/number of experiments)....
Table 3.3 The average number of iterations required for convergence(‘Para’ ...
Table 3.4 Notations.
Table 3.5 Reward probability.
Table 3.6 The average number of iterations required for convergence(‘’ den...
Table 3.7 The average computational time of contenders in seconds ( is the...
Table 3.8 The average computational time of three phases of in seconds (
Table 3.9 Increasing accuracy with decreasing step size.
Chapter 4
Table 4.1 Capability under different environments of all SPL algorithms.
Table 4.2 HSSL decision table and SHSSL informative table to choose next se...
Table 4.3 The deceptive table of SHSSL to choose next search interval in di...
Table 4.4 SHSSL and HSSL's mean estimate value under different conditions o...
Table 4.5 Decision table of original ASS as well as SASSB in informative en...
Table 4.6 The decision table of SASSB in deceptive environment.
Chapter 5
Table 5.1 Numerical evidence on the rate of convergence in Environment 5.1 ...
Table 5.2 of other procedures when OCBAbw reaches 99% in Environment 5.1....
Table 5.3 Numerical evidence on the rate of convergence in Environment 5.2 ...
Table 5.4 of other procedures when OCBAbw reaches 99% in Environment 5.2....
Table 5.5 Numerical evidence on the rate of convergence in Environment 5.3 ...
Table 5.6 Numerical evidence on the rate of convergence in Environment 5.4 ...
Table 5.7 of other procedures when OCBAbw reaches 99% in Environment 5.3....
Table 5.8 Numerical evidence on the rate of convergence in Environment 5.5 ...
Table 5.9 Numerical evidence on the rate of convergence in Environment 5.6 ...
Table 5.10 Numerical evidence on the rate of convergence in Environment 5.7...
Table 5.11 The percentage of the improvement of OCBAbw over the other four ...
Table 5.12 Different numbers of classes in real situations.
Table 5.13 FCS of procedures when that of mp reaches 95%.
Chapter 6
Table 6.1 Reward probability.
Table 6.2 The convergence speed of ...
Table 6.3 The convergence speed of contenders (“” denotes the best paramet...
Table 6.4 Convergence speedup rate of compared with contenders.
Chapter 7
Table 7.1 Parameter values.
Chapter 1
Figure 1.1 3‐D map of a Sphere function with additive and multiplicative noi...
Figure 1.2 3‐D map of a Different Powers function with additive and multipli...
Chapter 2
Figure 2.1 An environment.
Figure 2.2 An automaton.
Figure 2.3 The deterministic state transition graph.
Figure 2.4 The deterministic output graph.
Figure 2.5 The stochastic state transition graph.
Figure 2.6 The stochastic output graph.
Figure 2.7 The state transition graph of Tsetlin LA.
Figure 2.8 The state transition graph of Krinsky LA.
Figure 2.9 The state transition graph of krylov LA.
Figure 2.10 The state transition graph of IJA LA. (a) represent favorable ...
Figure 2.11 The classification of LA.
Chapter 3
Figure 3.1 Mean of the Action probability of the optimal action versus . ...
Figure 3.2 Variance coefficiency ...
Figure 3.3 ...
Figure 3.4 ...
Figure 3.5 ...
Figure 3.6 ...
Figure 3.7 Computational time for each iteration in seconds. (a) ...
Chapter 4
Figure 4.1 State transition map and the output function of an STPLA
Figure 4.2 State transition map and the output function of an STP‐TFSLA.
Figure 4.3 The state transition principle of ...
Figure 4.4 State transition map and the output function of an AT‐STPLA. Note...
Figure 4.5 The state transition principle of with parameter . R means a...
Figure 4.6 The mean alert probability under different omission error probabi...
Figure 4.7 The mean alert probability under different inclusion error probab...
Figure 4.8 Respective performances of AT‐STPLA and STP‐TFSLA adopting differ...
Figure 4.9 Respective performances of AT‐STPLA and STP‐TFSLA adopting differ...
Figure 4.10 Respective performances of AT‐STPLA and STP‐TFSLA adopting diffe...
Figure 4.11 Respective performances of AT‐STPLA and STP‐TFSLA adopting diffe...
Figure 4.12 Respective performances of AT‐STPLA and STP‐TFSLA adopting diffe...
Figure 4.13 Respective performances of AT‐STPLA and STP‐TFSLA adopting diffe...
Figure 4.14 Original HSSL's search structure is a binary tree. The search sp...
Figure 4.15 Symmetrical HSSL's search structure. The search space when .
Figure 4.16 The learning characteristics under opposite value of ( and )...
Figure 4.17 The learning characteristics under opposite value of ( and )...
Figure 4.18 The learning characteristics under opposite value of ( and )...
Figure 4.19 The learning characteristics under opposite value of ( and )...
Figure 4.20 The learning characteristics under opposite value of ( and )...
Figure 4.21 The learning characteristics under opposite value of ( and )...
Figure 4.22 Eight possibilities in the 3‐bits memory.
Figure 4.23 Search structure of SASSB.
Figure 4.24 Search structure without buffer area.
Figure 4.25 Case of .
Figure 4.26 Case of .
Figure 4.27 Case of ...
Figure 4.28 Case of .
Figure 4.29 Case of .
Figure 4.30 Case of .
Chapter 5
Figure 5.1 All designs except and approximate .
Figure 5.2 All designs except and approximate .
Figure 5.3 versus when five different allocation procedures are used for...
Figure 5.4 versus when five different allocation procedures are used for...
Figure 5.5 versus when five different allocation procedures are used for...
Figure 5.6 versus when five different allocation procedures are used for...
Figure 5.7 versus when five different allocation procedures are used for...
Figure 5.8 versus when five different allocation procedures are used for...
Figure 5.9 versus when five different allocation procedures are used for...
Figure 5.10 FCS versus with different procedures in Environment 5.5.
Figure 5.11 FCS versus with different procedures in Environment 5.6.
Figure 5.12 FCS versus with different procedures in Environment 5.7.
Figure 5.13 FCS versus with different procedures in Environment 5.8.
Figure 5.14 FCS versus with different procedures in Environment 5.9.
Figure 5.15 FCS versus with different procedures in Environment 5.10.
Figure 5.16 FCS versus with different procedures in Environment 5.11.
Figure 5.17 FCS versus with different procedures in Environment 5.12.
Chapter 6
Figure 6.1 LA and illustrated in the framework of the learning model [15, ...
Figure 6.2 Mean of the action probability of the optimal action versus . ...
Figure 6.3 Mean of the versus . (a) , (b) , (c) , (d) , and (e) .
Chapter 7
Figure 7.1 The comparison of fitness values (left axis) of different subset ...
Figure 7.2 The comparison of fitness values (left axis) of different subset ...
Figure 7.3 The comparison of fitness values (left axis) of different subset ...
Figure 7.4 The comparison of fitness values (left axis) of different subset ...
Figure 7.5 The comparison of fitness values (left axis) of different subset ...
Figure 7.6 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO‐...
Figure 7.7 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO‐...
Figure 7.8 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO‐...
Figure 7.9 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO‐...
Figure 7.10 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.11 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.12 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.13 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.14 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.15 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.16 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.17 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.18 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.19 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.20 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.21 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.22 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.23 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.24 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Figure 7.25 The comparison of fitness values (left axis) of PSO, PSO‐ER, PSO...
Cover
Table of Contents
Series Page
Title Page
Copyright
About the Authors
Preface
Acknowledgments
A Guide to Reading this Book
Organization of the Book
Begin Reading
Index
End User License Agreement
ii
iii
iv
ix
xi
xii
xiii
xiv
xv
xvi
xvii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
IEEE Press
445 Hoes Lane
Piscataway, NJ 08854
IEEE Press Editorial Board
Sarah Spurgeon, Editor in Chief
Jón Atli Benediktsson
Anjan Bose
James Duncan
Amin Moeness
Desineni Subbaram Naidu
Behzad Razavi
Jim Lyke
Hai Li
Brian Johnson
Jeffrey Reed
Diomidis Spinellis
Adam Drobot
Tom Robertazzi
Ahmet Murat Tekalp
JunQi Zhang
Tongji University, Shanghai, China
MengChu Zhou
New Jersey Institute of Technology, New Jersey, USA
Copyright © 2024 by The Institute of Electrical and Electronics Engineers, Inc.
All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per‐copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750‐8400, fax (978) 750‐4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748‐6011, fax (201) 748‐6008, or online at http://www.wiley.com/go/permission.
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762‐2974, outside the United States at (317) 572‐3993 or fax (317) 572‐4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging‐in‐Publication Data Applied for:
Hardback ISBN: 9781394188499
Cover Design: Wiley
Cover Image: © Vertigo3d/Getty Images
JunQi Zhang received the PhD degree in computing science from Fudan University, Shanghai, China, in 2007. He became a post‐doctoral research fellow and a lecturer in the Key Laboratory of Machine Perception, Ministry of Education in Computer Science, Peking University, Beijing, China, in 2007. He is currently a full professor in the Department of Computer Science and Technology, Tongji University, Shanghai. His current research interests include intelligent and learning automata, reinforcement machine learning, particle swarm optimization, firework algorithm, big data and high‐dimensional index, and multimedia data management. He has authored 20+ IEEE Transactions and 60+ conference papers in the above areas. Prof. Zhang was a recipient of the Outstanding Post‐Doctoral Award from Peking University.
MengChu Zhou received his BS degree in control engineering from the Nanjing University of Science and Technology, China, in 1983; MS degree in automatic control from the Beijing Institute of Technology, China, in 1986; and PhD degree in computer and systems engineering from Rensselaer Polytechnic Institute, USA, in 1990. He then joined the New Jersey Institute of Technology in 1990 and has been a distinguished professor since 2013. His interests are in intelligent automation, robotics, Petri nets, and AI. He has over 1100 publications including 14 books, 600+ IEEE Transactions papers, and 31 patents. He is a recipient of Humboldt Research Award for US Senior Scientists from Alexander von Humboldt Foundation; Franklin V. Taylor Memorial Award and Norbert Wiener Award from IEEE Systems, Man, and Cybernetics Society; and Edison Patent Award from the Research & Development Council of New Jersey. He is a fellow of IEEE, IFAC, AAAS, CAA, and NAI.
Stochastic ranking and selection aim to design statistical procedures that select a candidate with the highest mean performance from a finite set of candidates whose performance is uncertain but may be estimated by a learning process based on interactions with a stochastic environment or by simulations. The number of interactions with environments or simulations of candidates is usually limited and needs to be minimized due to limited computational resources. Traditional approaches taken in the literature include frequentist statistics, Bayesian statistics, heuristics, and asymptotic convergence in probability. Novel and recent approaches to stochastic ranking and selection problems are learning automata and ordinal optimization.
Surprisingly, none introduces or studies learning automata and ordinal optimization in the same one book as if these two techniques have no relevance at all. A learning automaton is a powerful tool for reinforcement learning and aims at learning the optimal action that maximizes the probability of being rewarded out of a set of allowable actions by the interaction with a stochastic environment. An update scheme of its state probability vector of actions is critical for learning automata. Its action probability vector plays two roles: (i) deciding when it converges, which is highly related to its used total computing budget, and (ii) allocating computing budget among actions to identify the optimal one, where only ordinal optimization is required. Ordinal optimization has emerged as an efficient technique for simulation optimization. Its underlying philosophy is to obtain good estimate of the optimal action or design while the accuracy of an estimate need not be that high. It is much faster for selecting an outstanding action or design in many practical situations if our goal is to find the best one rather than an accurate performance value of the best one. Therefore, learning automata and ordinal optimization share the common objective and the latter can provide efficient methods as an update scheme of the state probability vector of actions for learning automata.
This book introduces and combines learning automata and ordinal optimization to solve stochastic ranking and selection problems for the first time. This book may serve as a reference for those in the field and as a means for those new to the field for understanding and applying the main reinforcement learning and intelligent optimization approaches to the problems of their interest. This book is both a research monograph for the intended audience including researchers and practitioners, and a main reference book for a first‐year graduate course for the graduate students, in the fields of business, engineering, management science, operation management, stochastic control, economics, and computer science. Only basic knowledge of probability and an undergraduate background in the above‐mentioned majors are needed for understanding this book's materials.
JunQi Zhang
Shanghai, China
MengChu Zhou
New Jersey, USA and Hangzhou, China
From the first author of this book:
I would like to thank all the people who have contributed to this book and the research team at Tongji University for their full dedication and quality research. In particular, I would like to acknowledge the following individuals.
First, I would like to express my great appreciation to this book's co‐author, Professor MengChu Zhou, for his inspirational advice and insightful suggestions to help strengthen the visions and concepts of this book.
I would like to thank the significant help from my students Drs. Huan Liu and DuanWei Wu, Mr. Peng Zu, Mr. YeHao Lu, and Mr. YunZhe Wu and for content and material preparations, as well as the research outcomes.
I would like to appreciate the Wiley‐IEEE Press for providing the opportunity to publish this book and the esteemed editor and anonymous reviewers for reviewing our work. Special thanks are given to Jayashree Saishankar, Managing Editor at Wiley‐IEEE, and Victoria Bradshaw, Senior Editorial Assistant, who kindly and patiently helped us move smoothly during our book writing and preparation period.
I would like to acknowledge the support from Innovation Program of Shanghai Municipal Education Commission (202101070007E00098), Shanghai Industrial Collaborative Science and Technology Innovation Project (2021‐cyxt2‐kj10), Shanghai Municipal Science and Technology Major Project (2021SHZDZX0100), the Fundamental Research Funds for the Central Universities, the National Natural Science Foundation of China (51775385, 61703279, 62073244, 61876218), and the Shanghai Innovation Action Plan under grant no. 20511100500.
Finally, I truly appreciate the continuous support and endless love from my family, especially from my wife and two children, who always stay with me and make my life colorful and happy.
From the second author of this book:
Numerous collaborations have been behind this book and its related work. It would be impossible to reach this status without the following collaborators, some of whom are already mentioned in the first author's message.
I would like to thank Professors Naiqi Wu (Fellow of IEEE, Macau Institute of Systems Engineering, Macau University of Science and Technology, China), Zhiwu Li (Fellow of IEEE, Macau Institute of Systems Engineering, Macau University of Science and Technology, China), Maria Pia Fanti (Fellow of IEEE, Dipartimento di Elettrotecnica ed Elettronica, Polytechnic of Bari, Italy), Giancarlo Fortino (Fellow of IEEE, Department of Computer Science, Modeling, Electronics and Systems Engineering (DIMES), University of Calabria, Italy), and Keyi Xing (Systems Engineering Institute, Xi'an Jiaotong University, China).
I have enjoyed the great support and love from my family for long. It would be impossible to accomplish this book and many other achievements without their support and love. The work presented in this book was in part supported by FDCT (Fundo para o Desenvolvimento das Ciencias e da Tecnologia) under Grant No. 0047/2021/A1, and Lam Research Corporation through its Unlock Ideas program.
This book is written for senior students, graduate students, researchers, and practitioners in relevant machine learning fields. This book can be a reference book for those studying computer science/engineering, computational intelligence, machine learning, artificial intelligence, systems engineering, industrial engineering, and any field that deals with the optimal design and operation of complex systems with noisy disturbance to apply learning automata to their problem solving. The readers are assumed to have a background of computer programming and discrete mathematics including set theory and predicate logic.
If readers have knowledge of formalization, optimization, and machine learning, it should be easy for them to understand our presented algorithms and problem formalizations. Readers may choose to learn the ideas and concepts of learning automata. Readers may focus on the novel ideas and their related algorithms that are useful for their research and particular applications. If you do not like the ordinal optimization, you may choose to skip it and focus on the learning automata. The following chart should help readers better obtain what they need to read based on what they would learn from this book.
To overview the contents, this book first introduces the basic concept of learning automata in Chapter 2. Two pioneering and improved variants of learning automata from the perspectives of convergence and computational cost, respectively, are presented in Chapter 3. Two application‐oriented learning automata are given Chapter 4. One discovers and tracks spatiotemporal event patterns. The other solves the problem of stochastic search on a line. Ordinal optimization is another method to solve stochastic ranking and selection problems and is introduced in Chapter 5 through demonstrations of two pioneering variants of optimal computing budget allocation (OCBA). Chapter 6 incorporates learning automata with ordinal optimization to further improve their convergence performance. The role of ordinal optimization is separated from the action probability vector in learning automata. Then, as a pioneering ordinal optimization method, OCBA is introduced into learning automata to allocate the computing budget to actions in a way that maximizes the probability of selecting the true optimal action. The differences and relationships between learning automata and ordinal optimization can be well understood through this chapter. Chapter 7 demonstrates an example to show how both learning automata and OCBA can be used in noisy optimization. Finally, Chapter 8 summarizes the existing applications of learning automata and suggests their future research directions.
It is an expensive process to rank and select the best one from many complex discrete event dynamic systems that are computationally intensive to simulate. Therefore, learning the optimal system, action, alternative, candidate, or design is a classical problem and has many applications in the areas of intelligent system design, statistics and stochastic simulation, machine learning, and artificial intelligence.
Stochastic ranking and selection of given system designs can be treated as a simulation optimization problem. Its solution requires one to design statistical procedures that can select the one with the highest mean performance from a finite set of systems, alternatives, candidates, or designs. Their mean values are unknown and can be estimated only by statistical sampling, because their reward from environments is not deterministic but stochastic due to unknown noise. It is a classical problem in the areas of statistics and stochastic simulation.
Example 1.1 Finding the most effective drug or treatment from many different alternatives where the economic cost of each sample for testing the effectiveness of the drug is very expensive and risky. Another representative example is the archery competition. How can we select the real champion while using the least number of arrows?
Noise comes mainly from three kinds of uncertainties [1]. (i) Environmental uncertainties: operating temperature, pressure, humidity, changing material properties and drift, etc. are a common kind of uncertainties. (ii) Design parameter uncertainties: the design parameters of a product can only be realized to a certain degree of accuracy because high precision machinery is expensive. (iii) Evaluation uncertainties: the uncertainty happens in the evaluation of the system output and the system performance including measuring errors and all kinds of approximation errors if models instead of the real physical objects are used.
Real‐world optimization problems are often subject to uncertainty as variables can be affected by imprecise measurements or just corrupted by other factors such as communication errors. In either case, uncertainty is an inherent characteristic of many such problems and therefore needs to be considered when tailoring metaheuristics to find good solutions. Noise is a class of uncertainty and corrupts the objective values of solutions at each evaluation [2]. Noise has been shown to significantly deteriorate the performance of different metaheuristics such as Particle Swarm Optimizer (PSO), Genetic Algorithms (GA), Evolutionary Strategies (ES), Differential Evolution (DE), and other metaheuristics.
Example 1.2Ranking and selection are the critical part in PSO. First, each particle has to compare the fitness of its new position to its previous best and retain the better one. Second, the overall best solution found so far has to be determined as a global best solution to lead the swarm flying and refining the accuracy. Since PSO has a memory that stores the estimated personal best solution of a particle and the estimated global best solution of a swarm, noise leads such memory to be inaccurate over iterations and particles eventually fail to rank and select good solutions from bad ones, which drives the particle swarm toward a wrong direction. Concretely, a noisy fitness function induces noisy fitness evaluations and causes two types of undesirable selection behavior [3]: (i) A superior candidate may be erroneously believed to be inferior, causing it to be eliminated; (ii) An inferior candidate may be erroneously believed to be superior, causing it to survive and reproduce. These behaviors in turn cause the following undesirable effects: (i) The learning rate is reduced. (ii) The system does not retain what it has learnt. (iii) Exploitation is limited. (iv) Fitness does not monotonically improve with generation even with elitism.
Uncertainty has to be taken into account in many real‐world optimization problems. For example, in the engineering field, the signal returned from the real world usually includes a significant amount of noise due to the measure error or various uncertainties. It is usually modeled as sampling noise from a Gaussian distribution [4]. Therefore, the noise can be characterized by its standard deviation and classified into additive and multiplicative ones. Its impact to function can be expressed as:
where represents the real fitness value of solution and are illusory fitness values of solution in additive and multiplicative noisy environments, respectively. It is worth noting that they are identical in the environment where . It is obvious that multiplicative noise is much more challenging, since it can bring much larger disturbance to fitness values than additive noise.
Example 1.3 The 3‐D maps of functions [5] with additive and multiplicative noise with different are illustrated in Figs. 1.1 and 1.2. The figures show that the additional challenge of multiplicative noise over additive noise is a larger corruption of the objective values whose magnitude changes across the search space proportionally to the objective values of the solutions. For multiplicative noise, its impact depends on the optimization objective and the range of objective space. Specifically, on minimization problems whose objective space is only positive, the objective values of better solutions (whose fitness value is small) can be less affected by multiplicative noise. Conversely, in maximization problems, the objective values of better solutions can be more affected [2].
Figure 1.1 3‐D map of a Sphere function with additive and multiplicative noise. (a) True. (b) Corrupted by additive noise with . (c) Corrupted by multiplicative noise with .
Figure 1.2 3‐D map of a Different Powers function with additive and multiplicative noise. (a) True. (b) Corrupted by additive noise with . (c) Corrupted by multiplicative noise with .
Therefore, if the problem is subject to noise, the quality of the solutions deteriorates significantly. The basic resampling method uses many re‐evaluations to estimate the fitness of a candidate solution. Thus, the efficiency of resampling determines the accuracy of ranking and selection of the elite solutions, which are critical to intelligent optimization methods during their evolution to the optimal solutions. Yet the resampling costs too much. Considering a fixed and limited computational budget of function evaluations or environmental interactions, resampling methods better estimate the objective values of the solutions by performing fewer iterations. Intelligent allocation of resampling budget to candidate solutions can save many evaluations and improve the estimation of the solution fitness.
Ranking and selection procedures were developed in the 1950s for statistical selection problems such as choosing the best treatment for a medical condition [6]. The problem of selecting the best among a finite set of alternatives needs an efficient learning scheme given a noisy environment and limited simulation budget, where the best is defined with respect to the highest mean performance, and where the performance is uncertain but may be estimated via simulation. Approaches presented in the literature include frequentist statistics, Bayesian statistics, heuristics [7], and asymptotic convergence in probability [6]. This book focuses on learning automata and ordinal optimization methods to solve stochastic ranking and selection problems.
Investigation of a Learning Automaton (LA) began in the erstwhile Soviet Union with the work of Tsetlin [8, 9] and popularized as LA in a survey paper in 1974 [10]. These early models were referred to as deterministic and stochastic automata operating in random environments. Systems built with LA have been successfully employed in many difficult learning situations and the reinforcement learning represents a development closely related to the work on LA over the years. This has also led to the concept of LA being generalized in a number of directions in order to handle various learning problems.
An LA represents an important tool in the area of reinforcement learning and aims at learning the optimal action that maximizes the probability of being rewarded out of a set of allowable actions by the interaction with a random environment. During a cycle, an automaton chooses an action and then receives a stochastic response that can be either a reward or penalty from the environment. The action probability vector of choosing the next action is then updated by employing this response. The ability of learning how to choose the optimal action endows LA with high adaptability to the environment. Various LAs and their applications have been reviewed in survey papers [10], [11] and books [12–15].
Ordinal Optimization (OO) is introduced by Ho et al. [16]. There are two basic ideas behind it: (i) Estimating the order among solutions is much easier than estimating the absolute objective values of each solution. (ii) Softening the optimization goal and accepting good enough solutions leads to an exponential reduction in computational burden. The Optimal Computing Budget Allocation (OCBA) [17–22] is a famous approach that uses an average‐case analysis rather than a worst‐case bound of the indifference zone. It attempts to sequentially maximize the probability that the best alternative can be correctly identified after the next stage of sampling. Its procedure tends to require much less sampling effort to achieve the same or better empirical performance for correct selection than the procedures that are statistically more conservative.
Example 1.4 This example shows the objective of ordinal optimization. A system works with one of 10 components that have their own time‐to‐failure. The objective is to decide which component is the worst one to minimize steady‐state system unavailability given budget constraints. We have the budget to perform only 50 component tests to obtain if the tested component leads to a failure. How do we allocate these budget to these 10 components so as to identify the worst one is the objective of ordinal optimization.
The next example shows the reason of softening the optimization goal, i.e., reducing computational burden.
Example 1.5 This example shows the reason of softening the optimization goal and accepting good enough solutions leads to an exponential reduction in computational burden. A class consists of 50 students. The objective is to find the tallest one. However, we do not have to measure all student's accurate height. We can use some sorting algorithms like bubble sorting to identify the tallest student. In this way, we do not need all students' accurate heights. Furthermore, the order of the other students except the tallest one is not important and can be inaccurate. An ordinal optimization method can also be used to identify the tallest student.
What is the objective of stochastic ranking and selection?
What are the difficulties caused by noise in solving optimization problems?
What are the advantages and constraints of resampling in noisy optimization?
What is the objective of a learning automaton?
What are the basic ideas of ordinal optimization?
For additive and multiplicative noise, which one is more difficult to handle and why?
In
Example 1.5
., how to find the tallest student without their accurate height data?
1
H.‐G. Beyer and B. Sendhoff, “Robust optimization–a comprehensive survey,”
Computer Methods in Applied Mechanics and Engineering
, vol. 196, no. 33‐34, pp. 3190–3218, 2007.
2
J. Rada‐Vilela, “Population statistics for particle swarm optimization on problems subject to noise,” Ph.D. dissertation, 2014.
3
A. Di Pietro, L. While, and L. Barone, “Applying evolutionary algorithms to problems with noisy, time‐consuming fitness functions,” in
Proceedings of Congress on Evolutionary Computation
, vol. 2. IEEE, 2004, pp. 1254–1261.
4
Y. Jin and J. Branke, “Evolutionary optimization in uncertain environments‐a survey,”
IEEE Transactions on Evolutionary Computation
, vol. 9, no. 3, pp. 303–317, 2005.
5
J. Liang, B. Qu, and P. N. Suganthan, “Problem definitions and evaluation criteria for the CEC 2013 special session on real‐parameter optimization,” Zhengzhou University, Zhengzhou, China and Nanyang Technological University, Singapore, Technical Report, vol. 201212, 2013.
6
M. C. Fu, “
Handbook of simulation optimization
,”Springer, 2015, vol. 216.
7
Y. Jin, H. Wang, and C. Su, “Data‐driven evolutionary optimization integrating evolutionary computation,” Machine Learning and Data Science,Springer, Cham, Switzerland, 2021.
8
M. Tsetlin, “On the behavior of finite automata in random media,” Automation and Remote Control, pp. 1210–1219, 1961.
9
M. L. Tsetlin, “
Automaton theory and the modeling of biological systems
,” New York:Academic, 1973.
10
K. S. Narendra and M. A. L. Thathachar, “Learning automata: A survey,”
IEEE Transactions on Systems, Man, and Cybernetics
, vol. 4, pp. 323–334, 1974.
11
M. A. L. Thathachar and P. S. Sastry, “Varieties of learning automata: An overview,”
IEEE Transactions on Systems, Man, and Cybernetics
, vol. 32, pp. 711–722, 2002.
12
S. Lakshmivarahan, “
Learning algorithms theory and applications
,”New York:Springer‐Verlag, 1981.
13
K. S. Narendra and M. A. L. Thathachar, “
Learning automata: An introduction
,”Englewood Cliffs, NJ:Prentice‐Hall, 1989.
14
K. Najim and A. S. Poznyak, “
Learning automata: Theory and applications
,”New York:Pergamon, 1994.
15
A. S. Poznyak and K. Najim, “
Learning automata and stochastic optimization
,”New York:Springer, 1997.
16
Y. C. Ho, R. S. Sreenivas, and P. Vakili, “Ordinal optimization of discrete event dynamic systems,”
Discrete Event Dynamic Systems (DEDS)
, vol. 2, no. 2, pp. 61–88, 1992.
17
H.‐C. Chen, C.‐H. Chen, and E. Yücesan, “Computing efforts allocation for ordinal optimization and discrete event simulation,”
IEEE Transactions on Automatic Control
, vol. 45, no. 5, pp. 960–964, 2000.
18
C.‐H. Chen, J. Lin, E. Yücesan, and S. E. Chick, “Simulation budget allocation for further enhancing the efficiency of ordinal optimization,”
Discrete Event Dynamic Systems: Theory and Applications
, vol. 10, pp. 251–270, 2000.
19
Y. Ho, Q. Zhao, and Q. Jia, “
Ordinal optimization: Soft optimization for hard problems
,” New York: Springer, 2007.
20
J. Zhang, Z. Li, C. Wang, D. Zang, and M. Zhou, “Approximate simulation budget allocation for subset ranking,”
IEEE Transactions on Control Systems Technology
, vol. 25, no. 1, pp. 358–365, 2017.
21
J. Zhang, L. Zhang, C. Wang, and M. Zhou, “Approximately optimal computing budget allocation for selection of the best and worst designs,”
IEEE Transactions on Automatic Control
, vol. 62, no. 7, pp. 3249–3261, 2017.
22
J. Zhang, C. Wang, D. Zang, and M. Zhou, “Incorporation of optimal computing budget allocation for ordinal optimization into learning automata,”
IEEE Transactions on Automation Science and Engineering
, vol. 13, no. 2, pp. 1008–1017, 2016.
In order to simulate a biological learning process and achieve automatic learning of a machine, Tsetlin [1] et al., first proposed a novel mathematical model called Learning Automaton (LA). LA's performance is optimized by constantly interacting with random environments. Consequently, the optimal action can be selected from the set of alternative ones in the current environment. The optimal action is defined as the action with the highest environmental reward probability in the current environment. LA is one of the reinforcement learning algorithms and has the advantages of being simple and easy to implement, possessing fast random optimization and strong anti‐noise ability, and complete convergence. During its development and evolution over several decades, the convergence speed and accuracy of the LA algorithm have been greatly improved. In addition, LAs have also been applied to many application fields, such as graph coloring, random shortest path, distributed computing, wireless network spectrum allocation, image processing, and pattern recognition.
An LA consists of two parts: an automaton and an environment. The latter shown in Fig. 2.1 is defined mathematically as a triple , which can be explained as follows.
is a set of actions (). The action selected at instant is denoted by .
is the output set of possible environmental responses. The environmental response at instant is denoted by . To simplify our discussions, let . “1” and “0” denote the reward and penalty responses, respectively.
is the penalty probability matrix that the environment rewards actions.
Since is a conditional probability, the environment is called a random environment. According to the different values of , the random environment can be divided into three environmental models, i.e., P‐model, Q‐model, and S‐model.
If is a finite set and its elements are discretely distributed over the unit interval [0, 1], the environmental model is referred to as a Q‐model; if has an infinite number of values, and its elements are continuously distributed in the unit interval [0, 1], the environmental model is referred as an S‐model, while an environmental model whose response contains only output values 0 and 1 is referred as a P‐model.
Q‐ and S‐models are more closely related to biological learning and have more practical application values. However, research on a P‐model is the basis of studying other more complex environments. Therefore, a P‐model is usually more attractive to many scholars.
Considering a P‐model, if , then we have . In this case, the penalty probability can be written as , where each , corresponds to the element of . If is not a function of time, the environment is known as a stationary environment; otherwise, it is a non‐stationary environment.
Figure 2.1 An environment.
An automaton shown in Fig. 2.2 can be described by a quintuple .
is the set of outputs or actions of an automaton. The action selected at instant is denoted by .
, as an environmental response, is an input set of an automaton. At instant , it is denoted as .