111,99 €
A guide to modern optimization applications and techniques in newly emerging areas spanning optimization, data science, machine intelligence, engineering, and computer sciences Optimization Techniques and Applications with Examples introduces the fundamentals of all the commonly used techniques in optimization that encompass the broadness and diversity of the methods (traditional and new) and algorithms. The author--a noted expert in the field--covers a wide range of topics including mathematical foundations, optimization formulation, optimality conditions, algorithmic complexity, linear programming, convex optimization, and integer programming. In addition, the book discusses artificial neural network, clustering and classifications, constraint-handling, queueing theory, support vector machine and multi-objective optimization, evolutionary computation, nature-inspired algorithms and many other topics. Designed as a practical resource, all topics are explained in detail with step-by-step examples to show how each method works. The book's exercises test the acquired knowledge that can be potentially applied to real problem solving. By taking an informal approach to the subject, the author helps readers to rapidly acquire the basic knowledge in optimization, operational research, and applied data mining. This important resource: * Offers an accessible and state-of-the-art introduction to the main optimization techniques * Contains both traditional optimization techniques and the most current algorithms and swarm intelligence-based techniques * Presents a balance of theory, algorithms, and implementation * Includes more than 100 worked examples with step-by-step explanations Written for upper undergraduates and graduates in a standard course on optimization, operations research and data mining, Optimization Techniques and Applications with Examples is a highly accessible guide to understanding the fundamentals of all the commonly used techniques in optimization.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 480
Veröffentlichungsjahr: 2018
Cover
Preface
Introduction
Further Reading
Part I: Fundamentals
1 Mathematical Foundations
1.1 Functions and Continuity
1.2 Review of Calculus
1.3 Vectors
1.4 Matrix Algebra
1.5 Eigenvalues and Eigenvectors
1.6 Optimization and Optimality
1.7 General Formulation of Optimization Problems
Exercises
Further Reading
2 Algorithms, Complexity, and Convexity
2.1 What Is an Algorithm?
2.2 Order Notations
2.3 Convergence Rate
2.4 Computational Complexity
2.5 Convexity
2.6 Stochastic Nature in Algorithms
Exercises
Bibliography
Part II: Optimization Techniques and Algorithms
3 Optimization
3.1 Unconstrained Optimization
3.2 Gradient‐Based Methods
3.3 Gradient‐Free Nelder–Mead Method
Exercises
Bibliography
4 Constrained Optimization
4.1 Mathematical Formulation
4.2 Lagrange Multipliers
4.3 Slack Variables
4.4 Generalized Reduced Gradient Method
4.5 KKT Conditions
4.6 Penalty Method
Exercises
Bibliography
5 Optimization Techniques
5.1 BFGS Method
5.2 Trust‐Region Method
5.3 Sequential Quadratic Programming
5.4 Convex Optimization
5.5 Equality Constrained Optimization
5.6 Barrier Functions
5.7 Interior‐Point Methods
5.8 Stochastic and Robust Optimization
Exercises
Bibliography
Part III: Applied Optimization
6 Linear Programming
6.1 Introduction
6.2 Simplex Method
6.3 Worked Example by Simplex Method
6.4 Interior‐Point Method for LP
Exercises
Bibliography
7 Integer Programming
7.1 Integer Linear Programming
7.2 LP Relaxation
7.3 Branch and Bound
7.4 Mixed Integer Programming
7.5 Applications of LP, IP, and MIP
Exercises
Bibliography
8 Regression and Regularization
8.1 Sample Mean and Variance
8.2 Regression Analysis
8.3 Nonlinear Least Squares
8.4 Over‐fitting and Information Criteria
8.5 Regularization and Lasso Method
8.6 Logistic Regression
8.7 Principal Component Analysis
Exercises
Bibliography
9 Machine Learning Algorithms
9.1 Data Mining
9.2 Data Mining for Big Data
9.3 Artificial Neural Networks
9.4 Support Vector Machines
9.5 Deep Learning
Exercises
Bibliography
10 Queueing Theory and Simulation
10.1 Introduction
10.2 Arrival Model
10.3 Service Model
10.4 Basic Queueing Model
10.5 Little’s Law
10.6 Queue Management and Optimization
Exercises
Bibliography
Part IV: Advanced Topics
11 Multiobjective Optimization
11.1 Introduction
11.2 Pareto Front and Pareto Optimality
11.3 Choice and Challenges
11.4 Transformation to Single Objective Optimization
11.5 The ɛ‐Constraint Method
11.6 Evolutionary Approaches
Exercises
Bibliography
12 Constraint‐Handling Techniques
12.1 Introduction and Overview
12.2 Method of Lagrange Multipliers
12.3 Barrier Function Method
12.4 Penalty Method
12.5 Equality Constraints via Tolerance
12.6 Feasibility Criteria
12.7 Stochastic Ranking
12.8 Multiobjective Constraint‐Handling and Ranking
Exercises
Bibliography
Part V: Evolutionary Computation and Nature-Inspired Algorithms
13 Evolutionary Algorithms
13.1 Evolutionary Computation
13.2 Evolutionary Strategy
13.3 Genetic Algorithms
13.4 Simulated Annealing
13.5 Differential Evolution
Exercises
Bibliography
14 Nature‐Inspired Algorithms
14.1 Introduction to SI
14.2 Ant and Bee Algorithms
14.3 Particle Swarm Optimization
14.4 Firefly Algorithm
14.5 Cuckoo Search
14.6 Bat Algorithm
14.7 Flower Pollination Algorithm
14.8 Other Algorithms
Exercises
Bibliography
Appendix A: Notes on Software Packages
A.1 Software Packages
A.2 Matlab Codes in This Book
A.3 Optimization Software
Appendix B: Problem Solutions
Solutions for Chapter 1
Solutions for Chapter 2
Solutions for Chapter 3
Solutions for Chapter 4
Solutions for Chapter 5
Solutions for Chapter 6
Solutions for Chapter 7
Solutions for Chapter 8
Solutions for Chapter 9
Solutions for Chapter 10
Solutions for Chapter 11
Solutions for Chapter 12
Solutions for Chapter 13
Solutions for Chapter 14
Index
End User License Agreement
Chapter 07
Table 7.1 Transport problem with
m
suppliers and
n
demands.
Table 7.2 Product portfolio of a small factory.
Table 7.3 Production scheduling.
Chapter 08
Table 8.1 Two quantities with measured data.
Table 8.2 Goodness of fit in terms of RSS.
Table 8.3 AIC as the goodness of fit.
Chapter 01
Figure 1.1 A simple quadratic function
.
Figure 1.2 Discontinuity of the Heaviside step function at
.
Figure 1.3 The gradients of a family of curves
(where
at any point
x
are the same
.
Figure 1.4 Smooth functions (left) and non‐smooth (but continuous) functions (right).
Figure 1.5 Expansion and approximations for
, where
.
Figure 1.6 Different
p
‐norms for
, and
(left) as well as
and
(right).
Figure 1.7 Feasible domain with nonlinear inequality constraints
ψ
1
(
x
) and
ψ
2
(
x
) (left) as well as a linear inequality constraint
ψ
3
(
x
). An example with an objective of
subject
(right).
Figure 1.8 Local optima, weak optima, and global optimality.
Chapter 02
Figure 2.1 Convexity: (a) non‐convex and (b) convex.
Figure 2.2 Gradient (left) and subgradients (right) of a function
f
(
x
) at
x
0
.
Figure 2.3 Gaussian distributions for
.
Figure 2.4 Representation of Monte Carlo integration.
Chapter 03
Figure 3.1 Plot of
f
(
x
) where
corresponds to the global minimum.
Figure 3.2 Function sin(
x
)/
x
has an infinite number of local minima and maxima.
Figure 3.3 The concept of a simplex: (a) 1‐simplex, (b) 2‐simplex, and (c) 3‐simplex.
Figure 3.4 Simplex manipulations: (a) reflection with fixed volume (area), (b) expansion or contraction along the line of reflection, and (c) reduction.
Chapter 04
Figure 4.1 Minimization of a function with the two equality constraints.
Chapter 05
Figure 5.1 The barrier method: (a) log barrier near a boundary and (b) central path for
and
.
Figure 5.2 Robustness of the optimal solution.
Chapter 06
Figure 6.1 Schematic representation of linear programming. If
,
,
,
, and
, then the optimal solution is at
B
(10, 10).
Chapter 07
Figure 7.1 Linear programming.
Figure 7.2 Integer programming.
Figure 7.3 Integer programming (with an alternative objective).
Figure 7.4 Branch and bound (Example 7.1, Subproblem
P
3
).
Figure 7.5 Branch and bound (Example 7.2, subproblems).
Figure 7.6 Branch and bound tree.
Figure 7.7 Branch and bound (Example 7.2, continued).
Figure 7.8 Branch and bound (Example 7.2, alternative branching).
Chapter 08
Figure 8.1 Least square and the best fit line.
Figure 8.2 Best fit curve for
with 2.5 % noise.
Figure 8.3 An example of nonlinear least squares.
Figure 8.4 Some random data with a trend.
Chapter 09
Figure 9.1 A simple neuron.
Figure 9.2 Schematic representation of a simple feedforward neural network with
n
i
inputs,
m
hidden nodes, and
N
outputs.
Figure 9.3 Restricted Boltzmann machine with visible and hidden units.
Figure 9.4 Hyperplane, maximum margins, and a linear support vector machine.
Figure 9.5 Kernel functions by nonlinear transformation. (a) Input space and (b) feature space.
Chapter 10
Figure 10.1 A simple queue representation of M/M/1.
Chapter 11
Figure 11.1 The non‐dominated set, Pareto front and ideal vectors in a minimization problem with two objectives
f
1
and
f
2
.
Figure 11.2 Three functions reach the global minimum at
.
Figure 11.3 Weighted sum method for two objectives
f
1
and
f
2
and
.
Figure 11.4 Finding the Pareto solution with maximum utility in a maximization problem with two objectives.
Figure 11.5 Pareto front is the line connecting
A
(5, 0) and
B
(0,5/
α
). The Pareto solution with maximum utility is
at point
A
.
Figure 11.6 Slicing the objective domain in the
ε
‐constraint method.
Figure 11.7 True Pareto front and the estimated front when setting
f
1
as the objective and
f
2
as the constraint.
Figure 11.8 True Pareto front and the estimated front when setting
f
2
as the objective and
f
1
as the constraint.
Chapter 13
Figure 13.1 Diagram of crossover at a random crossover point.
Figure 13.2 Schematic representation of mutation at a single site by flipping a randomly selected bit (
).
Figure 13.3 Schematic representation of mutation vectors in differential evolution with movement
.
Chapter 14
Figure 14.1 Schematic representation of the motion of a particle in PSO, moving toward the global best
g
* and the current best
for each particle
i
.
Figure 14.2 Random walk in 2D with a Gaussian step‐size distribution and the path of 100 steps starting at the origin (0,0) (marked with •).
Figure 14.3 Lévy flights in consecutive 100 steps starting at the origin (0, 0) (marked with •).
Cover
Table of Contents
Begin Reading
iii
iv
xiii
xiv
xv
xvii
xix
xx
xxi
xxiii
xxiv
xxv
xxvi
xxvii
xxviii
xxix
xxx
xxxi
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
37
38
40
41
43
44
45
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
87
88
89
90
92
93
94
95
96
97
98
99
100
101
102
103
104
106
108
109
110
111
112
113
114
115
116
117
118
119
120
122
123
124
125
127
128
129
130
131
132
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
187
188
189
190
191
192
193
194
195
197
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
217
218
219
220
221
222
223
225
226
227
228
229
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
249
251
252
254
255
256
257
258
259
260
261
262
263
264
265
266
267
269
270
271
272
273
274
278
279
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
314
315
316
317
318
319
320
321
323
325
326
327
329
330
332
333
334
335
336
337
338
339
341
342
343
345
346
347
348
349
350
Xin‐She Yang
Middlesex University London
This edition first published 2018© 2018 John Wiley & Sons, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Xin‐She Yang to be identified as the author of the material in this work has been asserted in accordance with law.
Registered OfficeJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of WarrantyIn view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data
Names: Yang, Xin‐She, author.Title: Optimization techniques and applications with examples/Xin‐She Yang.Description: Hoboken, New Jersey : John Wiley & Sons, 2018. | Includes bibliographical references and index. |Identifiers: LCCN 2018019716 (print) | LCCN 2018033280 (ebook) | ISBN 9781119490609 (Adobe PDF) | ISBN 9781119490623 (ePub) | ISBN 9781119490548 (hardcover)Subjects: LCSH: Mathematical optimization.Classification: LCC QA402.5 (ebook) | LCC QA402.5 .Y366 2018 (print) | DDC 519.6–dc23LC record available at https://lccn.loc.gov/2018019716
Cover image: © MirageC/Getty ImagesCover design: Wiley
Figure I.1
Classification of optimization problems
Figure I.2
Classification of optimization algorithms
Figure 1.1
A simple quadratic function
Figure 1.2
Discontinuity of the Heaviside step function at
Figure 1.3
The gradients of a family of curves
(where
at any point
x
are the same
Figure 1.4
Smooth functions (left) and non‐smooth (but continuous) functions (right)
Figure 1.5
Expansion and approximations for
, where
Figure 1.6
Different
p
‐norms for
, and
(left) as well as
and
(right)
Figure 1.7
Feasible domain with nonlinear inequality constraints
ψ
1
(
x
) and
ψ
2
(
x
) (left) as well as a linear inequality constraint
ψ
3
(
x
). An example with an objective of
subject
(right)
Figure 1.8
Local optima, weak optima, and global optimality
Figure 2.1
Convexity: (a) non‐convex and (b) convex
Figure 2.2
Gradient (left) and subgradients (right) of a function
f
(
x
) at
x
0
Figure 2.3
Gaussian distributions for
Figure 2.4
Representation of Monte Carlo integration
Figure 3.1
Plot of
f
(
x
) where
corresponds to the global minimum
Figure 3.2
Function
sin
(
x
)/
x
has an infinite number of local minima and maxima
Figure 3.3
The concept of a simplex: (a) 1‐simplex, (b) 2‐simplex, and (c) 3‐simplex
Figure 3.4
Simplex manipulations: (a) reflection with fixed volume (area), (b) expansion or contraction along the line of reflection, and (c) reduction
Figure 4.1
Minimization of a function with the two equality constraints
Figure 5.1
The barrier method: (a) log barrier near a boundary and (b) central path for
and
Figure 5.2
Robustness of the optimal solution
Figure 6.1
Schematic representation of linear programming. If
,
,
,
, and
, then the optimal solution is at
B
(10, 10)
Figure 7.1
Linear programming
Figure 7.2
Integer programming
Figure 7.3
Integer programming (with an alternative objective)
Figure 7.4
Branch and bound (
Example 7.1
, Subproblem
P
3
)
Figure 7.5
Branch and bound (
Example 7.2
, subproblems)
Figure 7.6
Branch and bound tree
Figure 7.7
Branch and bound (
Example 7.1
, continued)
Figure 7.8
Branch and bound (
Example 7.2
, alternative branching)
Figure 8.1
Least square and the best fit line
Figure 8.2
Best fit curve for
with 2.5 % noise
Figure 8.3
An example of nonlinear least squares
Figure 8.4
Some random data with a trend
Figure 9.1
A simple neuron
Figure 9.2
Schematic representation of a simple feedforward neural network with
n
i
inputs,
m
hidden nodes, and
N
outputs
Figure 9.3
Restricted Boltzmann machine with visible and hidden units
Figure 9.4
Hyperplane, maximum margins, and a linear support vector machine
Figure 9.5
Kernel functions by nonlinear transformation. (a) Input space and (b) feature space
Figure 10.1
A simple queue representation of M/M/1
Figure 11.1
The non‐dominated set, Pareto front and ideal vectors in a minimization problem with two objectives
f
1
and
f
2
Figure 11.2
Three functions reach the global minimum at
Figure 11.3
Weighted sum method for two objectives
f
1
and
f
2
and
Figure 11.4
Finding the Pareto solution with maximum utility in a maximization problem with two objectives
Figure 11.5
Pareto front is the line connecting
A
(5, 0) and
B
(0,5/
α
). The Pareto solution with maximum utility is
at point
A
Figure 11.6
Slicing the objective domain in the ɛ‐constraint method
Figure 11.7
True Pareto front and the estimated front when setting
f
1
as the objective and
f
2
as the constraint
Figure 11.8
True Pareto front and the estimated front when setting
f
2
as the objective and
f
1
as the constraint
Figure 13.1
Diagram of crossover at a random crossover point
Figure 13.2
Schematic representation of mutation at a single site by flipping a randomly selected bit (
)
Figure 13.3
Schematic representation of mutation vectors in differential evolution with movement
Figure 14.1
Schematic representation of the motion of a particle in PSO, moving toward the global best
g
* and the current best
for each particle
i
Figure 14.2
Random walk in 2D with a Gaussian step‐size distribution and the path of 100 steps starting at the origin (0, 0) (marked with •)
Figure 14.3
Lévy flights in consecutive 100 steps starting at the origin (0, 0) (marked with •)
Table 7.1
Transport problem with m suppliers and n demands
Table 7.2
Product portfolio of a small factory
Table 7.3
Production scheduling
Table 8.1
Two quantities with measured data
Table 8.2
Goodness of fit in terms of RSS
Table 8.3
AIC as the goodness of fit
Optimization and operations research are part of many university courses because they are important to many disciplines and applications such as engineering design, business planning, computer science, data mining, machine learning, artificial intelligence, and industries.
There are many books on the market, but most books are very specialized in the sense that they target at a specialized audience in a given subject such as applied mathematics, business management, computer science, or engineering. Most of these books, though with much details and often more than 500 pages, have focused on the traditional optimization techniques, commonly used in operations research. In some sense, some courses in the traditional operations research areas have focused too much on many traditional topics, and these topics have many excellent textbooks. However, optimization techniques have evolved over the last few decades, and some new techniques start to show their effectiveness and have become an integrated part of new mainstream methods. Though important traditional topics will be still introduced in this book, the contents on some techniques may be brief instead of overlapping with other books, we will provide relevant references to appropriate literature when necessary. Therefore, this book tries to address modern topics, as used mostly by researchers working in these newly emerging areas, spanning optimization, data science, machine intelligence, engineering, and computer sciences, so that the topics covered in this book are most relevant to the current areas of research and syllabus topics in optimization, data mining, machine learning, and operations research.
This book aims at an introductory level, introducing all the fundamentals of all the commonly used techniques so that students can see the broadness and diversity of the methods and algorithms and sample the flavor of each method (both traditional and new). The book covers almost all major optimization techniques with many worked examples. Topics include mathematical foundations, optimization formulation, optimality conditions, algorithmic complexity, linear programming, convex optimization, integer programming, artificial neural network, clustering and classifications, regression and least squares, constraint‐handling, queueing theory, support vector machine and multiobjective optimization, evolutionary computation, nature‐inspired algorithms, and others. Each topic is explained in detail with step‐by‐step examples to show how each method works, and exercises are also provided to test the acquired knowledge and potentially apply to real problem solving.
With an informal approach and more than 100 worked examples and exercises, this introductory book is especially suitable for both undergraduates and graduates to rapidly acquire the basic knowledge in optimization, operational research, machine learning, and data mining. This book can also serve as a reference for researchers and professionals in engineering, computer sciences, operations research, data mining, and management science.
Xin‐She Yang
Cambridge and LondonMarch 2018
I would like to thank my students who took my relevant courses and provided some feedback on the contents concerning optimization algorithms and examples. I also thank the participants for their comments at international conferences where I gave tutorials on nature‐inspired algorithms. I would also like to thank the editors, Kathleen Pagliaro, Vishnu Narayanan, Mindy Okura‐Marszycki, and staff at Wiley for their help and professionalism. Last but not least, I thank my family for their support.
Xin‐She Yang
AIC
Akaike information criterion
ANN
Artificial neural network
APSO
Accelerated particle swarm optimization
BA
Bat algorithm
BFGS
Broyden–Fletcher–Goldfarb–Shanno
BFR
Bradley–Fayyad–Reina
BIC
Bayesian information criterion
BPNN
Back propagation neural network
CNN
Convolutionary neural network
CPF
Cumulative probability function
CS
Cuckoo search
CURE
Clustering using representative
DE
Differential evolution
DL
Deep learning
ES
Evolution strategy
FA
Firefly algorithm
FPA
Flower pollination algorithm
GA
Genetic algorithm
IP
Integer programming
KKT
Karush–Kuhn–Tucker
LP
Linear programming
MIP
Mixed integer programming
ML
Machine learning
NSGA
Non‐dominated sorting genetic algorithm
NP
Non‐deterministic polynomial‐time
PCA
Principal component analysis
Probability density function
PSO
Particle swarm optimization
QP
Quadratic programming
RBM
Restricted Boltzmann machine
SA
Simulated annealing
SGA
Stochastic gradient ascent
SGD
Stochastic gradient descent
SQP
Sequential quadratic programming
SVM
Support vector machine
TSP
Traveling salesman problem
Optimization is everywhere, though it can mean different things from different perspectives. From basic calculus, optimization can be simply used to find the maximum or minimum of a function such as in the real domain . In this case, we can either use a gradient‐based method or simply spot the solution due to the fact that x4 and x2 are always nonnegative, the minimum values of x4 and x2 are zero, thus has a minimum at . This can be confirmed easily by taking the first derivative of f(x) with respect to x, we have
which has only one solution because x is a real number. The condition seems to be sufficient to determine the optimal solution in this case. In fact, this function is convex with only one minimum in the whole real domain.
However, things become more complicated when f(x) is highly nonlinear with multiple optima. For example, if we try to find the maximum value of in the real domain, we can naively use
which has an infinite number of solutions for . There is no simple formula for these solutions, thus a numerical method has to be used to calculate these solutions. In addition, even with all the efforts to find these solutions, care has to be taken because the actual global maximum occurs at . However, this solution can only be found by taking the limit , and it is not part of the solutions from the above condition of . This highlights the potential difficulty for nonlinear problems with multiple optima or multi‐modality.
Furthermore, not all functions are smooth. For example, if we try to use to find the minimum of
we will realize that f(x) is not differentiable at , though the global minimum occurs at . In this case, optimization techniques that require the calculation of derivatives will not work.
Problems become more challenging in higher‐dimensional spaces. For example, the nonlinear function
where for , has a minimum at , but this function is not differentiable at x∗.
Therefore, optimization techniques have to be diverse to use gradient information when appropriate, and not to use it when it is not defined or not easily calculated. Though the above nonlinear optimization problems can be challenging to solve, constraints on the search domain and certain independent variables can make the search domain much more complicated, which can consequently make the problem even harder to solve. In addition, sometime, we have to optimize several objective functions instead of just one function, which will in turn make a problem more challenging to solve.
In general, it is possible to write most optimization problems in the general form mathematically
where fi(x), ϕj(x), and ψk(x) are functions of the design vector
Here, the components xi of x are called design or decision variables, and they can be real continuous, discrete, or the mixture of these two. The functions fi(x) where are called the objective functions, and in the case of , there is only a single objective, and the problem becomes single‐objective optimization. The objective function is sometimes called the cost function, loss function, or energy function in the literature. The space spanned by the decision variables is called the design space or search space ℝn, while the space formed by the objective function values is called the response space or objective landscape. The equalities for ϕj and inequalities for ψk are called constraints. It is worth pointing out that we can also write the inequalities in the other way , and we can also formulate the objectives as maximization.
If a solution vector x satisfies all the constraints ϕj(x) and ψk(x), it is called a feasible solution; otherwise, it is infeasible. All the feasible points or vectors form the feasible regions in the search space.
In a rare but extreme case where there is no objective at all, there are only constraints. Such a problem is called a feasibility problem and any feasible solution is an optimal solution to such problems.
If all the problem functions are linear, the problem becomes a linear programming (LP) problem. Here, the term “programming” means planning, it has nothing to do with computer programming in this context. Thus, mathematical optimization is also called mathematical programming. In addition, if the variables in LP only take integer values, the LP becomes an integer LP or simple integer programming (IP). If the variables can only take two values (0 or 1), such an LP is called binary IP.
If there are no constraints (i.e. and ), the problem becomes an unconstrained optimization problem. If but , it becomes equality‐constrained optimization. Conversely, if but , it becomes the inequality‐constrained problem.
The classification of optimization problems can also be done by looking at the modality of the objective landscape, which can be divided into multimodal problems and unimodal problems including convex optimization. In addition, classification can also be about the determinacy of the problem. If there is NO randomness in the formulation, the problem is called deterministic and in fact all the above problems are essentially deterministic. However, if there is uncertainty in the variables or function forms, then optimization involves probability distribution and expectation, such problems are often called stochastic optimization or robust optimization.
Classification of optimization problems and the terminologies used in the optimization literature can be diverse and confusing. We summarize most of these terms in Figure I.1.
Figure I.1 Classification of optimization problems.
Whether an optimization problem is considered easy or hard, it can depend on many factors and the actual perspective of mathematical formulations. In fact, three factors that make a problem more challenging are: nonlinearity of the objective function, the high dimensionality of the problem, and the complex shape of the search domain.
Nonliearity and multimodality: The high nonlinearity of a function to be optimized usually means multimodality with multiple local modes with multiple (even infinite) optima. In most cases, algorithms to solve such problems are more likely to get trapped in local modes.
High dimensionality: High dimensionality is associated with the so‐called curse of dimensionality where the distance becomes large, while the volume of any algorithm that can actually search becomes insignificant compared to the vast feasible space.
Constraints: The multiple complex constraints can make the feasible region complicated with irregular geometry or shapes. In some cases, feasible regions can be split into multiple disconnected regions with isolated islands, which makes it harder for algorithms to search all the feasible regions (thus potentially missing the true optimality).
Other factors such as the evaluation time of an objective are also important. In many applications such as protein folding, bio‐informatics, aero‐space engineering, and deep machine learning (ML), the evaluation of a single objective can take a long time (from a few hours to days or even weeks), therefore the computational costs can be very high. Thus, the choice of efficient algorithms becomes paramount.
Algorithms for solving optimization problems tend to be iterative, and thus multiple evaluations of objectives are needed, typically hundreds or thousands (or even millions) of evaluations. Different algorithms may have different efficiency and different requirements. For example, Newton’s method, which is gradient‐based, is very efficient for solving smooth objective functions, but they can get stuck in local modes if the objective is highly multimodal. If the objective is not smooth or has a kink, then the Nelder–Mead simplex method can be used because it is a gradient‐free method, and can work well even for problems with discontinuities, but it can become slow and get stuck in a local mode. Algorithms for solving nonlinear optimization are diverse, including the trust‐region method, interior‐point method, and others, but they are mostly local search methods. In a special case when the objective becomes convex, they can be very efficient. Quadratic programming (QP) and sequential quadratic programming use such convexity properties to their advantage.
The simplex method for solving LP can be efficient in practice, but it requires all the problem functions to be linear. But, if an LP problem has integer variables, the simplex method will not work directly, it has to be combined with branch and bound to solve IP problems.
As traditional methods are usually local search algorithms, one of the current trends is to use heuristic and metaheuristic algorithms. Here meta‐ means “beyond” or “higher level,” and they generally perform better than simple heuristics. In addition, all metaheuristic algorithms use certain tradeoff of randomization and local search. It is worth pointing out that there are no agreed definitions of heuristics and metaheuristics in the literature, some use “heuristics” and “metaheuristics” interchangeably. However, recent trends tend to name all stochastic algorithms with randomization and local search as metaheuristic. Here, we will also use this convention. Randomization provides a good way to move away from local search to the search on the global scale. Therefore, almost all metaheuristic algorithms intend to be suitable for global optimization, though global optimality may be still challenging to achieve for most problems in practice.
Most metaheuristic algorithms are nature‐inspired as they have been developed based on some abstraction of nature. Nature has evolved over millions of years and has found perfect solutions to almost all the problems she met. We can thus learn the success of problem‐solving from nature and develop nature‐inspired heuristic and/or metaheuristic algorithms. More specifically, some nature‐inspired algorithms are inspired by Darwin’s evolutionary theory. Consequently, they are said to be biology‐inspired or simply bio‐inspired.
The basic requirements of this book are the fundamental knowledge of functions, basic calculus, and vector algebra. However, we will review here the most relevant fundamentals of functions, vectors, differentiation, and integration. Then, we will introduce some useful concepts such as eigenvalues, complexity, convexity, probability distributions, and optimality conditions.
Loosely speaking, a function is a quantity (say y) which varies with another independent quantity or variable x in a deterministic way. For example, a simple quadratic function is
For any given value of x, there is a unique corresponding value of y. By varying x smoothly, we can vary y in such a manner that the point (x, y) will trace out a curve on the x–y plane (see Figure 1.1). Thus, x is called the independent variable, and y is called the dependent variable or function. Sometimes, in order to emphasize the relationship as a function, we use f(x) to express a generic function, showing that it is a function of x. This can also be written as .
Figure 1.1 A simple quadratic function .
The domain of a function is the set of numbers x for which the function f(x) is valid (that is, f(x) gives a valid value for a corresponding value of x). If a function is defined over a range , we say its domain is [a, b] that is called a closed interval. If both a and b are not included, we have , which is denoted by (a, b), and we call this interval an open interval. If b is included, while a is not, we have , and we often write this half‐open and half‐closed interval as . Thus, the domain of function is the whole set of all real numbers ℝ, so we have
Here, the notation means infinity. In this case, the domain of is all the real numbers, which can be simply written as ℝ, that is, .
All the values that a function can take for a given domain form the range of the function. Thus, the range of is or . Here, the closed bracket “[” means that the value (here 0) is included as part of the range, while the open round bracket “)” means that the value (here +) is not part of the range.
A function is called continuous if an infinitely small change δ of the independent variable x always lead to an infinitely small change in . Alternatively, we can loosely view that the graph representing the function forms a single piece, unbroken curve. More formally, we can say that for any small change of the independent variable in the domain, there is an such that
This is the continuity condition. Obviously, functions such as x, , and x2 are all continuous.
If a function does not satisfy the continuity condition, then the function is called discontinuous. For example, the Heaviside step function
is discontinuous at as shown in Figure 1.2 where the solid dot means that is included in the right branch , and the hollow dot means that it is not included. In this case, this function is called a right‐continuous function.
Figure 1.2 Discontinuity of the Heaviside step function at .
For a given non‐empty set of real numbers, we now introduce some important concepts such as the supremum and infimum. A number is called an upper bound for S if for all . An upper bound β is said to be the least (or smallest) upper bound for S, or the supremum, if for any upper bound . This is often written as
All such notations are widely used in the literature of mathematical analysis. Here, “” denotes both equality and definition, which means “is identical to.”
On the other hand, a number L is called a lower bound for S if for all x in S (that is, for all x, denoted by ). A lower bound α is referred to as the greatest (or largest) lower bound if for any lower bound L, which is written as
In general, both the supremum β and the infimum α, if they exist, may or may not belong to S.
For example, any numbers greater than 5, say, 7.2 and 500 are an upper bound for the interval or . However, its smallest upper bound (or sup) is 5. Similarly, numbers such as and are lower bound of the interval, but is the greatest lower bound (or inf). In addition, the interval has an infimum of 15 but it has no upper bound. That is to say, its supremum does not exist, or .
There is an important completeness axiom which says that if a non‐empty set of real numbers is bounded above, then it has a supremum. Similarly, if a non‐empty set of real numbers is bounded below, then it has an infimum.
Furthermore, the maximum for S is the largest value of all elements , and often written as max(S) or max S, while the minimum, min(S) or min S, is the smallest value among all . For the same interval , the maximum of this interval is 5 which is equal to its supremum, while its minimum 5 is also equal to its infimum. Though the supremum and infimum are not necessarily part of the set S, however, the maximum and minimum (if they exist) always belong to the set.
However, the concepts of supremum (or infimum) and maximum (or minimum) are not the same, and maximum/minimum may not always exist.
For example, the interval or has the supremum of , but S has no maximum.
Similarly, the interval does not have a minimum, though its infimum is . Furthermore, the open interval has no maximum or minimum; however, its supremum is 7, and infimum is .
It is worth pointing out that the problems we will discuss in this book will always have at least a maximum or a minimum.
The gradient or first derivative of a function f(x) at point x is defined by
From the definition and basic function operations, it is straightforward to show that
In addition, for any two functions f(x) and g(x), we have
where a, b are two real constants. Therefore, it is easy to show that
where k is a constant. This means that a family of functions shifted by a different constant k will have the same gradient at the same point x, as shown in Figure 1.3.
Figure 1.3 The gradients of a family of curves (where at any point x are the same .
Some useful differentiation rules are the product rule
and the chair rule
where f(g(x)) is a composite function, which means that f is a function of g, and g is a function of x.
For example, from and , we have
Similarly, we have
From the product rule (1.11), if we replace g(x) by 1/g(x), we have
and
which is the well‐known quotient rule. For example, we have
It is worth pointing out that a continuous function may not have well‐defined derivatives. For example, the absolute or modulus function
does not have a well‐defined gradient at x because the gradient of is +1 if x approaches 0 from (using notation ). However, if x approaches 0 from (using notation ), the gradient of is . That is
In this case, we say that is not differentiable at . However, for , we can write
A nature extension of this is that for a function f(x), we have
As an example, for , we have
Higher derivatives of a univariate real function can be defined as
for all positive integers .
The first, second, and third derivatives of are
and
For a continuous function f(x), if its first derivatives are well defined at every point in the whole domain, the function is called differentiable or a continuously differential function. A continuously differentiable function is said to be class C1 if its first derivative exists and is continuous. Similarly, a function is said to be class C2 if its both first and second derivatives exist, and are continuous. This can be extended to class Ck in a similar manner. If the derivatives of all orders (all positive integers) everywhere in its domain, the function is called smooth.
It is straightforward to check that , sin(x), exp(x), and are all smooth functions, but and are not smooth. Some of these functions are shown in Figure 1.4.
Figure 1.4 Smooth functions (left) and non‐smooth (but continuous) functions (right).
In numerical methods and some mathematical analysis, series expansions make some calculations easier. For example, we can write the exponential function ex as a series about as
Now let us try to determine these coefficients. At , we have
which gives . In order to reduce the power or order of the expansion so that we can determine the next coefficient, we first differentiate both sides of Eq. (1.21) once; we have
By setting again , we have
which gives . Similarly, differentiating it again, we have
At , we get
or !. Here, is the factorial of 2. In general, the factorial n ! is defined as .
Following the same procedure and differentiating it n times, we have
and leads to !. Therefore, the final series expansion can be written as
which is an infinite series. Obviously, we can follow a similar process to expand other functions. We have seen here the importance of differentiation and derivatives.
If we know the value of f(x) at x0, we can use some approximations in a small interval (see Figure 1.5). Following the same idea as Eq. (1.21), we can first write the approximation in the following general form:
Figure 1.5 Expansion and approximations for , where .
and then try to figure out the unknown coefficients . For the above approximation to be valid at , we have
so that .
Now let us first take the first derivative of Eq. (1.29),
By setting , we have
which gives
Similarly, we differentiate Eq. (1.29) twice with respect to x and we have
Setting , we have
Following the same procedure, we have
Thus, we finally obtain
which is the well‐known Taylor series.
In a special case when and , the above Taylor series becomes zero centered, and such expansions are traditionally called Maclaurin series
named after mathematician Colin Maclaurin.
In theory, we can use as many terms as possible, but in practice, the series converges very quickly and only a few terms are sufficient. It is straightforward to verify that the exponential series for ex is identical to the results given earlier. Now let us look at other examples.
Let us expand about . We know that
or , which means that
where the angle x is in radians.
For example, we know that . We now use the expansion to estimate it for ,
which is very close to the true value 1/2.
If we continue the process to infinity, we then reach the infinite power series and the error f(n)(0)xn/n ! becomes negligibly small if the series converges. For example, some common series are
and
As an exercise, we leave the reader to prove the above series.
For multivariate functions, we can define the partial derivatives with respect to an independent variable by assuming that other independent variables are constants. For example, for a function f(x, y) with two independent variables, we can define
and
Similar to the ordinary derivatives, partial derivative operations are also linear.
For , its partial derivatives are
where we have treated y as a constant and also used the fact that because x and y are both independent variables. Similarly, we have
We can define higher‐order derivatives as
and
Let us revisit the previous example.
For , we have
