Optimization Techniques and Applications with Examples - Xin-She Yang - E-Book

Optimization Techniques and Applications with Examples E-Book

Xin-She Yang

0,0
111,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A guide to modern optimization applications and techniques in newly emerging areas spanning optimization, data science, machine intelligence, engineering, and computer sciences Optimization Techniques and Applications with Examples introduces the fundamentals of all the commonly used techniques in optimization that encompass the broadness and diversity of the methods (traditional and new) and algorithms. The author--a noted expert in the field--covers a wide range of topics including mathematical foundations, optimization formulation, optimality conditions, algorithmic complexity, linear programming, convex optimization, and integer programming. In addition, the book discusses artificial neural network, clustering and classifications, constraint-handling, queueing theory, support vector machine and multi-objective optimization, evolutionary computation, nature-inspired algorithms and many other topics. Designed as a practical resource, all topics are explained in detail with step-by-step examples to show how each method works. The book's exercises test the acquired knowledge that can be potentially applied to real problem solving. By taking an informal approach to the subject, the author helps readers to rapidly acquire the basic knowledge in optimization, operational research, and applied data mining. This important resource: * Offers an accessible and state-of-the-art introduction to the main optimization techniques * Contains both traditional optimization techniques and the most current algorithms and swarm intelligence-based techniques * Presents a balance of theory, algorithms, and implementation * Includes more than 100 worked examples with step-by-step explanations Written for upper undergraduates and graduates in a standard course on optimization, operations research and data mining, Optimization Techniques and Applications with Examples is a highly accessible guide to understanding the fundamentals of all the commonly used techniques in optimization.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 480

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Preface

Introduction

Further Reading

Part I: Fundamentals

1 Mathematical Foundations

1.1 Functions and Continuity

1.2 Review of Calculus

1.3 Vectors

1.4 Matrix Algebra

1.5 Eigenvalues and Eigenvectors

1.6 Optimization and Optimality

1.7 General Formulation of Optimization Problems

Exercises

Further Reading

2 Algorithms, Complexity, and Convexity

2.1 What Is an Algorithm?

2.2 Order Notations

2.3 Convergence Rate

2.4 Computational Complexity

2.5 Convexity

2.6 Stochastic Nature in Algorithms

Exercises

Bibliography

Part II: Optimization Techniques and Algorithms

3 Optimization

3.1 Unconstrained Optimization

3.2 Gradient‐Based Methods

3.3 Gradient‐Free Nelder–Mead Method

Exercises

Bibliography

4 Constrained Optimization

4.1 Mathematical Formulation

4.2 Lagrange Multipliers

4.3 Slack Variables

4.4 Generalized Reduced Gradient Method

4.5 KKT Conditions

4.6 Penalty Method

Exercises

Bibliography

5 Optimization Techniques

5.1 BFGS Method

5.2 Trust‐Region Method

5.3 Sequential Quadratic Programming

5.4 Convex Optimization

5.5 Equality Constrained Optimization

5.6 Barrier Functions

5.7 Interior‐Point Methods

5.8 Stochastic and Robust Optimization

Exercises

Bibliography

Part III: Applied Optimization

6 Linear Programming

6.1 Introduction

6.2 Simplex Method

6.3 Worked Example by Simplex Method

6.4 Interior‐Point Method for LP

Exercises

Bibliography

7 Integer Programming

7.1 Integer Linear Programming

7.2 LP Relaxation

7.3 Branch and Bound

7.4 Mixed Integer Programming

7.5 Applications of LP, IP, and MIP

Exercises

Bibliography

8 Regression and Regularization

8.1 Sample Mean and Variance

8.2 Regression Analysis

8.3 Nonlinear Least Squares

8.4 Over‐fitting and Information Criteria

8.5 Regularization and Lasso Method

8.6 Logistic Regression

8.7 Principal Component Analysis

Exercises

Bibliography

9 Machine Learning Algorithms

9.1 Data Mining

9.2 Data Mining for Big Data

9.3 Artificial Neural Networks

9.4 Support Vector Machines

9.5 Deep Learning

Exercises

Bibliography

10 Queueing Theory and Simulation

10.1 Introduction

10.2 Arrival Model

10.3 Service Model

10.4 Basic Queueing Model

10.5 Little’s Law

10.6 Queue Management and Optimization

Exercises

Bibliography

Part IV: Advanced Topics

11 Multiobjective Optimization

11.1 Introduction

11.2 Pareto Front and Pareto Optimality

11.3 Choice and Challenges

11.4 Transformation to Single Objective Optimization

11.5 The ɛ‐Constraint Method

11.6 Evolutionary Approaches

Exercises

Bibliography

12 Constraint‐Handling Techniques

12.1 Introduction and Overview

12.2 Method of Lagrange Multipliers

12.3 Barrier Function Method

12.4 Penalty Method

12.5 Equality Constraints via Tolerance

12.6 Feasibility Criteria

12.7 Stochastic Ranking

12.8 Multiobjective Constraint‐Handling and Ranking

Exercises

Bibliography

Part V: Evolutionary Computation and Nature-Inspired Algorithms

13 Evolutionary Algorithms

13.1 Evolutionary Computation

13.2 Evolutionary Strategy

13.3 Genetic Algorithms

13.4 Simulated Annealing

13.5 Differential Evolution

Exercises

Bibliography

14 Nature‐Inspired Algorithms

14.1 Introduction to SI

14.2 Ant and Bee Algorithms

14.3 Particle Swarm Optimization

14.4 Firefly Algorithm

14.5 Cuckoo Search

14.6 Bat Algorithm

14.7 Flower Pollination Algorithm

14.8 Other Algorithms

Exercises

Bibliography

Appendix A: Notes on Software Packages

A.1 Software Packages

A.2 Matlab Codes in This Book

A.3 Optimization Software

Appendix B: Problem Solutions

Solutions for Chapter 1

Solutions for Chapter 2

Solutions for Chapter 3

Solutions for Chapter 4

Solutions for Chapter 5

Solutions for Chapter 6

Solutions for Chapter 7

Solutions for Chapter 8

Solutions for Chapter 9

Solutions for Chapter 10

Solutions for Chapter 11

Solutions for Chapter 12

Solutions for Chapter 13

Solutions for Chapter 14

Index

End User License Agreement

List of Tables

Chapter 07

Table 7.1 Transport problem with

m

suppliers and

n

demands.

Table 7.2 Product portfolio of a small factory.

Table 7.3 Production scheduling.

Chapter 08

Table 8.1 Two quantities with measured data.

Table 8.2 Goodness of fit in terms of RSS.

Table 8.3 AIC as the goodness of fit.

List of Illustrations

Chapter 01

Figure 1.1 A simple quadratic function

.

Figure 1.2 Discontinuity of the Heaviside step function at

.

Figure 1.3 The gradients of a family of curves

(where

at any point

x

are the same

.

Figure 1.4 Smooth functions (left) and non‐smooth (but continuous) functions (right).

Figure 1.5 Expansion and approximations for

, where

.

Figure 1.6 Different

p

‐norms for

, and

(left) as well as

and

(right).

Figure 1.7 Feasible domain with nonlinear inequality constraints

ψ

1

(

x

) and

ψ

2

(

x

) (left) as well as a linear inequality constraint

ψ

3

(

x

). An example with an objective of

subject

(right).

Figure 1.8 Local optima, weak optima, and global optimality.

Chapter 02

Figure 2.1 Convexity: (a) non‐convex and (b) convex.

Figure 2.2 Gradient (left) and subgradients (right) of a function

f

(

x

) at

x

0

.

Figure 2.3 Gaussian distributions for

.

Figure 2.4 Representation of Monte Carlo integration.

Chapter 03

Figure 3.1 Plot of

f

(

x

) where

corresponds to the global minimum.

Figure 3.2 Function sin(

x

)/

x

has an infinite number of local minima and maxima.

Figure 3.3 The concept of a simplex: (a) 1‐simplex, (b) 2‐simplex, and (c) 3‐simplex.

Figure 3.4 Simplex manipulations: (a) reflection with fixed volume (area), (b) expansion or contraction along the line of reflection, and (c) reduction.

Chapter 04

Figure 4.1 Minimization of a function with the two equality constraints.

Chapter 05

Figure 5.1 The barrier method: (a) log barrier near a boundary and (b) central path for

and

.

Figure 5.2 Robustness of the optimal solution.

Chapter 06

Figure 6.1 Schematic representation of linear programming. If

,

,

,

, and

, then the optimal solution is at

B

(10, 10).

Chapter 07

Figure 7.1 Linear programming.

Figure 7.2 Integer programming.

Figure 7.3 Integer programming (with an alternative objective).

Figure 7.4 Branch and bound (Example 7.1, Subproblem

P

3

).

Figure 7.5 Branch and bound (Example 7.2, subproblems).

Figure 7.6 Branch and bound tree.

Figure 7.7 Branch and bound (Example 7.2, continued).

Figure 7.8 Branch and bound (Example 7.2, alternative branching).

Chapter 08

Figure 8.1 Least square and the best fit line.

Figure 8.2 Best fit curve for

with 2.5 % noise.

Figure 8.3 An example of nonlinear least squares.

Figure 8.4 Some random data with a trend.

Chapter 09

Figure 9.1 A simple neuron.

Figure 9.2 Schematic representation of a simple feedforward neural network with

n

i

inputs,

m

hidden nodes, and

N

outputs.

Figure 9.3 Restricted Boltzmann machine with visible and hidden units.

Figure 9.4 Hyperplane, maximum margins, and a linear support vector machine.

Figure 9.5 Kernel functions by nonlinear transformation. (a) Input space and (b) feature space.

Chapter 10

Figure 10.1 A simple queue representation of M/M/1.

Chapter 11

Figure 11.1 The non‐dominated set, Pareto front and ideal vectors in a minimization problem with two objectives

f

1

and

f

2

.

Figure 11.2 Three functions reach the global minimum at

.

Figure 11.3 Weighted sum method for two objectives

f

1

and

f

2

and

.

Figure 11.4 Finding the Pareto solution with maximum utility in a maximization problem with two objectives.

Figure 11.5 Pareto front is the line connecting

A

(5, 0) and

B

(0,5/

α

). The Pareto solution with maximum utility is

at point

A

.

Figure 11.6 Slicing the objective domain in the

ε

‐constraint method.

Figure 11.7 True Pareto front and the estimated front when setting

f

1

as the objective and

f

2

as the constraint.

Figure 11.8 True Pareto front and the estimated front when setting

f

2

as the objective and

f

1

as the constraint.

Chapter 13

Figure 13.1 Diagram of crossover at a random crossover point.

Figure 13.2 Schematic representation of mutation at a single site by flipping a randomly selected bit (

).

Figure 13.3 Schematic representation of mutation vectors in differential evolution with movement

.

Chapter 14

Figure 14.1 Schematic representation of the motion of a particle in PSO, moving toward the global best

g

* and the current best

for each particle

i

.

Figure 14.2 Random walk in 2D with a Gaussian step‐size distribution and the path of 100 steps starting at the origin (0,0) (marked with •).

Figure 14.3 Lévy flights in consecutive 100 steps starting at the origin (0, 0) (marked with •).

Guide

Cover

Table of Contents

Begin Reading

Pages

iii

iv

xiii

xiv

xv

xvii

xix

xx

xxi

xxiii

xxiv

xxv

xxvi

xxvii

xxviii

xxix

xxx

xxxi

1

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

37

38

40

41

43

44

45

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

87

88

89

90

92

93

94

95

96

97

98

99

100

101

102

103

104

106

108

109

110

111

112

113

114

115

116

117

118

119

120

122

123

124

125

127

128

129

130

131

132

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

187

188

189

190

191

192

193

194

195

197

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

217

218

219

220

221

222

223

225

226

227

228

229

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

249

251

252

254

255

256

257

258

259

260

261

262

263

264

265

266

267

269

270

271

272

273

274

278

279

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

314

315

316

317

318

319

320

321

323

325

326

327

329

330

332

333

334

335

336

337

338

339

341

342

343

345

346

347

348

349

350

Optimization Techniques and Applications with Examples

Xin‐She Yang

Middlesex University London

This edition first published 2018© 2018 John Wiley & Sons, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Xin‐She Yang to be identified as the author of the material in this work has been asserted in accordance with law.

Registered OfficeJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of WarrantyIn view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication Data

Names: Yang, Xin‐She, author.Title: Optimization techniques and applications with examples/Xin‐She Yang.Description: Hoboken, New Jersey : John Wiley & Sons, 2018. | Includes bibliographical references and index. |Identifiers: LCCN 2018019716 (print) | LCCN 2018033280 (ebook) | ISBN 9781119490609 (Adobe PDF) | ISBN 9781119490623 (ePub) | ISBN 9781119490548 (hardcover)Subjects: LCSH: Mathematical optimization.Classification: LCC QA402.5 (ebook) | LCC QA402.5 .Y366 2018 (print) | DDC 519.6–dc23LC record available at https://lccn.loc.gov/2018019716

Cover image: © MirageC/Getty ImagesCover design: Wiley

List of Figures

Figure I.1

Classification of optimization problems

Figure I.2

Classification of optimization algorithms

Figure 1.1

A simple quadratic function

Figure 1.2

Discontinuity of the Heaviside step function at

Figure 1.3

The gradients of a family of curves

(where

at any point

x

are the same

Figure 1.4

Smooth functions (left) and non‐smooth (but continuous) functions (right)

Figure 1.5

Expansion and approximations for

, where

Figure 1.6

Different

p

‐norms for

, and

(left) as well as

and

(right)

Figure 1.7

Feasible domain with nonlinear inequality constraints

ψ

1

(

x

) and

ψ

2

(

x

) (left) as well as a linear inequality constraint

ψ

3

(

x

). An example with an objective of

subject

(right)

Figure 1.8

Local optima, weak optima, and global optimality

Figure 2.1

Convexity: (a) non‐convex and (b) convex

Figure 2.2

Gradient (left) and subgradients (right) of a function

f

(

x

) at

x

0

Figure 2.3

Gaussian distributions for

Figure 2.4

Representation of Monte Carlo integration

Figure 3.1

Plot of

f

(

x

) where

corresponds to the global minimum

Figure 3.2

Function

sin

(

x

)/

x

has an infinite number of local minima and maxima

Figure 3.3

The concept of a simplex: (a) 1‐simplex, (b) 2‐simplex, and (c) 3‐simplex

Figure 3.4

Simplex manipulations: (a) reflection with fixed volume (area), (b) expansion or contraction along the line of reflection, and (c) reduction

Figure 4.1

Minimization of a function with the two equality constraints

Figure 5.1

The barrier method: (a) log barrier near a boundary and (b) central path for

and

Figure 5.2

Robustness of the optimal solution

Figure 6.1

Schematic representation of linear programming. If

,

,

,

, and

, then the optimal solution is at

B

(10, 10)

Figure 7.1

Linear programming

Figure 7.2

Integer programming

Figure 7.3

Integer programming (with an alternative objective)

Figure 7.4

Branch and bound (

Example 7.1

, Subproblem

P

3

)

Figure 7.5

Branch and bound (

Example 7.2

, subproblems)

Figure 7.6

Branch and bound tree

Figure 7.7

Branch and bound (

Example 7.1

, continued)

Figure 7.8

Branch and bound (

Example 7.2

, alternative branching)

Figure 8.1

Least square and the best fit line

Figure 8.2

Best fit curve for

with 2.5 % noise

Figure 8.3

An example of nonlinear least squares

Figure 8.4

Some random data with a trend

Figure 9.1

A simple neuron

Figure 9.2

Schematic representation of a simple feedforward neural network with

n

i

inputs,

m

hidden nodes, and

N

outputs

Figure 9.3

Restricted Boltzmann machine with visible and hidden units

Figure 9.4

Hyperplane, maximum margins, and a linear support vector machine

Figure 9.5

Kernel functions by nonlinear transformation. (a) Input space and (b) feature space

Figure 10.1

A simple queue representation of M/M/1

Figure 11.1

The non‐dominated set, Pareto front and ideal vectors in a minimization problem with two objectives

f

1

and

f

2

Figure 11.2

Three functions reach the global minimum at

Figure 11.3

Weighted sum method for two objectives

f

1

and

f

2

and

Figure 11.4

Finding the Pareto solution with maximum utility in a maximization problem with two objectives

Figure 11.5

Pareto front is the line connecting

A

(5, 0) and

B

(0,5/

α

). The Pareto solution with maximum utility is

at point

A

Figure 11.6

Slicing the objective domain in the ɛ‐constraint method

Figure 11.7

True Pareto front and the estimated front when setting

f

1

as the objective and

f

2

as the constraint

Figure 11.8

True Pareto front and the estimated front when setting

f

2

as the objective and

f

1

as the constraint

Figure 13.1

Diagram of crossover at a random crossover point

Figure 13.2

Schematic representation of mutation at a single site by flipping a randomly selected bit (

)

Figure 13.3

Schematic representation of mutation vectors in differential evolution with movement

Figure 14.1

Schematic representation of the motion of a particle in PSO, moving toward the global best

g

* and the current best

for each particle

i

Figure 14.2

Random walk in 2D with a Gaussian step‐size distribution and the path of 100 steps starting at the origin (0, 0) (marked with •)

Figure 14.3

Lévy flights in consecutive 100 steps starting at the origin (0, 0) (marked with •)

List of Tables

Table 7.1

Transport problem with m suppliers and n demands

Table 7.2

Product portfolio of a small factory

Table 7.3

Production scheduling

Table 8.1

Two quantities with measured data

Table 8.2

Goodness of fit in terms of RSS

Table 8.3

AIC as the goodness of fit

Preface

Optimization and operations research are part of many university courses because they are important to many disciplines and applications such as engineering design, business planning, computer science, data mining, machine learning, artificial intelligence, and industries.

There are many books on the market, but most books are very specialized in the sense that they target at a specialized audience in a given subject such as applied mathematics, business management, computer science, or engineering. Most of these books, though with much details and often more than 500 pages, have focused on the traditional optimization techniques, commonly used in operations research. In some sense, some courses in the traditional operations research areas have focused too much on many traditional topics, and these topics have many excellent textbooks. However, optimization techniques have evolved over the last few decades, and some new techniques start to show their effectiveness and have become an integrated part of new mainstream methods. Though important traditional topics will be still introduced in this book, the contents on some techniques may be brief instead of overlapping with other books, we will provide relevant references to appropriate literature when necessary. Therefore, this book tries to address modern topics, as used mostly by researchers working in these newly emerging areas, spanning optimization, data science, machine intelligence, engineering, and computer sciences, so that the topics covered in this book are most relevant to the current areas of research and syllabus topics in optimization, data mining, machine learning, and operations research.

This book aims at an introductory level, introducing all the fundamentals of all the commonly used techniques so that students can see the broadness and diversity of the methods and algorithms and sample the flavor of each method (both traditional and new). The book covers almost all major optimization techniques with many worked examples. Topics include mathematical foundations, optimization formulation, optimality conditions, algorithmic complexity, linear programming, convex optimization, integer programming, artificial neural network, clustering and classifications, regression and least squares, constraint‐handling, queueing theory, support vector machine and multiobjective optimization, evolutionary computation, nature‐inspired algorithms, and others. Each topic is explained in detail with step‐by‐step examples to show how each method works, and exercises are also provided to test the acquired knowledge and potentially apply to real problem solving.

With an informal approach and more than 100 worked examples and exercises, this introductory book is especially suitable for both undergraduates and graduates to rapidly acquire the basic knowledge in optimization, operational research, machine learning, and data mining. This book can also serve as a reference for researchers and professionals in engineering, computer sciences, operations research, data mining, and management science.

Xin‐She Yang

Cambridge and LondonMarch 2018

Acknowledgements

I would like to thank my students who took my relevant courses and provided some feedback on the contents concerning optimization algorithms and examples. I also thank the participants for their comments at international conferences where I gave tutorials on nature‐inspired algorithms. I would also like to thank the editors, Kathleen Pagliaro, Vishnu Narayanan, Mindy Okura‐Marszycki, and staff at Wiley for their help and professionalism. Last but not least, I thank my family for their support.

Xin‐She Yang

Acronyms

AIC

Akaike information criterion

ANN

Artificial neural network

APSO

Accelerated particle swarm optimization

BA

Bat algorithm

BFGS

Broyden–Fletcher–Goldfarb–Shanno

BFR

Bradley–Fayyad–Reina

BIC

Bayesian information criterion

BPNN

Back propagation neural network

CNN

Convolutionary neural network

CPF

Cumulative probability function

CS

Cuckoo search

CURE

Clustering using representative

DE

Differential evolution

DL

Deep learning

ES

Evolution strategy

FA

Firefly algorithm

FPA

Flower pollination algorithm

GA

Genetic algorithm

IP

Integer programming

KKT

Karush–Kuhn–Tucker

LP

Linear programming

MIP

Mixed integer programming

ML

Machine learning

NSGA

Non‐dominated sorting genetic algorithm

NP

Non‐deterministic polynomial‐time

PCA

Principal component analysis

PDF

Probability density function

PSO

Particle swarm optimization

QP

Quadratic programming

RBM

Restricted Boltzmann machine

SA

Simulated annealing

SGA

Stochastic gradient ascent

SGD

Stochastic gradient descent

SQP

Sequential quadratic programming

SVM

Support vector machine

TSP

Traveling salesman problem

Introduction

Optimization is everywhere, though it can mean different things from different perspectives. From basic calculus, optimization can be simply used to find the maximum or minimum of a function such as in the real domain . In this case, we can either use a gradient‐based method or simply spot the solution due to the fact that x4 and x2 are always nonnegative, the minimum values of x4 and x2 are zero, thus has a minimum at . This can be confirmed easily by taking the first derivative of f(x) with respect to x, we have

(I.1)

which has only one solution because x is a real number. The condition seems to be sufficient to determine the optimal solution in this case. In fact, this function is convex with only one minimum in the whole real domain.

However, things become more complicated when f(x) is highly nonlinear with multiple optima. For example, if we try to find the maximum value of in the real domain, we can naively use

(I.2)

which has an infinite number of solutions for . There is no simple formula for these solutions, thus a numerical method has to be used to calculate these solutions. In addition, even with all the efforts to find these solutions, care has to be taken because the actual global maximum occurs at . However, this solution can only be found by taking the limit , and it is not part of the solutions from the above condition of . This highlights the potential difficulty for nonlinear problems with multiple optima or multi‐modality.

Furthermore, not all functions are smooth. For example, if we try to use to find the minimum of

(I.3)

we will realize that f(x) is not differentiable at , though the global minimum occurs at . In this case, optimization techniques that require the calculation of derivatives will not work.

Problems become more challenging in higher‐dimensional spaces. For example, the nonlinear function

(I.4)

where for , has a minimum at , but this function is not differentiable at x∗.

Therefore, optimization techniques have to be diverse to use gradient information when appropriate, and not to use it when it is not defined or not easily calculated. Though the above nonlinear optimization problems can be challenging to solve, constraints on the search domain and certain independent variables can make the search domain much more complicated, which can consequently make the problem even harder to solve. In addition, sometime, we have to optimize several objective functions instead of just one function, which will in turn make a problem more challenging to solve.

In general, it is possible to write most optimization problems in the general form mathematically

(I.5)
(I.6)
(I.7)

where fi(x), ϕj(x), and ψk(x) are functions of the design vector

(I.8)

Here, the components xi of x are called design or decision variables, and they can be real continuous, discrete, or the mixture of these two. The functions fi(x) where are called the objective functions, and in the case of , there is only a single objective, and the problem becomes single‐objective optimization. The objective function is sometimes called the cost function, loss function, or energy function in the literature. The space spanned by the decision variables is called the design space or search space ℝn, while the space formed by the objective function values is called the response space or objective landscape. The equalities for ϕj and inequalities for ψk are called constraints. It is worth pointing out that we can also write the inequalities in the other way , and we can also formulate the objectives as maximization.

If a solution vector x satisfies all the constraints ϕj(x) and ψk(x), it is called a feasible solution; otherwise, it is infeasible. All the feasible points or vectors form the feasible regions in the search space.

In a rare but extreme case where there is no objective at all, there are only constraints. Such a problem is called a feasibility problem and any feasible solution is an optimal solution to such problems.

If all the problem functions are linear, the problem becomes a linear programming (LP) problem. Here, the term “programming” means planning, it has nothing to do with computer programming in this context. Thus, mathematical optimization is also called mathematical programming. In addition, if the variables in LP only take integer values, the LP becomes an integer LP or simple integer programming (IP). If the variables can only take two values (0 or 1), such an LP is called binary IP.

If there are no constraints (i.e. and ), the problem becomes an unconstrained optimization problem. If but , it becomes equality‐constrained optimization. Conversely, if but , it becomes the inequality‐constrained problem.

The classification of optimization problems can also be done by looking at the modality of the objective landscape, which can be divided into multimodal problems and unimodal problems including convex optimization. In addition, classification can also be about the determinacy of the problem. If there is NO randomness in the formulation, the problem is called deterministic and in fact all the above problems are essentially deterministic. However, if there is uncertainty in the variables or function forms, then optimization involves probability distribution and expectation, such problems are often called stochastic optimization or robust optimization.

Classification of optimization problems and the terminologies used in the optimization literature can be diverse and confusing. We summarize most of these terms in Figure I.1.

Figure I.1 Classification of optimization problems.

Whether an optimization problem is considered easy or hard, it can depend on many factors and the actual perspective of mathematical formulations. In fact, three factors that make a problem more challenging are: nonlinearity of the objective function, the high dimensionality of the problem, and the complex shape of the search domain.

Nonliearity and multimodality: The high nonlinearity of a function to be optimized usually means multimodality with multiple local modes with multiple (even infinite) optima. In most cases, algorithms to solve such problems are more likely to get trapped in local modes.

High dimensionality: High dimensionality is associated with the so‐called curse of dimensionality where the distance becomes large, while the volume of any algorithm that can actually search becomes insignificant compared to the vast feasible space.

Constraints: The multiple complex constraints can make the feasible region complicated with irregular geometry or shapes. In some cases, feasible regions can be split into multiple disconnected regions with isolated islands, which makes it harder for algorithms to search all the feasible regions (thus potentially missing the true optimality).

Other factors such as the evaluation time of an objective are also important. In many applications such as protein folding, bio‐informatics, aero‐space engineering, and deep machine learning (ML), the evaluation of a single objective can take a long time (from a few hours to days or even weeks), therefore the computational costs can be very high. Thus, the choice of efficient algorithms becomes paramount.

Algorithms for solving optimization problems tend to be iterative, and thus multiple evaluations of objectives are needed, typically hundreds or thousands (or even millions) of evaluations. Different algorithms may have different efficiency and different requirements. For example, Newton’s method, which is gradient‐based, is very efficient for solving smooth objective functions, but they can get stuck in local modes if the objective is highly multimodal. If the objective is not smooth or has a kink, then the Nelder–Mead simplex method can be used because it is a gradient‐free method, and can work well even for problems with discontinuities, but it can become slow and get stuck in a local mode. Algorithms for solving nonlinear optimization are diverse, including the trust‐region method, interior‐point method, and others, but they are mostly local search methods. In a special case when the objective becomes convex, they can be very efficient. Quadratic programming (QP) and sequential quadratic programming use such convexity properties to their advantage.

The simplex method for solving LP can be efficient in practice, but it requires all the problem functions to be linear. But, if an LP problem has integer variables, the simplex method will not work directly, it has to be combined with branch and bound to solve IP problems.

As traditional methods are usually local search algorithms, one of the current trends is to use heuristic and metaheuristic algorithms. Here meta‐ means “beyond” or “higher level,” and they generally perform better than simple heuristics. In addition, all metaheuristic algorithms use certain tradeoff of randomization and local search. It is worth pointing out that there are no agreed definitions of heuristics and metaheuristics in the literature, some use “heuristics” and “metaheuristics” interchangeably. However, recent trends tend to name all stochastic algorithms with randomization and local search as metaheuristic. Here, we will also use this convention. Randomization provides a good way to move away from local search to the search on the global scale. Therefore, almost all metaheuristic algorithms intend to be suitable for global optimization, though global optimality may be still challenging to achieve for most problems in practice.

Most metaheuristic algorithms are nature‐inspired as they have been developed based on some abstraction of nature. Nature has evolved over millions of years and has found perfect solutions to almost all the problems she met. We can thus learn the success of problem‐solving from nature and develop nature‐inspired heuristic and/or metaheuristic algorithms. More specifically, some nature‐inspired algorithms are inspired by Darwin’s evolutionary theory. Consequently, they are said to be biology‐inspired or simply bio‐inspired.

Part IFundamentals

1Mathematical Foundations

The basic requirements of this book are the fundamental knowledge of functions, basic calculus, and vector algebra. However, we will review here the most relevant fundamentals of functions, vectors, differentiation, and integration. Then, we will introduce some useful concepts such as eigenvalues, complexity, convexity, probability distributions, and optimality conditions.

1.1 Functions and Continuity

1.1.1 Functions

Loosely speaking, a function is a quantity (say y) which varies with another independent quantity or variable x in a deterministic way. For example, a simple quadratic function is

(1.1)

For any given value of x, there is a unique corresponding value of y. By varying x smoothly, we can vary y in such a manner that the point (x, y) will trace out a curve on the x–y plane (see Figure 1.1). Thus, x is called the independent variable, and y is called the dependent variable or function. Sometimes, in order to emphasize the relationship as a function, we use f(x) to express a generic function, showing that it is a function of x. This can also be written as .

Figure 1.1 A simple quadratic function .

The domain of a function is the set of numbers x for which the function f(x) is valid (that is, f(x) gives a valid value for a corresponding value of x). If a function is defined over a range , we say its domain is [a, b] that is called a closed interval. If both a and b are not included, we have , which is denoted by (a, b), and we call this interval an open interval. If b is included, while a is not, we have , and we often write this half‐open and half‐closed interval as . Thus, the domain of function is the whole set of all real numbers ℝ, so we have

(1.2)

Here, the notation means infinity. In this case, the domain of is all the real numbers, which can be simply written as ℝ, that is, .

All the values that a function can take for a given domain form the range of the function. Thus, the range of is or . Here, the closed bracket “[” means that the value (here 0) is included as part of the range, while the open round bracket “)” means that the value (here +) is not part of the range.

1.1.2 Continuity

A function is called continuous if an infinitely small change δ of the independent variable x always lead to an infinitely small change in . Alternatively, we can loosely view that the graph representing the function forms a single piece, unbroken curve. More formally, we can say that for any small change of the independent variable in the domain, there is an such that

(1.3)

This is the continuity condition. Obviously, functions such as x, , and x2 are all continuous.

If a function does not satisfy the continuity condition, then the function is called discontinuous. For example, the Heaviside step function

(1.4)

is discontinuous at as shown in Figure 1.2 where the solid dot means that is included in the right branch , and the hollow dot means that it is not included. In this case, this function is called a right‐continuous function.

Figure 1.2 Discontinuity of the Heaviside step function at .

1.1.3 Upper and Lower Bounds

For a given non‐empty set of real numbers, we now introduce some important concepts such as the supremum and infimum. A number is called an upper bound for S if for all . An upper bound β is said to be the least (or smallest) upper bound for S, or the supremum, if for any upper bound . This is often written as

(1.5)

All such notations are widely used in the literature of mathematical analysis. Here, “” denotes both equality and definition, which means “is identical to.”

On the other hand, a number L is called a lower bound for S if for all x in S (that is, for all x, denoted by ). A lower bound α is referred to as the greatest (or largest) lower bound if for any lower bound L, which is written as

(1.6)

In general, both the supremum β and the infimum α, if they exist, may or may not belong to S.

Example 1.1

For example, any numbers greater than 5, say, 7.2 and 500 are an upper bound for the interval or . However, its smallest upper bound (or sup) is 5. Similarly, numbers such as and are lower bound of the interval, but is the greatest lower bound (or inf). In addition, the interval has an infimum of 15 but it has no upper bound. That is to say, its supremum does not exist, or .

There is an important completeness axiom which says that if a non‐empty set of real numbers is bounded above, then it has a supremum. Similarly, if a non‐empty set of real numbers is bounded below, then it has an infimum.

Furthermore, the maximum for S is the largest value of all elements , and often written as max(S) or max S, while the minimum, min(S) or min S, is the smallest value among all . For the same interval , the maximum of this interval is 5 which is equal to its supremum, while its minimum 5 is also equal to its infimum. Though the supremum and infimum are not necessarily part of the set S, however, the maximum and minimum (if they exist) always belong to the set.

However, the concepts of supremum (or infimum) and maximum (or minimum) are not the same, and maximum/minimum may not always exist.

Example 1.2

For example, the interval or has the supremum of , but S has no maximum.

Similarly, the interval does not have a minimum, though its infimum is . Furthermore, the open interval has no maximum or minimum; however, its supremum is 7, and infimum is .

It is worth pointing out that the problems we will discuss in this book will always have at least a maximum or a minimum.

1.2 Review of Calculus

1.2.1 Differentiation

The gradient or first derivative of a function f(x) at point x is defined by

(1.7)

From the definition and basic function operations, it is straightforward to show that

(1.8)

In addition, for any two functions f(x) and g(x), we have

(1.9)

where a, b are two real constants. Therefore, it is easy to show that

(1.10)

where k is a constant. This means that a family of functions shifted by a different constant k will have the same gradient at the same point x, as shown in Figure 1.3.

Figure 1.3 The gradients of a family of curves (where at any point x are the same .

Some useful differentiation rules are the product rule

(1.11)

and the chair rule

(1.12)

where f(g(x)) is a composite function, which means that f is a function of g, and g is a function of x.

Example 1.3

For example, from and , we have

Similarly, we have

From the product rule (1.11), if we replace g(x) by 1/g(x), we have

(1.13)

and

(1.14)

which is the well‐known quotient rule. For example, we have

(1.15)

It is worth pointing out that a continuous function may not have well‐defined derivatives. For example, the absolute or modulus function

(1.16)

does not have a well‐defined gradient at x because the gradient of is +1 if x approaches 0 from (using notation ). However, if x approaches 0 from (using notation ), the gradient of is . That is

(1.17)

In this case, we say that is not differentiable at . However, for , we can write

(1.18)

Example 1.4

A nature extension of this is that for a function f(x), we have

(1.19)

As an example, for , we have

Higher derivatives of a univariate real function can be defined as

(1.20)

for all positive integers .

Example 1.5

The first, second, and third derivatives of are

and

For a continuous function f(x), if its first derivatives are well defined at every point in the whole domain, the function is called differentiable or a continuously differential function. A continuously differentiable function is said to be class C1 if its first derivative exists and is continuous. Similarly, a function is said to be class C2 if its both first and second derivatives exist, and are continuous. This can be extended to class Ck in a similar manner. If the derivatives of all orders (all positive integers) everywhere in its domain, the function is called smooth.

It is straightforward to check that , sin(x), exp(x), and are all smooth functions, but and are not smooth. Some of these functions are shown in Figure 1.4.

Figure 1.4 Smooth functions (left) and non‐smooth (but continuous) functions (right).

1.2.2 Taylor Expansions

In numerical methods and some mathematical analysis, series expansions make some calculations easier. For example, we can write the exponential function ex as a series about as

(1.21)

Now let us try to determine these coefficients. At , we have

(1.22)

which gives . In order to reduce the power or order of the expansion so that we can determine the next coefficient, we first differentiate both sides of Eq. (1.21) once; we have

(1.23)

By setting again , we have

(1.24)

which gives . Similarly, differentiating it again, we have

(1.25)

At , we get

(1.26)

or !. Here, is the factorial of 2. In general, the factorial n ! is defined as .

Following the same procedure and differentiating it n times, we have

(1.27)

and leads to !. Therefore, the final series expansion can be written as

(1.28)

which is an infinite series. Obviously, we can follow a similar process to expand other functions. We have seen here the importance of differentiation and derivatives.

If we know the value of f(x) at x0, we can use some approximations in a small interval (see Figure 1.5). Following the same idea as Eq. (1.21), we can first write the approximation in the following general form:

(1.29)

Figure 1.5 Expansion and approximations for , where .

and then try to figure out the unknown coefficients . For the above approximation to be valid at , we have

(1.30)

so that .

Now let us first take the first derivative of Eq. (1.29),

(1.31)

By setting , we have

(1.32)

which gives

(1.33)

Similarly, we differentiate Eq. (1.29) twice with respect to x and we have

(1.34)

Setting , we have

(1.35)

Following the same procedure, we have

(1.36)

Thus, we finally obtain

(1.37)

which is the well‐known Taylor series.

In a special case when and , the above Taylor series becomes zero centered, and such expansions are traditionally called Maclaurin series

(1.38)

named after mathematician Colin Maclaurin.

In theory, we can use as many terms as possible, but in practice, the series converges very quickly and only a few terms are sufficient. It is straightforward to verify that the exponential series for ex is identical to the results given earlier. Now let us look at other examples.

Example 1.6

Let us expand about . We know that

or , which means that

where the angle x is in radians.

For example, we know that . We now use the expansion to estimate it for ,

which is very close to the true value 1/2.

If we continue the process to infinity, we then reach the infinite power series and the error f(n)(0)xn/n ! becomes negligibly small if the series converges. For example, some common series are

(1.39)
(1.40)
(1.41)

and

(1.42)

As an exercise, we leave the reader to prove the above series.

1.2.3 Partial Derivatives

For multivariate functions, we can define the partial derivatives with respect to an independent variable by assuming that other independent variables are constants. For example, for a function f(x, y) with two independent variables, we can define

(1.43)

and

(1.44)

Similar to the ordinary derivatives, partial derivative operations are also linear.

Example 1.7

For , its partial derivatives are

where we have treated y as a constant and also used the fact that because x and y are both independent variables. Similarly, we have

We can define higher‐order derivatives as

(1.45)

and

(1.46)

Let us revisit the previous example.

Example 1.8

For , we have