83,99 €
Fully describes optimization methods that are currently most valuable in solving real-life problems. Since optimization has applications in almost every branch of science and technology, the text emphasizes their practical aspects in conjunction with the heuristics useful in making them perform more reliably and efficiently. To this end, it presents comparative numerical studies to give readers a feel for possibile applications and to illustrate the problems in assessing evidence. Also provides theoretical background which provides insights into how methods are derived. This edition offers revised coverage of basic theory and standard techniques, with updated discussions of line search methods, Newton and quasi-Newton methods, and conjugate direction methods, as well as a comprehensive treatment of restricted step or trust region methods not commonly found in the literature. Also includes recent developments in hybrid methods for nonlinear least squares; an extended discussion of linear programming, with new methods for stable updating of LU factors; and a completely new section on network programming. Chapters include computer subroutines, worked examples, and study questions.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 942
Veröffentlichungsjahr: 2013
Contents
Preface
Table of Notation
PART 1 UNCONSTRAINED OPTIMIZATION
Chapter 1 Introduction
1.1 HISTORY AND APPLICATIONS
1.2 MATHEMATICAL BACKGROUND
QUESTIONS FOR CHAPTER1
Chapter 2 Structure of Methods
2.1 CONDITIONS FOR LOCAL MINIMA
2.2 AD HOC METHODS
2.3 USEFUL ALGORITHMIC PROPERTIES
2.4 QUADRATIC MODELS
2.5 DESCENT METHODS AND STABILITY
2.6 ALGORITHMS FOR THE LINE SEARCH SUBPROBLEM
QUESTIONS FOR CHAPTER2
Chapter 3 Newton-like Methods
3.1 NEWTON’S METHOD
3.2 QUASI-NEWTON METHODS
3.3 INVARIANCE, METRICS AND VARIATIONAL PROPERTIES
3.4 THE BROYDEN FAMILY
3.5 NUMERICAL EXPERIMENTS
3.6 OTHER FORMULAE
QUESTIONS FOR CHAPTER 3
Chapter 4 Conjugate Direction Methods
4.1 CONJUGATE GRADIENT METHODS
4.2 DIRECTION SET METHODS
QUESTIONS FOR CHAPTER 4
Chapter 5 Restricted Step Methods
5.1 A PROTOTYPE ALGORITHM
5.2 LEVENBERG-MARQUARDT METHODS
QUESTIONS FOR CHAPTER 5
Chapter 6 Sums of Squares and Nonlinear Equations
6.1 OVER-DETERMINED SYSTEMS
6.2 WELL-DETERMINED SYSTEMS
6.3 NO-DERIVATIVE METHODS
QUESTIONS FOR CHAPTER 6
PART 2 CONSTRAINED OPTIMIZATION
Chapter 7 Introduction
7.1 PREVIEW
7.2 ELIMINATION AND OTHER TRANSFORMATIONS
QUESTIONS FOR CHAPTER 7
Chapter 8 Linear Programming
8.1 STRUCTURE
8.2 THE SIMPLEX METHOD
8.3 OTHER LP TECHNIQUES
8.4 FEASIBLE POINTS FOR LINEAR CONSTRAINTS
8.5 STABLE AND LARGE-SCALE LINEAR PROGRAMMING
8.6 DEGENERACY
8.7 POLYNOMIAL TIME ALGORITHMS
QUESTIONS FOR CHAPTER 8
Chapter 9 The Theory of Constrained Optimization
9.1 LAGRANGE MULTIPLIERS
9.2 FIRST ORDER CONDITIONS
9.3 SECOND ORDER CONDITIONS
9.4 CONVEXITY
9.5 DUALITY
QUESTIONS FOR CHAPTER 9
Chapter 10 Quadratic Programming
10.1 EQUALITY CONSTRAINTS
10.2 LAGRANGIAN METHODS
10.3 ACTIVE SET METHODS
10.4 ADVANCED FEATURES
10.5 SPECIAL QP PROBLEMS
10.6 COMPLEMENTARY PIVOTING AND OTHER METHODS
QUESTIONS FOR CHAPTER 10
Chapter 11 General Linearly Constrained Optimization
11.1 EQUALITY CONSTRAINTS
11.2 INEQUALITY CONSTRAINTS
11.3 ZIGZAGGING
QUESTIONS FOR CHAPTER 11
Chapter 12 Nonlinear Programming
12.1 PENALTY AND BARRIER FUNCTIONS
12.2 MULTIPLIER PENALTY FUNCTIONS
12.3 THE L1 EXACT PENALTY FUNCTION
12.4 THE LAGRANGE-NEWTON METHOD (SQP)
12.5 NONLINEAR ELIMINATION AND FEASIBLE DIRECTION METHODS
12.6 Other Methods
QUESTIONS FOR CHAPTER12
Chapter 13 Other Optimization Problems
13.1 INTEGER PROGRAMMING
13.2 GEOMETRIC PROGRAMMING
13.3 NETWORK PROGRAMMING
QUESTIONS FOR CHAPTER13
Chapter 14 Non-Smooth Optimization
14.1 INTRODUCTION
14.2 OPTIMALITY CONDITIONS
14.3 EXACT PENALTY FUNCTIONS
14.4 ALGORITHMS
14.5 A GLOBALLY CONVERGENT PROTOTYPE ALGORITHM
14.6 CONSTRAINED NON-SMOOTH OPTIMIZATION
QUESTIONS FOR CHAPTER14
References
Subject Index
Copyright © 1987 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England Telephone (+44) 1243 779777
Email (for orders and customer service enquiries): [email protected] our Home Page on www.wileyeurope.com or www.wiley.co.uk
All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to [email protected], or faxed to (+44) 1243 770571.
This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Other Wiley Editorial Offices
John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA
Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA
Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany
John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia
John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809
John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
ISBN: 13: 978 0 471 91547 8 (HB)ISBN: 13: 978 0 471 49463 8 (PB)
Preface
The subject of optimization is a fascinating blend of heuristics and rigour, of theory and experiment. It can be studied as a branch of pure mathematics, yet has applications in almost every branch of science and technology. This book aims to present those aspects of optimization methods which are currently of foremost importance in solving real life problems. I strongly believe that it is not possible to do this without a background of practical experience into how methods behave, and I have tried to keep practicality as my central theme. Thus basic methods are described in conjunction with those heuristics which can be valuable in making the methods perform more reliably and efficiently. In fact I have gone so far as to present comparative numerical studies, to give the feel for what is possible, and to show the importance (and difficulty) of assessing such evidence. Yet one cannot exclude the role of theoretical studies in optimization, and the scientist will always be in a better position to use numerical techniques effectively if he understands some of the basic theoretical background. I have tried to present such theory as shows how methods are derived, or gives insight into how they perform, whilst avoiding theory for theory’s sake.
Some people will approach this book looking for a suitable text for undergraduate and postgraduate classes. I have used this material (or a selection from it) at both levels, in introductory engineering courses, in Honours mathematics lectures, and in lecturing to M.Sc. and Ph.D. students. In an attempt to cater for this diversity, I have used a Jekyll and Hyde style in the book, in which the more straightforward material is presented in simple terms, whilst some of the more difficult theoretical material is nonetheless presented rigorously, but can be avoided if need be. I have also tried to present worked examples for most of the basic methods. One observation of my own which I pass on for what it is worth is that the students gain far more from a course if they can be provided with computer subroutines for a few of the standard methods, with which they can perform simple experiments for themselves, to see for example how badly the steepest descent method handles Rosenbrock’s problem, and so on.
In addition to the worked examples, each chapter is terminated by a set of questions which aim to not only illustrate but also extend the material in the text. Many of the questions I have used in tutorial classes or examination papers. The reader may find a calculator (and possibly a programmable calculator) helpful in some cases. A few of the questions are taken from the Dundee Numerical Analysis M.Sc. examination, and are open book questions in the nature of a one day mini research project.
The second edition of the book combines the material in Volumes 1 and 2 of the first edition. Thus unconstrained optimization is the subject of Part 1 and covers the basic theoretical background and standard techniques such as line search methods, Newton and quasi-Newton methods and conjugate direction methods. A feature not common in the literature is a comprehensive treatment of restricted step or trust region methods, which have very strong theoretical properties and are now preferred in a number of situations. The very important field of nonlinear equations and nonlinear least squares (for data fitting applications) is also treated thoroughly. Part 2 covers constrained optimization which overall has a greater degree of complexity on account of the presence of the constraints. I have covered the theory of constrained optimization in a general (albeit standard) way, looking at the effect of first and second order perturbations at the solution. Some books prefer to emphasize the part played by convex analysis and duality in optimization problems. I also describe these features (in what I hope is a straightforward way) but give them lesser priority on account of their lack of generality.
Most finite dimensional problems of a continuous nature have been included in the book but I have generally kept away from problems of a discrete or combinatorial nature since they have an entirely different character and the choice of method can be very specialized. In this case the nearest thing to a general purpose method is the branch and bound method, and since this is a transformation to a sequence of continuous problems of the type covered in this volume, I have included a straightforward description of the technique. A feature of this book which I think is lacking in the literature is a treatment of non-differentiable optimization which is reasonably comprehensive and covers both theoretical and practical aspects adequately. I hope that the final chapter meets this need. The subject of geometric programming is also included in the book because I think that it is potentially valuable, and again I hope that this treatment will turn out to be more straightforward and appealing than others in the literature. The subject of nonlinear programming is covered in some detail but there are difficulties in that this is a very active research area. To some extent therefore the presentation mirrors my assessment and prejudice as to how things will turn out, in the absence of a generally agreed point of view. However, I have also tried to present various alternative approaches and their merits and demerits. Linear constraint programming, on the other hand, is now well developed and here the difficulty is that there are two distinct points of view. One is the traditional approach in which algorithms are presented as generalizations of early linear programming methods which carry out pivoting in a tableau. The other is a more recent approach in terms of active set strategies: I regard this as more intuitive and flexible and have therefore emphasized it, although both methods are presented and their relationship is explored.
This second edition has given me the opportunity to improve the presentation of some parts of the book and to introduce new developments and a certain amount of new material. In Part 1 the description of line searches is improved and some new results are included. The variational properties of the BFGS and DFP methods are now described in some detail. More simple proofs of the properties of trust region methods are given. Recent developments in hybrid methods for nonlinear least squares are described. A thorough treatment of the Dennis–Moré theorem characterizing superlinear convergence in nonlinear systems is given and its significance is discussed. In Part 2 the treatment of linear programming has been extended considerably and includes new methods for stable updating of LU factors and the reliable treatment of degeneracy. Also, important recent developments in polynomial time algorithms are described and discussed, including ellipsoid algorithms and Karmarkar’s method. The treatment of quadratic programming now includes a description of range space and dual active set methods. For general linear constraint programming some new theorems are given, including convergence proofs for a trust region method. The chapter on nonlinear programming now includes an extra section giving a direct treatment of the L1 exact penalty function not requiring any convex analysis. New developments in sequential quadratic programming (SQP) are described, particularly for the case that only the reduced Hessian matrix is used. A completely new section on network programming is given relating numerical linear algebraic and graph theoretic concepts and showing their application in various types of optimization problem. For non-smooth optimization, Osborne’s concept of structure functionals is used to unify the treatment of regularity for second order conditions and to show the equivalence to nonlinear programming. It is also used to demonstrate the second order convergence of a non-smooth SQP algorithm. The Maratos effect and the use of second order corrections are described. Finally a new section giving optimality conditions for constrained composite non-smooth optimization is included. A considerable number of new exercises is also given.
It is a great pleasure to me to acknowledge those many people who have influenced my thinking and contributed to my often inadequate knowledge. Amongst many I must single out the assistance and encouragement given to me by Professor M. J. D. Powell, my former colleague at AERE Harwell, and one whose contributions to the subject are unsurpassed. I am also indebted to Professor A. R. Mitchell and other members of the University of Dundee for providing the stimulating and yet relaxed environment in which this book was prepared. I also wish to thank Professor D. S. Jones for his interest and encouragement in publishing the book, and Drs M. P. Jackson, G. A. Watson and R. S. Womersley for their constructive advice on the contents. I gratefully acknowledge those various people who have taken the trouble to write or otherwise inform me of errors, misconceptions, etc., in text. Whilst this new edition has given me the opportunity to correct previous errors, it has also inevitably enabled me to introduce many new ones for which I apologise in advance. I am also grateful for the invaluable secretarial help that I have received over the years in preparing various drafts of this book.
Last, but foremost, I wish to dedicate this book to my parents and family as some small acknowledgement of their unfailing love and affection.
Dundee, December 1986
R. Fletcher
Table of Notation
Optimization might be defined as the science of determining the ‘best’ solutions to certain mathematically defined problems, which are often models of physical reality. It involves the study of optimality criteria for problems, the determination of algorithmic methods of solution, the study of the structure of such methods, and computer experimentation with methods both under trial conditions and on real life problems. There is an extremely diverse range of practical applications. Yet the subject can be studied (not here) as a branch of pure mathematics.
Before 1940 relatively little was known about methods for numerical optimization of functions of many variables. There had been some least squares calculations carried out, and steepest descent type methods had been applied in some physics problems. The Newton method in many variables was known, and more sophisticated methods were being attempted such as the self-consistent field method for variational problems in theoretical chemistry. Nonetheless anything of any complexity demanded armies of assistants operating desk calculating machines. There is no doubt therefore that the advent of the computer was paramount in the development of optimization methods and indeed in the whole of numerical analysis. The 1940s and 1950s saw the introduction and development of the very important branch of the subject known as linear programming. (The term ‘programming’ by the way is synonymous with ‘optimization’ and was originally used to mean optimization in the sense of optimal planning.) All these methods however had a fairly restricted range of application, and again in the post-war period the development of ‘hill-climbing’ methods took place—methods of wide applicability which did not rely on any special structure in the problem. The latter methods were at first very crude and inefficient, but the subject was again revolutionized in 1959 with the publication of a report by W. C. Davidon which led to the introduction of variable metric methods. My friend and colleague M. J. D. Powell describes a meeting he attended in 1961 in which the speakers were telling of the difficulty of minimizing functions of ten variables, whereas he had just programmed a method based on Davidon’s ideas which had solved problems of 100 variables in a short time. Since that time the development of the subject has proceeded apace and has included methods for a wide variety of problems. This book describes these developments in what is hoped will be a systematic and comprehensive way.
The applicability of optimization methods is widespread, reaching into almost every activity in which numerical information is processed (Science, Engineering, Mathematics, Economics, Commerce, etc.). To provide a comprehensive account of all these applications would therefore be unrealistic, but a selection might include:
and applications to other branches of numerical analysis:
Figure 1.1.1 A model distillation column
This book however is not concerned with applications, except insofar as they indicate the different types of optimization problem which arise. It is possible to categorize these into a relatively small number of standard problems and to state algorithms for each one. The user’s task is to discover into what category his problem fits, and then to call up the appropriate optimization subroutine on a computer. This subroutine will specify to the user how the problem data is to be presented, for example nonlinear functions usually have to be programmed in a user-written subroutine in a certain standard format. It is also as well to remember that in practice the solution of an optimization problem is not the only information that the user might need. He will often be interested in the sensitivity of the solution to changes in the parameters, especially so if the mathematical model is not a close approximation to reality, or if he cannot build his design to the same accuracy as the solution. He may indeed be interested in the variation of the solution obtained by varying some parameters over wide ranges, and it is often possible to provide this information without re-solving the problem numerous times.
This book therefore is concerned with some of the various standard optimization problems which can arise. In fact the material is divided into Part 1 and Part 2. Part 1 is devoted to the subject of unconstrained optimization, in which the optimum value is sought of an objective function of many variables, without any constraints. This problem is important in its own right and also as a major tool in solving some constrained problems. Also many of the ideas carry over into constrained optimization. The special case of sums of squares functions, which arise in data fitting problems, is also considered. This also includes the solution of sets of simultaneous nonlinear equations, which is an important problem in its own right, but which is often solved by optimization methods. Part 2 is devoted to constrained optimization in which the additional complication arises of the various types of constraint referred to above. An overview of constrained optimization is given in Section 7.
In this book a selection has had to be made amongst the extensive literature about optimization methods. I have been concerned to present practical methods (and associated theory) which have been implemented and for which a body of satisfactory numerical experience exists. I am equally concerned about reliability of algorithms and whether there is proof or good reason to think that convergence to a solution will occur at a reasonably rapid rate. However, I shall also be trying to point out which new ideas in the subject I feel are significant and which might lead to future developments. Many people may read this book seeking a particular algorithm which best solves their specific problem. Such advice is not easy to give, especially in that the decision is not as clear-cut as it may seem. There are many special cases which should be taken into account, for instance the relative ease of computing the function and its derivatives. Similarly, considerations of how best to pose the problem in the first instance are relevant to the choice of method. Finally, and of most importance, the decision is subject to the availability of computer subroutines or packages which implement the methods. However some program libraries now give a decision tree in the documentation to help the user choose his method. Whilst these are valuable, they should only be used as a rough guide, and never as a substitute for common sense or the advice of a specialist in optimization techniques.
The book relies heavily on the concepts and techniques of matrix algebra and numerical linear algebra, which are not set out here (see Broyden, 1975, for example), although brief explanations are given in passing in certain cases. A vector is represented by a lower case bold letter (e.g. a) and usually refers to a column vector. A matrix is referred to by a bold upper case letter (e.g. B). That is
The ideas of vector spaces are also used, although often only in a simple minded way. A pointx in n-dimensional space (n) is the vector (x1, x2,…, xn)T, where x1 is the component in the first coordinate direction, and so on. Most of the methods to be described are iterative methods which generate a sequence of points, x(1), x(2), x(3),… say, or {x(k)} (the superscripts denoting iteration number), hopefully converging to a fixed point x* which is the solution of the problem (see Figure 1.2.2). The idea of a line is important, and is the set of points
(1.2.1)
for all a (sometimes for all α ≥ 0; this is strictly a half-line), in which x′ is a fixed
Figure 1.2.1 A line in two dimensions
The calculus of any function of many variables, f(x) say, is clearly important. Some pictorial intuition for two variable problems is often gained by drawing contours (surfaces along which f(x) is constant). A well-known test function for optimization methods is Rosenbrock’s function
(1.2.2)
the contours for which are shown in Figure 1.2.2. Some other contours are
Figure 1.2.2 Contours for Rosenbrock’s function, equation (1.2.2)
illustrated in Figure 6.2.2 in Chapter 6. In general it will be assumed that the problem functions which arise are smooth, that is continuous and continuously (Fréchet) differentiable Therefore for a function f(x) at any point x there is a vector of first partial derivatives, or gradient vector
(1.2.3)
where denotes the gradient operator (∂/∂x1,…,∂/∂xn)T. If f(x) is twice continuously differentiable (2) then there exists a matrix of second partial derivatives or Hessian matrix, written 2f(x), for which the i, jth element is ∂2f/(∂xi∂xj). This matrix is square and symmetric. Since any column (the jth, say) is (∂f/∂xi), the matrix can strictly be written as (fT). For example, in (1.2.2)
(1.2.4)
and this illustrates that f and 2f will in general depend upon x, and vary from point to point. Thus at and by substitution into (1.2.4).
These expressions can be used to determine the derivatives of f along any line x(α) in (1.2.1). By the chain rule
(1.2.5)
so the slope of f(= f(x(α)) along the line at any point x(α) is
(1.2.6)
Likewise the curvature along the line is
(1.2.7)
Special cases of many variable functions include the general linear function which can be written
(1.2.8)
where a and b are constant. (Strictly this should be described as an affine function on account of the existence of the constant b. However, the use of linear to describe a function whose graph is a line (or a hyperplane) is common in optimization. I do not intent to depart from this usage, but apologise to the erudite reader.) If the coordinate vector
(1.2.9)
(1.2.10)
Figure 1.2.3 Properties of the gradient vector
(1.2.11)
where G, b, and c are constant and G is symmetric, or as
(1.2.12)
(1.2.13)
(1.2.14)
(1.2.15)
that is the Hessian matrix maps differences in position into differences in gradient. This result is used widely.
An indispensable technique for handling more general smooth functions of many variables is the Taylor series. For functions of one variable the infinite series is
(1.2.16)
although the series may be truncated after the term in αp, replacing f(p)(0) by f(p)(ξ) where ξ∈[0, α]. An integral form of the remainder can also be used. Now let f(α)= f(x(α)) be the value of a function of many variables along the line x(α) (see (1.2.1)). Then using (1.2.6) and (1.2.7) in (1.2.16)
(1.2.17)
(1.2.18)
These are two forms of the many variable Taylor series. Furthermore, consider applying (1.2.18) to the function ∂f(x)/∂xi. Since (∂f(x)/∂xi) is the ith column of the Hessian matrix 2f, it follows that
(1.2.19)
which is a Taylor series expansion for the gradient of f. Neglecting the higher terms in the limit h→0, then this reduces to (1.2.15) showing that a general function behaves like a quadratic function in a sufficiently small neighbourhood of x′.
It is hoped that a grasp of simple mathematical concepts such as these will enable the reader to follow most of the developments in the book. In certain places more complicated mathematics is used without detailed explanation. This-is usually in an attempt to establish important results rigorously; however they often can be skipped over without losing the thread of the explanation. A summary of the notations used in the book is given immediately following the Preface.
1.1. Obtain expressions for the gradient vector and Hessian matrix for the functions of n variables:
1.3. Write down the Taylor expansion for the m-vector f(x) about x′, where fT is denoted by A.
1.5. At a point x′ for which g’ ≠ 0, show that the direction vectors ±g′ are orthogonal to the contour and the tangent plane surface at x′.
(Some other questions which partly refer to the material of Section 1.2 are given at the end of Chapter 2.)
In the following chapters the problem of finding a local solution to the problem
(2.1.1)
is considered. f(x) is referred to as the objective function, and the minimizing point or minimizer is denoted by x*. Optimization problems also exist which are maximization problems and these can be cast in the form of (2.1.1) through the simple transformation
(2.1.2)
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
