Optimal Learning - Warren B. Powell - E-Book

Optimal Learning E-Book

Warren B. Powell

0,0
114,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Learn the science of collecting information to make effective decisions Everyday decisions are made without the benefit of accurate information. Optimal Learning develops the needed principles for gathering information to make decisions, especially when collecting information is time-consuming and expensive. Designed for readers with an elementary background in probability and statistics, the book presents effective and practical policies illustrated in a wide range of applications, from energy, homeland security, and transportation to engineering, health, and business. This book covers the fundamental dimensions of a learning problem and presents a simple method for testing and comparing policies for learning. Special attention is given to the knowledge gradient policy and its use with a wide range of belief models, including lookup table and parametric and for online and offline problems. Three sections develop ideas with increasing levels of sophistication: * Fundamentals explores fundamental topics, including adaptive learning, ranking and selection, the knowledge gradient, and bandit problems * Extensions and Applications features coverage of linear belief models, subset selection models, scalar function optimization, optimal bidding, and stopping problems * Advanced Topics explores complex methods including simulation optimization, active learning in mathematical programming, and optimal continuous measurements Each chapter identifies a specific learning problem, presents the related, practical algorithms for implementation, and concludes with numerous exercises. A related website features additional applications and downloadable software, including MATLAB and the Optimal Learning Calculator, a spreadsheet-based package that provides an introduc-tion to learning and a variety of policies for learning.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 693

Veröffentlichungsjahr: 2013

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



CONTENTS

PREFACE

ACKNOWLEDGMENTS

CHAPTER 1 THE CHALLENGES OF LEARNING

1.1 LEARNING THE BEST PATH

1.2 AREAS OF APPLICATION

1.3 MAJOR PROBLEM CLASSES

1.4 THE DIFFERENT TYPES OF LEARNING

1.5 LEARNING FROM DIFFERENT COMMUNITIES

1.6 INFORMATION COLLECTION USING DECISION TREES

1.7 WEBSITE AND DOWNLOADABLE SOFTWARE

1.8 GOALS OF THIS BOOK

PROBLEMS

CHAPTER 2 ADAPTIVE LEARNING

2.1 THE FREQUENTIST VIEW

2.2 THE BAYESIAN VIEW

2.3 UPDATING FOR NON-GAUSSIAN PRIORS

2.4 MONTE CARLO SIMULATION

2.5 WHY DOES IT WORK?*

2.6 BIBLIOGRAPHIC NOTES

PROBLEMS

CHAPTER 3 THE ECONOMICS OF INFORMATION

3.1 AN ELEMENTARY INFORMATION PROBLEM

3.2 THE MARGINAL VALUE OF INFORMATION

3.3 AN INFORMATION ACQUISITION PROBLEM

3.4 BIBLIOGRAPHIC NOTES

PROBLEMS

CHAPTER 4 RANKING AND SELECTION

4.1 THE MODEL

4.2 MEASUREMENT POLICIES

4.3 EVALUATING POLICIES

4.4 MORE ADVANCED TOPICS*

4.5 BIBLIOGRAPHIC NOTES

PROBLEMS

CHAPTER 5 THE KNOWLEDGE GRADIENT

5.1 THE KNOWLEDGE GRADIENT FOR INDEPENDENT BELIEFS

5.2 THE VALUE OF INFORMATION AND THE S-CURVE EFFECT

5.3 KNOWLEDGE GRADIENT FOR CORRELATED BELIEFS

5.4 ANTICIPATORY VERSUS EXPERIENTIAL LEARNING

5.5 THE KNOWLEDGE GRADIENT FOR SOME NON-GAUSSIAN DISTRIBUTIONS

5.6 RELATIVES OF THE KNOWLEDGE GRADIENT

5.7 THE PROBLEM OF PRIORS

5.8 DISCUSSION

5.9 WHY DOES IT WORK?*

5.10 BIBLIOGRAPHIC NOTES

PROBLEMS

CHAPTER 6 BANDIT PROBLEMS

6.1 THE THEORY AND PRACTICE OF GITTINS INDICES

6.2 VARIATIONS OF BANDIT PROBLEMS

6.3 UPPER CONFIDENCE BOUNDING

6.4 THE KNOWLEDGE GRADIENT FOR BANDIT PROBLEMS

6.5 BIBLIOGRAPHIC NOTES

PROBLEMS

CHAPTER 7 ELEMENTS OF A LEARNING PROBLEM

7.1 THE STATES OF OUR SYSTEM

7.2 TYPES OF DECISIONS

7.3 EXOGENOUS INFORMATION

7.4 TRANSITION FUNCTIONS

7.5 OBJECTIVE FUNCTIONS

7.6 EVALUATING POLICIES

7.7 DISCUSSION

7.8 BIBLIOGRAPHIC NOTES

PROBLEMS

CHAPTER 8 LINEAR BELIEF MODELS

8.1 APPLICATIONS

8.2 A BRIEF REVIEW OF LINEAR REGRESSION

8.3 THE KNOWLEDGE GRADIENT FOR A LINEAR MODEL

8.4 APPLICATION TO DRUG DISCOVERY

8.5 APPLICATION TO DYNAMIC PRICING

8.6 BIBLIOGRAPHIC NOTES

PROBLEMS

CHAPTER 9 SUBSET SELECTION PROBLEMS

9.1 APPLICATIONS

9.2 CHOOSING A SUBSET USING RANKING AND SELECTION

9.3 LARGER SETS

9.4 VERY LARGE SETS

9.5 BIBLIOGRAPHIC NOTES

PROBLEMS

CHAPTER 10 OPTIMIZING A SCALAR FUNCTION

10.1 DETERMINISTIC MEASUREMENTS

10.2 STOCHASTIC MEASUREMENTS

10.3 BIBLIOGRAPHIC NOTES

PROBLEMS

CHAPTER 11 OPTIMAL BIDDING

11.1 MODELING CUSTOMER DEMAND

11.2 BAYESIAN MODELING FOR DYNAMIC PRICING

11.3 BIDDING STRATEGIES

11.4 WHY DOES IT WORK?*

11.5 BIBLIOGRAPHIC NOTES

PROBLEMS

CHAPTER 12 STOPPING PROBLEMS

12.1 SEQUENTIAL PROBABILITY RATIO TEST

12.2 THE SECRETARY PROBLEM

12.3 BIBLIOGRAPHIC NOTES

PROBLEMS

CHAPTER 13 ACTIVE LEARNING IN STATISTICS

13.1 DETERMINISTIC POLICIES

13.2 SEQUENTIAL POLICIES FOR CLASSIFICATION

13.3 A VARIANCE-MINIMIZING POLICY

13.4 MIXTURES OF GAUSSIANS

13.5 BIBLIOGRAPHIC NOTES

CHAPTER 14 SIMULATION OPTIMIZATION

14.1 INDIFFERENCE ZONE SELECTION

14.2 OPTIMAL COMPUTING BUDGET ALLOCATION

14.3 MODEL-BASED SIMULATED ANNEALING

14.4 OTHER AREAS OF SIMULATION OPTIMIZATION

14.5 BIBLIOGRAPHIC NOTES

CHAPTER 15 LEARNING IN MATHEMATICAL PROGRAMMING

15.1 APPLICATIONS

15.2 LEARNING ON GRAPHS

15.3 ALTERNATIVE EDGE SELECTION POLICIES

15.4 LEARNING COSTS FOR LINEAR PROGRAMS*

15.5 BIBLIOGRAPHIC NOTES

CHAPTER 16 OPTIMIZING OVER CONTINUOUS MEASUREMENTS

16.1 THE BELIEF MODEL

16.2 SEQUENTIAL KRIGING OPTIMIZATION

16.3 THE KNOWLEDGE GRADIENT FOR CONTINUOUS PARAMETERS*

16.4 EFFICIENT GLOBAL OPTIMIZATION

16.5 EXPERIMENTS

16.6 EXTENSION TO HIGHER-DIMENSIONAL PROBLEMS

16.7 BIBLIOGRAPHIC NOTES

CHAPTER 17 LEARNING WITH A PHYSICAL STATE

17.1 INTRODUCTION TO DYNAMIC PROGRAMMING

17.2 SOME HEURISTIC LEARNING POLICIES

17.3 THE LOCAL BANDIT APPROXIMATION

17.4 THE KNOWLEDGE GRADIENT IN DYNAMIC PROGRAMMING

17.5 AN EXPECTED IMPROVEMENT POLICY

17.6 BIBLIOGRAPHIC NOTES

INDEX

WILEY SERIES IN PROBABILITY AND STATISTICS

Copyright © 2012 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representation or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Powell, Warren B., 1955–

Optimal learning / Warren B. Powell, Operations Research and Financial Engineering, Princeton University, Ilya O. Ryzhov, Robert H. Smith School of Business, University of Maryland.

pages cm. — (Wiley series in probability and statistics)

Includes bibliographical references and index.

ISBN 978-0-470-59669-2 (hardback)

1. Machine learning. I. Ryzhov, Ilya Olegovich, 1985– II. Title.

Q325.5.P69 2012

006.3′1—dc23

2011047629

10 9 8 7 6 5 4 3 2 1

To our families

PREFACE

This book emerged from a stream of research conducted in CASTLE Laboratory at Princeton University during the period 2006 - 2011. Initially, the work was motivated by the “exploration vs. exploitation” problem that arises in the design of algorithms for approximate dynamic programming, where it may be necessary to visit a state to learn the value of being in the state. However, we quickly became aware that this basic question had many applications outside of dynamic programming.

The results of this research were made possible by the efforts and contributions of numerous colleagues. The work was conducted under the guidance and supervision of Warren Powell, founder and director of CASTLE Lab. Key contributors include Peter Frazier, Ilya Ryzhov, Warren Scott, and Emre Barut, all graduate students at the time; Martijn Mes, a post-doctoral associate from University Twente in the Netherlands; Diana Negoescu, Gerald van den Berg, and Will Manning, then undergraduate students. The earliest work by Peter Frazier recognized the power of a one-step look-ahead policy, which was named the knowledge gradient, in offline (ranking and selection) problems. The true potential of this idea, however, was realized later with two developments. The first, by Peter Frazier, adapted the knowledge gradient concept to an offline problem with correlated beliefs about different alternatives. This result made it possible to learn about thousands of discrete alternatives with very small measurement budgets.

The second development, by Ilya Ryzhov, made a connection between the knowledge gradient for offline problems and the knowledge gradient for online problems. This relationship links two distinct communities: ranking and selection in statistics and simulation, and the multi-armed bandit problem in applied probability and computer science. This link provides a tractable approach to multi-armed bandit problems with correlated beliefs, a relatively new extension of the well-known bandit model.

This foundation led to a number of additional results. Peter Frazier and Diana Negoescu created a version of the algorithm for problems where the belief structure is given by a linear model, which made it possible to tackle a problem in drug discovery, sorting through 87,000 possible drug combinations with just a few hundred experiments. Martijn Mes adapted the knowledge gradient approach to create an algorithm for a nonparametric model where the value of an alternative was represented through a hierarchical aggregation model. Emre Barut adapted this result for a non-parametric belief model using kernel regression. Warren Scott derived a very difficult but powerful algorithm for when the choice of what to measure is a multidimensional vector of continuous parameters, which was applied to calibrate an industrial simulator for airline operations. Ilya Ryzhov then used the knowledge gradient idea to connect optimal learning with classical mathematical programming, making it possible to incorporate learning issues into fundamental optimization models such as linear programs. As of this writing, we are developing (with Gerald van den Berg) a method to handle the exploration issue in approximate dynamic programming – the very problem that we originally set out to solve.

The work inspired an undergraduate course at Princeton University called “Optimal Learning.” Over several years, it has repeatedly attracted talented and enthusiastic students who have produced a creative collection of projects. This book evolved out of lecture notes written the first year that the course was offered. Indeed, the course covers roughly the first seven chapters, with other topics selected from the second half as time permits. The book is designed to be accessible to an advanced undergraduate audience, and presents an overview of the extensive body of research we compiled around the idea of the knowledge gradient. However, we also kept another goal in mind: to recognize the important contributions that have been made by a number of different communities such as economics, computer science, applied probability, simulation optimization, stochastic search, and ranking and selection.

The languages of these different communities have posed a particular challenge. For example, we use the term “online learning” to refer to learning where we have to live with the rewards we receive while also learning to improve decisions in the future, while some use this same term to refer to any sequential learning policy. Different communities are guided by specific classes of applications with characteristics that guide the choice of algorithms. The application setting is rarely transparent in the mathematics, complicating the evaluation of competing methods which have been designed with specific issues in mind. In the multi-armed bandit literature, an alternative x is always referred to as a “bandit,” even if x is continuous and vector-valued. Even within this literature, the preferred techniques for solving these problems are quite different in applied probability (Gittins indices) and computer science (upper confidence bounding).

Audience

The book is aimed primarily at an advanced undergraduate or masters-level audience with a course in statistics and a full course in probability. The course can easily be taught at a Ph.D. level by putting more emphasis on the derivations and supporting theory, which is quite deep. However, the book was written for students and practitioners who are interested in practical tools for real problems. For this reason, the core of each chapter focuses on a specific learning problem and presents practical, implementable algorithms.

The later chapters cover material that is more advanced, including (a) learning on graphs and linear programs and (b) learning where the alternatives are continuous (and possibly vector-valued). We have provided chapters designed to bridge with communities such as simulation optimization and machine learning. This material is designed to help Ph.D. students and researchers to understand the many communities that have contributed to the general area of optimal learning.

While every effort has been made to make this material as accessible as possible, the theory supporting this field can be quite subtle. More advanced material is indicated with an * in the section title. Some derivations and proofs are provided in sections called “Why does it work.” These sections provide more advanced students with a more thorough foundation, but they can be skipped without loss of continuity.

Organization of the Book

The book is roughly organized into three parts:

Part I: Fundamentals

Chapter 1 - The Challenges of Learning
Chapter 2 - Adaptive Learning
Chapter 3 - The Economics of Information
Chapter 4 - Ranking and Selection
Chapter 5 - The Knowledge Gradient
Chapter 6 - Bandit Problems
Chapter 7 - Elements of a Learning Problem

Part II: Extensions and Applications

Chapter 8 - Linear Belief Models
Chapter 9 - Subset Selection Problems
Chapter 10 - Optimizing a Scalar Function
Chapter 11 - Optimal Bidding
Chapter 12 - Stopping Problems

Part III: Advanced Topics

Chapter 13 - Active Learning in Statistics
Chapter 14 - Simulation Optimization
Chapter 15 - Learning in Mathematical Programming
Chapter 16 - Optimizing over Continuous Measurements
Chapter 17 - Learning with a Physical State

The book is used as a textbook for an undergraduate course at Princeton. In this setting, Part I covers the foundations of the course. This material is supplemented with traditional weekly problem sets and a midterm exam (two hourly exams would also work well here). Each of these chapters have a relatively large number of exercises to help students develop their understanding of the material.

After this foundational material is covered, students work in teams of two to design a project which involves the efficient collection of information. In the initial problem definition, it is important for students to clearly identify the information that is being collected, the implementation decision (which may be different, but not always), and the metric used to evaluate the quality of the implementation decision.

While the students work on their projects, the course continues to work through most of the topics in Part II. The material on linear belief models is particularly useful in many of the student projects, as is the subset selection chapter. Sometimes it is useful to prioritize the material being presented based on the topics that the students have chosen. The chapters in Part II have a small number of exercises, many of which require the use of downloadable MATLAB software to help with the implementation of these more difficult algorithms.

Part III of the book is advanced material and is intended primarily for researchers and professionals interested in using the book as a reference volume. These chapters are not accompanied by exercises, since the material here is more difficult and would require the use of fairly sophisticated software packages.

Additional material for the book is available at the website:

http://optimallearning.princeton.edu/

Downloadable software is provided for several of the most important algorithms, and a fairly elaborate implementation, the Optimal Learning Calculator, is available as a spreadsheet interface calling a sophisticated Java library. Further reading, software, sample projects, and additional thoughts about the field will be made available here.

WARREN B. POWELL AND ILYA O. RYZHOV

Princeton UniversityUniversity of MarylandOctober 2011

ACKNOWLEDGMENTS

This book reflects the contributions of a number of students at Princeton University. Special thanks go to Peter Frazier, who developed the initial framework that shaped our research in this area. Warren Scott, Diana Negoescu, Martijn Mes, and Emre Barut all made important contributions, and their enthusiastic participation in all stages of this research is gratefully acknowledged.

The Optimal Learning Calculator was co-written by Gerald van den Berg and Will Manning, working as undergraduate interns. The spreadsheet-based interface has provided valuable insights into the behavior of different learning policies.

We are very grateful to the students of ORF 418, Optimal Learning, who put up with the formative development of these ideas and who provided an extensive list of creative projects to highlight potential applications of information collection.

The research was supported primarily by the Air Force Office of Scientific Research from the discrete mathematics program headed by Don Hearn, with additional support from the Department of Homeland Security through the CICCADA Center at Rutgers under the leadership of Fred Roberts and the National Science Foundation. Special thanks also goes to SAP and the enthusiastic support of Bill McDermott and Paul Hofmann, along with the many corporate partners of CASTLE Laboratory and PENSA who have provided the challenging problems which motivated this research.

W. B. P. and I. O. R.

CHAPTER 1

THE CHALLENGES OF LEARNING

We are surrounded by situations where we need to make a decision or solve a problem, but where we do not know some or all of the relevant information for the problem perfectly. Will the path recommended by my navigation system get me to my appointment on time? Am I charging the right price for my product, and do I have the best set of features? Will a new material make batteries last longer? Will a molecular compound help reduce a cancer tumor? If I turn my retirement fund over to this investment manager, will I be able to outperform the market? Sometimes the decisions have a simple structure (which investment advisor should I use), while others require complex planning (how do I deploy a team of security agents to assess the safety of a set of food processing plants). Sometimes we have to learn while we are doing (the sales of a book at a particular price), while in other cases we may have a budget to collect information before making a final decision.

There are some decision problems that are hard even if we have access to perfectly accurate information about our environment: planning routes for aircraft and pilots, optimizing the movements of vehicles to pick up and deliver goods, or scheduling machines to finish a set of jobs on time. This is known as deterministic optimization. Then there are other situations where we have to make decisions under uncertainty, but where we assume we know the probability distributions of the uncertain quantities: How do I allocate investments to minimize risk while maintaining a satisfactory return, or how do I optimize the storage of energy given uncertainties about demands from consumers? This is known as stochastic optimization.

In this book, we introduce problems where the probability distributions themselves are unknown, but where we have the opportunity to collect new information to improve our understanding of what they are. We are primarily interested in problems where the cost of the information is considered “significant,” which is to say that we are willing to spend some time thinking about how to collect the information in an effective way. What this means, however, is highly problem-dependent. We are willing to spend quite a bit before we drill a $10 million hole hoping to find oil, but we may be willing to invest only a small effort before determining the next measurement inside a search algorithm running on a computer.

The modeling of learning problems, which might be described as “learning how to learn,” can be fairly difficult. While expectations are at least well-defined for stochastic optimization problems, they take on subtle interpretations when we are actively changing the underlying probability distributions. For this reason, we tend to work on what might otherwise look like very simple problems. Fortunately, there are very many “simple problems” which would be trivial if we only knew the values of all the parameters, but which pose unexpected challenges when we lack information.

1.1 LEARNING THE BEST PATH

Consider the problem of finding the fastest way to get from your new apartment to your new job in Manhattan. We can find a set of routes from the Internet or from our GPS device, but we do not know anything about traffic congestion or subway delays. The only way we can get data to estimate actual delays on a path is to travel the path. We wish to devise a strategy that governs how we choose paths so that we strike a balance between experimenting with new paths and getting to work on time every day.

(1.1)

(1.2)

(1.3)

Figure 1.1 A simple shortest path problem, giving the current estimate of the mean and standard deviation (of the estimate) for each path.

Note that is our estimate of the variance of by iteration n (assuming we have visited path p times). The variance of our estimate of the mean, is given by

Now we face the challenge: Which path should we try? Let’s start by assuming that you just started a new job and you have been to the Internet to find different paths, but you have not tried any of them. If your job involves commuting from a New Jersey suburb into Manhattan, you have a mixture of options that include driving (various routes) and commuter train, with different transit options once you arrive in Manhattan. But you do have an idea of the length of each path, and you may have heard some stories about delays through the tunnel into Manhattan, as well as a few stories about delayed trains. From this, you construct a rough estimate of the travel time on each path, and we are going to assume that you have at least a rough idea of how far off these estimates may be. We denote these initial estimates using

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!