109,99 €
A thought-provoking look at statistical learning theory and its role in understanding human learning and inductive reasoning
A joint endeavor from leading researchers in the fields of philosophy and electrical engineering, An Elementary Introduction to Statistical Learning Theory is a comprehensive and accessible primer on the rapidly evolving fields of statistical pattern recognition and statistical learning theory. Explaining these areas at a level and in a way that is not often found in other books on the topic, the authors present the basic theory behind contemporary machine learning and uniquely utilize its foundations as a framework for philosophical thinking about inductive inference.
Promoting the fundamental goal of statistical learning, knowing what is achievable and what is not, this book demonstrates the value of a systematic methodology when used along with the needed techniques for evaluating the performance of a learning system. First, an introduction to machine learning is presented that includes brief discussions of applications such as image recognition, speech recognition, medical diagnostics, and statistical arbitrage. To enhance accessibility, two chapters on relevant aspects of probability theory are provided. Subsequent chapters feature coverage of topics such as the pattern recognition problem, optimal Bayes decision rule, the nearest neighbor rule, kernel rules, neural networks, support vector machines, and boosting.
Appendices throughout the book explore the relationship between the discussed material and related topics from mathematics, philosophy, psychology, and statistics, drawing insightful connections between problems in these areas and statistical learning theory. All chapters conclude with a summary section, a set of practice questions, and a reference sections that supplies historical notes and additional resources for further study.
An Elementary Introduction to Statistical Learning Theory is an excellent book for courses on statistical learning theory, pattern recognition, and machine learning at the upper-undergraduate and graduate levels. It also serves as an introductory reference for researchers and practitioners in the fields of engineering, computer science, philosophy, and cognitive science that would like to further their knowledge of the topic.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 419
Veröffentlichungsjahr: 2011
Table of Contents
Series Page
Title Page
Copyright
Preface
Chapter 1: Introduction: Classification, Learning, Features, and Applications
1.1 Scope
1.2 Why Machine Learning?
1.3 Some Applications
1.4 Measurements, Features, and Feature Vectors
1.5 The Need for Probability
1.6 Supervised Learning
1.8 Appendix: Induction
Chapter 2: Probability
2.1 Probability of Some Basic Events
2.2 Probabilities of Compound Events
2.3 Conditional Probability
2.4 Drawing Without Replacement
2.5 A Classic Birthday Problem
2.6 Random Variables
2.7 Expected Value
2.8 Variance
2.10 Appendix: Interpretations of Probability
Chapter 3: Probability Densities
3.1 An Example in Two Dimensions
3.2 Random Numbers in [0,1]
3.3 Density Functions
3.4 Probability Densities in Higher Dimensions
3.5 Joint and Conditional Densities
3.6 Expected Value and Variance
3.7 Laws of Large Numbers
3.9 Appendix: Measurability
Chapter 4: The Pattern Recognition Problem
4.1 A Simple Example
4.2 Decision Rules
4.3 Success Criterion
4.4 The Best Classifier: Bayes Decision Rule
4.5 Continuous Features and Densities
4.7 Appendix: Uncountably Many
Chapter 5: The Optimal Bayes Decision Rule
5.1 Bayes Theorem
5.2 Bayes Decision Rule
5.3 Optimality and Some Comments
5.4 An Example
5.5 Bayes Theorem and Decision Rule with Densities
5.7 Appendix: Defining Conditional Probability
Chapter 6: Learning from Examples
6.1 Lack of Knowledge of Distributions
6.2 Training Data
6.3 Assumptions on the Training Data
6.4 A Brute Force Approach to Learning
6.5 Curse of Dimensionality, Inductive Bias, and No Free Lunch
6.7 Appendix: What Sort of Learning?
Chapter 7: The Nearest Neighbor Rule
7.1 The Nearest Neighbor Rule
7.2 Performance of the Nearest Neighbor Rule
7.3 Intuition and Proof Sketch of Performance*
7.4 Using more Neighbors
7.6 Appendix: When People use Nearest Neighbor Reasoning
Chapter 8: Kernel Rules
8.1 Motivation
8.2 A Variation on Nearest Neighbor Rules
8.3 Kernel Rules
8.4 Universal Consistency of Kernel Rules
8.5 Potential Functions
8.6 More General Kernels
8.8 Appendix: Kernels, Similarity, and Features
Chapter 9: Neural Networks: Perceptrons
9.1 Multilayer Feedforward Networks
9.2 Neural Networks for Learning and Classification
9.3 Perceptrons
9.4 Learning Rule for Perceptrons
9.5 Representational Capabilities of Perceptrons
9.7 Appendix: Models of Mind
Chapter 10: Multilayer Networks
10.1 Representation Capabilities of Multilayer Networks
10.2 Learning and Sigmoidal Outputs
10.3 Training Error and Weight Space
10.4 Error Minimization by Gradient Descent
10.5 Backpropagation
10.6 Derivation of Backpropagation Equations*
10.8 Appendix: Gradient Descent and Reasoning toward Reflective Equilibrium
Chapter 11: PAC Learning
11.1 Class of Decision Rules
11.2 Best Rule from a Class
11.3 Probably Approximately Correct Criterion
11.4 PAC Learning
11.6 Appendix: Identifying Indiscernibles
Chapter 12: VC Dimension
12.1 Approximation and Estimation Errors
12.2 Shattering
12.3 VC Dimension
12.4 Learning Result
12.5 Some Examples
12.6 Application to Neural Nets
12.8 Appendix: VC Dimension and Popper Dimension
Chapter 13: Infinite VC Dimension
13.1 A Hierarchy of Classes and Modified PAC Criterion
13.2 Misfit Versus Complexity Trade-Off
13.3 Learning Results
13.4 Inductive Bias and Simplicity
13.6 Appendix: Uniform Convergence and Universal Consistency
Chapter 14: The Function Estimation Problem
14.1 Estimation
14.2 Success Criterion
14.3 Best Estimator: Regression Function
14.4 Learning in Function Estimation
14.6 Appendix: Regression Toward the Mean
Chapter 15: Learning Function Estimation
15.1 Review of the Function Estimation/Regression Problem
15.2 Nearest Neighbor Rules
15.3 Kernel Methods
15.4 Neural Network Learning
15.5 Estimation with a Fixed Class of Functions
15.6 Shattering, Pseudo-Dimension, and Learning
15.7 Conclusion
15.8 Appendix: Accuracy, Precision, Bias, and Variance in Estimation
Chapter 16: Simplicity
16.1 Simplicity in Science
16.2 Ordering Hypotheses
16.3 Two Examples
16.4 Simplicity as Simplicity of Representation
16.5 Pragmatic Theory of Simplicity
16.6 Simplicity and Global Indeterminacy
16.8 Appendix: Basic Science and Statistical Learning Theory
Chapter 17: Support Vector Machines
17.1 Mapping the Feature Vectors
17.2 Maximizing the Margin
17.3 Optimization and Support Vectors
17.4 Implementation and Connection to Kernel Methods
17.5 Details of the Optimization Problem*
17.7 Appendix: Computation
Chapter 18: Boosting
18.1 Weak Learning Rules
18.2 Combining Classifiers
18.3 Distribution on the Training Examples
18.4 The Adaboost Algorithm
18.5 Performance on Training Data
18.6 Generalization Performance
18.8 Appendix: Ensemble Methods
Bibliography
Author Index
Subject Index
Copyright © 2010 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4744. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Kulkarni, Sanjeev.
An elementary introduction to statistical learning theory / Sanjeev Kulkarni, Gilbert Harman. p. cm.
Includes index.
ISBN 978-0-470-64183-5 (cloth)
1. Machine learning-Statistical methods. 2. Pattern recognition systems. I. Harman, Gilbert. II. Title.
Q325.5.K85 2011
006.3′1–dc22
2010045223
Preface
This book offers a broad and accessible introduction to the relatively new field of statistical learning theory, a field that has emerged from engineering studies of pattern recognition and machine learning, developments in nonparametric statistics, computer science, the study of language learning in linguistics, developmental and cognitive psychology, the philosophical problem of induction, and the philosophy of science and method.
The book is the product of a very successful introductory course on “Learning Theory and Epistemology” that we have been teaching jointly in electrical engineering and philosophy at Princeton University. The course is open to all students and has no specific prerequisites other than some analytical skills and intellectual curiosity. Although much of the material is technical, we have found that the main points are both accessible to and appreciated by a broad range of students. In each class, our students have included freshmen through seniors, with majors from the sciences, engineering, humanities, and social sciences.
The engineering study of pattern recognition is concerned with developing automated systems to discriminate between various inputs in a useful way. How can the post office develop systems to scan and sort mail on the basis of hand-written addresses? How can a manufacturer design a computerized system to transcribe ordinary conversations? Can computers be used to analyze medical images to make diagnoses?
Machine learning provides an efficient way to approach some pattern recognition problems. It is possible to train a system to recognize handwritten zip codes. Automated systems can interact with users to learn to perform speech recognition. A computer might use machine learning to develop a system that can analyze medical images in the way that experts do.
Machine learning and pattern recognition are also concerned with the general principles involved in learning systems. Rather than develop algorithms from scratch and in an ad hoc manner for each new application, a systematic methodology can be extremely useful. It is also important to have techniques for evaluating the performance of a learning system. Knowing what is achievable and what is not helps to provide a benchmark and often suggests new techniques for practical learning algorithms.
These questions are also related to philosophical questions that arise in epistemology. What can we learn and how can we learn it? What can we learn about other minds and the external world? What can we learn through induction?
The philosophical problem of induction asks how it is possible to learn anything on the basis of inductive reasoning, given that the truth of the premises of inductive reasoning does not guarantee the truth of its conclusion. There is no single solution to this problem, not because there is no solution, but because there are many, depending on what counts as learning. In this book, we explain how various solutions depend on the way the problem of induction is formulated.
Thus, we hope this book will serve as an accessible introduction to statistical learning theory for a broad audience. For those interested in more in-depth studies of learning theory or practical algorithms, we hope the book will provide a helpful starting point. For those interested in epistemology or philosophy in general, we hope the book will help draw connections into very relevant ideas from other fields. And for others, we hope the book will help provide an understanding of some deep and fundamental insights from statistical learning theory that are at the heart of advances in artificial intelligence and shed light on the nature and limits of learning.
We acknowledge with thanks a Curriculum Development Grant from the 250th Anniversary Fund for Innovation in Undergraduate Education from Princeton University. Rajeev Kulkarni gave us extremely useful comments on the whole book, which has greatly improved the result. Joel Predd and Maya Gupta also provided valuable comments on various parts. We have also benefitted from a careful reading by Joshua Harris. We are also grateful to our teaching assistants over the years and to the many students who have discussed the content of the course with us. Thanks!
Chapter 2
Probability
In this and the next chapter, we explain some of the elementary mathematics of probability. This provides the mathematical foundation for dealing with uncertainty and forms the basis for statistical learning theory. In particular, we are interested in learning when there is uncertainty in the underlying objects (feature vectors), the labels (indicating the class to which the objects belong), and the relationship between the class of the object and the feature vector. This uncertainty will be modeled probabilistically.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
