An Elementary Introduction to Statistical Learning Theory - Sanjeev Kulkarni - E-Book

An Elementary Introduction to Statistical Learning Theory E-Book

Sanjeev Kulkarni

0,0
109,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A thought-provoking look at statistical learning theory and its role in understanding human learning and inductive reasoning

A joint endeavor from leading researchers in the fields of philosophy and electrical engineering, An Elementary Introduction to Statistical Learning Theory is a comprehensive and accessible primer on the rapidly evolving fields of statistical pattern recognition and statistical learning theory. Explaining these areas at a level and in a way that is not often found in other books on the topic, the authors present the basic theory behind contemporary machine learning and uniquely utilize its foundations as a framework for philosophical thinking about inductive inference.

Promoting the fundamental goal of statistical learning, knowing what is achievable and what is not, this book demonstrates the value of a systematic methodology when used along with the needed techniques for evaluating the performance of a learning system. First, an introduction to machine learning is presented that includes brief discussions of applications such as image recognition, speech recognition, medical diagnostics, and statistical arbitrage. To enhance accessibility, two chapters on relevant aspects of probability theory are provided. Subsequent chapters feature coverage of topics such as the pattern recognition problem, optimal Bayes decision rule, the nearest neighbor rule, kernel rules, neural networks, support vector machines, and boosting.

Appendices throughout the book explore the relationship between the discussed material and related topics from mathematics, philosophy, psychology, and statistics, drawing insightful connections between problems in these areas and statistical learning theory. All chapters conclude with a summary section, a set of practice questions, and a reference sections that supplies historical notes and additional resources for further study.

An Elementary Introduction to Statistical Learning Theory is an excellent book for courses on statistical learning theory, pattern recognition, and machine learning at the upper-undergraduate and graduate levels. It also serves as an introductory reference for researchers and practitioners in the fields of engineering, computer science, philosophy, and cognitive science that would like to further their knowledge of the topic.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 419

Veröffentlichungsjahr: 2011

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Series Page

Title Page

Copyright

Preface

Chapter 1: Introduction: Classification, Learning, Features, and Applications

1.1 Scope

1.2 Why Machine Learning?

1.3 Some Applications

1.4 Measurements, Features, and Feature Vectors

1.5 The Need for Probability

1.6 Supervised Learning

1.8 Appendix: Induction

Chapter 2: Probability

2.1 Probability of Some Basic Events

2.2 Probabilities of Compound Events

2.3 Conditional Probability

2.4 Drawing Without Replacement

2.5 A Classic Birthday Problem

2.6 Random Variables

2.7 Expected Value

2.8 Variance

2.10 Appendix: Interpretations of Probability

Chapter 3: Probability Densities

3.1 An Example in Two Dimensions

3.2 Random Numbers in [0,1]

3.3 Density Functions

3.4 Probability Densities in Higher Dimensions

3.5 Joint and Conditional Densities

3.6 Expected Value and Variance

3.7 Laws of Large Numbers

3.9 Appendix: Measurability

Chapter 4: The Pattern Recognition Problem

4.1 A Simple Example

4.2 Decision Rules

4.3 Success Criterion

4.4 The Best Classifier: Bayes Decision Rule

4.5 Continuous Features and Densities

4.7 Appendix: Uncountably Many

Chapter 5: The Optimal Bayes Decision Rule

5.1 Bayes Theorem

5.2 Bayes Decision Rule

5.3 Optimality and Some Comments

5.4 An Example

5.5 Bayes Theorem and Decision Rule with Densities

5.7 Appendix: Defining Conditional Probability

Chapter 6: Learning from Examples

6.1 Lack of Knowledge of Distributions

6.2 Training Data

6.3 Assumptions on the Training Data

6.4 A Brute Force Approach to Learning

6.5 Curse of Dimensionality, Inductive Bias, and No Free Lunch

6.7 Appendix: What Sort of Learning?

Chapter 7: The Nearest Neighbor Rule

7.1 The Nearest Neighbor Rule

7.2 Performance of the Nearest Neighbor Rule

7.3 Intuition and Proof Sketch of Performance*

7.4 Using more Neighbors

7.6 Appendix: When People use Nearest Neighbor Reasoning

Chapter 8: Kernel Rules

8.1 Motivation

8.2 A Variation on Nearest Neighbor Rules

8.3 Kernel Rules

8.4 Universal Consistency of Kernel Rules

8.5 Potential Functions

8.6 More General Kernels

8.8 Appendix: Kernels, Similarity, and Features

Chapter 9: Neural Networks: Perceptrons

9.1 Multilayer Feedforward Networks

9.2 Neural Networks for Learning and Classification

9.3 Perceptrons

9.4 Learning Rule for Perceptrons

9.5 Representational Capabilities of Perceptrons

9.7 Appendix: Models of Mind

Chapter 10: Multilayer Networks

10.1 Representation Capabilities of Multilayer Networks

10.2 Learning and Sigmoidal Outputs

10.3 Training Error and Weight Space

10.4 Error Minimization by Gradient Descent

10.5 Backpropagation

10.6 Derivation of Backpropagation Equations*

10.8 Appendix: Gradient Descent and Reasoning toward Reflective Equilibrium

Chapter 11: PAC Learning

11.1 Class of Decision Rules

11.2 Best Rule from a Class

11.3 Probably Approximately Correct Criterion

11.4 PAC Learning

11.6 Appendix: Identifying Indiscernibles

Chapter 12: VC Dimension

12.1 Approximation and Estimation Errors

12.2 Shattering

12.3 VC Dimension

12.4 Learning Result

12.5 Some Examples

12.6 Application to Neural Nets

12.8 Appendix: VC Dimension and Popper Dimension

Chapter 13: Infinite VC Dimension

13.1 A Hierarchy of Classes and Modified PAC Criterion

13.2 Misfit Versus Complexity Trade-Off

13.3 Learning Results

13.4 Inductive Bias and Simplicity

13.6 Appendix: Uniform Convergence and Universal Consistency

Chapter 14: The Function Estimation Problem

14.1 Estimation

14.2 Success Criterion

14.3 Best Estimator: Regression Function

14.4 Learning in Function Estimation

14.6 Appendix: Regression Toward the Mean

Chapter 15: Learning Function Estimation

15.1 Review of the Function Estimation/Regression Problem

15.2 Nearest Neighbor Rules

15.3 Kernel Methods

15.4 Neural Network Learning

15.5 Estimation with a Fixed Class of Functions

15.6 Shattering, Pseudo-Dimension, and Learning

15.7 Conclusion

15.8 Appendix: Accuracy, Precision, Bias, and Variance in Estimation

Chapter 16: Simplicity

16.1 Simplicity in Science

16.2 Ordering Hypotheses

16.3 Two Examples

16.4 Simplicity as Simplicity of Representation

16.5 Pragmatic Theory of Simplicity

16.6 Simplicity and Global Indeterminacy

16.8 Appendix: Basic Science and Statistical Learning Theory

Chapter 17: Support Vector Machines

17.1 Mapping the Feature Vectors

17.2 Maximizing the Margin

17.3 Optimization and Support Vectors

17.4 Implementation and Connection to Kernel Methods

17.5 Details of the Optimization Problem*

17.7 Appendix: Computation

Chapter 18: Boosting

18.1 Weak Learning Rules

18.2 Combining Classifiers

18.3 Distribution on the Training Examples

18.4 The Adaboost Algorithm

18.5 Performance on Training Data

18.6 Generalization Performance

18.8 Appendix: Ensemble Methods

Bibliography

Author Index

Subject Index

Copyright © 2010 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.

Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4744. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Kulkarni, Sanjeev.

An elementary introduction to statistical learning theory / Sanjeev Kulkarni, Gilbert Harman. p. cm.

Includes index.

ISBN 978-0-470-64183-5 (cloth)

1. Machine learning-Statistical methods. 2. Pattern recognition systems. I. Harman, Gilbert. II. Title.

Q325.5.K85 2011

006.3′1–dc22

2010045223

Preface

This book offers a broad and accessible introduction to the relatively new field of statistical learning theory, a field that has emerged from engineering studies of pattern recognition and machine learning, developments in nonparametric statistics, computer science, the study of language learning in linguistics, developmental and cognitive psychology, the philosophical problem of induction, and the philosophy of science and method.

The book is the product of a very successful introductory course on “Learning Theory and Epistemology” that we have been teaching jointly in electrical engineering and philosophy at Princeton University. The course is open to all students and has no specific prerequisites other than some analytical skills and intellectual curiosity. Although much of the material is technical, we have found that the main points are both accessible to and appreciated by a broad range of students. In each class, our students have included freshmen through seniors, with majors from the sciences, engineering, humanities, and social sciences.

The engineering study of pattern recognition is concerned with developing automated systems to discriminate between various inputs in a useful way. How can the post office develop systems to scan and sort mail on the basis of hand-written addresses? How can a manufacturer design a computerized system to transcribe ordinary conversations? Can computers be used to analyze medical images to make diagnoses?

Machine learning provides an efficient way to approach some pattern recognition problems. It is possible to train a system to recognize handwritten zip codes. Automated systems can interact with users to learn to perform speech recognition. A computer might use machine learning to develop a system that can analyze medical images in the way that experts do.

Machine learning and pattern recognition are also concerned with the general principles involved in learning systems. Rather than develop algorithms from scratch and in an ad hoc manner for each new application, a systematic methodology can be extremely useful. It is also important to have techniques for evaluating the performance of a learning system. Knowing what is achievable and what is not helps to provide a benchmark and often suggests new techniques for practical learning algorithms.

These questions are also related to philosophical questions that arise in epistemology. What can we learn and how can we learn it? What can we learn about other minds and the external world? What can we learn through induction?

The philosophical problem of induction asks how it is possible to learn anything on the basis of inductive reasoning, given that the truth of the premises of inductive reasoning does not guarantee the truth of its conclusion. There is no single solution to this problem, not because there is no solution, but because there are many, depending on what counts as learning. In this book, we explain how various solutions depend on the way the problem of induction is formulated.

Thus, we hope this book will serve as an accessible introduction to statistical learning theory for a broad audience. For those interested in more in-depth studies of learning theory or practical algorithms, we hope the book will provide a helpful starting point. For those interested in epistemology or philosophy in general, we hope the book will help draw connections into very relevant ideas from other fields. And for others, we hope the book will help provide an understanding of some deep and fundamental insights from statistical learning theory that are at the heart of advances in artificial intelligence and shed light on the nature and limits of learning.

We acknowledge with thanks a Curriculum Development Grant from the 250th Anniversary Fund for Innovation in Undergraduate Education from Princeton University. Rajeev Kulkarni gave us extremely useful comments on the whole book, which has greatly improved the result. Joel Predd and Maya Gupta also provided valuable comments on various parts. We have also benefitted from a careful reading by Joshua Harris. We are also grateful to our teaching assistants over the years and to the many students who have discussed the content of the course with us. Thanks!

Chapter 2

Probability

In this and the next chapter, we explain some of the elementary mathematics of probability. This provides the mathematical foundation for dealing with uncertainty and forms the basis for statistical learning theory. In particular, we are interested in learning when there is uncertainty in the underlying objects (feature vectors), the labels (indicating the class to which the objects belong), and the relationship between the class of the object and the feature vector. This uncertainty will be modeled probabilistically.

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!