Combining Pattern Classifiers - Ludmila I. Kuncheva - E-Book

Combining Pattern Classifiers E-Book

Ludmila I. Kuncheva

0,0
99,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A unified, coherent treatment of current classifier ensemble methods, from fundamentals of pattern recognition to ensemble feature selection, now in its second edition The art and science of combining pattern classifiers has flourished into a prolific discipline since the first edition of Combining Pattern Classifiers was published in 2004. Dr. Kuncheva has plucked from the rich landscape of recent classifier ensemble literature the topics, methods, and algorithms that will guide the reader toward a deeper understanding of the fundamentals, design, and applications of classifier ensemble methods. Thoroughly updated, with MATLAB code and practice data sets throughout, Combining Pattern Classifiers includes: * Coverage of Bayes decision theory and experimental comparison of classifiers * Essential ensemble methods such as Bagging, Random forest, AdaBoost, Random subspace, Rotation forest, Random oracle, and Error Correcting Output Code, among others * Chapters on classifier selection, diversity, and ensemble feature selection With firm grounding in the fundamentals of pattern recognition, and featuring more than 140 illustrations, Combining Pattern Classifiers, Second Edition is a valuable reference for postgraduate students, researchers, and practitioners in computing and engineering.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 590

Veröffentlichungsjahr: 2014

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



COMBINING PATTERN CLASSIFIERS

Methods and Algorithms

Second Edition

LUDMILA I. KUNCHEVA

Copyright © 2014 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herin may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services please contact our Customer Care Department with the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic format.

MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book's use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software.

Library of Congress Cataloging-in-Publication Data

Kuncheva, Ludmila I. (Ludmila Ilieva), 1959– Combining pattern classifiers : methods and algorithms / Ludmila I. Kuncheva. -- Second edition. pages cm Includes index. ISBN 978-1-118-31523-1 (hardback) 1. Pattern recognition systems. 2. Image processing--Digital techniques. I. Title. TK7882.P3K83 2014 006.4--dc23 2014014214

To Roumen, Diana and Kamelia

CONTENTS

Preface

The Playing Field

Software

Structure and What is New in the Second Edition

Who is This Book For?

Notes

Acknowledgements

1 Fundamentals of Pattern Recognition

1.1 Basic Concepts: Class, Feature, Data Set

1.2 Classifier, Discriminant Functions, Classification Regions

1.3 Classification Error and Classification Accuracy

1.4 Experimental Comparison of Classifiers

1.5 Bayes Decision Theory

1.6 Clustering and Feature Selection

1.7 Challenges of Real-Life Data

Appendix

1.A.1 Data Generation

1.A.2 Comparison of Classifiers

1.A.3 Feature Selection

Notes

2 Base Classifiers

2.1 Linear and Quadratic Classifiers

2.2 Decision Tree Classifiers

2.3 The Naïve Bayes Classifier

2.4 Neural Networks

2.5 Support Vector Machines

2.6 The

k

-Nearest Neighbor Classifier (

k

-nn)

2.7 Final Remarks

Appendix

2.A.1 Matlab Code for the Fish Data

2.A.2 Matlab Code for Individual Classifiers

Notes

3 An Overview of the Field

3.1 Philosophy

3.2 Two Examples

3.3 Structure of the Area

3.4 Quo Vadis?

Notes

4 Combining Label Outputs

4.1 Types of Classifier Outputs

4.2 A Probabilistic Framework for Combining Label Outputs

4.3 Majority Vote

4.4 Weighted Majority Vote

4.5 NaÏve-Bayes Combiner

4.6 Multinomial Methods

4.7 Comparison of Combination Methods for Label Outputs

Appendix

4.A.1 Matan’s Proof for the Limits on the Majority Vote Accuracy

4.A.2 Selected Matlab Code

Notes

5 Combining Continuous-Valued Outputs

5.1 Decision Profile

5.2 How Do We Get Probability Outputs?

5.3 Nontrainable (Fixed) Combination Rules

5.4 The Weighted Average (Linear Combiner)

5.5 A Classifier as a Combiner

5.6 An Example of Nine Combiners for Continuous-Valued Outputs

5.7 To Train or Not to Train?

Appendix

5.A.1 Theoretical Classification Error for the Simple Combiners

5.A.2 Selected Matlab Code

Notes

6 Ensemble Methods

6.1 Bagging

6.2 Random Forests

6.3 Adaboost

6.4 Random Subspace Ensembles

6.5 Rotation Forest

6.6 Random Linear Oracle

6.7 Error Correcting Output Codes (ECOC)

Appendix

6.A.1 Bagging

6.A.2 AdaBoost

6.A.3 Random Subspace

6.A.4 Rotation Forest

6.A.5 Random Linear Oracle

6.A.6 Ecoc

Notes

7 Classifier Selection

7.1 Preliminaries

7.2 Why Classifier Selection Works

7.3 Estimating Local Competence Dynamically

7.4 Pre-Estimation of the Competence Regions

7.5 Simultaneous Training of Regions and Classifiers

7.6 Cascade Classifiers

Appendix: Selected Matlab Code

7.A.1 Banana Data

7.A.2 Evolutionary Algorithm for a Selection Ensemble for the Banana Data

8 Diversity in Classifier Ensembles

8.1 What is Diversity?

8.2 Measuring Diversity in Classifier Ensembles

8.3 Relationship Between Diversity and Accuracy

8.4 Using Diversity

8.5 Conclusions: Diversity of Diversity

Appendix

8.A.1 Derivation of Diversity Measures for Oracle Outputs

8.A.2 Diversity Measure Equivalence

8.A.3 Independent Outputs ≠ Independent Errors

8.A.4 Bound on the Kappa-Error Diagram

8.A.5 Calculation of the Pareto Frontier

Notes

9 Ensemble Feature Selection

9.1 Preliminaries

9.2 Ranking by Decision Tree Ensembles

9.3 Ensembles of Rankers

9.4 Random Feature Selection for the Ensemble

9.5 Nonrandom Selection

9.6 A Stability Index

Appendix

9.A.1 Matlab Code for the Numerical Example of Ensemble Ranking

9.A.2 Matlab GA Nuggets

9.A.3 Matlab Code for the Stability Index

10 A Final Thought

References

Index

End User License Agreement

List of Tables

Chapter 1

TABLE 1.1

TABLE 1.2

TABLE 1.3

TABLE 1.4

TABLE 1.A.1

Chapter 2

TABLE 2.1

Chapter 4

TABLE 4.1

TABLE 4.2

TABLE 4.3

TABLE 4.4

TABLE 4.5

TABLE 4.6

TABLE 4.7

TABLE 4.8

TABLE 4.9

Chapter 5

TABLE 5.1

TABLE 5.2

TABLE 5.3

TABLE 5.4

TABLE 5.5

Chapter 6

TABLE 6.1

TABLE 6.2

TABLE 6.3

Chapter 7

TABLE 7.1

Chapter 8

TABLE 8.1

TABLE 8.2

TABLE 8.3

TABLE 8.4

TABLE 8.5

TABLE 8.6

TABLE 8.7

Guide

Cover

Table of Contents

Preface

Pages

xv

xvi

xvii

xviii

xix

xx

xxi

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

46

47

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

87

89

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

353

354

355

356

357

PREFACE

Pattern recognition is everywhere. It is the technology behind automatically identifying fraudulent bank transactions, giving verbal instructions to your mobile phone, predicting oil deposit odds, or segmenting a brain tumour within a magnetic resonance image.

A decade has passed since the first edition of this book. Combining classifiers, also known as “classifier ensembles,” has flourished into a prolific discipline. Viewed from the top, classifier ensembles reside at the intersection of engineering, computing, and mathematics. Zoomed in, classifier ensembles are fuelled by advances in pattern recognition, machine learning and data mining, among others. An ensemble aggregates the “opinions” of several pattern classifiers in the hope that the new opinion will be better than the individual ones. Vox populi, vox Dei.

The interest in classifier ensembles received a welcome boost due to the high-profile Netflix contest. The world’s research creativeness was challenged using a difficult task and a substantial reward. The problem was to predict whether a person will enjoy a movie based on their past movie preferences. A Grand Prize of $1,000,000 was to be awarded to the team who first achieved a 10% improvement on the classification accuracy of the existing system Cinematch. The contest was launched in October 2006, and the prize was awarded in September 2009. The winning solution was nothing else but a rather fancy classifier ensemble.

What is wrong with the good old single classifiers? Jokingly, I often put up a slide in presentations, with a multiple-choice question. The question is “Why classifier ensembles?” and the three possible answers are:

because we like to complicate entities beyond necessity (anti-Occam’s razor);

because we are lazy and stupid and cannot be bothered to design and train one single sophisticated classifier; and

because democracy is so important to our society, it must be important to classification.

Funnily enough, the real answer hinges on choice (b). Of course, it is not a matter of laziness or stupidity, but the realization that a complex problem can be elegantly solved using simple and manageable tools. Recall the invention of the error backpropagation algorithm followed by the dramatic resurfacing of neural networks in the 1980s. Neural networks were proved to be universal approximators with unlimited flexibility. They could approximate any classification boundary in any number of dimensions. This capability, however, comes at a price. Large structures with a vast number of parameters have to be trained. The initial excitement cooled down as it transpired that massive structures cannot be easily trained with sufficient guarantees of good generalization performance. Until recently, a typical neural network classifier contained one hidden layer with a dozen neurons, sacrificing the so acclaimed flexibility but gaining credibility. Enter classifier ensembles! Ensembles of simple neural networks are among the most versatile and successful ensemble methods.

But the story does not end here. Recent studies have rekindled the excitement of using massive neural networks drawing upon hardware advances such as parallel computations using graphics processing units (GPU) [75]. The giant data sets necessary for training such structures are generated by small distortions of the available set. These conceptually different rival approaches to machine learning can be regarded as divide-and-conquer and brute force, respectively. It seems that the jury is still out about their relative merits. In this book we adopt the divide-and-conquer approach.

THE PLAYING FIELD

Writing the first edition of the book felt like the overwhelming task of bringing structure and organization to a hoarder’s attic. The scenery has changed markedly since then. The series of workshops on Multiple Classifier Systems (MCS), run since 2000 by Fabio Roli and Josef Kittler [338], served as a beacon, inspiration, and guidance for experienced and new researchers alike. Excellent surveys shaped the field, among which are the works by Polikar [311], Brown [53], and Valentini and Re [397]. Better still, four recent texts together present accessible, in-depth, comprehensive, and exquisite coverage of the classifier ensemble area: Rokach [335], Zhou [439], Schapire and Freund [351], and Seni and Elder [355]. This gives me the comfort and luxury to be able to skim over topics which are discussed at length and in-depth elsewhere, and pick ones which I believe deserve more exposure or which I just find curious.

As in the first edition, I have no ambition to present an accurate snapshot of the state of the art. Instead, I have chosen to explain and illustrate some methods and algorithms, giving sufficient detail so that the reader can reproduce them in code. Although I venture an opinion based on general consensus and examples in the text, this should not be regarded as a guide for preferring one method to another.

SOFTWARE

A rich set of classifier ensemble methods is implemented in WEKA1 [167], a collection of machine learning algorithms for data-mining tasks. PRTools2 is a MATLAB toolbox for pattern recognition developed by the Pattern Recognition Research Group of the TU Delft, The Netherlands, led by Professor R. P. W. (Bob) Duin. An industry-oriented spin-off toolbox, called “perClass”3 was designed later. Classifier ensembles feature prominently in both packages.

PRTools and perClass are instruments for advanced MATLAB programmers and can also be used by practitioners after a short training. The recent edition of MATLAB Statistics toolbox (2013b) includes a classifier ensemble suite as well.

Snippets of MATLAB DIY (do-it-yourself) code for illustrating methodologies and concepts are given in the chapter appendices. MATLAB was seen as a suitable language for such illustrations because it often looks like executable pseudo-code. A programming language is like a living creature—it grows, develops, changes, and breeds. The code in the book is written by today’s versions, styles, and conventions. It does not, by any means, measure up to the richness, elegance, and sophistication of PRTools and perClass. Aimed at simplicity, the code is not fool-proof nor is it optimized for time or other efficiency criteria. Its sole purpose is to enable the reader to grasp the ideas and run their own small-scale experiments.

STRUCTURE AND WHAT IS NEW IN THE SECOND EDITION

The book is organized as follows.

Chapter 1, Fundamentals, gives an introduction of the main concepts in pattern recognition, Bayes decision theory, and experimental comparison of classifiers. A new treatment of the classifier comparison issue is offered (after Demšar [89]). The discussion of bias and variance decomposition of the error which was given in a greater level of detail in Chapter 7 before (bagging and boosting) is now briefly introduced and illustrated in Chapter 1.

Chapter 2, Base Classifiers, contains methods and algorithms for designing the individual classifiers. In this edition, a special emphasis is put on the stability of the classifier models. To aid the discussions and illustrations throughout the book, a toy two-dimensional data set was created called the fish data. The Naïve Bayes classifier and the support vector machine classifier (SVM) are brought to the fore as they are often used in classifier ensembles. In the final section of this chapter, I introduce the triangle diagram that can enrich the analyses of pattern recognition methods.

Chapter 3, Multiple Classifier Systems, discusses some general questions in combining classifiers. It has undergone a major makeover. The new final section, “Quo Vadis?,” asks questions such as “Are we reinventing the wheel?” and “Has the progress thus far been illusory?” It also contains a bibliometric snapshot of the area of classifier ensembles as of January 4, 2013 using Thomson Reuters’ Web of Knowledge (WoK).

Chapter 4, Combining Label Outputs, introduces a new theoretical framework which defines the optimality conditions of several fusion rules by progressively relaxing an assumption. The Behavior Knowledge Space method is trimmed down and illustrated better in this edition. The combination method based on singular value decomposition (SVD) has been dropped.

Chapter 5, Combining Continuous-Valued Outputs, summarizes classifier fusion methods such as simple and weighted average, decision templates and a classifier used as a combiner. The division of methods into class-conscious and class-independent in the first edition was regarded as surplus and was therefore abandoned.

Chapter 6, Ensemble Methods, grew out of the former Bagging and Boosting chapter. It now accommodates on an equal keel the reigning classics in classifier ensembles: bagging, random forest, AdaBoost and random subspace, as well as a couple of newcomers: rotation forest and random oracle. The Error Correcting Output Code (ECOC) ensemble method is included here, having been cast as “Miscellanea” in the first edition of the book. Based on the interest in this method, as well as its success, ECOC’s rightful place is together with the classics.

Chapter 7, Classifier Selection, explains why this approach works and how classifier competence regions are estimated. The chapter contains new examples and illustrations.

Chapter 8, Diversity, gives a modern view on ensemble diversity, raising at the same time some old questions, which are still puzzling the researchers in spite of the remarkable progress made in the area. There is a frighteningly large number of possible “new” diversity measures, lurking as binary similarity and distance measures (take for example Choi et al.’s study [74] with 76, s-e-v-e-n-t-y s-i-x, such measures). And we have not even touched the continuous-valued outputs and the possible diversity measured from those. The message in this chapter is stronger now: we hardly need any more diversity measures; we need to pick a few and learn how to use them. In view of this, I have included a theoretical bound on the kappa-error diagram [243] which shows how much space is still there for new ensemble methods with engineered diversity.

Chapter 9, Ensemble Feature Selection, considers feature selection by the ensemble and for the ensemble. It was born from a section in the former Chapter 8, Miscellanea. The expansion was deemed necessary because of the surge of interest to ensemble feature selection from a variety of application areas, notably so from bioinformatics [346]. I have included a stability index between feature subsets or between feature rankings [236].

I picked a figure from each chapter to create a small graphical guide to the contents of the book as illustrated in Figure 1.

FIGURE 1 The book chapters at a glance.

The former Theory chapter (Chapter 9) was dissolved; parts of it are now blended with the rest of the content of the book. Lengthier proofs are relegated to the respective chapter appendices. Some of the proofs and derivations were dropped altogether, for example, the theory behind the magic of AdaBoost. Plenty of literature sources can be consulted for the proofs and derivations left out.

The differences between the two editions reflect the fact that the classifier ensemble research has made a giant leap; some methods and techniques discussed in the first edition did not withstand the test of time, others were replaced with modern versions. The dramatic expansion of some sub-areas forced me, unfortunately, to drop topics such as cluster ensembles and stay away from topics such as classifier ensembles for: adaptive (on-line) learning, learning in the presence of concept drift, semi-supervised learning, active learning, handing imbalanced classes and missing values. Each of these sub-areas will likely see a bespoke monograph in a not so distant future. I look forward to that.

I am humbled by the enormous volume of literature on the subject, and the ingenious ideas and solutions within. My sincere apology to those authors, whose excellent research into classifier ensembles went without citation in this book because of lack of space or because of unawareness on my part.

WHO IS THIS BOOK FOR?

The book is suitable for postgraduate students and researchers in computing and engineering, as well as practitioners with some technical background. The assumed level of mathematics is minimal and includes a basic understanding of probabilities and simple linear algebra. Beginner’s MATLAB programming knowledge would be beneficial but is not essential.

Ludmila I. Kuncheva

Bangor, Gwynedd, UK

December 2013

NOTES

1.

http://www.cs.waikato.ac.nz/ml/weka/

2.

http://prtools.org/

3.

http://perclass.com/index.php/html/

ACKNOWLEDGEMENTS

I am most sincerely indebted to Gavin Brown, Juan Rodríguez, and Kami Kountcheva for scrutinizing the manuscript and returning to me their invaluable comments, suggestions, and corrections. Many heartfelt thanks go to my family and friends for their constant support and encouragement. Last but not least, thank you, my reader, for picking up this book.

Ludmila I. Kuncheva

Bangor, Gwynedd, UK

December 2013