88,99 €
The confluence of cloud computing, parallelism and advanced machine intelligence approaches has created a world in which the optimum knowledge system will usually be architected from the combination of two or more knowledge-generating systems. There is a need, then, to provide a reusable, broadly-applicable set of design patterns to empower the intelligent system architect to take advantage of this opportunity.
This book explains how to design and build intelligent systems that are optimized for changing system requirements (adaptability), optimized for changing system input (robustness), and optimized for one or more other important system parameters (e.g., accuracy, efficiency, cost). It provides an overview of traditional parallel processing which is shown to consist primarily of task and component parallelism; before introducing meta-algorithmic parallelism which is based on combining two or more algorithms, classification engines or other systems.
Key features:
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 752
Veröffentlichungsjahr: 2013
Contents
Cover
Title Page
Copyright
Acknowledgments
Chapter 1: Introduction and Overview
1.1 Introduction
1.2 Why Is This Book Important?
1.3 Organization of the Book
1.4 Informatics
1.5 Ensemble Learning
1.6 Machine Learning/Intelligence
1.7 Artificial Intelligence
1.8 Data Mining/Knowledge Discovery
1.9 Classification
1.10 Recognition
1.11 System-Based Analysis
1.12 Summary
References
Chapter 2: Parallel Forms of Parallelism
2.1 Introduction
2.2 Parallelism by Task
2.3 Parallelism by Component
2.4 Parallelism by Meta-algorithm
2.5 Summary
References
Chapter 3: Domain Areas: Where Are These Relevant?
3.1 Introduction
3.2 Overview of the Domains
3.3 Primary Domains
3.4 Secondary Domains
3.5 Summary
References
Chapter 4: Applications of Parallelism by Task
4.1 Introduction
4.2 Primary Domains
4.3 Summary
References
Chapter 5: Application of Parallelism by Component
5.1 Introduction
5.2 Primary Domains
5.3 Summary
References
Chapter 6: Introduction to Meta-algorithmics
6.1 Introduction
6.2 First-Order Meta-algorithmics
6.3 Second-Order Meta-algorithmics
6.4 Third-Order Meta-algorithmics
6.5 Summary
References
Chapter 7: First-Order Meta-algorithmics and Their Applications
7.1 Introduction
7.2 First-Order Meta-algorithmics and the “Black Box”
7.3 Primary Domains
7.4 Secondary Domains
7.5 Summary
References
Chapter 8: Second-Order Meta-algorithmics and Their Applications
8.1 Introduction
8.2 Second-Order Meta-algorithmics and Targeting the “Fringes”
8.3 Primary Domains
8.4 Secondary Domains
8.5 Summary
References
Chapter 9: Third-Order Meta-algorithmics and Their Applications
9.1 Introduction
9.2 Third-Order Meta-algorithmic Patterns
9.3 Primary Domains
9.4 Secondary Domains
9.5 Summary
References
Chapter 10: Building More Robust Systems
10.1 Introduction
10.2 Summarization
10.3 Cloud Systems
10.4 Mobile Systems
10.5 Scheduling
10.6 Classification
10.7 Summary
Reference
Chapter 11: The Future
11.1 Recapitulation
11.2 The Pattern of All Patience
11.3 Beyond the Pale
11.4 Coming Soon
11.5 Summary
References
Index
© 2013 John Wiley & Sons, Ltd
Registered office
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Simske, Steven J. Meta-algorithmics : patterns for robust, low-cost, high-quality systems / Dr. Steven J. Simske, Hewlett-Packard Labs. pages cm ISBN 978-1-118-34336-4 (hardback) 1. Computer algorithms. 2. Parallel algorithms. 3. Heuristic programming. 4. Computer systems–Costs. 5. Computer systems–Quality control. I. Title. QA76.9.A43S543 2013 005.1–dc23
2013004488
A catalogue record for this book is available from the British Library.
ISBN: 9781118343364
Acknowledgments
The goals of this book were ambitious—perhaps too ambitious—both in breadth (domains addressed) and depth (number and variety of parallel processing and meta-algorithmic patterns). The book represents, or at least builds on, the work of many previous engineers, scientists and knowledge workers. Undoubtedly most, if not all, of the approaches in this book have been elaborated elsewhere, either overtly or in some disguise. One contribution of this book is to bring these disparate approaches together in one place, systematizing the design of intelligent parallel systems. In progressing from the design of parallel systems using traditional by-component and by-task approaches to meta-algorithmic parallelism, the advantages of hybridization for system accuracy, robustness and cost were shown.
I have a lot of people to thank for making this book a reality. First and foremost, I'd like to thank my wonderful family—Tess, Kieran and Dallen—for putting up with a year's worth of weekends and late nights spent designing and running the many “throw away” experiments necessary to illustrate the application of meta-algorithmics (not to mention that little thing of actually writing). Their patience and support made all the difference! Thanks, your presence in my life makes writing this book worthwhile.
I also thank Hewlett Packard (HP), my employer, for the go-ahead to write this book. While my “day job” work load was in no way lightened during the writing of the book (for one example, as I write this I have 150% more direct reports than I did when the contract to write this book was signed nearly a year and a half ago!), HP did streamline the contract and in no way micromanaged the process. Sometimes the biggest help is just getting out of the way, and I appreciate it. Special thanks to Yan Liu, Qin Zhu Lippert, Keith Moore and Eric Hanson for their support along the (HP) way.
The editorial staff at John Wiley & Sons has been tremendous. In particular Alex King, my main editor and the person who convinced me to write this book in the first place, has been a delight. Baljinder Kaur has been tremendous in finding typographical and logical errors during the editorial process. Her sharp eye and wit have both made the process a snap. I'd also like to thank Genna Manaog, Richard Davies, Liz Wingett, and Claire Bailey for their help.
Most of the photos and all of the tables and diagrams in the book were my creation, with one notable exception. There is an excellent scan of a 1996 brochure for the Cheyenne Mountain Zoo which I have used extensively to illustrate mixed-region segmentation. Thanks to Erica Meyer for providing copyright permission, not to mention for helping run one of the country's most unique attractions.
Have you found a mistake or two in reading the book? If not, you, like me, have Jason Aronoff to thank. Jason read each chapter after I finished the first draft, and his excellent and very timely feedback allowed me to send a second draft to Wiley (before deadline! I've heard that “never happens” from my friend and fellow author Bob Ulichney), where otherwise a very sloppy, choppy first draft would have had to suffice. Thanks to Marie Vans for her extra pair of eyes on the proofs.
On the science of meta-algorithmics, big thanks go to Sherif Yacoub, who framed out several of the patterns with me more than a decade ago. His analytical and design expertise greatly affected Chapter 7 in particular. I'd also like to thank Xiaofan Lin for excellent collaboration on various meta-algorithmic experiments (part of speech tagging and OCR, for example), not to mention his great leadership on voting patterns. My friend and colleague Igor Boyko worked with me on early meta-algorithmic search approaches. Yan Xiong also worked on several of the original experiments, and in particular discovered hybrid ways to perform journal splitting. John Burns led the team comprising all these übersmart researchers, and was tremendously supportive of early work.
I would be remiss at best to fail to mention Doug Heins, my friend and confidant, who has the most meta-algorithmic mind of anyone I know. That's right, it has improved accuracy, robustness and cost (yes cost—I owe a lot to him, but to date he has not charged me!). My deep thanks also to Dave Wright, who has extended meta-algorithmics to fantasy football and other areas. In addition to his great insights during the kernel meta-algorithmic text classification work, Dave continues to be a source of wisdom and perspective for me.
I can only begin to thank all my wonderful collaborators in the various domains—imaging to security to biometrics to speech analysis—covered in part in this book. Particular mention, however, goes to Guy Adams, Stephen Pollard, Reed Ayers, Henry Sang, Dave Isaacson, Marv Luttges, David Auter, Dalong Li and Matt Gaubatz. I wish to separately thank Margaret Sturgill, with whom I have collaborated for 18 years in various hybrid system architecture, classification and imaging projects.
Finally, a huge thanks to my many supportive friends, including Dave Barry (the man of positive energy), Jay Veazey (the wise mentor and font of insight), Joost van Der Water, Dave Klaus, Mick Keyes, Helen Balinsky, Gary Dispoto and Ellis Gayles, who have encouraged me throughout the book creation process. I hope this does not disappoint!
If you perform a Tessellation and Recombination pattern on the above paragraphs, the output would be quite obvious. I am a lucky man indeed. Thanks so much!
Steve Simske17 April 2013
1
Introduction and Overview
Plus ça change, plus c’est la meme chose.
–Jean-Baptiste Alphonse Karr (1849)
There's even exponential growth in the rate of exponential growth.
–Ray Kurzweil (2001)
1.1 Introduction
Services, businesses, analytics, and other types of data services have moved from workstations, local area networks (LANs), and in-house IT infrastructure to the Internet, mobile devices, and more recently “the Cloud.” This has broad implications in the privacy, security, versioning, and ultimately the long-term fate of data. These changes, however, provide a welcome opportunity for reconsidering the manner in which intelligent systems are designed, built and tested, deployed, and optimized during deployment. With the advent of powerful machine learning capabilities in the past two decades, it has become clear to the research community that learning algorithms, and systems based in all or part on these algorithms, are not only possible but also essential for modern business. However, the combined impact of mobile devices, ubiquitous services, and the cloud comprise a fundamental change in how systems themselves can—and I argue should—be designed. Services themselves can be transformed into learning systems, adaptive not just in terms of the specific parameters of their applications and algorithms but also in the repertoire (or set) and relationship (or architecture) between multiple applications and algorithms in the service.
With the nearly unlimited computing and data hosting possibilities now feasible, hitherto processor-bound and memory-bound applications, services, and decision-making (actionable analytics) approaches are now freed from many of their limitations. In fact, the cloud- and graphical processing unit (GPU)-based computation have made possible parallel processing on a grand scale. In recognition of this new reality, this book focuses on the algorithmic, analytic, and system patterns that can be used to better take advantage of this new norm of parallelism, and will help to move the fields of machine learning, analytics, inference, and classification to more squarely align with this new norm.
In this chapter, I overview at an often high, but thematic, level the broad fields of machine intelligence, artificial intelligence, data mining, classification, recognition, and systems-based analysis. Standing on the shoulders of giants who have pioneered these fields before this book, the intent is to highlight the salient differences in multiple approaches to useful solutions in each of these arenas. Through this approach, I intend to engage all interested readers—from the interested newcomer to the field of intelligent systems design to the expert with far deeper experience than myself in one or more of these arenas—in the central themes of this book. In short, these themes are:
1.2 Why Is This Book Important?
Jean-Baptiste Alphonse Karr, right after the 1848 Revolutions rocked Europe, made the famous observation that “the more that things change, the more they stay the same.” This statement anticipated Darwin's treatise on the Origin of Species by a decade, and is germane to this day. In the fast-changing world of the twenty-first century, in which Ray Kurzweil's musing on the rapidly increasing growth in the rate of growth is nearly cliché and Luddite musings on humanity losing control of data are de rigueur, perhaps it may be time to reconsider how large systems are architected. Designing a system to be robust to change—to anticipate change—may also be the right path to designing a system that is optimized for accuracy, cost, and other important performance parameters. One objective of this book is to provide a straightforward means of designing and building intelligent systems that are optimized for changing system requirements (adaptability), optimized for changing system input (robustness), and optimized for one or more other important system parameters (e.g., accuracy, efficiency, and cost). If such an objective can be achieved, then rather than being insensitive to change, the system will benefit from change. This is important because more and more, every system of value is actually an intelligent system.
The vision of this book is to provide a practical, systems-oriented, statistically driven approach to parallel—and specifically meta-algorithmics-driven—machine intelligence, with a particular emphasis on classification and recognition. Three primary types of parallelism will be considered: (1) parallelism by task—that is, the assignment of multiple, usually different tasks to parallel pipelines that would otherwise be performed sequentially by the same processor; (2) parallelism by component—wherein a larger machine intelligence task is assigned to a set of parallel pipelines, each performing the same task but on a different data set; and (3) parallelism by meta-algorithmics. This last topic—parallelism by meta-algorithmics—is in practice far more open to art as it is still both art and science. In this book, I will show how meta-algorithmics extend the more traditional forms of parallelism and, as such, can complement the other forms of parallelism to create better systems.
1.3 Organization of the Book
The book is organized in 11 chapters. In this first chapter, I provide the aims of the book and connect the material to the long, impressive history of research in other fields salient to intelligent systems. This is accomplished by reviewing this material in light of the book's perspective. In Chapter 2, I provide an overview of parallelism, especially considering the impact of GPUs, multi-core processors, virtualism, and cloud computing on the fundamental approaches for intelligent algorithm, system and service design. I complete the overview chapters of the book in Chapter 3, wherein I review the application domains within which I will be applying the different forms of parallelism in later chapters. This includes primary domains of focus selected to illustrate the depth of the approaches, and secondary domains to illustrate more fully the breadth. The primary domains are (1) document understanding, (2) image understanding, (3) biometrics, and (4) security printing. The secondary domains are (1) image segmentation, (2) speech recognition, (3) medical signal processing, (4) medical imaging, (5) natural language processing (NLP), (6) surveillance, (7) optical character recognition (OCR), and (8) security analytics. Of these primary and secondary domains, I end in each case with the security-related topics, as they provide perhaps the broadest, most interdisciplinary needs, thus affording an excellent opportunity to illustrate the design and development of complex systems.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
