111,99 €
Reinforcement and Systemic Machine Learning for Decision Making There are always difficulties in making machines that learn from experience. Complete information is not always available--or it becomes available in bits and pieces over a period of time. With respect to systemic learning, there is a need to understand the impact of decisions and actions on a system over that period of time. This book takes a holistic approach to addressing that need and presents a new paradigm--creating new learning applications and, ultimately, more intelligent machines. The first book of its kind in this new and growing field, Reinforcement and Systemic Machine Learning for Decision Making focuses on the specialized research area of machine learning and systemic machine learning. It addresses reinforcement learning and its applications, incremental machine learning, repetitive failure-correction mechanisms, and multiperspective decision making. Chapters include: * Introduction to Reinforcement and Systemic Machine Learning * Fundamentals of Whole-System, Systemic, and Multiperspective Machine Learning * Systemic Machine Learning and Model * Inference and Information Integration * Adaptive Learning * Incremental Learning and Knowledge Representation * Knowledge Augmentation: A Machine Learning Perspective * Building a Learning System With the potential of this paradigm to become one of the more utilized in its field, professionals in the area of machine and systemic learning will find this book to be a valuable resource.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 437
Veröffentlichungsjahr: 2012
Contents
Cover
Series Page I
Title Page
Copyright
Dedication
Preface
Acknowledgments
About the Author
Chapter 1: Introduction to Reinforcement and Systemic Machine Learning
1.1 Introduction
1.2 Supervised, Unsupervised, and Semisupervised Machine Learning
1.3 Traditional Learning Methods and History of Machine Learning
1.4 What is Machine Learning?
1.5 Machine-Learning Problem
1.6 Learning Paradigms
1.7 Machine-Learning Techniques and Paradigms
1.8 What is Reinforcement Learning?
1.9 Reinforcement Function and Environment Function
1.10 Need of Reinforcement Learning
1.11 Reinforcement Learning and Machine Intelligence
1.12 What is Systemic Learning?
1.13 What Is Systemic Machine Learning?
1.14 Challenges in Systemic Machine Learning
1.15 Reinforcement Machine Learning and Systemic Machine Learning
1.16 Case Study Problem Detection in a Vehicle
1.17 Summary
Reference
Chapter 2: Fundamentals of Whole-System, Systemic, and Multiperspective Machine Learning
2.1 Introduction
2.2 What is Systemic Machine Learning?
2.3 Generalized Systemic Machine-Learning Framework
2.4 Multiperspective Decision Making and Multiperspective Learning
2.5 Dynamic and Interactive Decision Making
2.6 The Systemic Learning Framework
2.7 System Analysis
2.8 Case Study: Need of Systemic Learning in the Hospitality Industry
2.9 Summary
References
Chapter 3: Reinforcement Learning
3.1 Introduction
3.2 Learning Agents
3.3 Returns and Reward Calculations
3.4 Reinforcement Learning and Adaptive Control
3.5 Dynamic Systems
3.6 Reinforcement Learning and Control
3.7 Markov Property and Markov Decision Process
3.8 Value Functions
3.9 Learning An Optimal Policy (Model-Based and Model-Free Methods)
3.10 Dynamic Programming
3.11 Adaptive Dynamic Programming
3.12 Example: Reinforcement Learning for Boxing Trainer
3.13 Summary
Reference
Chapter 4: Systemic Machine Learning and Model
4.1 Introduction
4.2 A Framework for Systemic Learning
4.3 Capturing THE Systemic View
4.4 Mathematical Representation of System Interactions
4.5 Impact Function
4.6 Decision-Impact Analysis
4.7 Summary
Chapter 5: Inference and Information Integration
5.1 Introduction
5.2 Inference Mechanisms and Need
5.3 Integration of Context and Inference
5.4 Statistical Inference and Induction
5.5 Pure Likelihood Approach
5.6 Bayesian Paradigm and Inference
5.7 Time-Based Inference
5.8 Inference to Build a System View
5.9 Summary
References
Chapter 6: Adaptive Learning
6.1 Introduction
6.2 Adaptive Learning and Adaptive Systems
6.3 What is Adaptive Machine Learning?
6.4 Adaptation and Learning Method Selection Based on Scenario
6.5 Systemic Learning and Adaptive Learning
6.6 Competitive Learning and Adaptive Learning
6.7 Examples
6.8 Summary
References
Chapter 7: Multiperspective and Whole-System Learning
7.1 Introduction
7.2 Multiperspective Context Building
7.3 Multiperspective Decision Making And Multiperspective Learning
7.4 Whole-System Learning And Multiperspective Approaches
7.5 Case Study Based On Multiperspective Approach
7.6 Limitations To A Multiperspective Approach
7.7 Summary
References
Chapter 8: Incremental Learning and Knowledge Representation
8.1 Introduction
8.2 Why incremental learning?
8.3 Learning From What Is Already LearnED. . .
8.4 Supervised Incremental Learning
8.5 Incremental Unsupervised Learning And Incremental Clustering
8.6 Semisupervised Incremental Learning
8.7 Incremental and Systemic Learning
8.8 Incremental Closeness Value and Learning Method
8.9 Learning and Decision-Making Model
8.10 Incremental Classification Techniques
8.11 Case Study: Incremental Document Classification
8.12 Summary
Chapter 9: Knowledge Augmentation: A Machine Learning Perspective
9.1 Introduction
9.2 Brief History And Related Work
9.3 Knowledge Augmentation And Knowledge Elicitation
9.4 Life Cycle Of Knowledge
9.5 Incremental Knowledge Representation
9.6 Case-Based Learning And Learning With Reference To Knowledge Loss
9.7 Knowledge Augmentation: Techniques And Methods
9.8 Heuristic Learning
9.9 Systemic Machine Learning And Knowledge Augmentation
9.10 Knowledge Augmentation In Complex Learning Scenarios
9.11 Case Studies
9.12 Summary
References
Chapter 10: Building a Learning System
10.1 Introduction
10.2 Systemic Learning System
10.3 Algorithm Selection
10.4 Knowledge Representation
10.5 Designing A Learning System
10.6 Making System To Behave Intelligently
10.7 Example-Based Learning
10.8 Holistic Knowledge Framework and Use of Reinforcement Learning
10.9 Intelligent Agents—Deployment And Knowledge Acquisition And Reuse
10.10 Case-Based Learning: Human Emotion-Detection System
10.11 Holistic View in Complex Decision Problem
10.12 Knowledge Representation and Data Discovery
10.13 Components
10.14 Future of Learning Systems and Intelligent Systems
10.15 Summary
Appendix A: Statistical Learning Methods
A.1 Probability
A.2 Bayesian Classification
A.3 Regression
A.4 Rough Sets
A.5 Support Vector Machines
References
Appendix B: Markov Processes
B.1 Markov Processes
B.2 Semi-Markov Process
Index
Series Page II
IEEE Press
445 Hoes Lane
Piscataway, NJ 08855
IEEE Press Editorial Board
John B. Anderson, Editor in Chief
R. Abhari G. W. Arnold F. Canavero
D. Goldof B-M. Haemmerli D. Jacobson
M. Lanzerotti O. P. Malik S. Nahavandi
T. Samad G. Zobrist
Kenneth Moore, Director of IEEE Book and Information Services (BIS)
Copyright © 2012 by the Institute of Electrical and Electronics Engineers, Inc.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey. All rights reserved.
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Kulkarni, Parag.
Reinforcement and systemic machine learning for decision making / Parag
Kulkarni.
p. cm. – (IEEE series on systems science and engineering ; 1)
ISBN 978-0-470-91999-6.
1. Reinforcement learning. 2. Machine learning. 3. Decision Making. I. Title.
Q325.6.K85 2012
006.3′1–dc23
2011043300
Dedicated to the late D.B. Joshi and the late Savitri Joshi,who inspired me to think differently
Preface
There has been movement for years to make machines intelligent. This movement began long ago, even long before the computer era. Event-based intelligence in those days was incorporated in appliances or the ensemble of appliances. This intelligence was very much guided, and human intervention was mandatory. Even feedback control systems are a rudimentary form of intelligent system. Later adaptive control systems and hybrid control systems added flair of intelligence in these systems. This movement has received more attention with the advent of computers. Simple event-based learning with computers became a part of many intelligent systems very quickly. The expectation from intelligent systems kept on increasing. This led to one of the very well-received paradigms of learning, which is pattern-based learning. This allowed the systems to exhibit intelligence in many practical scenarios. It included patterns of weather, patterns of occupancy, and different patterns where patterns could help to make decisions. This paradigm evolved into a paradigm of behavioral pattern-based learning. This was more a behavioral pattern than a simple pattern of a particular measurement parameter. Behavioral patterns attempted to give a better picture and insight. This helped to learn and make decisions in case of networks and business scenarios. This took the intelligent systems to the next level. Learning is a manifestation of intelligence. Making machines to learn is a major part of the movement to make machines intelligent.
The complexities in decision scenarios and making machines to learn in complex scenarios raised many questions on the intelligence of a machine. Learning in isolation is never complete. Human beings learn in groups, develop colonies, and interact to build intelligence. The collective and cooperative learning of humans allows them to achieve supremacy. Furthermore, humans learn in association with the environment. They interact with the environment and receive feedback in the form of a reward or penalty. Their learning in association gives them power for exploration-based learning. Exploitation of already learned facts and exploration with reference to actions takes place. The paradigm of reinforcement learning added a new dimension to learning and could cover many new aspects of learning required for dynamic scenarios.
As mentioned by Rutherford D. Roger: “We are drowning in information and starving for knowledge.” More and more information becomes available for our disposal. This information is in heterogeneous forms. There are many information sources and numerous learning opportunities. The practical assumptions while learning can make learning restrictive. Actually there are relationships among different parts of the system, and one of the basic principles of system thinking states is that the cause and effect are separated in time and space. The impact of the decision or any action can be felt beyond visible boundaries. Failing to consider this systemic aspect and relationship will lead to many limitations while learning, and hence the traditional learning paradigms suffer in highly dynamic and complex real-life problems. The holistic view and understanding of the interdependencies and intradependencies can help us to learn many new aspects and understand, analyze, and interpret the information in a more realistic way. The aspect of learning based on available information, building new information, mapping it to knowledge, and understanding different perspectives while learning can really help to make learning more effective. Learning is not just getting more data and arranging that data. It is not even building more information. Basically, the purpose of learning is to empower individuals to make better decisions and to improve their ability to create value. In machine learning, there is a need to expand the ability of machines with reference to different information sources and learning opportunities. In machine learning, it is also about empowering machines to make better decisions and improving their ability to create value.
This book is an attempt to put forth a new paradigm of systemic machine learning and research opportunities in machine learning with reference to different aspects of machine learning. The book tries to build the foundation for systemic machine learning with elaborate case studies. Machine learning and artificial intelligence are interdisciplinary in nature. Right from statistics, mathematics, psychology, and computer engineering, many researchers contributed to this field to make it rich and achieve better results. Based on these numerous contributions and our research in machine learning field, this book tries to explore the concept of systemic machine learning. Systemic machine learning is holistic, multiperspective, incremental, and systemic. While learning we can learn different things from the same data sets, we can also learn from already learned facts, and there can be number of representations of knowledge. This book is an attempt to build a framework to make the best use of all information sources and build knowledge with reference to the complete system.
In many cases, the problem is not static. It changes with time and depends on environment, and the solution even depends on the decision context. Context may not be just limited to a few parameters, but the overall information about a problem builds the context. A general-purpose system without context may not be able to handle context-specific decision. This book discusses different facets of learning as well as the need of a new paradigm with reference to complex decision problems. The book can be used as a reference book for specialized research and can help readers and researchers to appreciate new paradigms of machine learning.
This book is organized as depicted in the following figure:
Chapter 1 introduces concepts of systemic and reinforcement machine learning. It builds a platform for the paradigm of systemic machine learning while highlighting the need of the same. Chapter 2 throws more light on the fundamentals of systemic machine learning, whole system learning, and multiperspective learning. Chapter 3 is about reinforcement learning while Chapter 4 deals with systemic machine learning and model building. The important aspects of decision making such as inference are covered in Chapter 5. Chapter 6 discusses adaptive machine learning and various aspects of adaptive machine learning. Chapter 7 discusses the paradigm of multiperspective machine learning and whole system learning. Chapter 8 addresses the need for incremental machine learning. Chapters 8 and 9 deal with knowledge representation and knowledge augmentation. Chapter 10 discusses the building learning system.
This book tries to include different facets of learning while introducing a new paradigm of machine learning. It deals with building knowledge through machine learning. This book is for those individuals who are planning to contribute to make a machine more intelligent by making it learn through new experiments, are ready to try new ways, and are open for a new paradigm for the same.
Parag Kulkarni
Acknowledgments
For the past two decades I have been working with various decision-making and AI-based IT product companies. During this period I worked on different Machine Learning algorithms and applied them for different applications. This work made me realize the need for a new paradigm for machine learning and the need for change in thinking. This built the foundation for this book and started the thought process for systemic machine learning. I am thankful to different organizations I worked with, including Siemens and IDeaS, and to my colleagues in those organizations. I would also like to acknowledge the support of my friends and coworkers.
I would like to thank my Ph.D. and M.Tech. students—Prachi, Yashodhara, Vinod, Sunita, Pramod, Nitin, Deepak, Preeti, Anagha, Shankar, Shweta, Basawraj, Shashikanth, and others—for their direct and indirect contribution that came through technical brainstorming. They are always ready to work on new ideas and contributed through collective learning. Special thanks to Prachi for her help in drawing diagrams and formatting the text.
I am thankful to Prof. Chande, the late Prof. Ramani, Dr. Sinha, Dr. Bhanu Prasad, Prof. Warnekar, and Prof. Navetia for useful comments and reviews. I am also thankful to Institutes such as COEP, PICT, GHRIET, PCCOE, DYP COE, IIM, Masaryk University, and so on, for allowing me to interact and present my thoughts in front of students. I am also thankful to IASTED, IACSIT, and IEEE for giving me the platform to present my research through various technical conferences. I am also thankful to reviewers of my research papers.
I am thankful to my mentor, teacher, and grandfather, the late D.B. Joshi, for motivating me to think differently. I also would like to take the opportunity to thank my mother. Most importantly I would like to thank my wife Mrudula and son Hrishikesh for their support, motivation, and help.
I am also thankful to IEEE/Wiley and the editorial team of IEEE/Wiley for their support and helping me to present my research, thoughts, and experiments in the form of a book.
Parag Kulkarni
About the Author
Parag Kulkarni, Ph.D. D.Sc., is CEO and Chief Scientist at EKLaT Research, Pune. He has more than two decades of experience in knowledge management, e-business, intelligent systems and machine learning consultation, research and product building. An alumnus of IIT Kharagpur and IIM Kolkata, Dr. Kulkarni has been a visiting professor at IIM Indore, visiting researcher at Masaryk University Czech Republic, and Adjunct Professor at the College of Engineering, Pune. He has headed companies, research labs, and groups at various IT companies including IDeaS, Siemens Information Systems Ltd., and Capilson, Pune, and ReasonEdge, Singapore. He has led many start-up companies to success through strategic innovation and research. The UGSM Monarch Business School, Switzerland, has conferred higher doctorate D.Sc. on Dr. Kulkarni. He is a coinventor of three patents and has coauthored more than 100 research papers and several books.
Chapter 1
Introduction to Reinforcement and Systemic Machine Learning
The expectations from intelligent systems are increasing day by day. What an intelligent system was supposed to do a decade ago is now expected from an ordinary system. Whether it is a washing machine or a health care system, we expect it to be more and more intelligent and demonstrate that behavior while solving complex as well as day-to-day problems. The applications are not limited to a particular domain and are literally distributed across all domains. Hence domain-specific intelligence is fine but the user has become demanding, and a true intelligent and problem-solving system irrespective of domains has become a necessary goal. We want the systems to drive cars, play games, train players, retrieve information, and help even in complex medical diagnosis. All these applications are beyond the scope of isolated systems and traditional preprogrammed learning. These activities need dynamic intelligence. Dynamic intelligence can be exhibited through learning not only based on available knowledge but also based on the exploration of knowledge through interactions with the environment. The use of existing knowledge, learning based on dynamic facts, and acting in the best way in complex scenarios are some of the expected features of intelligent systems.
The learning has many facets. Right from simple memorization of facts to complex inference are some examples of learning. But at any point of time, learning is a holistic activity and takes place around the objective of better decision-making. Learning results from data storing, sorting, mapping, and classification. Still one of the most important aspects of intelligence is learning. In most of the cases we expect learning to be a more goal-centric activity. Learning results from an inputs from an experienced person, one's own experience, and inference based on experiences or past learning. So there are three ways of learning:
Learning based on expert inputs (supervised learning)Learning based on own experienceLearning based on already learned factsIn this chapter, we will discuss the basics of reinforcement learning and its history. We will also look closely at the need of reinforcement learning. This chapter will discuss limitations of reinforcement learning and the concept of systemic learning. The systemic machine-learning paradigm is discussed along with various concepts and techniques. The chapter also covers an introduction to traditional learning methods. The relationship among different learning methods with reference to systemic machine learning is elaborated in this chapter. The chapter builds the background for systemic machine learning.
Learning that takes place based on a class of examples is referred to as supervised learning. It is learning based on labeled data. In short, while learning, the system has knowledge of a set of labeled data. This is one of the most common and frequently used learning methods. Let us begin by considering the simplest machine-learning task: supervised learning for classification. Let us take an example of classification of documents. In this particular case a learner learns based on the available documents and their classes. This is also referred to as labeled data. The program that can map the input documents to appropriate classes is called a classifier, because it assigns a class (i.e., document type) to an object (i.e., a document). The task of supervised learning is to construct a classifier given a set of classified training examples. A typical classification is depicted in Figure 1.1.
Figure 1.1 Supervised learning.
Figure 1.1 represents a hyperplane that has been generated after learning, separating two classes—class A and class B in different parts. Each input point presents input–output instance from sample space. In case of document classification, these points are documents. Learning computes a separating line or hyperplane among documents. An unknown document type will be decided by its position with respect to a separator.
There are a number of challenges in supervised classification such as generalization, selection of right data for learning, and dealing with variations. Labeled examples are used for training in case of supervised learning. The set of labeled examples provided to the learning algorithm is called the training set.
The classifier and of course the decision-making engine should minimize false positives and false negatives. Here false positives stand for the result yes---that is, classified in a particular group wrongly. False negative is the case where it should have been accepted as a class but got rejected. For example, apples not classified as apples is false negative, while an orange or some other fruit classified as an apple is false positive in the apple class. Another example of it is when guilty but not convicted is false positive, while innocent but convicted or declared innocent is false negative. Typically, wrongly classified are more harmful than unclassified elements.
If a classifier knew that the data consisted of sets or batches, it could achieve higher accuracy by trying to identify the boundary between two adjacent sets. It is true in the case of sets of documents to be separated from one another. Though it depends on the scenario, typically false negatives are more costly than false positives, so we might want the learning algorithm to prefer classifiers that make fewer false negative errors, even if they make more false positives as a result. This is so because false negative generally takes away the identity of the objects or elements that are classified correctly. It is believed that the false positive can be corrected in next pass, but there is no such scope for false negative.
Supervised learning is not just about classification, but it is the overall process that with guidelines maps to the most appropriate decision.
Unsupervised learning refers to learning from unlabeled data. It is based more on similarity and differences than on anything else. In this type of learning, all similar items are clustered together in a particular class where the label of a class is not known.
It is not possible to learn in a supervised way in the absence of properly labeled data. In these scenarios there is need to learn in an unsupervised way. Here the learning is based more on similarities and differences that are visible. These differences and similarities are mathematically represented in unsupervised learning.
Given a large collection of objects, we often want to be able to understand these objects and visualize their relationships. For an example based on similarities, a kid can separate birds from other animals. It may use some property or similarity while separating, such as the birds have wings. The criterion in initial stages is the most visible aspects of those objects. Linnaeus devoted much of his life to arranging living organisms into a hierarchy of classes, with the goal of arranging similar organisms together at all levels of the hierarchy. Many unsupervised learning algorithms create similar hierarchical arrangements based on similarity-based mappings. The task of hierarchical clustering is to arrange a set of objects into a hierarchy such that similar objects are grouped together. Nonhierarchical clustering seeks to partition the data into some number of disjoint clusters. The process of clustering is depicted in Figure 1.2. A learner is fed with a set of scattered points, and it generates two clusters with representative centroids after learning. Clusters show that points with similar properties and closeness are grouped together.
Figure 1.2 Unsupervised learning.
In practical scenarios there is always need to learn from both labeled and unlabeled data. Even while learning in an unsupervised way, there is the need to make the best use of labeled data available. This is referred to as semisupervised learning. Semisupervised learning is making the best use of two paradigms of learning—that is, learning based on similarity and learning based on inputs from a teacher. Semisupervised learning tries to get the best of both the worlds.
Learning is not just knowledge acquisition but rather a combination of knowledge acquisition, knowledge augmentation, and knowledge management. Furthermore, intelligent inference is essential for proper learning. Knowledge deals with significance of information and learning deals with building knowledge. How can a machine can be made to learn? This research question has been posed for more than six decades by researchers. The outcome of this research has built a platform for this chapter. Learning involves every activity. One such example, is the following: While going to the office yesterday, Ram found road repair work in progress on route one, so he followed route two today. It might be possible that route two is worse. Then he may go back to route one or might try route three. Route one is in bad shape due to repair work is knowledge built, and based on that knowledge he has taken action: following route 2, that is, exploration. The complexity of learning increases as the number of parameters and time dimensions start playing a role in decision making.
These new parameters make his decision much more complex as compared to scenario 1 and scenario 2 discussed above.
In this chapter, we will discuss various learning methods along with examples. The data and information used for learning are very important. The data cannot be used as is for learning. It may contain outliers and information about features that may not be relevant with respect to the problem one is trying to solve. The approaches for the selection of data for learning vary with the problems. In some cases the most frequent patterns are used for learning. Even in some cases, outliers are also used for learning. There can be learning based on exceptions. The learning can take place based on similarities as well as differences. The positive as well as negative examples help in effective learning. Various models are built for learning with the objective of exploiting the knowledge.
Learning is a continuous process. The new scenarios are observed and new situations arise—those need to be used for learning. Learning from observation needs to construct meaningful classification of observed objects and situation. Methods of measuring similarity and proximity are employed for this purpose. Learning from observations is the most commonly used method by human beings. While making decisions we may come across the scenarios and objects that we have not used or came across during a learning phase. The inference allows us to handle these scenarios. Furthermore, we need to learn in different and new scenarios and hence even while making decisions the learning continues.
There are three fundamental continuously active human-like learning mechanisms:
Traditional machine-learning approaches are susceptible to dynamic continual changes in the environment. However, perceptual learning in human does not have such restrictions. Learning in humans is selectively incremental, so it does not need a large training set and is simultaneously not biased by already learned but outdated facts. Learning and knowledge extraction in human beings is dynamic, and a human brain adapts to changes occurring in the environment continuously.
Interestingly, psychologists have played a major role in the development of machine-learning techniques. It has been a movement taken by computer researchers and psychologists together to make machines intelligent for more than six decades. The application areas are growing, and research done in the last six decades made us believe that it is one of the most interesting areas to make machines learn.
Machine learning is the study of methods for programming computers to learn. It is about making machines to behave intelligently and learn from experiences like human beings. In some tasks the human expert may not be required; this may include automated manufacturing or repetitive tasks with very few dynamic situations but demanding very high level of precision. A machine-learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist and are required, but the knowledge is present in a tacit form. Speech recognition and language understanding come under this category. Virtually all humans exhibit expert-level abilities on these tasks, but the exact method and steps to perform these tasks are not known. A set of inputs and outputs with mapping is provided in this case, and thus machine-learning algorithms can learn to map the inputs to the outputs.
Third, there are problems where phenomena are changing rapidly. In real life there are many dynamic scenarios. Here the situations and parameters are changing dynamically. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules.
Fourth, there are applications that need to be customized for each computer user separately. A machine-learning system can learn the customer-specific requirements and tune the parameters accordingly to get a customized version for a specific customer.
Machine learning addresses many of the research questions with the aid of statistics, data mining, and psychology. Machine learning is much more than just data mining and statistics. Machine learning (ML) as it stands today is the use of data mining and statistics for inferencing to make decisions or build knowledge to enable better decision making. Statistics is more about understanding data and the pattern between them. Data mining seeks the relevant data based on patterns for decision making and analysis. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people. At the end of the day, we want machine learning to empower machines with the learning abilities that are demonstrated by humans in complex scenarios. The psychological studies of human nature and the intelligence also contribute to different methods of machine learning. This includes concept learning, skill acquisition, strategy change, analytical inferences, and bias based on scenarios.
Machine learning is primarily concerned with the timely response, accuracy, and effectiveness of the resulting computer system. It many times does not take into account other aspects such as learning abilities and responding to dynamic situations, which are equally important. A machine-learning approach focuses on many complex applications such as building an accurate face recognition and authentication system. Statisticians, psychologists, and computer scientists may work together on this front. A data mining approach might look for patterns and variations in image data.
One of the major aspects of learning is the selection of learning data. All the information available for learning cannot be used as it is. It may contain a lot of data that may not be relevant or captured from a completely different perspective. Every bit of data cannot be used with the same importance and priority. The prioritization of the data is done based on scenarios, system significance, and relevance. The determination of relevance of these data is one of the most difficult parts of the process.
There are a number of challenges in making machines learn and making suitable decisions at the right time. The challenges start from the availability of limited learning data, unknown perspectives, and defining the decision problems. Let us take a simple example where a machine is expected to prescribe the right medicine to a patient. The learning set may include samples of patients, their histories, their test reports, and the symptoms reported by them. Furthermore, the data for learning may also include other information such as family history, habits, and so on. In case of a new patient, there is the need to infer based on available limited information because the manifestation of the same disease may be different in his case. Some key information might be missing, and hence decision making may become even more difficult.
When we look at the way a human being learns, we find many interesting aspects. Generally the learning takes place with understanding. It is facilitated when new and existing knowledge is structured around the major concepts and principles of the discipline. During the learning, either some principles are already there or developed in the process work as a guideline for learning. The learning also needs prior knowledge. Learners use what they already know to construct new understandings. This is more like building knowledge. Furthermore, there are different perspectives and metacognition. Learning is facilitated through the use of metacognitive strategies that identify, monitor, and regulate cognitive processes.
A general concept of machine learning is depicted in Figure 1.3. Machine learning studies computer algorithms for learning. We might, for instance, be interested in learning to complete a task, or to make accurate predictions, reactions in certain situations, or to behave intelligently. The learning that is being done is always based on some sort of observations or data, such as examples (the most common case in this course), direct experience, or instruction. So in general, machine learning is about learning to do better in the future based on what was experienced in the past. It is making a machine to learn from available information, experience, and knowledge building.
Figure 1.3 Machine learning and classification.
In the context of the present research, machine learning is the development of programs that allow us to analyze data from the various sources, select relevant data, and use those data to predict the behavior of the system in another similar and if possible different scenario. Machine learning also classifies objects and behaviors to finally impart the decisions for new input scenarios. The interesting part is that more learning and intelligence is required to deal with uncertain situations.
It can be easily concluded that all the problems that need intelligence to solve come under the category of machine-learning problems. Typical problems are character recognition, face authentication, document classification, spam filtering, speech recognition, fraud detection, weather forecasting, and occupancy forecasting. Interestingly, many problems that are more complex and involve decision making can be considered as machine-learning problems as well. These problems typically involve learning from experiences and data, and search for the solutions in known as well as unknown search spaces. It may involve the classification of objects, problems, and mapping them to solutions or decisions. Even classification of any type of objects or events is also a machine-learning problem.
The primary goal of learning/machine learning is producing some learning algorithm with practical value. In the literature and research, most of the time machine learning is referred to from the perspective of applications and it is more bound by methods. The goals of ML are described as development and enhancement of computer algorithms and models to meet the decision-making requirements in practical scenarios. Interestingly, it did achieve the set goal in many applications. Right from washing machines and microwave ovens to the automated landing of aircraft, machine learning is playing a major role in all modern applications and appliances. The era of machine learning has introduced methods from simple data analysis and pattern matching to fuzzy logic and inferencing.
In machine learning, most of the inferencing is data driven. The sources of data are limited and many times there is difficulty in identifying the useful data. It may be possible that the source contains large piles of data and that the data contain important relationships and correlations among them. Machine learning can extract these relationships, which is an area of data mining applications. The goal of machine learning is to facilitate in building intelligent systems (IS) that can be used in solving real-life problems.
The computational power of the computing engine, the sophistication and elegance of algorithms, the amount and quality of information and values, and the efficiency and reliability of the system architecture determine the amount of intelligence. The amount of intelligence can grow through algorithm development, learning, and evolution. Intelligence is the product of natural selection, wherein more successful behavior is passed on to succeeding generations of intelligent systems and less successful behavior dies out. This intelligence helps humans and intelligent systems to learn.
In supervised learning we learn from different scenarios and expected outcomes presented as a learning material. The purpose is that if we come across a similar scenario in the future we should be in position to make appropriate or rather the best possible decisions. This is possible if we can classify a new scenario to one of the known classes or known scenarios. Enabling to classify the new scenario allows us to select an appropriate action. Learning is possible by imitation, memorization, mapping, and inference. Furthermore, induction, deduction, and example-based and observation-based learning are some other ways in which learning is possible.
Learning is driven by objective and governed by certain performance elements and their components. The clarity about the performance elements and their components, available feedback to learn the behavior of these components, and the representation of these components are necessary for learning. The agents need to learn, and components of these agents should be able to map and determine actions, extract and infer about the information related to the environment, and set goals that describe classes of states. The desired actions with reference to value or state help the system to learn. The learning takes place based on feedbacks. These feedbacks come in the form of penalties or rewards.
An empirical learning method has three different approaches to modeling problems based on observation, data, and partial knowledge about problem domains. These approaches are more specific to problem domains. They are
Each of these models has their own pros and cons. They are best suited for different application areas depending on training samples and prior knowledge. Generally, learning model suitability depends on the problem scenario and available knowledge and decision complexities.
In a generative modeling approach, statistics provide a formal method for determining nondeterministic models by estimating joint probability over variables of problem domain. Bayesian networks are used to capture dependencies among domain variables as well as distributions among them. This partial domain knowledge combined with observations enhances the probability density function. Generative density function is then used to generate samples of different configurations of the system and to draw an inference on an unknown situation. Traditional rule-based expert systems are giving way to statistical generative approaches due to visualization of interdependencies among variables that yields better prediction than heuristic approaches. Natural language processing, speech recognition, and topic modeling among different speakers are some of the application areas of generative modeling. This probabilistic approach of learning can be used in computer vision, motion tracking, object recognition, face recognition, and so on. In a nutshell, learning with generative modeling can be applied in the domains of perception, temporal modeling, and autonomous agents. This model tries to represent and model interdependencies in order to lead to better predictions.
A discriminative approach models posterior probability or discriminant functions with less domain-specific or prior knowledge. This technique directly optimizes target task-related criteria. For example, a Support Vector Machine maximizes the margin of a hyperplane between two sets of variables in n dimensions. This approach can be widely used for document classification, character recognition, and other numerous areas where interdependency among problem variables does not play any role or play the minimum role in observation variables. Thus, prediction is not influenced by inherent problem structure and also by domain knowledge. This approach may not be very effective in the case of very high level of interdependencies.
The third approach is imitative learning. Autonomous agents, which exhibit interactive behavior, are trained through an imitative learning model. The objective of imitative leaning is to learn an agent's behavior by providing a real example of agents' interaction with the world and generalizing it. The two components of this learning model, passively perceiving real-world behavior and learning from it, are depicted in Figure 1.4. Interactive agents perceive the environment using a generative model to regenerate/synthesize virtual characters/interaction and use a discriminative approach on temporal learning to focus on the prediction task necessary for action selection. An agent tries to imitate real-world situations with intelligence so that if exact behavior is not available in a learned hypothesis, the agent can still take some action based on synthesis. Occurrence of imitative and observational learning can be used in contingence with reinforcement learning. The imitative response can be the action for the rewards in reinforcement learning.
Figure 1.4 Reinforcement and imitative learning.
Figure 1.4 depicts the imitative learning with reference to a demonstrator and environment. The demonstration is rather action or a series of actions from which an observer learns. Environment refers to the environment of the observer. The learning takes place based on imitation and observation of demonstration while knowledge base and environment help in inferring different facts to complete the learning. Imitative learning can be extended to imitative reinforcement learning where imitation is based on previous knowledge learning and the rewards are compared with pure imitative response.
Learning based on experience need to have input and outcome of experience to be measured. For any action there is some outcome. The outcome leads to some sort of amendment in your action. Learning can be data-based, event-based, pattern-based, and system-based. There are advantages and disadvantages of each of these paradigms of learning. Knowledge building and learning is a continuous process, and we would like systems to reuse creatively and intelligently what is learned selectively in order to achieve the goal state.
Interestingly, when a kid is learning to walk, it is using all types of learning simultaneously. It has some supervised learning in the form of parents guiding it, some unsupervised learning based on new data points it is coming across, inference for some similar scenarios, and feedback from environment. Learning results from labeled as well as unlabeled data, and it takes place simultaneously. In fact a kid is using all the learning methods and much more than that. A kid not only uses available knowledge and context but also infers information that cannot be derived directly from the available data. Kids use all these methods selectively, together, and based on need and appropriateness. The learning by kids results from their close interactions with environment. While making systems learn from experiences, we need to take into account all these facts. Furthermore, it is more about paradigms rather than methods used for learning. This book is about making a system intelligent with focus on reinforcement learning. Reinforcement learning tries to strike balance between exploitation and exploration. Furthermore, it takes place with interaction with environment. Rewards from environment and then cumulative value drive the overall actions. Figure 1.5 depicts the process of how a kid learns. Kids get many inputs from their parents, society, school, and experiences. They perform actions, and for actions they obtain rewards from these sources and environment.
Figure 1.5 Kid learning model.
The learning paradigm kept changing over the years. The concept of intelligence changed, and even paradigm of learning and knowledge acquisition changed. Paradigm is (in the philosophy of science) a very general conception of the nature of scientific endeavor within which a given enquiry is undertaken. The learning as per Peter Senge is the acquiring of information and knowledge that can empower us to get what we would like to get out of life [1].
In machine learning if we go through the history, learning is initially assumed more as memorization and getting or reproducing one of the memorized facts that is appropriate when required. This paradigm can be called a data-centric paradigm. In fact this paradigm does exist in machine learning even today and is being used to great extent in all intelligent programs. Take the example of a simple program of retrieving the age of employees. A simple database with names and age is maintained; and when the name of any employee is given, the program can retrieve the age of the given employee. There are many such database-centric applications demonstrating data centric intelligence. But slowly the expectations from intelligent systems started increasing. As per the Turing test of intelligence, an intelligent system is one that can behave like a human, or it is difficult to make out whether a response is coming from a machine or a human.
The learning is interdisciplinary and deals with many aspects from psychology, statistics, mathematics, and neurology. Interestingly, all human behaviors could not correspond to intelligence and hence there are some areas where a computer can behave or respond in a better way. The Turing test is applicable to intelligent behavior of computers. There are some intelligent activities that humans do not do or that, machines can do in a better way than humans.
Reinforcement learning is making systems get the best of both worlds in the best possible way. But since the systemic behaviors of activities and decision making makes it necessary to understand the system behavior and components for effective decision making, traditional paradigms of machine learning may not exhibit the required intelligent behavior in complex systems. Every activity, action, and decision has some systemic impact. Furthermore, any event may result from some other event or series of events from a systemic perspective. These relationships are complex and difficult to understand. Systemic machine learning is more about exploitation, exploration from systemic perspective to build knowledge to get what we expect from the system. Learning from experience is the most important part of it. With more and more experience the behavior is expected to improve.
Two aspects of learning include learning for predictable environment behavior and learning for nonpredictable environment behavior. As we expect systems and machines to behave intelligently even in a nonpredictive environment, we need to look at learning paradigms and models from the perspective of new expectations. These expectations make it necessary to learn continuously and from various sources of information.
Representing and adapting knowledge for these systems and using them effectively is a necessary part of it. Another important aspect of learning is context: the intelligence and decision making should make effective use of context. In the absence of context, deriving the meaning of data is difficult. Further decisions may differ as per the context. Context is very systemic in nature. Context talks more about the scenario—that is, circumstances and facts surrounding the event. In absence of the facts and circumstances of related data, decision making becomes a difficult task. The context covers various aspects of environment and system such as environmental parameters, interactions with other systems and subsystems, various parameters, and so on. A doctor asks patients a number of questions. The information given by a patient along with the information with the doctor about epidemic and other recent health issues and outcome of conducted medical tests builds context for him/her. A doctor uses this context to diagnose.
The intelligence is not isolated and needs to use information from the environment for decision making as well as learning. The learning agents get feedback in the form of reward/penalty for their every action. They are supposed to learn from experience. To learn, there is a need to acquire more and more information. In real-life scenarios the agents cannot view anything and everything. There are fully observable environments and partially observable environments. Practically all environments are partially observable unless specific constraints are posed for some focused goal. The limited view limits the learning and decision-making abilities. The concept of integrating information is used very effectively in intelligent systems—the learning paradigm is confined by data-centric approaches. The context considered in the past research was more data centric and was never at a center of the activity.
There are tons of nonlinear and complex problems still waiting for the solutions. Ranging from automated car drivers to next level security systems. These problems look solvable—but the methods, solutions, and available information are just not enough to provide a graceful solution.
The main objective in solving a machine-learning problem is to produce intelligent programs or intelligent agents through the process of learning and adapting to changed environment. Reinforcement learning is one such machine-learning process. In this approach, learners or software agents learn from direct interaction with environment. This mimics the way human being learns. The agent can also learn even if complete model or information about environment is not available. An agent gets feedback about its actions as reward or punishment. During a learning process, these situations are mapped to actions in an environment. Reinforcement learning algorithms maximize rewards received during interactions with environment and establish the mapping of states to actions as a decision-making policy. The policy can be decided once or it can also adapt with changes in environment.
Reinforcement learning is different from supervised learning—the most widely used kind of learning. Supervised learning is learning from examples provided by a knowledgeable external supervisor. It is a method for training a parameterized function approximator. But it is not adequate for learning from interaction. It is more like learning from external guidance, and the guidance sits out of the environment or situation. In interactive problems it is often impractical to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act. In uncharted territory, where one would expect learning to be most beneficial, an agent must be able to learn from its own experience and from environment also. Thus, reinforcement learning combines the field of dynamic programming and supervised learning to generate a machine-learning system, which is very close to approaches used by human learning.
One of the challenges that arise in reinforcement learning and not in other kinds of learning is the trade-off between exploration and exploitation. To obtain a lot of reward, a reinforcement-learning agent must prefer actions that it has tried in the past and found to be effective in producing reward. But to discover such actions, it has to try actions that it has not selected before. The agent has to exploit what it already knows in order to obtain reward, but it also has to explore in order to make better action selections in the future. The dilemma is that neither exploration nor exploitation can be pursued exclusively without failing at the task. The agent must try a variety of actions and progressively favor those that appear to be best. On a stochastic task, each action must be tried many times to gain a reliable estimate of its expected reward. The entire issue of balancing exploration and exploitation does not arise in supervised learning, as it is usually defined. Furthermore, supervised learning never looks into exploration, and the responsibility of exploration is given to experts.
Another key feature of reinforcement learning is that it explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment. This is in contrast with many approaches that consider subproblems without addressing how they might fit into a larger picture. For example, we have mentioned that much of machine-learning research is concerned with supervised learning without explicitly specifying how such ability would finally be useful. Other researchers have developed theories of planning with general goals, but without considering planning's role in real-time decision making nor considering the question of where the predictive models necessary for planning would come from. Although these approaches have yielded many useful results, their focus on isolated subproblems is a significant limitation. These limitations come from the inability to interact in real-time scenarios and the absence of active learning.
Reinforcement learning differs from the more widely studied problem of supervised learning in several ways. The most important difference is that there is no presentation of input–output pairs. Instead, after choosing an action the agent is told the immediate reward and the subsequent state, but is not told which action would have been in its best long-term interests. It is necessary for the agent to gather useful experience about the possible system states, actions, transitions, and rewards actively to act optimally. Another difference from supervised learning is that online performance is important; the evaluation of the system is often concurrent with learning.
