39,59 €
Virtual Assistants, such as Alexa and Siri, process our requests, Google's cars have started to read addresses, and Amazon's prices and Netflix's recommended videos are decided by AI. Artificial Intelligence is one of the most exciting technologies and is becoming increasingly significant in the modern world.
Hands-On Artificial Intelligence for Beginners will teach you what Artificial Intelligence is and how to design and build intelligent applications. This book will teach you to harness packages such as TensorFlow in order to create powerful AI systems. You will begin with reviewing the recent changes in AI and learning how artificial neural networks (ANNs) have enabled more intelligent AI. You'll explore feedforward, recurrent, convolutional, and generative neural networks (FFNNs, RNNs, CNNs, and GNNs), as well as reinforcement learning methods. In the concluding chapters, you'll learn how to implement these methods for a variety of tasks, such as generating text for chatbots, and playing board and video games.
By the end of this book, you will be able to understand exactly what you need to consider when optimizing ANNs and how to deploy and maintain AI applications.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 397
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Veena NaikAcquisition Editor: Namrata PatilContent Development Editor: Snehal KolteTechnical Editor: Chintan Thakkar, Dinesh ChaudharyCopy Editor: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Priyanaka DhadkeGraphics: Jisha ChirayilProduction Coordinator: Aparna Bhagat
First published: October 2018
Production reference: 1301018
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78899-106-3
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Patrick D. Smith is the Data Science Lead for Excella in Arlington, Virginia, where he founded the data science and machine learning team. Prior to Excella, Patrick was the lead instructor for the data science program at General Assembly in Washington, DC, as well as a data scientist with Booz Allen Hamilton's Strategic Innovations Group.
He holds a bachelor's degree from The George Washington University in International Economics, and is currently a part-time masters student in software engineering at Harvard University.
David Dindi received a M.Sc. and a B.Sc. in chemical engineering with a focus on artificial intelligence from Stanford University. While at Stanford, David developed deep learning frameworks for predicting patient-specific adverse reactions to drugs at the Stanford Center for Biomedical Informatics. He currently advises a number of early stage start-ups in Silicon Valley and in New York.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Hands-On Artificial Intelligence for Beginners
Packt Upsell
Why subscribe?
Packt.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Reviews
The History of AI
The beginnings of AI –1950–1974
Rebirth –1980–1987
The modern era takes hold – 1997-2005
Deep learning and the future – 2012-Present
Summary
Machine Learning Basics
Technical requirements
Applied math basics
The building blocks – scalars, vectors, matrices, and tensors
Scalars
Vectors
Matrices
Tensors
Matrix math
Scalar operations
Element–wise operations
Basic statistics and probability theory
The probability space and general theory
Probability distributions
Probability mass functions
Probability density functions
Conditional and joint probability
Chain rule for joint probability
Bayes' rule for conditional probability
Constructing basic machine learning algorithms
Supervised learning algorithms
Random forests
Unsupervised learning algorithms
Basic tuning
Overfitting and underfitting
K-fold cross-validation
Hyperparameter optimization
Summary
Platforms and Other Essentials
Technical requirements
TensorFlow, PyTorch, and Keras
TensorFlow
Basic building blocks
The TensorFlow graph
PyTorch
Basic building blocks
The PyTorch graph
Keras
Basic building blocks
Wrapping up
Cloud computing essentials
AWS basics
EC2 and virtual machines
S3 Storage
AWS Sagemaker
Google Cloud Platform basics
GCP cloud storage
GCP Cloud ML Engine
CPUs, GPUs, and other compute frameworks
Installing GPU libraries and drivers
With Linux (Ubuntu)
With Windows
Basic GPU operations
The future – TPUs and more
Summary
Your First Artificial Neural Networks
Technical requirements
Network building blocks
Network layers
Naming and sizing neural networks
Setting up network parameters in our MNIST example
Activation functions
Historically popular activation functions
Modern approaches to activation functions
Weights and bias factors
Utilizing weights and biases in our MNIST example
Loss functions
Using a loss function for simple regression
Using cross-entropy for binary classification problems
Defining a loss function in our MNIST example
Stochastic gradient descent
Learning rates
Utilizing the Adam optimizer in our MNIST example
Regularization
The training process
Putting it all together
Forward propagation
Backpropagation
Forwardprop and backprop with MNIST
Managing a TensorFlow model
Saving model checkpoints
Summary
Convolutional Neural Networks
Overview of CNNs
Convolutional layers
Layer parameters and structure
Pooling layers
Fully connected layers
The training process
CNNs for image tagging
Summary
Recurrent Neural Networks
Technical requirements
The building blocks of RNNs
Basic structure
Vanilla recurrent neural networks
One-to-many
Many-to-one
Many-to-many
Backpropagation through time
Memory units – LSTMs and GRUs
LSTM
GRUs
Sequence processing with RNNs
Neural machine translation
Attention mechanisms
Generating image captions
Extensions of RNNs
Bidirectional RNNs
Neural turing machines
Summary
Generative Models
Technical requirements
Getting to AI – generative models
Autoencoders
Network architecture
Building an autoencoder
Variational autoencoders
Structure
Encoder
Decoder
Training and optimizing VAEs
Utilizing a VAE
Generative adversarial networks
Discriminator network
Generator network
Training GANs
Other forms of generative models
Fully visible belief nets
Hidden Markov models
Boltzmann machines
Summary
References
Reinforcement Learning
Technical requirements
Principles of reinforcement learning
Markov processes
Rewards
Policies
Value functions
The Bellman equation
Q–learning
Policy optimization
Extensions on policy optimization
Summary
Deep Learning for Intelligent Agents
Technical requirements
Word embeddings
Word2vec
Training Word2vec models
GloVe
Constructing a basic agent
Summary
Deep Learning for Game Playing
Technical requirements
Introduction
Networks for board games
Understanding game trees
AlphaGo and intelligent game-playing AIs
AlphaGo policy network
AlphaGo value network
AlphaGo in action
Networks for video games
Constructing a Deep Q–network
Utilizing a target network
Experience replay buffer
Choosing action
Training methods
Training the network
Running the network
Summary
Deep Learning for Finance
Requirements
Introduction to AI in finance
Deep learning in trading
Building a trading platform
Basic trading functions
Creating an artificial trader
Managing market data
Price prediction utilizing LSTMs
Backtesting your algorithm
Event-driven trading platforms
Gathering stock price data
Generating word embeddings
Neural Tensor Networks for event embeddings
Predicting events with a convolutional neural network
Deep learning in asset management
Summary
Deep Learning for Robotics
Technical requirements
Introduction
Setting up your environment
MuJoCo physics engine
Downloading the MuJoCo binary files
Signing up for a free trial of MuJoCo
Configuring your MuJoCo files
Installing the MuJoCo Python package
Setting up a deep deterministic policy gradients model
Experience replay buffer
Hindsight experience replay
The actor–critic network
The actor
The critic
Deep Deterministic Policy Gradients
Implementation of DDPG
Summary
References
Deploying and Maintaining AI Applications
Technical requirements
Introduction
Deploying your applications
Deploying models with TensorFlow Serving
Utilizing docker
Building a TensorFlow client
Training and deploying with the Google Cloud Platform
Training on GCP
Deploying for online learning on GCP
Using an API to Predict
Scaling your applications
Scaling out with distributed TensorFlow
Testing and maintaining your applications
Testing deep learning algorithms
Summary
References
Other Books You May Enjoy
Leave a review - let other readers know what you think
Virtual assistants such as Alexa and Siri process our requests, Google's cars have started to read addresses, and Amazon's prices and Netflix's recommended videos are decided by AI. AI is one of the most exciting technologies, and is becoming increasingly significant in the modern world.
Hands-On Artificial Intelligence for Beginners will teach you what AI is and how to design and build intelligent applications. This book will teach you to harness packages such as TensorFlow to create powerful AI systems. You will begin by reviewing the recent changes in AI and learning how artificial neural networks (ANNs) have enabled more intelligent AI. You'll explore feedforward, recurrent, convolutional, and generative neural networks (FFNNs, RNNs, CNNs, and GNNs), as well as reinforcement learning methods. In the concluding chapters, you'll learn how to implement these methods for a variety of tasks, such as generating text for chatbots, directing self-driving cars, and playing board and video games.
By the end of this book, you will be able to understand exactly what you need to consider when optimizing ANNs and how to deploy and maintain AI applications.
This book is designed for beginners in AI, aspiring AI developers, and machine learning enthusiasts with an interest in leveraging various algorithms to build powerful AI applications.
Chapter 1, The History of AI, begins by discussing the mathematical basis of AI and how certain theorems evolved. Then, we'll look at the research done in the 1980s and 90s to improve ANNs, we'll look at the AI winter, and we'll finish off with how we arrived at where we are today.
Chapter 2, Machine Learning Basics, introduces the fundamentals of machine learning and AI. Here, we will cover essential probability theory, linear algebra, and other elements that will lay the groundwork for the future chapters.
Chapter 3, Platforms and Other Essentials, introduces the deep learning libraries of Keras and TensorFlow and moves onto an introduction of basic AWS terminology and concepts that are useful for deploying your networks in production. We'll also introduce CPUs and GPUs, as well as other forms of compute architecture that you should be familiar with when building deep learning solutions.
Chapter 4, Your First Artificial Neural Networks, explains how to build our first artificial neural network. Then, we will learn ability of the core elements of ANNs and construct a simple single layer network both in Keras and TensorFlow so that you understand how the two languages work. With this simple network, we will do a basic classification task, such as the MNIST OCR task.
Chapter 5, Convolutional Neural Networks, introduces the convolutional neural network and explains its inner workings. We'll touch upon the basic building blocks of convolutions, pooling layers, and other elements. Lastly, we'll construct a Convolutional Neural Network for image tagging.
Chapter 6, Recurrent Neural Networks, introduces one of the workhorses of deep learning and AI—the recurrent neural network. We'll first introduce the conceptual underpinnings of recurrent neural networks, with a specific focus on utilizing them for natural language processing tasks. We'll show how one can generate text utilizing you of these networks and see how they can be utilized for predictive financial models.
Chapter 7, Generative Models, covers generative models primarily through the lens of GANs, and we'll look at how we can accomplish each of the above tasks with GANs.
Chapter 8, Reinforcement Learning, introduces additional forms of neural networks. First, we'll take a look at autoencoders, which are unsupervised learning algorithms that help us recreate inputs when we don't have access to input data. Afterwards, we'll touch upon other forms of networks, such as the emerging geodesic neural networks.
Chapter 9, Deep Learning for Intelligent Assistant, focuses on utilizing our knowledge of various forms of neural networks from the previous section to make an intelligent assistant, along the lines of Amazon's Alexa or Apple's Siri. We'll learn about and utilize word embeddings, recurrent neural networks, and decoders.
Chapter 10, Deep Learning for Game Playing, explains how to construct game-playing algorithms with reinforcement learning. We'll look at several different forms of games, from simple Atari-style games to more advanced board games. We'll touch upon the methods that Google Brain utilized to build AlphaGo.
Chapter 11, Deep Learning for Finance, shows how to create an advanced market prediction system in TensorFlow utilizing RNNs.
Chapter 12,Deep Learning for Robotics, uses deep learning to teach a robot to move objects. We will first train the neural network in simulated environments and then move on to real mechanical parts with images acquired from a camera.
Chapter 13, Scale, Deploy and Maintain AI Application, introduces methods for creating and scaling training pipelines and deployment architectures for AI systems.
The codes in the chapter can be directly executed using Jupyter and Python. The code files for the book are present in the GitHub link provided in the following sections.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Hands-On-Artificial-Intelligence-for-Beginners. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book,mention the book title in the subject of your message and email us [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visitwww.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us [email protected] a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visitauthors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
The term Artificial Intelligence (AI) carries a great deal of weight. AI has benefited from over 70 years of research and development. The history of AI is varied and winding, but one ground truth remains – tireless researchers have worked through funding growths and lapses, promise and doubt, to push us toward achieving ever more realistic AI.
Before we begin, let's weed through the buzzwords and marketing and establish what AI really is. For the purposes of this book, we will rely on this definition:
AI is a system or algorithm that allows computers to perform tasks without explicitly being programmed to do so.
AI is an interdisciplinary field. While we'll focus largely on utilizing deep learning in this book, the field also encompasses elements of robotics and IoT, and has a strong overlap (if it hasn't consumed it yet) with generalized natural language processing research. It's also intrinsically linked with fields such as Human-Computer Interaction (HCI) as it becomes increasingly important to integrate AI with our lives and the modern world around us.
AI goes through waves, and is bound to go through another (perhaps smaller) wave in the future. Each time, we push the limits of AI with the computational power that is available to us, and research and development stops. This day and age may be different, as we benefit from the confluence of increasingly large and efficient data stores, rapid fast and cheap computing power, and the funding of some of the most profitable companies in the world. To understand how we ended up here, let's start at the beginning.
In this chapter, we will cover the following topics:
The beginnings of AI
–
1950–1974
Rebirth – 1980–1987
The modern era takes hold
–
1997
–
2005
Deep learning and the future
–
2012–Present
Since some of the earliest mathematicians and thinkers, AI has been a long sought after concept. The ancient Greeks developed myths of the automata, a form of robot that would complete tasks for the Gods that they considered menial, and throughout early history thinkers pondered what it meant to human, and if the notion of human intelligence could be replicated. While it's impossible to pinpoint an exact beginning for AI as a field of research, its development parallels the early advances of computer science. One could argue that computer science as a field developed out of this early desire to create self-thinking machines.
During the second world war, British mathematician and code breaker Alan Turing developed some of the first computers, conceived with the vision of AI in mind. Turing wanted to create a machine that would mimic human comprehension, utilizing all available information to reason and make decisions. In 1950, he published Computing Machinery and Intelligence, which introduced what we now call the Turing test of AI. The Turing test, which is a benchmark by which to measure the aptitude of a machine to mimic human interaction, states that to pass the test, the machine must be able to sufficiently fool a discerning judge as to if it is a human or not. This might sound simple, but think about how many complex items would have to be conquered to reach this point. The machine would be able to comprehend, store information on, and respond to natural language, all the while remembering knowledge and responding to situations with what we deem common sense.
Turing could not move far beyond his initial developments; in his day, utilizing a computer for research cost almost $200,000 per month and computers could not store commands. His research and devotion to the field, however, has earned him accolades. Today, he is widely considered the father of AI and the academic study of computer science.
It was in the summer of 1956, however, that the field was truly born. Just a few months before, researchers at the RAND Corporation developed the Logic Theorist – considered the world's first AI program – which proved 38 theorems of the Principia Mathematica. Spurred on by this development and others, John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon hosted the now famous Dartmouth Summer Research Project on AI, coining the term Artificial Intelligence itself and providing the groundwork for the field. With funding from the Rockefeller Foundation, these four friends brought together some of the most preeminent researchers in AI over the course of the summer to brainstorm and effectively attempt to provide a roadmap for the field. They came from the institutions and companies that were on the leading edge of the computing revolution at the time; Harvard, Dartmouth, MIT, IBM, Bell Labs, and the RAND Corporation. Their topics of discussion were fairly forward-thinking for the time – they could have easily been those of an AI conference today—Artificial Neural Networks (ANN), natural language processing (NLP), theories of computation, and general computing frameworks. The Summer Research Project was seminal in creating the field of AI as we know it today, and many of its discussion topics spurned the growth of AI research and development through the 1950s and 1960s.
After 1956, innovation kept up a rapid pace. Years later, in 1958, a researcher at the Cornell Aeronautical Laboratory named Frank Rosenblatt invented one of the founding algorithms of AI, the Perceptron. The following diagram shows the Perceptron algorithm:
Perceptrons are simple, single-layer networks that work as linear classifiers. They consist of four main architectural aspects which are mentioned as follows:
The input layer
: The initial layer for reading in data
Weight and biases vectors
: Weights help learn appropriate values during training for the connections between neurons, while biases help shift the activation function to fit the desired output
A summation function
: A simple summation of the input
An activation function
: A simple mapping of the summed weighted input to the output
As you can see, these networks use basic mathematics to perform basic mathematical operations. They failed to live up to the hype, however, and significantly contributed to the first AI winter because of the vast disappointment they created.
Another important development of this early era of research was adaline. As you can see, adaline attempted to improve upon the perceptron by utilizing continuous predicted values to learn the coefficients, unlike the perceptron, which utilizes class labels. The following diagram shows the adaline algorithm:
These golden years also brought us early advances such as the student program that solved high school algebra programs and the ELIZA Chatbot. By 1963, the advances in the field convinced the newly formed Advanced Research Projects Agency (DARPA) to begin funding AI research at MIT.
By the late 1960s, funding in the US and the UK began to dry up. In 1969, a book named Perceptrons by MIT's Marvin Minsky and Seymour Papert (https://archive.org/details/Perceptrons) proved that these networks could only mathematically compute extremely basic functions. In fact, they went so far as to suggest that Rosenblatt had greatly exaggerated his findings and the importance of the perceptron. Perceptrons were of limited functionality to the field, effectively halting research in network structures.
With both governments releasing reports that significantly criticized the usefulness of AI, the field was shuttled into what has become known as the AI winter. AI research continued throughout the late 1960s and 1970s, mostly under different terminology. The terms machine learning, knowledge-based system, and pattern recognition all come from this period, when researchers had to think up creative names for their work in order to receive funding. Around this time, however, a student at the University of Cambridge named Geoffrey Hinton began exploring ANNs and how we could utilize them to mimic the brain's memory functions. We'll talk a lot more about Hinton in the following sections and throughout this book, as he has become one of the most important figures in AI today.
The 1980s saw the birth of deep learning, the brain of AI that has become the focus of most modern AI research. With the revival of neural network research by John Hopfield and David Rumelhart, and several funding initiatives in Japan, the United States, and the United Kingdom, AI research was back on track.
In the early 1980s, while the United States was still toiling from the effects of the AI Winter, Japan was funding the fifth generation computer system project to advance AI research. In the US, DARPA once again ramped up funding for AI research, with business regaining interest in AI applications. IBM's T.J. Watson Research Center published a statistical approach to language translation (https://aclanthology.info/pdf/J/J90/J90-2002.pdf), which replaced traditional rule-based NLP models with probabilistic models, the shepherding in the modern era of NLP.
Hinton, the student from the University of Cambridge who persisted in his research, would make a name for himself by coining the term deep learning. He joined forces with Rumelhart to become one of the first researchers to introduce the backpropagation algorithm for training ANNs, which is the backbone of all of modern deep learning. Hinton, like many others before him, was limited by computational power, and it would take another 26 years before the weight of his discovery was really felt.
By the late 1980s, the personal computing revolution and missed expectations threatened the field. Commercial development all but came to a halt, as mainframe computer manufacturers stopped producing hardware that could handle AI-oriented languages, and AI-oriented mainframe manufacturers went bankrupt. It had seemed as if all had come to a standstill.
AI further entered the public discourse in 1997 when IBM's Deep Blue system beat world champion chess grandmaster Garry Kasparov. Within a year, a former student of Geoffrey Hinton's, Yann LeCun, developed the Convolutional Neural Network at Bell Labs, which was enabled by the backpropagation algorithm and years of research into computer vision tasks. Hochreiter and Schmidhuber invented the first memory unit, the long short-term memory unit (LSTM), which is still used today for sequence modeling.
ANNs still had a way to go. Computing and storage limitations prevented these networks from scaling, and other methods such as support vector machines (SVMs) were developed as alternatives.
AI has made further strides in the past several years than in the 60-odd years since its birth. Its popularity has further been fueled by the increasingly public nature of its benefits – self-driving cars, personal assistants, and its ever-ubiquitous use in social media and advertising. For most of its history, AI was a field with little interaction with the average populace, but now it's come to the forefront of international discourse.
Today's age of AI has been the result of three trends:
The increasing amount of data and computing power available to AI researchers and practitioners
Ongoing research by Geoffrey Hinton and his lab at the University of Toronto into deep neural networks
Increasingly public applications of AI that have driven adoption and further acceptance into mainstream technology culture
Today, companies, governments, and other organizations have benefited from the big data revolution of the mid 2000s, which has brought us a plethora of data stores. At last, AI applications have the requisite data to train. Computational power is cheap and only getting cheaper.
On the research front, in 2012, Hinton and two of his students were finally able to show that deep neural networks were able to outperform all other methods in image recognition in the large-scale visual recognition challenge. The modern era of AI was born.
Interestingly enough, Hinton's team's work on computer vision also introduced the idea of utilizing Graphics Processing Units (GPUs) to train deep networks. It also introduced dropout and ReLu, which have become cornerstones of deep learning. We'll discuss these in the coming chapters. Today, Hinton is the most cited AI researcher on the planet. He is a lead data scientist at Google Brain and has been tied to many major developments in AI in the modern era.
AI was further thrown into the public sphere when, in 2011, IBM Watson defeated the world Jeopardy champions, and in 2016 Google's AlphaGo defeated the world grand champion at one of the most challenging games known to man: Go.
Today, we are closer than ever to having machines that can pass the Turing test. Networks are able to generate ever more realistic sounding imitations of speeches, images, and writing. Reinforcement learning methods and Ian Goodfellow's GANs have made incredible strides. Recently, there has been emerging research that is working to demystify the inner workings of deep neural networks. As the field progresses, however, we should all be mindful of overpromising. For most of its history, companies have often overpromised regarding what AI can do, and in turn, we've seen a consistent disappointment in its abilities. Focusing the abilities of AI on only certain applications, and continuing to view research in the field from a biological perspective, will only hurt its advancement going forward. In this book, however, we'll see that today's practical applications are directed and realistic, and that the field is making more strides toward true AI than ever before.
Since its beginnings in the 1940s and 1950s, AI has made great bounds. Many of the technologies and ideas that we are utilizing today are directly based on these early discoveries. Over the course of the latter half of the 20th century, pioneers such as Geoffrey Hinton have pushed AI forward through peaks and busts. Today, we are on track to achieve sustained AI development for the foreseeable future.
The development of AI technology has been closely aligned with the development of new hardware and increasingly large data sources. As we'll see throughout this book, great AI applications are built with data constraints and hardware optimization in mind. The next chapter will introduce you to the fundamentals of machine learning and AI. We will also cover probability theory, linear algebra, and other elements that will lay the groundwork for the future chapters.
Artificial Intelligence (AI) is rooted in mathematics and statistics. When creating an Artificial Neural Network (ANN), we're conducting mathematical operations on data represented in linear space; it is, by nature, applied mathematics and statistics. Machine learning algorithms are nothing but function approximations; they try and find a mapping between an input and a correct corresponding output. We use algebraic methods to create algorithms that learn these mappings.
Almost all machine learning can be expressed in a fairly straight-forward formula; bringing together a dataset and model, along with a loss function and optimization technique that are applicable to the dataset and model. This section is intended as a review of the basic mathematical tools and techniques that are essential to understanding what's under the hood in AI.
In this chapter, we'll review linear algebra and probability, and then move on to the construction of basic and fundamental machine learning algorithms and systems, before touching upon optimization techniques that can be used for all of your methods going forward. While we will utilize mathematical notation and expressions in this chapter and the following chapters, we will focus on translating each of these concepts into Python code. In general, Python is easier to read and comprehend than mathematical expressions, and allows readers to get off the ground quicker.
We will be covering the following topics in this chapter:
Applied math basics
Probability theory
Constructing basic machine learning algorithms
In this chapter, we will be working in Python 3 with the scikit-learn scientific computing package. You can install the package, you can run pip install sklearn in your terminal or command line.
In the following section, we'll introduce the fundamental types of linear algebra objects that are used throughout AI applications; scalars, vectors, matrices, and tensors.
The basic operations of an ANN are based on matrix math. In this section, we'll be reviewing the basic operations that you need to know to understand the mechanics of ANNs.
Probability, the mathematical method for modeling uncertain scenarios, underpins the algorithms that make AI intelligent, helping to tell us how our systems should reason. So, what is probability? We'll define it as follows:
Simply said, probability is the mathematical study of uncertainty. In this section, we'll cover the basics of probability space and probability distributions, as well as helpful tools for solving simple problems.
When probability is discussed, it's often referred to in terms of the probability of a certain event happening. Is it going to rain? Will the price of apples go up or down? In the context of machine learning, probabilities tell us the likelihood of events such as a comment being classified as positive vs. negative, or whether a fraudulent transaction will happen on a credit card. We measure probability by defining what we refer to as the probability space. A probability space is a measure of how and why of the probabilities of certain events. Probability spaces are defined by three characteristics:
The sample space, which tells us the possible outcomes or a situation
A defined set of events; such as two fraudulent credit card transactions
The measure of probability of each of these events
While probability spaces are a subject worthy of studying in their own right, for our own understanding, we'll stick to this basic definition.
In probability theory, the idea of independence is essential. Independence is a state where a random variable does not change based on the value of another random variable. This is an important assumption in deep learning, as non–independent features can often intertwine and affect the predictive power of our models.
In statistical terms, a collection of data about an event is a sample, which is drawn from a theoretical superset of data called a population that represents everything that is known about a grouping or event. For instance, if we were poll people on the street about whether they believe in Political View A or Political View B, we would be generating a random sample from the population, which would be entire population of the city, state, or country where we are polling.
Now let's say we wanted to use this sample to predict the likelihood of a person having one of the two political views, but we mostly polled people who were at an event supporting Political View A. In this case, we may have a biased sample. When sampling, it is important to take a random sample to decrease bias, otherwise any statistical analysis or modeling that we do with sample will be biased as well.
You've probably seen a chart such as the following one; it's showing us the values that appear in a dataset, and how many times those values appear. This is called a distribution of a variable. In this particular case, we're displaying the distribution with the help of a histogram, which shows the frequency of the variables:
In this section, we're interested in a particular type of distribution, called a probabilitydistribution. When we talk about probability distributions, we're talking about the likelihood of a random variable taking on a certain value, and we create one by dividing the frequencies in the preceding histogram by the total number of samples in the distribution, in a process called normalization. There are two primary forms of probability distributions: probability mass functions for discrete variables, and probability density functions for continuous variables, as well as; cumulative distribution functions, which apply to any random variables, also exist.
Probability mass functions (PMFs)are discrete distributions. The random variables of the distribution can take on a finite number of values:
PMFs look a bit different from our typical view of a distribution, and that is because of their finite nature.
Probability density functions (PDFs) are continuous distributions; values from the distribution can take on infinitely many values. For example, take the image as follows:
You've probably seen something like this before; it's a probability density function of a standard normal, or Gaussian distribution.
Conditional probability is the probability that x happens, given that y happens. It's one of the key tools for reasoning about uncertainty in probability theory. Let's say we are talking about your winning the lottery, given that it's a sunny day. Maybe you're feeling lucky! How would we write that in a probability statement? It would be the probability of your lottery win, A, given the probability of it being sunny, B, so P(A|B).
Joint probability is the probability of two things happening simultaneously: what is the probability of you winning the lottery and it being a sunny day?
Joint probability is important in the AI space; it's what underlies the mechanics of generative models
