E-Book
39,59 €

Hands-On Artificial Intelligence for Beginners E-Book

Patrick D. Smith

0,0

39,59 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Fachliteratur
Sprache: Englisch

Beschreibung

Virtual Assistants, such as Alexa and Siri, process our requests, Google's cars have started to read addresses, and Amazon's prices and Netflix's recommended videos are decided by AI. Artificial Intelligence is one of the most exciting technologies and is becoming increasingly significant in the modern world.
Hands-On Artificial Intelligence for Beginners will teach you what Artificial Intelligence is and how to design and build intelligent applications. This book will teach you to harness packages such as TensorFlow in order to create powerful AI systems. You will begin with reviewing the recent changes in AI and learning how artificial neural networks (ANNs) have enabled more intelligent AI. You'll explore feedforward, recurrent, convolutional, and generative neural networks (FFNNs, RNNs, CNNs, and GNNs), as well as reinforcement learning methods. In the concluding chapters, you'll learn how to implement these methods for a variety of tasks, such as generating text for chatbots, and playing board and video games.
By the end of this book, you will be able to understand exactly what you need to consider when optimizing ANNs and how to deploy and maintain AI applications.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 397

Veröffentlichungsjahr: 2018

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Denke (nach) und werde reich

Napoleon Hill

30 Minuten Resilienz

Ulrich Siegrist

Krebszellen mögen keine Himbeeren - Der große Bestseller - Vollständig überarbeitet und aktualisiert

Richard Béliveau

Die Hormonrevolution

Michael E Platt

Der Crash ist die Lösung

Matthias Weik

Günter, der innere Schweinehund, lernt verkaufen

Stefan Frädrich

Mission erfüllt

Owen Mark

Die Leber wächst mit ihren Aufgaben

Dr. med. Eckart von Hirschhausen

Macht, was ihr liebt!

Anja Förster

Der größte Raubzug der Geschichte

Matthias Weik

Unsere Hunde - gesund durch Homöopathie

Hans Günter Wolff

Die Jahrhundertlüge, die nur Insider kennen

Heiko Schrang

Organisation für Komplexität

Niels Pfläging

Radikal führen

Reinhard K. Sprenger

30 Minuten Sympathisch und souverän: So geht Vortragen!

Thomas Lorenz

BLACKOUT - Morgen ist es zu spät

Marc Elsberg

The Truth About Employee Engagement

Hands-On Artificial Intelligence for Beginners

An introduction to AI concepts, algorithms, and their implementation

Patrick D. Smith

BIRMINGHAM - MUMBAI

Hands-On Artificial Intelligence for Beginners

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Veena NaikAcquisition Editor: Namrata PatilContent Development Editor: Snehal KolteTechnical Editor: Chintan Thakkar, Dinesh ChaudharyCopy Editor: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Priyanaka DhadkeGraphics: Jisha ChirayilProduction Coordinator: Aparna Bhagat

First published: October 2018

Production reference: 1301018

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78899-106-3

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

Packt.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Patrick D. Smith is the Data Science Lead for Excella in Arlington, Virginia, where he founded the data science and machine learning team. Prior to Excella, Patrick was the lead instructor for the data science program at General Assembly in Washington, DC, as well as a data scientist with Booz Allen Hamilton's Strategic Innovations Group.

He holds a bachelor's degree from The George Washington University in International Economics, and is currently a part-time masters student in software engineering at Harvard University.

My journey into technology never would have been possible without my father, Curtis Griswold Smith, who was director of I.T. for one of the the world's first pioneering computer companies, Digital Equipment Corporation. It was he who introduced me to computing at three years old, and where my love of all technology stems from.

About the reviewer

David Dindi received a M.Sc. and a B.Sc. in chemical engineering with a focus on artificial intelligence from Stanford University. While at Stanford, David developed deep learning frameworks for predicting patient-specific adverse reactions to drugs at the Stanford Center for Biomedical Informatics. He currently advises a number of early stage start-ups in Silicon Valley and in New York.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Title Page

Hands-On Artificial Intelligence for Beginners

Packt Upsell

Why subscribe?

Packt.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Reviews

The History of AI

The beginnings of AI –1950–1974

Rebirth –1980–1987

The modern era takes hold – 1997-2005

Deep learning and the future – 2012-Present

Summary

Machine Learning Basics

Technical requirements

Applied math basics

The building blocks – scalars, vectors, matrices, and tensors

Scalars

Vectors

Matrices

Tensors

Matrix math

Scalar operations

Element–wise operations

Basic statistics and probability theory

The probability space and general theory

Probability distributions

Probability mass functions

Probability density functions

Conditional and joint probability

Chain rule for joint probability

Bayes' rule for conditional probability

Constructing basic machine learning algorithms

Supervised learning algorithms

Random forests

Unsupervised learning algorithms

Basic tuning

Overfitting and underfitting

K-fold cross-validation

Hyperparameter optimization

Summary

Platforms and Other Essentials

Technical requirements

TensorFlow, PyTorch, and Keras

TensorFlow

Basic building blocks

The TensorFlow graph

PyTorch

Basic building blocks

The PyTorch graph

Keras

Basic building blocks

Wrapping up

Cloud computing essentials

AWS basics

EC2 and virtual machines

S3 Storage

AWS Sagemaker

Google Cloud Platform basics

GCP cloud storage

GCP Cloud ML Engine

CPUs, GPUs, and other compute frameworks

Installing GPU libraries and drivers

With Linux (Ubuntu)

With Windows

Basic GPU operations

The future – TPUs and more

Summary

Your First Artificial Neural Networks

Technical requirements

Network building blocks

Network layers

Naming and sizing neural networks

Setting up network parameters in our MNIST example

Activation functions

Historically popular activation functions

Modern approaches to activation functions

Weights and bias factors

Utilizing weights and biases in our MNIST example

Loss functions

Using a loss function for simple regression

Using cross-entropy for binary classification problems

Defining a loss function in our MNIST example

Stochastic gradient descent

Learning rates

Utilizing the Adam optimizer in our MNIST example

Regularization

The training process

Putting it all together

Forward propagation

Backpropagation

Forwardprop and backprop with MNIST

Managing a TensorFlow model

Saving model checkpoints

Summary

Convolutional Neural Networks

Overview of CNNs

Convolutional layers

Layer parameters and structure

Pooling layers

Fully connected layers

The training process

CNNs for image tagging

Summary

Recurrent Neural Networks

Technical requirements

The building blocks of RNNs

Basic structure

Vanilla recurrent neural networks

One-to-many

Many-to-one

Many-to-many

Backpropagation through time

Memory units – LSTMs and GRUs

LSTM

GRUs

Sequence processing with RNNs

Neural machine translation

Attention mechanisms

Generating image captions

Extensions of RNNs

Bidirectional RNNs

Neural turing machines

Summary

Generative Models

Technical requirements

Getting to AI – generative models

Autoencoders

Network architecture

Building an autoencoder

Variational autoencoders

Structure

Encoder

Decoder

Training and optimizing VAEs

Utilizing a VAE

Generative adversarial networks

Discriminator network

Generator network

Training GANs

Other forms of generative models

Fully visible belief nets

Hidden Markov models

Boltzmann machines

Summary

References

Reinforcement Learning

Technical requirements

Principles of reinforcement learning

Markov processes

Rewards

Policies

Value functions

The Bellman equation

Q–learning

Policy optimization

Extensions on policy optimization

Summary

Deep Learning for Intelligent Agents

Technical requirements

Word embeddings

Word2vec

Training Word2vec models

GloVe

Constructing a basic agent

Summary

Deep Learning for Game Playing

Technical requirements

Introduction

Networks for board games

Understanding game trees

AlphaGo and intelligent game-playing AIs

AlphaGo policy network

AlphaGo value network

AlphaGo in action

Networks for video games

Constructing a Deep Q–network

Utilizing a target network

Experience replay buffer

Choosing action

Training methods

Training the network

Running the network

Summary

Deep Learning for Finance

Requirements

Introduction to AI in finance

Deep learning in trading

Building a trading platform

Basic trading functions

Creating an artificial trader

Managing market data

Price prediction utilizing LSTMs

Backtesting your algorithm

Event-driven trading platforms

Gathering stock price data

Generating word embeddings

Neural Tensor Networks for event embeddings

Predicting events with a convolutional neural network

Deep learning in asset management

Summary

Deep Learning for Robotics

Technical requirements

Introduction

Setting up your environment

MuJoCo physics engine

Downloading the MuJoCo binary files

Signing up for a free trial of MuJoCo

Configuring your MuJoCo files

Installing the MuJoCo Python package

Setting up a deep deterministic policy gradients model

Experience replay buffer

Hindsight experience replay

The actor–critic network

The actor

The critic

Deep Deterministic Policy Gradients

Implementation of DDPG

Summary

References

Deploying and Maintaining AI Applications

Technical requirements

Introduction

Deploying your applications

Deploying models with TensorFlow Serving

Utilizing docker

Building a TensorFlow client

Training and deploying with the Google Cloud Platform

Training on GCP

Deploying for online learning on GCP

Using an API to Predict

Scaling your applications

Scaling out with distributed TensorFlow

Testing and maintaining your applications

Testing deep learning algorithms

Summary

References

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Virtual assistants such as Alexa and Siri process our requests, Google's cars have started to read addresses, and Amazon's prices and Netflix's recommended videos are decided by AI. AI is one of the most exciting technologies, and is becoming increasingly significant in the modern world.

Hands-On Artificial Intelligence for Beginners will teach you what AI is and how to design and build intelligent applications. This book will teach you to harness packages such as TensorFlow to create powerful AI systems. You will begin by reviewing the recent changes in AI and learning how artificial neural networks (ANNs) have enabled more intelligent AI. You'll explore feedforward, recurrent, convolutional, and generative neural networks (FFNNs, RNNs, CNNs, and GNNs), as well as reinforcement learning methods. In the concluding chapters, you'll learn how to implement these methods for a variety of tasks, such as generating text for chatbots, directing self-driving cars, and playing board and video games.

By the end of this book, you will be able to understand exactly what you need to consider when optimizing ANNs and how to deploy and maintain AI applications.

Who this book is for

This book is designed for beginners in AI, aspiring AI developers, and machine learning enthusiasts with an interest in leveraging various algorithms to build powerful AI applications.

What this book covers

Chapter 1, The History of AI, begins by discussing the mathematical basis of AI and how certain theorems evolved. Then, we'll look at the research done in the 1980s and 90s to improve ANNs, we'll look at the AI winter, and we'll finish off with how we arrived at where we are today.

Chapter 2, Machine Learning Basics, introduces the fundamentals of machine learning and AI. Here, we will cover essential probability theory, linear algebra, and other elements that will lay the groundwork for the future chapters.

Chapter 3, Platforms and Other Essentials, introduces the deep learning libraries of Keras and TensorFlow and moves onto an introduction of basic AWS terminology and concepts that are useful for deploying your networks in production. We'll also introduce CPUs and GPUs, as well as other forms of compute architecture that you should be familiar with when building deep learning solutions.

Chapter 4, Your First Artificial Neural Networks, explains how to build our first artificial neural network. Then, we will learn ability of the core elements of ANNs and construct a simple single layer network both in Keras and TensorFlow so that you understand how the two languages work. With this simple network, we will do a basic classification task, such as the MNIST OCR task.

Chapter 5, Convolutional Neural Networks, introduces the convolutional neural network and explains its inner workings. We'll touch upon the basic building blocks of convolutions, pooling layers, and other elements. Lastly, we'll construct a Convolutional Neural Network for image tagging.

Chapter 6, Recurrent Neural Networks, introduces one of the workhorses of deep learning and AI—the recurrent neural network. We'll first introduce the conceptual underpinnings of recurrent neural networks, with a specific focus on utilizing them for natural language processing tasks. We'll show how one can generate text utilizing you of these networks and see how they can be utilized for predictive financial models.

Chapter 7, Generative Models, covers generative models primarily through the lens of GANs, and we'll look at how we can accomplish each of the above tasks with GANs.

Chapter 8, Reinforcement Learning, introduces additional forms of neural networks. First, we'll take a look at autoencoders, which are unsupervised learning algorithms that help us recreate inputs when we don't have access to input data. Afterwards, we'll touch upon other forms of networks, such as the emerging geodesic neural networks.

Chapter 9, Deep Learning for Intelligent Assistant, focuses on utilizing our knowledge of various forms of neural networks from the previous section to make an intelligent assistant, along the lines of Amazon's Alexa or Apple's Siri. We'll learn about and utilize word embeddings, recurrent neural networks, and decoders.

Chapter 10, Deep Learning for Game Playing, explains how to construct game-playing algorithms with reinforcement learning. We'll look at several different forms of games, from simple Atari-style games to more advanced board games. We'll touch upon the methods that Google Brain utilized to build AlphaGo.

Chapter 11, Deep Learning for Finance, shows how to create an advanced market prediction system in TensorFlow utilizing RNNs.

Chapter 12,Deep Learning for Robotics, uses deep learning to teach a robot to move objects. We will first train the neural network in simulated environments and then move on to real mechanical parts with images acquired from a camera.

Chapter 13, Scale, Deploy and Maintain AI Application, introduces methods for creating and scaling training pipelines and deployment architectures for AI systems.

To get the most out of this book

The codes in the chapter can be directly executed using Jupyter and Python. The code files for the book are present in the GitHub link provided in the following sections.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

www.packt.com

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

Enter the name of the book in the

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Hands-On-Artificial-Intelligence-for-Beginners. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book,mention the book title in the subject of your message and email us [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visitwww.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us [email protected] a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visitauthors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

The History of AI

The term Artificial Intelligence (AI) carries a great deal of weight. AI has benefited from over 70 years of research and development. The history of AI is varied and winding, but one ground truth remains – tireless researchers have worked through funding growths and lapses, promise and doubt, to push us toward achieving ever more realistic AI.

Before we begin, let's weed through the buzzwords and marketing and establish what AI really is. For the purposes of this book, we will rely on this definition:

AI is a system or algorithm that allows computers to perform tasks without explicitly being programmed to do so.

AI is an interdisciplinary field. While we'll focus largely on utilizing deep learning in this book, the field also encompasses elements of robotics and IoT, and has a strong overlap (if it hasn't consumed it yet) with generalized natural language processing research. It's also intrinsically linked with fields such as Human-Computer Interaction (HCI) as it becomes increasingly important to integrate AI with our lives and the modern world around us.

AI goes through waves, and is bound to go through another (perhaps smaller) wave in the future. Each time, we push the limits of AI with the computational power that is available to us, and research and development stops. This day and age may be different, as we benefit from the confluence of increasingly large and efficient data stores, rapid fast and cheap computing power, and the funding of some of the most profitable companies in the world. To understand how we ended up here, let's start at the beginning.

In this chapter, we will cover the following topics:

The beginnings of AI

–

1950–1974

Rebirth – 1980–1987

The modern era takes hold

–

1997

–

2005

Deep learning and the future

–

2012–Present

The beginnings of AI –1950–1974

Since some of the earliest mathematicians and thinkers, AI has been a long sought after concept. The ancient Greeks developed myths of the automata, a form of robot that would complete tasks for the Gods that they considered menial, and throughout early history thinkers pondered what it meant to human, and if the notion of human intelligence could be replicated. While it's impossible to pinpoint an exact beginning for AI as a field of research, its development parallels the early advances of computer science. One could argue that computer science as a field developed out of this early desire to create self-thinking machines.

During the second world war, British mathematician and code breaker Alan Turing developed some of the first computers, conceived with the vision of AI in mind. Turing wanted to create a machine that would mimic human comprehension, utilizing all available information to reason and make decisions. In 1950, he published Computing Machinery and Intelligence, which introduced what we now call the Turing test of AI. The Turing test, which is a benchmark by which to measure the aptitude of a machine to mimic human interaction, states that to pass the test, the machine must be able to sufficiently fool a discerning judge as to if it is a human or not. This might sound simple, but think about how many complex items would have to be conquered to reach this point. The machine would be able to comprehend, store information on, and respond to natural language, all the while remembering knowledge and responding to situations with what we deem common sense.

Turing could not move far beyond his initial developments; in his day, utilizing a computer for research cost almost $200,000 per month and computers could not store commands. His research and devotion to the field, however, has earned him accolades. Today, he is widely considered the father of AI and the academic study of computer science.

It was in the summer of 1956, however, that the field was truly born. Just a few months before, researchers at the RAND Corporation developed the Logic Theorist – considered the world's first AI program – which proved 38 theorems of the Principia Mathematica. Spurred on by this development and others, John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon hosted the now famous Dartmouth Summer Research Project on AI, coining the term Artificial Intelligence itself and providing the groundwork for the field. With funding from the Rockefeller Foundation, these four friends brought together some of the most preeminent researchers in AI over the course of the summer to brainstorm and effectively attempt to provide a roadmap for the field. They came from the institutions and companies that were on the leading edge of the computing revolution at the time; Harvard, Dartmouth, MIT, IBM, Bell Labs, and the RAND Corporation. Their topics of discussion were fairly forward-thinking for the time – they could have easily been those of an AI conference today—Artificial Neural Networks (ANN), natural language processing (NLP), theories of computation, and general computing frameworks. The Summer Research Project was seminal in creating the field of AI as we know it today, and many of its discussion topics spurned the growth of AI research and development through the 1950s and 1960s.

After 1956, innovation kept up a rapid pace. Years later, in 1958, a researcher at the Cornell Aeronautical Laboratory named Frank Rosenblatt invented one of the founding algorithms of AI, the Perceptron. The following diagram shows the Perceptron algorithm:

The Perceptron algorithm

Perceptrons are simple, single-layer networks that work as linear classifiers. They consist of four main architectural aspects which are mentioned as follows:

The input layer

: The initial layer for reading in data

Weight and biases vectors

: Weights help learn appropriate values during training for the connections between neurons, while biases help shift the activation function to fit the desired output

A summation function

: A simple summation of the input

An activation function

: A simple mapping of the summed weighted input to the output

As you can see, these networks use basic mathematics to perform basic mathematical operations. They failed to live up to the hype, however, and significantly contributed to the first AI winter because of the vast disappointment they created.

Another important development of this early era of research was adaline. As you can see, adaline attempted to improve upon the perceptron by utilizing continuous predicted values to learn the coefficients, unlike the perceptron, which utilizes class labels. The following diagram shows the adaline algorithm:

These golden years also brought us early advances such as the student program that solved high school algebra programs and the ELIZA Chatbot. By 1963, the advances in the field convinced the newly formed Advanced Research Projects Agency (DARPA) to begin funding AI research at MIT.

By the late 1960s, funding in the US and the UK began to dry up. In 1969, a book named Perceptrons by MIT's Marvin Minsky and Seymour Papert (https://archive.org/details/Perceptrons) proved that these networks could only mathematically compute extremely basic functions. In fact, they went so far as to suggest that Rosenblatt had greatly exaggerated his findings and the importance of the perceptron. Perceptrons were of limited functionality to the field, effectively halting research in network structures.

With both governments releasing reports that significantly criticized the usefulness of AI, the field was shuttled into what has become known as the AI winter. AI research continued throughout the late 1960s and 1970s, mostly under different terminology. The terms machine learning, knowledge-based system, and pattern recognition all come from this period, when researchers had to think up creative names for their work in order to receive funding. Around this time, however, a student at the University of Cambridge named Geoffrey Hinton began exploring ANNs and how we could utilize them to mimic the brain's memory functions. We'll talk a lot more about Hinton in the following sections and throughout this book, as he has become one of the most important figures in AI today.

Rebirth –1980–1987

The 1980s saw the birth of deep learning, the brain of AI that has become the focus of most modern AI research. With the revival of neural network research by John Hopfield and David Rumelhart, and several funding initiatives in Japan, the United States, and the United Kingdom, AI research was back on track.

In the early 1980s, while the United States was still toiling from the effects of the AI Winter, Japan was funding the fifth generation computer system project to advance AI research. In the US, DARPA once again ramped up funding for AI research, with business regaining interest in AI applications. IBM's T.J. Watson Research Center published a statistical approach to language translation (https://aclanthology.info/pdf/J/J90/J90-2002.pdf), which replaced traditional rule-based NLP models with probabilistic models, the shepherding in the modern era of NLP.

Hinton, the student from the University of Cambridge who persisted in his research, would make a name for himself by coining the term deep learning. He joined forces with Rumelhart to become one of the first researchers to introduce the backpropagation algorithm for training ANNs, which is the backbone of all of modern deep learning. Hinton, like many others before him, was limited by computational power, and it would take another 26 years before the weight of his discovery was really felt.

By the late 1980s, the personal computing revolution and missed expectations threatened the field. Commercial development all but came to a halt, as mainframe computer manufacturers stopped producing hardware that could handle AI-oriented languages, and AI-oriented mainframe manufacturers went bankrupt. It had seemed as if all had come to a standstill.

The modern era takes hold – 1997-2005

AI further entered the public discourse in 1997 when IBM's Deep Blue system beat world champion chess grandmaster Garry Kasparov. Within a year, a former student of Geoffrey Hinton's, Yann LeCun, developed the Convolutional Neural Network at Bell Labs, which was enabled by the backpropagation algorithm and years of research into computer vision tasks. Hochreiter and Schmidhuber invented the first memory unit, the long short-term memory unit (LSTM), which is still used today for sequence modeling.

ANNs still had a way to go. Computing and storage limitations prevented these networks from scaling, and other methods such as support vector machines (SVMs) were developed as alternatives.

Deep learning and the future – 2012-Present

AI has made further strides in the past several years than in the 60-odd years since its birth. Its popularity has further been fueled by the increasingly public nature of its benefits – self-driving cars, personal assistants, and its ever-ubiquitous use in social media and advertising. For most of its history, AI was a field with little interaction with the average populace, but now it's come to the forefront of international discourse.

Today's age of AI has been the result of three trends:

The increasing amount of data and computing power available to AI researchers and practitioners

Ongoing research by Geoffrey Hinton and his lab at the University of Toronto into deep neural networks

Increasingly public applications of AI that have driven adoption and further acceptance into mainstream technology culture

Today, companies, governments, and other organizations have benefited from the big data revolution of the mid 2000s, which has brought us a plethora of data stores. At last, AI applications have the requisite data to train. Computational power is cheap and only getting cheaper.

On the research front, in 2012, Hinton and two of his students were finally able to show that deep neural networks were able to outperform all other methods in image recognition in the large-scale visual recognition challenge. The modern era of AI was born.

Interestingly enough, Hinton's team's work on computer vision also introduced the idea of utilizing Graphics Processing Units (GPUs) to train deep networks. It also introduced dropout and ReLu, which have become cornerstones of deep learning. We'll discuss these in the coming chapters. Today, Hinton is the most cited AI researcher on the planet. He is a lead data scientist at Google Brain and has been tied to many major developments in AI in the modern era.

AI was further thrown into the public sphere when, in 2011, IBM Watson defeated the world Jeopardy champions, and in 2016 Google's AlphaGo defeated the world grand champion at one of the most challenging games known to man: Go.

Today, we are closer than ever to having machines that can pass the Turing test. Networks are able to generate ever more realistic sounding imitations of speeches, images, and writing. Reinforcement learning methods and Ian Goodfellow's GANs have made incredible strides. Recently, there has been emerging research that is working to demystify the inner workings of deep neural networks. As the field progresses, however, we should all be mindful of overpromising. For most of its history, companies have often overpromised regarding what AI can do, and in turn, we've seen a consistent disappointment in its abilities. Focusing the abilities of AI on only certain applications, and continuing to view research in the field from a biological perspective, will only hurt its advancement going forward. In this book, however, we'll see that today's practical applications are directed and realistic, and that the field is making more strides toward true AI than ever before.

Summary

Since its beginnings in the 1940s and 1950s, AI has made great bounds. Many of the technologies and ideas that we are utilizing today are directly based on these early discoveries. Over the course of the latter half of the 20th century, pioneers such as Geoffrey Hinton have pushed AI forward through peaks and busts. Today, we are on track to achieve sustained AI development for the foreseeable future.

The development of AI technology has been closely aligned with the development of new hardware and increasingly large data sources. As we'll see throughout this book, great AI applications are built with data constraints and hardware optimization in mind. The next chapter will introduce you to the fundamentals of machine learning and AI. We will also cover probability theory, linear algebra, and other elements that will lay the groundwork for the future chapters.

Machine Learning Basics

Artificial Intelligence (AI) is rooted in mathematics and statistics. When creating an Artificial Neural Network (ANN), we're conducting mathematical operations on data represented in linear space; it is, by nature, applied mathematics and statistics. Machine learning algorithms are nothing but function approximations; they try and find a mapping between an input and a correct corresponding output. We use algebraic methods to create algorithms that learn these mappings.

Almost all machine learning can be expressed in a fairly straight-forward formula; bringing together a dataset and model, along with a loss function and optimization technique that are applicable to the dataset and model. This section is intended as a review of the basic mathematical tools and techniques that are essential to understanding what's under the hood in AI.

In this chapter, we'll review linear algebra and probability, and then move on to the construction of basic and fundamental machine learning algorithms and systems, before touching upon optimization techniques that can be used for all of your methods going forward. While we will utilize mathematical notation and expressions in this chapter and the following chapters, we will focus on translating each of these concepts into Python code. In general, Python is easier to read and comprehend than mathematical expressions, and allows readers to get off the ground quicker.

We will be covering the following topics in this chapter:

Applied math basics

Probability theory

Constructing basic machine learning algorithms

Technical requirements

In this chapter, we will be working in Python 3 with the scikit-learn scientific computing package. You can install the package, you can run pip install sklearn in your terminal or command line.

The building blocks – scalars, vectors, matrices, and tensors

In the following section, we'll introduce the fundamental types of linear algebra objects that are used throughout AI applications; scalars, vectors, matrices, and tensors.

Matrix math

The basic operations of an ANN are based on matrix math. In this section, we'll be reviewing the basic operations that you need to know to understand the mechanics of ANNs.

Basic statistics and probability theory

Probability, the mathematical method for modeling uncertain scenarios, underpins the algorithms that make AI intelligent, helping to tell us how our systems should reason. So, what is probability? We'll define it as follows:

Probability is a frequency expressed as a fraction of the sample size, n [1].

Simply said, probability is the mathematical study of uncertainty. In this section, we'll cover the basics of probability space and probability distributions, as well as helpful tools for solving simple problems.

The probability space and general theory

When probability is discussed, it's often referred to in terms of the probability of a certain event happening. Is it going to rain? Will the price of apples go up or down? In the context of machine learning, probabilities tell us the likelihood of events such as a comment being classified as positive vs. negative, or whether a fraudulent transaction will happen on a credit card. We measure probability by defining what we refer to as the probability space. A probability space is a measure of how and why of the probabilities of certain events. Probability spaces are defined by three characteristics:

The sample space, which tells us the possible outcomes or a situation

A defined set of events; such as two fraudulent credit card transactions

The measure of probability of each of these events

While probability spaces are a subject worthy of studying in their own right, for our own understanding, we'll stick to this basic definition.

In probability theory, the idea of independence is essential. Independence is a state where a random variable does not change based on the value of another random variable. This is an important assumption in deep learning, as non–independent features can often intertwine and affect the predictive power of our models.

In statistical terms, a collection of data about an event is a sample, which is drawn from a theoretical superset of data called a population that represents everything that is known about a grouping or event. For instance, if we were poll people on the street about whether they believe in Political View A or Political View B, we would be generating a random sample from the population, which would be entire population of the city, state, or country where we are polling.

Now let's say we wanted to use this sample to predict the likelihood of a person having one of the two political views, but we mostly polled people who were at an event supporting Political View A. In this case, we may have a biased sample. When sampling, it is important to take a random sample to decrease bias, otherwise any statistical analysis or modeling that we do with sample will be biased as well.

Probability distributions

You've probably seen a chart such as the following one; it's showing us the values that appear in a dataset, and how many times those values appear. This is called a distribution of a variable. In this particular case, we're displaying the distribution with the help of a histogram, which shows the frequency of the variables:

In this section, we're interested in a particular type of distribution, called a probabilitydistribution. When we talk about probability distributions, we're talking about the likelihood of a random variable taking on a certain value, and we create one by dividing the frequencies in the preceding histogram by the total number of samples in the distribution, in a process called normalization. There are two primary forms of probability distributions: probability mass functions for discrete variables, and probability density functions for continuous variables, as well as; cumulative distribution functions, which apply to any random variables, also exist.

Probability mass functions

Probability mass functions (PMFs)are discrete distributions. The random variables of the distribution can take on a finite number of values:

PMFs look a bit different from our typical view of a distribution, and that is because of their finite nature.

Probability density functions

Probability density functions (PDFs) are continuous distributions; values from the distribution can take on infinitely many values. For example, take the image as follows:

You've probably seen something like this before; it's a probability density function of a standard normal, or Gaussian distribution.

Conditional and joint probability

Conditional probability is the probability that x happens, given that y happens. It's one of the key tools for reasoning about uncertainty in probability theory. Let's say we are talking about your winning the lottery, given that it's a sunny day. Maybe you're feeling lucky! How would we write that in a probability statement? It would be the probability of your lottery win, A, given the probability of it being sunny, B, so P(A|B).

Joint probability is the probability of two things happening simultaneously: what is the probability of you winning the lottery and it being a sunny day?