Machine Learning with Core ML - Joshua Newnham - E-Book

Machine Learning with Core ML E-Book

Joshua Newnham

0,0
39,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Core ML is a popular framework by Apple, with APIs designed to support various machine learning tasks. It allows you to train your machine learning models and then integrate them into your iOS apps.
Machine Learning with Core ML is a fun and practical guide that not only demystifies Core ML but also sheds light on machine learning. In this book, you’ll walk through realistic and interesting examples of machine learning in the context of mobile platforms (specifically iOS). You’ll learn to implement Core ML for visual-based applications using the principles of transfer learning and neural networks. Having got to grips with the basics, you’ll discover a series of seven examples, each providing a new use-case that uncovers how machine learning can be applied along with the related concepts.
By the end of the book, you will have the skills required to put machine learning to work in their own applications, using the Core ML APIs

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 403

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Machine Learning with Core ML

 

 

 

 

 

 

An iOS developer's guide to implementing machine learning in mobile apps

 

 

 

 

 

 

 

 

Joshua Newnham

 

 

 

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

Machine Learning with Core ML

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Amey VarangaonkarAcquisition Editor:Tushar GuptaContent Development Editor:Karan ThakkarTechnical Editor: Sagar SawantCopy Editor: Safis EditingProject Coordinator: Nidhi JoshiProofreader: Safis EditingIndexer: Tejal Daruwale SoniGraphics:Tania DuttaProduction Coordinator:Arvindkumar Gupta

First published: June 2018

Production reference: 1260618

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78883-829-0

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Joshua Newnham is a technology lead at a global design firm, Method, focusing on the intersection of design and artificial intelligence (AI), specifically in the areas of computational design and human computer interaction.

Prior to this, he was a technical director at Masters of Pie, a virtual reality (VR) and augmented reality (AR) studio focused on building collaborative tools for engineers and creatives.

First and foremost, I would like give thanks to my incredible wife and son for their extraordinary support, encouragement, and inspiration throughout this book and life in general. Thank you both. Writing a book is no small undertaking, and without the team at Packt continuously refining the work, you would likely be reading 400+ pages of late night ramblings. So, a big thanks to the team for helping me make this happen.

 

About the reviewer

Shilpa Karkeraa is a leading solution expert and the founder CEO of Myraa Technologies, an Artificial Intelligence Solutions Company. From being an independent entrepreneur to a hands-on developer, she has been an innovator with cutting edge technologies. Prior to Myraa, she spearheaded as a Team Lead—Data Engineering Group of a Bay Area start-up to a Top Corporates Financial Services Firm in India to being an architect at a Singaporean B2C company. She is an active global technology speaker. She hopes to commercialize research and innovation to touch human lives effectively! 

Reviewing this book was indeed a page turner along with an excelling keyboard cruncher. Thanks Packt Publishing for an amazing experience with the book. Cheers to the author for his writing streak over the beauty of design and detail, to my dear friend Tanvi Bhatt and to my technologists at Myraa Technologies for the hands-on implementations.

 

 

 

 

 

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Machine Learning with Core ML

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Introduction to Machine Learning

What is machine learning?

A brief tour of ML algorithms

Netflix – making recommendations 

Shadow draw – real-time user guidance for freehand drawing

Shutterstock – image search based on composition

iOS keyboard prediction – next letter prediction

A typical ML workflow 

Summary

Introduction to Apple Core ML

Difference between training and inference

Inference on the edge

A brief introduction to Core ML

Workflow 

Learning algorithms 

Auto insurance in Sweden

Supported learning algorithms

Considerations 

Summary

Recognizing Objects in the World

Understanding images

Recognizing objects in the world

Capturing data 

Preprocessing the data

Performing inference 

Summary 

Emotion Detection with CNNs

Facial expressions

Input data and preprocessing 

Bringing it all together

Summary 

Locating Objects in the World

Object localization and object detection 

Converting Keras Tiny YOLO to Core ML

Making it easier to find photos

Optimizing with batches

Summary

Creating Art with Style Transfer

Transferring style from one image to another 

A faster way to transfer style

Converting a Keras model to Core ML

Building custom layers in Swift

Accelerating our layers 

Taking advantage of the GPU 

Reducing your model's weight

Summary

Assisted Drawing with CNNs

Towards intelligent interfaces 

Drawing

Recognizing the user's sketch

Reviewing the training data and model

Classifying sketches 

Sorting by visual similarity

Summary 

Assisted Drawing with RNNs

Assisted drawing 

Recurrent Neural Networks for drawing classification

Input data and preprocessing 

Bringing it all together

Summary 

Object Segmentation Using CNNs

Classifying pixels 

Data to drive the desired effect – action shots

Building the photo effects application

Working with probabilistic results

Improving the model

Designing in constraints 

Embedding heuristics

Post-processing and ensemble techniques

Human assistance

Summary

An Introduction to Create ML

A typical workflow 

Preparing the data

Creating and training a model

Model parameters

Model metadata

Alternative workflow (graphical) 

Closing thoughts

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

We are living on the verge of a new era of computing, an era where computers are becoming more of a companion than a tool. The devices we carry in our pockets will soon better understand our world and us a lot better, and this will have a profound impact on how we interact with and use them.

But right now, a lot of these exciting advancements are stuck in the labs of researchers and not in the hands of designers and developers, making them usable and accessible to users. This is not because the details are locked away; on the contrary, in most cases they are freely available. 

This gap is somewhat due to our contentment with sticking to what we know, having the user do all the work, making them tap on the buttons. If nothing else, I hope this book makes you curious about what is out there and how it can be used to create new experiences, or improve existing ones. 

Within the pages of this book, you will find a series of examples to help you build an understanding of how deep neural networks work and how they can be applied.

This book focuses on a set of models for a better understanding of images and photos, specifically looking at how they can be adapted and applied on the iOS platform. This narrow focus of image-based models and the iOS platform is intentional; I find that the visual nature of images makes the concepts easier to, well, visualize, and the iPhone provides the perfect candidate and environment for experimentation.

So, as you go through this book, I encourage you to start thinking about new ways of how these models can be used and what new experiences you could create. With that being said, let's get started! 

Who this book is for

This book will appeal to three broad groups of people. The first are intermediate iOS developers who are interested in learning and applying machine learning (ML); some exposure to ML concepts may be beneficial but are not essential as this book covers the intuition behind the concepts and models used throughout it.

The second group are those who have experience in ML but not in iOS development and are looking for a resource to help them to get the grips with Core ML; for this group, it is recommended to complement this book with a book that covers the fundamentals of iOS development. 

The last group are experienced iOS developers and ML practitioners who are curious to see how various models have been applied in the context of the iOS platform. 

What this book covers

Chapter 1, Introduction to Machine Learning, provides a brief introduction to ML, including some explanation of the core concepts, the types of problems, algorithms, and general workflow of creating and using a ML models. The chapter concludes by exploring some examples where ML is being applied.

Chapter 2, Introduction to Apple Core ML, introduces Core ML, discussing what it is, what it is not, and the general workflow for using it.

Chapter 3, Recognizing Objects in the World, walks through building a Core ML application from start to finish. By the end of the chapter, we would have been through the whole process of obtaining a model, importing it into the project, and making use of it.

Chapter 4, Emotion Detection with CNNs, explores the possibilities of computers understanding us better, specifically our mood. We start by building our intuition of how ML can learn to infer your mood, and then put this to practice by building an application that does just that. We also use this as an opportunity to introduce the Vision framework and see how it complements Core ML. 

Chapter 5, Locating Objects in the World, goes beyond recognizing a single object to being able to recognize and locate multiple objects within a single image through object detection. After building our understanding of how it works, we move on to applying it to a visual search application that filters not only by object but also by composition of objects. In this chapter, we'll also get an opportunity to extend Core ML by implementing customer layers. 

Chapter 6, Creating Art with Style Transfer, uncovers the secrets behind the popular photo effects application, Prisma. We start by discussing how a model can be taught to differentiate between the style and content of an image, and then go on to build a version of  Prisma that applies a style from one image to another. We wrap up this chapter by looking at ways to optimize the model. 

Chapter 7, Assisted Drawing with CNNs, walks through building an application that can recognize a users sketch using the same concepts that have been introduced in previous chapters. Once what the user is trying to sketch has been recognized, we look at how we can find similar substitutes using the feature vectors from a CNN. 

Chapter 8, Assisted Drawing with RNNs, builds on the previous chapter and explores replacing the the convolution neural network (CNN) with a recurrent neural network (RNN) for sketch classification, thus introducing RNNs and showing how they can be applied to images. Along with a discussion on learning sequences, we will also delve into the details of how to download and compile Core ML models remotely. 

Chapter 9, Object Segmentation Using CNNs, walks through building an ActionShot photography application. And in doing so, we introduce another model and accompanying concepts, and get some hands-on experience of preparing and processing data.

Chapter 10, An Introduction to Create ML, is the last chapter. We introduce Create ML, a framework for creating and training Core ML models within Xcode using Swift. By the end of this chapter, you will know how to quickly create, train, and deploy a custom models. 

To get the most out of this book

To be able to follow through the examples in this book, you will need the following software:

macOS 10.13 or higher 

Xcode 9.2 or higher 

iOS 11.0 or higher (device and simulator) 

For the examples that are dependent on Core ML 2, you will need the following software:

macOS 10.14 

Xcode 10.0 beta 

iOS 12 (device and simulator)

It's recommended that you use https://notebooks.azure.com (or some other Jupyter notebook service provider) to follow the examples using the Core ML Tools Python package, but those wanting to run locally or train their model will need the following software:

Python 2.7 

Jupyter Notebooks 1.0

TensorFlow 1.0.0 or higher

NumPy 1.12.1 or higher

Core ML Tools 0.9 (and 2.0 for Core ML 2 examples)  

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packtpub.com

.

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Machine-Learning-with-Core-ML. In case there's an update to the code, it will be updated on the existing GitHub repository.

 We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/MachineLearningwithCoreML_ColorImages.pdf.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Introduction to Machine Learning

Let's begin our journey by peering into the future and envision how we'll see ourselves interacting with computers. Unlike today's computers, where we are required to continuously type in our emails and passwords to access information, the computers of the future will easily be able to recognize us by our face, voice, or activity.Unlike today's computers, which require step-by-step instructions to perform an action, the computer of the future will anticipate our intent and provide a natural way for us to converse with it, similar to how we engage with other people, and then proceed to help us achieve our goal. Our computer will not only assist us but also be our friend, our doctor, and so on. It could deliver our groceries at the door and be our interface with an increasingly complex and information-rich physical world.

What is exciting about this vision is that it is no longer in the realm of science fiction but an emergent reality. One of the major drivers of this is the progress and adoption of machine learning (ML) techniques, a discipline that gives computers the perceptual power of humans, thus giving them the ability to see, hear, and make sense of the world—physical and digital.

But despite all the great progress over the last 3-4 years, most of the ideas and potential are locked away in research projects and papers rather than being in the hands of the user. So it's the aim of this book to help developers understand these concepts better. It will enable you to put them into practice so that we can arrive at this future—a future where computers augment us, rather than enslave us due to their inability to understand our world.

Because of the constraint of Core ML—it being only able to perform inference—this book differs vastly from other ML books, in the sense that the core focus is on the application of ML. Specifically we'll focus on computer vision applications rather than the details of ML. But in order to better enable you to take full advantage of ML, we will spend some time introducing the associated concepts with each example. 

And before jumping into the hands-on examples, let's start from the beginning and build an appreciation for what ML is and how it can be applied. In this chapter we will:

Start by i

ntroducing ML. We'll learn how it differs from classical programming and why you might choose it.

Look at some examples of how ML is being used today, along with the type of data and ML algorithm being used.

Finally, present the typical workflow for ML projects.

Let's kick off by first discussing what ML is and why everyone is talking about it. 

What is machine learning?

ML is a subfield of Artificial Intelligence (AI), a topic of computer science born in the 1950s with the goal of trying to get computers to think or provide a level of automated intelligence similar to that of us humans. 

Early success in AI was achieved by using an extensive set of defined rules, known as symbolic AI, allowing expert decision making to be mimicked by computers. This approach worked well for many domains but had a big shortfall in that in order to create an expert, you needed one. Not only this, but also their expertise needed to be digitized somehow, which normally required explicit programming. 

ML provides an alternative; instead of having to handcraft rules, it learns from examples and experience. It also differs from classical programming in that it is probabilistic as opposed to being discrete. That is, it is able to handle fuzziness or uncertainty much better than its counterpart, which will likely fail when given an ambiguous input that wasn't explicitly identified and handled. 

I am going to borrow an example used by Google engineer Josh Godron in an introductory video to ML to better highlight the differences and value of ML.

Suppose you were given the task of classifying apples and oranges. Let's first approach this using what we will call classical programming:

Our input is an array of pixels for each image, and for each input, we will need to explicitly define some rules that will be able to distinguish an apple from an orange. Using the preceding examples, you can solve this by simply counting the number of orange and green pixels. Those with a higher ratio of green pixels would be classified as an apple, while those with a higher ratio of orange pixels would be classified as an orange. This works well with these examples but breaks if our input becomes more complex:

The introduction of new images means our simple color-counting function can no longer sufficiently differentiates our apples from our oranges, or even classify apples. We are required to reimplement the function to handle the new nuances introduced. As a result, our function grows in complexity and becomes more tightly coupled to the inputs and less likely able to generalize to other inputs. Our function might resemble something like the following:

func countColors(_ image:UIImage) -> [(color:UIColor, count:Int)]{//

lots of code

}func detectEdges(_ image:UIImage) -> [(x1:Int, y1:

Int

, x2:

Int

, y2:

Int

)]{

// lots of code}func analyseTexture(_ image:UIImage) -> [String]{// lots of code } func fitBoundingBox(_ image:UIImage) -> [(x:Int, y:Int, w:Int, h:Int)]{// lots of code }

This function can be considered our model, which models the relationship of the inputs with respect to their labels (apple or orange), as illustrated in the following diagram:

 The alternative, and the approach we're interested in, is getting this model created to automatically use examples; this, in essence, is what ML is all about. It provides us with an effective tool to model complex tasks that would otherwise be nearly impossible to define by rules. 

The creation phase of the ML model is called training and is determined by the type of ML algorithm selected and data being fed. Once the model is trained, that is, once it has learned, we can use it to make inferences from the data, as illustrated in the following diagram: 

The example we have presented here, classifying oranges and apples, is a specific type of ML algorithm called a classifier, or, more specifically, a multi-class classifier. The model was trained through supervision; that is, we fed in examples of input with their associated labels (or classes). It is useful to understand the types of ML algorithms that exist along with the types of training, which is the topic of the next section. 

A brief tour of ML algorithms

In this section, we will look at some examples of how ML is used, and with each example, we'll speculate about the type of data, learning style, and ML algorithm used. I hope that by the end of this section, you will be inspired by what is possible with ML and gain some appreciation for the types of data, algorithms, and learning styles that exist. 

In this section, we will be presenting some real-life examples in the context of introducing types of data, algorithms, and learning styles. It is not our intention to show accurate data representations or implementations for the example, but rather use the examples as a way of making the ideas more tangible. 

Shadow draw – real-time user guidance for freehand drawing

To highlight the synergies between man and machine, AI is sometimes referred to as Augmented Intelligence (AI), putting the emphasis on the system to augment our abilities rather than replacing us altogether.

One area that is becoming increasingly popular—and of particular interest to myself—is assisted creation systems, an area that sits at the intersection of the fields of human-computer interaction (HCI) and ML. These are systems created to assist in some creative tasks such as drawing, writing, video, and music. 

The example we will discuss in this section is shadow draw, a research project undertaken at Microsoft in 2011 by Y.J. Lee, L. Zitnick, and M. Cohen. Shadow draw is a system that assists the user in drawing by matching and aligning a reference image from an existing dataset of objects and then lightly rendering shadows in the background to be used as guidelines for the user. For example, if the user is predicted to be drawing a bicycle, then the system would render guidelines under the user's pen to assist them in drawing the object, as illustrated in this diagram:

As we did before, let's walk through how we might approach this, focusing specifically on classifying the sketch; that is, we'll predict what object the user is drawing. This will give us the opportunity to see new types of data, algorithms, and applications of ML.

The dataset used in this project consisted of 30,000 natural images collected from the internet via 40 category queries such as face, car, and bicycle, with each category stored in its own directory; the following diagram shows some examples of these images:

After obtaining the raw data, the next step, and typical of any ML project, is to perform data preprocessing and feature engineering. The following diagram shows the preprocessing steps, which consist of:

Rescaling each image

Desaturating (turning black and white)

Edge detection

Our next step is to abstract our data into something more meaningful and useful for our ML algorithm to work with; this is known as feature engineering, and is a critical step in a typical ML workflow. 

One approach, and the approach we will describe, is creating something known as a visual bag of words. This is essentially a histogram of features (visual words) used to describe each image, and collectively to describe each category. What constitutes a feature is dependent on the data and ML algorithm; for example, we can extract and count the colors of each image, where the colors become our features and collectively describe our image, as shown in the following diagram: 

But because we are dealing with sketches, we want something fairly coarse—something that can capture the general strokes directions that will encapsulate the general structure of the image. For example, if we were to describe a square and a circle, the square would consist of horizontal and vertical strokes, while the circle would consist mostly of diagonal strokes. To extract these features, we can use a computer vision algorithm called histogram of oriented gradients (HOG); after processing an image you are returned a histogram of gradient orientations in localized portions of the image. Exactly what we want! To help illustrate the concept, this process is summarized for a single image here:

 

After processing all the images in our dataset, our next step is to find a histogram (or histograms) that can be used to identify each category; we can use an unsupervised learning clustering technique called K-means, where each category histogram is the centroid for that cluster. The following diagram describes this process; we first extract features for each image and then cluster these using K-means, where the distance is calculated using the histogram of gradients. Once our images have been clustered into their groups, we extract the center (mean) histogram of each of these groups to act as our category descriptor: 

Once we have obtained a histogram for each category (codebook), we can train a classifier  using each image's extracted features (visual words) and the associated category (label). One popular and effective classifier is support vector machines (SVM). What SVM tries to find is a hyperplane that best separates the categories; here, best refers to a plane that has the largest distance between each of the category members. The term hyper is used because it transforms the vectors into high-dimensional space such that the categories can be separated with a linear plane (plane because we are working within a space). The following diagram shows how this may look for two categories in a two-dimensional space:

With our model now trained, we can perform real-time classification on the image as the user is drawing, thus allowing us to assist the user by providing them with guidelines for the object they are wanting to draw (or at least, mention the object we predicted them to be drawing). Perfectly suited for touch interfaces such as your iPhone or iPad! This assists not just in drawing applications, but anytime where an input is required by the user, such as image-based searching or note taking. 

In this example, we showed how feature engineering and unsupervised learning are used to augment data, making it easier for our model to sufficiently perform classification using the supervised learning algorithm SVM. Prior to deep neural networks, feature engineering was a critical step in ML and sometimes a limiting factor for these reasons:

It required special skills and sometimes domain expertise

It was at the mercy of a human being able to find and extract meaningful features

It required that the features extracted would generalize across the population, that is, be expressive enough to be applied to all examples

In the next example, we introduce a type of neural network called a convolutional neural network (CNN or ConvNet), which takes care of a lot of the feature engineering itself. 

The paper describing the actual project and approach can be found here: http://vision.cs.utexas.edu/projects/shadowdraw/shadowdraw.html.

Shutterstock – image search based on composition

Over the past 10 years, we have seen an explosive growth in visual content created and consumed on the Web, but before the success of CNNs, images were found by performing simple keyword searches on the tags assigned manually. All this changed around 2012, when A. Krizhevsky, I. Sutskever, and G. E. Hinton published their paper ImageNet Classification with Deep Convolutional Networks. The paper described their architecture used to win the 2012 ImageNet Large-Scale Visual Recognition Challenge (ILSVRC). It's a competition like the Olympics of computer vision, where teams compete across a range of CV tasks such as classification, detection, and object localization. And that was the first year a CNN gained the top position with a test error rate of 15.4% (the next best entry achieved an test error rate of 26.2%). Ever since then, CNNs have become the de facto approach for computer vision tasks, including becoming the new approach for performing visual search. Most likely, it has been adopted by the likes of Google, Facebook, and Pinterest, making it easier than ever to find that right image. 

Recently, (October 2017), Shutterstock announced one of the more novel uses of CNNs, where they introduced the ability for their users to search for not only multiple items in an image, but also the composition of those items. The following screenshot shows an example search for a kitten and a computer, with the kitten on the left of the computer:

So what are CNNs? As previously mentioned, CNNs are a type of neural network that are well suited for visual content due to their ability to retain spatial information. They are somewhat similar to the previous example, where we explicitly define a filter to extract localized features from the image. A CNN performs a similar operation, but unlike our previous example, filters are not explicitly defined. They are learned through training, and they are not confined to a single layer but rather build with many layers. Each layer builds upon the previous one and each becomes increasingly more abstract (abstract here means a higher-order representation, that is, from pixels to shapes) in what it represents. 

To help illustrate this, the following diagram visualizes how a network might build up its understanding of a cat. The first layer's filters extract simple features, such as edges and corners. The next layer builds on top of these with its own filters, resulting in higher-level concepts being extracted, such as shapes or parts of the cat. These high-level concepts are then combined for classification purposes:

This ability to get a deeper understanding of the data and reduce the dependency on manual feature engineering has made deep neural networks one of the most popular ML algorithms over the past few years.

To train the model, we feed the network examples using images as inputs and labels as the expected outputs. Given enough examples, the model will build an internal representation for each label, which can be sufficiently used for classification; this, of course, is a type of supervised learning. 

Our last task is to find the location of the item or items; to achieve this, we can inspect the weights of the network to find out which pixels activated a particular class, and then create a bounding box around the inputs with the largest weights. 

We have now identified the items and their locations within the image. With this information, we can preprocess our repository of images and cache it as metadata to make it accessible via search queries. We will revisit this idea later in the book when you will get a chance to implement a version of this to assist the user in finding images in their photo album. 

In this section, we saw how ML can be used to improve user experience and briefly introduced the intuition behind CNNs, a neural network well suited for visual contexts, where retaining proximity of features and building higher levels of abstraction is important. In the next section, we will continue our exploration of ML applications by introducing another example that improves the user experience and a new type of neural network that is well suited for sequential data such as text. 

iOS keyboard prediction – next letter prediction

Quoting usability expert Jared Spool, Good design, when done well, should be invisible. This holds true for ML as well. The application of ML need not be apparent to the user and sometimes (more often than not) more subtle uses of ML can prove just as impactful. 

A good example of this is an iOS feature called dynamic target resizing; it is working every time you type on an iOS keyboard, where it actively tries to predict what word you're trying to type:

Using this prediction, the iOS keyboard dynamically changes the touch area of a key (here illustrated by the red circles) that is the most likely character based on what has already been typed before it. 

For example, in the preceding diagram, the user has entered "Hell"; now it would be reasonable to assume that the most likely next character the user wants to tap is "o". This is intuitive given our knowledge of the English language, but how do we teach a machine to know this?

This is where recurrent neural networks (RNNs) come in; it's a type of neural network that persists state over time. You can think of this persisted state as a form of memory, making RNNs suitable for sequential data such as text (any data where the inputs and outputs are dependent on each other). This state is created by using a feedback loop from the output of the cell, as shown in the following diagram: 

The preceding diagram shows a single RNN cell. If we unroll this over time, we would get something that looks like the following:

Using hello as our example, the preceding diagram shows an unrolled RNN over five time steps; at each time step, the RNN predicts the next likely character. This prediction is determined by its internal representation of the language (from training) and subsequent inputs. This internal representation is built by training it on samples of text where the output is using the inputs but at the next time step (as illustrated earlier). Once trained, the inference follows a similar path, except that we feed to the network the predicted character from the output, to get the next output (to generate the sequence, that is, words).

Neural networks and most ML algorithms require their inputs to be numbers, so we need to convert our characters to numbers, and back again. When dealing with text (characters and words), there are generally two approaches: one-hot encoding and embeddings. Let's quickly cover each of these to get some intuition of how to handle text.

Text (characters and words) is considered categorical, meaning that we cannot use a single number to represent text because there is no inherit relationship between the text and the value; that is, assigning the 10 and cat 20 implies that cat has a greater value than the. Instead, we need to encode them into something where no bias is introduced. One solution to this is encoding them using one-hot encoding, which uses an array of the size of your vocabulary (number of characters in our case), with the index of the specific character set to 1 and the rest set to 0. The following diagram illustrates the encoding process for the corpus "hello":

In the preceding diagram, we show some of the steps required when encoding characters; we start off by splitting the corpus into individual characters (called tokens, and the process is called tokenization). Then we create a set that acts as our vocabulary, and finally we encode this with each character being assigned a vector.

Here, we'll only present some of the steps required for preparing text before passing it to our ML algorithm.

Once our inputs are encoded, we can feed them into our network. Outputs will also be represented in this format, with the most likely character being the index with the greatest value. For example, if 'e' is predicted, then the most likely the output may resemble something like [0.95, 0.2, 0.2, 0.1].

But there are two problems with one-hot encoding. The first is that for a large vocabulary, we end up with a very sparse data structure. This is not only an inefficient use of memory, but also requires additional calculations for training and inference. The second problem, which is more obvious when operating on words, is that we lose any contextual meaning after they have been encoded. For example, if we were to encode the words dog and dogs, we would lose any relationship between these words after encoding. 

An alternative, and something that addresses these two problems, is using an embedding. These are generally weights from a trained network that use a dense vector representation for each token, one that preserves some contextual meaning. This book focuses on computer vision tasks, so we won't be going into the details here. Just remember that we need to encode our text (characters) into something our ML algorithm will accept. 

We train the model using weak supervision, similar to supervised learning, but inferring the label without it having been explicitly labelled. Once trained, we can predict the next character using multi-class classification, as described earlier. 

Over the past couple of years, we have seen the evolution of assistive writing; one example is Google's Smart Reply, which provides an end-to-end method for automatically generating short email responses. Exciting times!

This concludes our brief tour of introducing types of ML problems along with the associated data types, algorithms, and learning style. We have only scratched the surface of each, but as you make your way through this book, you will be introduced to more data types, algorithms, and learning styles. 

In the next section, we will take a step back and review the overall workflow for training and inference before wrapping up this chapter. 

A typical ML workflow 

If we analyze each of the examples presented so far, we see that each follows a similar pattern. First is the definition of the problem or desired functionality. Once we have established what we want to do, we then identify the available data and/or what data is required. With the data in hand, our next step is to create our ML model and prepare the data for training.

After training, something we hadn't discussed here, is validating our ML model, that is, testing that it satisfactorily achieves what we require of it. An example is being able to make an accurate prediction. Once we have trained a model, we can make use of it by feeding in real data, that is, data outside our training set. In the following diagram, we see these steps summarized for training and inference: 

We will spend most of our time using trained models in this book, but understanding how we arrive at these models will prove helpful as you start creating your own intelligent apps. This will also help you identify opportunities to apply ML on existing data or inspire you to seek out new data sources