TensorFlow Deep Learning Projects - Alexey Grigorev - E-Book

TensorFlow Deep Learning Projects E-Book

Alexey Grigorev

0,0
32,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Leverage the power of Tensorflow to design deep learning systems for a variety of real-world scenarios

Key Features

  • Build efficient deep learning pipelines using the popular Tensorflow framework
  • Train neural networks such as ConvNets, generative models, and LSTMs
  • Includes projects related to Computer Vision, stock prediction, chatbots and more

Book Description

TensorFlow is one of the most popular frameworks used for machine learning and, more recently, deep learning. It provides a fast and efficient framework for training different kinds of deep learning models, with very high accuracy. This book is your guide to master deep learning with TensorFlow with the help of 10 real-world projects.

TensorFlow Deep Learning Projects starts with setting up the right TensorFlow environment for deep learning. Learn to train different types of deep learning models using TensorFlow, including Convolutional Neural Networks, Recurrent Neural Networks, LSTMs, and Generative Adversarial Networks. While doing so, you will build end-to-end deep learning solutions to tackle different real-world problems in image processing, recommendation systems, stock prediction, and building chatbots, to name a few. You will also develop systems that perform machine translation, and use reinforcement learning techniques to play games.

By the end of this book, you will have mastered all the concepts of deep learning and their implementation with TensorFlow, and will be able to build and train your own deep learning models with TensorFlow confidently.

What you will learn

  • Set up the TensorFlow environment for deep learning
  • Construct your own ConvNets for effective image processing
  • Use LSTMs for image caption generation
  • Forecast stock prediction accurately with an LSTM architecture
  • Learn what semantic matching is by detecting duplicate Quora questions
  • Set up an AWS instance with TensorFlow to train GANs
  • Train and set up a chatbot to understand and interpret human input
  • Build an AI capable of playing a video game by itself –and win it!

Who this book is for

This book is for data scientists, machine learning developers as well as deep learning practitioners, who want to build interesting deep learning projects that leverage the power of Tensorflow. Some understanding of machine learning and deep learning, and familiarity with the TensorFlow framework is all you need to get started with this book.

Luca Massaron is a data scientist and marketing research director specialized in multivariate statistical analysis, machine learning, and customer insight, with 10+ years experience of solving real-world problems and generating value for stakeholders using reasoning, statistics, data mining, and algorithms. Passionate about everything on data analysis and demonstrating the potentiality of data-driven knowledge discovery to both experts and non-experts, he believes that a lot can be achieved by understanding in simple terms and practicing the essentials of any discipline. Alberto Boschetti is a data scientist with strong expertise in signal processing and statistics. He holds a PhD in telecommunication engineering and lives and works in London. In his work, he faces daily challenges spanning natural language processing, machine learning, and distributed processing. He is very passionate about his job and always tries to stay up to date on the latest development in data science technologies, attending meetups, conferences, and other events. Alexey Grigorev is a skilled data scientist, machine learning engineer, and software developer with more than 8 years of professional experience. He started his career as a Java developer working at a number of large and small companies, but after a while he switched to data science. Right now, Alexey works as a data scientist at Simplaex, where, in his day-to-day job, he actively uses Java and Python for data cleaning, data analysis, and modeling. His areas of expertise are machine learning and text mining. Abhishek Thakur is a data scientist. His focus is mainly on applied machine learning and deep learning, rather than theoretical aspects. He completed his master's in computer science at the University of Bonn in early 2014. Since then, he has worked in various industries, with a research focus on automatic machine learning. He likes taking part in machine learning competitions and has attained a third place in the worldwide rankings on the popular website Kaggle. Rajalingappaa Shanmugamani is currently a deep learning lead at SAP, Singapore. Previously, he worked and consulted at various startups, developing computer vision products. He has a master's from IIT Madras, his thesis having been based on the applications of computer vision in manufacturing. He has published articles in peer-reviewed journals, and spoken at conferences, and applied for a few patents in machine learning. In his spare time, he coaches programming and machine learning to school students and engineers.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 348

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



TensorFlow Deep Learning Projects

 

 

10 real-world projects on computer vision, machine translation, chatbots, and reinforcement learning

 

 

 

 

 

Luca Massaron
Alberto Boschetti
Alexey Grigorev
Abhishek Thakur
Rajalingappaa Shanmugamani

 

 

 

 

 

BIRMINGHAM - MUMBAI

TensorFlow Deep Learning Projects

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editors: Amey VarangaonkarAcquisition Editor: Viraj MadhavContent Development Editors: Snehal KolteTechnical Editor: Dharmendra YadavCopy Editor: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexers: Rekha NairGraphics: Tania DuttaProduction Coordinator: Shraddha Falebhai

First published: March 2018

Production reference: 1270318

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78839-806-0

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the authors

Luca Massaron is a data scientist and marketing research director specialized in multivariate statistical analysis, machine learning, and customer insight, with 10+ years experience of solving real-world problems and generating value for stakeholders using reasoning, statistics, data mining, and algorithms. Passionate about everything on data analysis and demonstrating the potentiality of data-driven knowledge discovery to both experts and non-experts, he believes that a lot can be achieved by understanding in simple terms and practicing the essentials of any discipline.

I would like to thank Yukiko and Amelia for their continued support, help, and loving patience.

Alberto Boschetti is a data scientist with strong expertise in signal processing and statistics. He holds a PhD in telecommunication engineering and lives and works in London. In his work, he faces daily challenges spanning natural language processing, machine learning, and distributed processing. He is very passionate about his job and always tries to stay up to date on the latest development in data science technologies, attending meetups, conferences, and other events.

 

AlexeyGrigorev is a skilled data scientist, machine learning engineer, and software developer with more than 8 years of professional experience. He started his career as a Java developer working at a number of large and small companies, but after a while he switched to data science. Right now, Alexey works as a data scientist at Simplaex, where, in his day-to-day job, he actively uses Java and Python for data cleaning, data analysis, and modeling. His areas of expertise are machine learning and text mining.

I would like to thank my wife, Larisa, and my son, Arkadij, for their patience and support while I was working on the book.

Abhishek Thakur is a data scientist. His focus is mainly on applied machine learning and deep learning, rather than theoretical aspects. He completed his master's in computer science at the University of Bonn in early 2014. Since then, he has worked in various industries, with a research focus on automatic machine learning. He likes taking part in machine learning competitions and has attained a third place in the worldwide rankings on the popular website Kaggle.

 

Rajalingappaa Shanmugamani is currently a deep learning lead at SAP, Singapore. Previously, he worked and consulted at various startups, developing computer vision products. He has a master's from IIT Madras, his thesis having been based on the applications of computer vision in manufacturing. He has published articles in peer-reviewed journals, and spoken at conferences, and applied for a few patents in machine learning. In his spare time, he coaches programming and machine learning to school students and engineers.

I thank my spouse Ezhil, family and friends for their immense support. I thank all the teachers, colleagues, managers and mentors from whom I have learned a lot.

 

About the reviewer

Marvin Bertin is an online course author and technical book editor focused on deep learning, computer vision, and NLP with TensorFlow. He holds a bachelor's in mechanical engineering and a master's in data science. He has worked as an ML engineer and data scientist in the Bay Area, focusing on recommender systems, NLP, and biotech applications. He currently works at a start-up that develops deep learning (AI) algorithms for early cancer detection.

 

 

 

 

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

TensorFlow Deep Learning Projects

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the authors

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Reviews

Recognizing traffic signs using Convnets

The dataset

The CNN network

Image preprocessing

Train the model and make predictions

Follow-up questions

Summary

Annotating Images with Object Detection API

The Microsoft common objects in context

The TensorFlow object detection API

Grasping the basics of R-CNN, R-FCN and  SSD models

Presenting our project plan

Setting up an environment suitable for the project

Protobuf compilation

Windows installation

Unix installation

Provisioning of the project code

Some simple applications

Real-time webcam detection

Acknowledgements

Summary

Caption Generation for Images

What is caption generation?

Exploring image captioning datasets

Downloading the dataset

Converting words into embeddings

Image captioning approaches

Conditional random field

Recurrent neural network on convolution neural network

Caption ranking

Dense captioning

RNN captioning

Multimodal captioning

Attention-based captioning

Implementing a caption generation model

Summary

Building GANs for Conditional Image Creation

Introducing GANs

The key is in the adversarial approach

A cambrian explosion

DCGANs

Conditional GANs

The project

Dataset class

CGAN class

Putting CGAN to work on some examples

MNIST

Zalando MNIST

EMNIST

Reusing the trained CGANs

Resorting to Amazon Web Service

Acknowledgements

Summary

Stock Price Prediction with LSTM

Input datasets – cosine and stock price

Format the dataset

Using regression to predict the future prices of a stock

Long short-term memory – LSTM 101

Stock price prediction with LSTM

Possible follow - up questions

Summary

Create and Train Machine Translation Systems

A walkthrough of the architecture

Preprocessing of the corpora

Training the machine translator

Test and translate

Home assignments

Summary

Train and Set up a Chatbot, Able to Discuss Like a Human

Introduction to the project

The input corpus

Creating the training dataset

Training the chatbot

Chatbox API

Home assignments

Summary

Detecting Duplicate Quora Questions

Presenting the dataset

Starting with basic feature engineering

Creating fuzzy features

Resorting to TF-IDF and SVD features

Mapping with Word2vec embeddings

Testing machine learning models

Building a TensorFlow model

Processing before deep neural networks

Deep neural networks building blocks

Designing the learning architecture

Summary

Building a TensorFlow Recommender System

Recommender systems

Matrix factorization for recommender systems

Dataset preparation and baseline

Matrix factorization

Implicit feedback datasets

SGD-based matrix factorization

Bayesian personalized ranking

RNN for recommender systems

Data preparation and baseline

RNN recommender system in TensorFlow

Summary

Video Games by Reinforcement Learning

The game legacy

The OpenAI version

Installing OpenAI on Linux (Ubuntu 14.04 or 16.04)

Lunar Lander in OpenAI Gym

Exploring reinforcement learning through deep learning

Tricks and tips for deep Q-learning

Understanding the limitations of deep Q-learning

Starting the project

Defining the AI brain

Creating memory for experience replay

Creating the agent

Specifying the environment

Running the reinforcement learning process

Acknowledgements

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

TensorFlow is one of the most popular frameworks used for machine learning and, more recently, deep learning. It provides a fast and efficient framework for training different kinds of deep learning models with very high accuracy. This book is your guide to mastering deep learning with TensorFlow with the help of 12 real-world projects.

TensorFlow Deep Learning Projects starts with setting up the right TensorFlow environment for deep learning. You'll learn to train different types of deep learning models using TensorFlow, including CNNs, RNNs, LSTMs, and generative adversarial networks. While doing so, you will build end-to-end deep learning solutions to tackle different real-world problems in image processing, enterprise AI, and natural language processing, to name a few. You'll train high-performance models to generate captions for images automatically, predict the performance of stocks, and create intelligent chatbots. Some advanced aspects, such as recommender systems and reinforcement learning, are also covered in this book.

By the end of this book, you will have mastered all the concepts of deep learning and their implementation with TensorFlow, and will be able to build and train your own deep learning models with TensorFlow to tackle any kind of problem.

Who this book is for

This book is for data scientists, machine learning and deep learning practitioners, and AI enthusiasts who want a go-to guide to test their knowledge and expertise in building real-world intelligent systems. If you want to master the different deep learning concepts and algorithms associated with it by implementing practical projects in TensorFlow, this book is what you need!

What this book covers

Chapter 1, Recognizing traffic signs using Convnets, shows how to extract the proper features from images with all the necessary preprocessing steps. For our convolutional neural network, we will use simple shapes generated with matplotlib. For our image preprocessing exercises, we will use the Yale Face Database.

Chapter 2, Annotating Images with Object Detection API, details a the building of a real-time object detection application that can annotate images, videos, and webcam captures using TensorFlow's new object detection API (with its selection of pretrained convolutional networks, the so-called TensorFlow detection model zoo) and OpenCV.

Chapter 3, Caption Generation for Images, enables readers to learn caption generation with or without pretrained models.

Chapter 4, Building GANs for Conditional Image Creation, guides you step by step through building a selective GAN to reproduce new images of the favored kind. The used datasets that GANs will reproduce will be of handwritten characters (both numbers and letters in Chars74K).

Chapter 5, Stock Price Prediction with LSTM, explores how to predict the future of a mono-dimensional signal, a stock price. Given its past, we will learn how to forecast its future with an LSTM architecture, and how we can make our prediction's more and more accurate.

Chapter 6, Create and Train Machine Translation Systems, shows how to create and train a bleeding-edge machine translation system with TensorFlow.

Chapter 7, Train and Set up a Chatbot, Able to Discuss Like a Human, tells you how to build an intelligent chatbot from scratch and how to discuss with it.

Chapter 8, Detecting Duplicate Quora Questions, discusses methods that can be used to detect duplicate questions using the Quora dataset. Of course, these methods can be used for other similar datasets.

Chapter 9, Building a TensorFlow Recommender System, covers large-scale applications with practical examples. We'll learn how to implement cloud GPU computing capabilities on AWS with very clear instructions. We'll also utilize H2O's wonderful API for deep networks on a large scale.

Chapter 10, Video Games by Reinforcement Learning, details a project where you build an AI capable of playing Lunar Lander by itself. The project revolves around the existing OpenAI Gym project and integrates it using TensorFlow. OpenAI Gym is a project that provides different gaming environments to explore how to use AI agents that can be powered by, among other algorithms, TensorFlow neural models. 

To get the most out of this book

The examples covered in this book can be run with Windows, Ubuntu, or Mac. All the installation instructions are covered. You will need basic knowledge of Python, machine learning and deep learning, and familiarity with TensorFlow.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packtpub.com

.

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/TensorFlow-Deep-Learning-Projects. We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "The class TqdmUpTo is just a tqdm wrapper that enables the use of the progress display also for downloads."

A block of code is set as follows:

import numpy as npimport urllib.requestimport tarfileimport osimport zipfileimport gzipimport osfrom glob import globfrom tqdm import tqdm

Any command-line input or output is written as follows:

epoch 01: precision: 0.064

epoch 02: precision: 0.086

epoch 03: precision: 0.106

epoch 04: precision: 0.127

epoch 05: precision: 0.138

epoch 06: precision: 0.145

epoch 07: precision: 0.150

epoch 08: precision: 0.149

epoch 09: precision: 0.151

epoch 10: precision: 0.152

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Recognizing traffic signs using Convnets

As the first project of the book, we'll try to work on a simple model where deep learning performs very well: traffic sign recognition. Briefly, given a color image of a traffic sign, the model should recognize which signal it is. We will explore the following areas:

How the dataset is composed

Which deep network to use

How to pre-process the images in the dataset

How to train and make predictions with an eye on performance

The dataset

Since we'll try to predict some traffic signs using their images, we will use a dataset built for the same purpose. Fortunately, researchers of Institute für Neuroinformatik, Germany, created a dataset containing almost 40,000 images, all different and related to 43 traffic signs. The dataset we will use is part of a competition named German Traffic Sign Recognition Benchmark (GTSRB), which attempted to score the performance of multiple models for the same goal. The dataset is pretty old—2011! But it looks like a nice and well-organized dataset to start our project from.

The dataset used in this project is freely available at http://benchmark.ini.rub.de/Dataset/GTSRB_Final_Training_Images.zip.Before you start running the code, please download the file and unpack it in the same directory as the code. After decompressing the archive, you'll have a new folder, named GTSRB, containing the dataset. The authors of the book would like to thank those who worked on the dataset and made it open source.  Also, refer http://cs231n.github.io/convolutional-networks/ to learn more about CNN.

Let's now see some examples:

"Speed limit 20 km/h":

"go straight or turn right":

"roundabout":

As you can see, the signals don't have a uniform brightness (some are very dark and some others are very bright), they're different in size, the perspective is different, they have different backgrounds, and they may contain pieces of other traffic signs.

The dataset is organized in this way: all the images of the same label are inside the same folder. For example, inside the path GTSRB/Final_Training/Images/00040/, all the images have the same label, 40. For the images with another label, 5, open the folder GTSRB/Final_Training/Images/00005/. Note also that all the images are in PPM format, a lossless compression format for images with many open source decoders/encoders.

The CNN network

For our project, we will use a pretty simple network with the following architecture:

In this architecture, we still have the choice of:

The number of filters and kernel size in the 2D convolution

The kernel size in the

Max pool

The number of units in the

Fully Connected

layer

The batch size, optimization algorithm, learning step (eventually, its decay rate), activation function of each layer, and number of epochs

Follow-up questions

Try adding/removing some CNN layers and/or fully connected layers. How does the performance change?

This simple project is proof that dropouts are necessary for regularization. Change the dropout percentage and check the overfitting-underfitting in the output.

Now, take a picture of multiple traffic signs in your city, and test the trained model in real life!

Summary

In this chapter, we saw how to recognize traffic signs using a convolutional neural network, or CNN. In the next chapter, we'll see something more complex that can be done with CNNs.

Annotating Images with Object Detection API

Computer vision has made great leaps forward in recent years because of deep learning, thus granting computers a higher grade in understanding visual scenes. The potentialities of deep learning in vision tasks are great: allowing a computer to visually perceive and understand its surroundings is a capability that opens the door to new artificial intelligence applications in both mobility (for instance, self-driving cars can detect if an appearing obstacle is a pedestrian, an animal or another vehicle from the camera mounted on the car and decide the correct course of action) and human-machine interaction in everyday-life contexts (for instance, allowing a robot to perceive surrounding objects and successfully interact with them).

After presenting ConvNets and how they operate in the first chapter, we now intend to create a quick, easy project that will help you to use a computer to understand images taken from cameras and mobile phones, using images collected from the Internet or directly from your computer's webcam. The goal of the project is to find the exact location and the type of the objects in an image.

In order to achieve such classification and localization, we will leverage the new TensorFlow object detection API, a Google project that is part of the larger TensorFlow models project which makes a series of pre-trained neural networks available off-the-shelf for you to wrap up in your own custom applications.

In this chapter, we are going to illustrate the following:

The advantages of using the right data for your project

A brief presentation of the TensorFlow object detection API

How to annotate stored images for further use

How to visually annotate a video using

moviepy

How to go real-time by annotating images from a webcam

The Microsoft common objects in context

Advances in application of deep learning  in computer vision are often highly focalized on the kind of classification problems that can be summarized by challenges such as ImageNet (but also, for instance, PASCAL VOC - http://host.robots.ox.ac.uk/pascal/VOC/voc2012/) and the ConvNets suitable to crack it (Xception, VGG16, VGG19, ResNet50, InceptionV3, and MobileNet, just to quote the ones available in the well-known package Keras: https://keras.io/applications/).

Though deep learning networks based on ImageNet data are the actual state of the art,  such networks can experience difficulties when faced with real-world applications. In fact, in practical applications, we have to process images that are quite different from the examples provided by ImageNet. In ImageNet the elements to be classified are clearly the only clear element present in the image, ideally set in an unobstructed way near the center of a neatly composed photo. In the reality of images taken from the field, objects are randomly scattered around, in often large number.  All these objects are also quite different from each other, creating sometimes confusing settings. In addition, often objects of interest cannot be clearly and directly perceived because they are visually obstructed by other potentially interesting objects.

Please refer to the figure from the following mentioned reference:

Figure 1: A sample of images from ImageNet: they are arranged in a hierarchical structure, allowing working with both general or more specific classes.

SOURCE: DENG, Jia, et al. Imagenet: A large-scale hierarchical image database. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009. p. 248-255.

Realistic images contain multiple objects that sometimes can hardly be distinguished from a noisy background. Often you really cannot create interesting projects just by labeling an image with a tag simply telling you the object was recognized with the highest confidence.

In a real-world application, you really need to be able to do the following:

Object classification of single and multiple instances when recognizing various objects, often of the same class

Image localization, that is understanding where the objects are in the image

Image segmentation,  by marking each pixel in the images with a label: the type of object or background in order to be able to cut off 

interesting parts from the background.

The necessity to train a ConvNet to be able to achieve some or all of the preceding mentioned objectives led to the creation of the Microsoft common objects in context (MS COCO) dataset, as described in the paper: LIN, Tsung-Yi, et al. Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Cham, 2014. p. 740-755. (You can read the original paper at the following link: https://arxiv.org/abs/1405.0312.) This dataset is made up of 91 common object categories, hierarchically ordered, with 82 of them having more than 5,000 labeled instances. The dataset totals 2,500,000 labeled objects distributed in 328,000 images.

Here are the classes that can be recognized in the MS COCO dataset:

{1: 'person', 2: 'bicycle', 3: 'car', 4: 'motorcycle', 5: 'airplane', 6: 'bus', 7: 'train', 8: 'truck', 9: 'boat', 10: 'traffic light', 11: 'fire hydrant', 13: 'stop sign', 14: 'parking meter', 15: 'bench', 16: 'bird', 17: 'cat', 18: 'dog', 19: 'horse', 20: 'sheep', 21: 'cow', 22: 'elephant', 23: 'bear', 24: 'zebra', 25: 'giraffe', 27: 'backpack', 28: 'umbrella', 31: 'handbag', 32: 'tie', 33: 'suitcase', 34: 'frisbee', 35: 'skis', 36: 'snowboard', 37: 'sports ball', 38: 'kite', 39: 'baseball bat', 40: 'baseball glove', 41: 'skateboard', 42: 'surfboard', 43: 'tennis racket', 44: 'bottle', 46: 'wine glass', 47: 'cup', 48: 'fork', 49: 'knife', 50: 'spoon', 51: 'bowl', 52: 'banana', 53: 'apple', 54: 'sandwich', 55: 'orange', 56: 'broccoli', 57: 'carrot', 58: 'hot dog', 59: 'pizza', 60: 'donut', 61: 'cake', 62: 'chair', 63: 'couch', 64: 'potted plant', 65: 'bed', 67: 'dining table', 70: 'toilet', 72: 'tv', 73: 'laptop', 74: 'mouse', 75: 'remote', 76: 'keyboard', 77: 'cell phone', 78: 'microwave', 79: 'oven', 80: 'toaster', 81: 'sink', 82: 'refrigerator', 84: 'book', 85: 'clock', 86: 'vase', 87: 'scissors', 88: 'teddy bear', 89: 'hair drier', 90: 'toothbrush'}

Though the ImageNet dataset can present 1,000 object classes (as described at https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a) distributed in 14,197,122 images,  MS COCO offers the peculiar feature of multiple objects distributed in a minor number of images (the dataset has been gathered using Amazon Mechanical Turk, a somehow more costly approach but shared by ImageNet, too). Given such premises, the MS COCO images can be considered very good examples of contextual relationships and non-iconic object views, since objects are arranged in realistic positions and settings. This can be verified from this comparative example taken from the MS COCO paper previously mentioned:

Figure 2: Examples of iconic and non-iconic images. SOURCE: LIN, Tsung-Yi, et al. Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Cham, 2014. p. 740-755.