E-Book
35,99 €

Hands-On Machine Learning on Google Cloud Platform E-Book

Giuseppe Ciaburro

0,0

35,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Unleash Google's Cloud Platform to build, train and optimize machine learning models

Key Features

Get well versed in GCP pre-existing services to build your own smart models
A comprehensive guide covering aspects from data processing, analyzing to building and training ML models
A practical approach to produce your trained ML models and port them to your mobile for easy access

Book Description

Google Cloud Machine Learning Engine combines the services of Google Cloud Platform with the power and flexibility of TensorFlow. With this book, you will not only learn to build and train different complexities of machine learning models at scale but also host them in the cloud to make predictions.

This book is focused on making the most of the Google Machine Learning Platform for large datasets and complex problems. You will learn from scratch how to create powerful machine learning based applications for a wide variety of problems by leveraging different data services from the Google Cloud Platform. Applications include NLP, Speech to text, Reinforcement learning, Time series, recommender systems, image classification, video content inference and many other. We will implement a wide variety of deep learning use cases and also make extensive use of data related services comprising the Google Cloud Platform ecosystem such as Firebase, Storage APIs, Datalab and so forth. This will enable you to integrate Machine Learning and data processing features into your web and mobile applications.

By the end of this book, you will know the main difficulties that you may encounter and get appropriate strategies to overcome these difficulties and build efficient systems.

What you will learn

Use Google Cloud Platform to build data-based applications for dashboards, web, and mobile
Create, train and optimize deep learning models for various data science problems on big data
Learn how to leverage BigQuery to explore big datasets
Use Google’s pre-trained TensorFlow models for NLP, image, video and much more
Create models and architectures for Time series, Reinforcement Learning, and generative models
Create, evaluate, and optimize TensorFlow and Keras models for a wide range of applications

Who this book is for

This book is for data scientists, machine learning developers and AI developers who want to learn Google Cloud Platform services to build machine learning applications. Since the interaction with the Google ML platform is mostly done via the command line, the reader is supposed to have some familiarity with the bash shell and Python scripting. Some understanding of machine learning and data science concepts will be handy

Giuseppe Ciaburro holds a PhD in environmental technical physics and two master's degrees. His research is on machine learning applications in the study of urban sound environments. He works at Built Environment Control Laboratory, Università degli Studi della Campania Luigi Vanvitelli (Italy). He has over 15 years' experience in programming Python, R, and MATLAB, first in the field of combustion, and then in acoustics and noise control. He has several publications to his credit. V Kishore Ayyadevara has over 9 years' experience of using analytics to solve business problems and setting up analytical work streams through his work at American Express, Amazon, and, more recently, a retail analytics consulting startup. He has an MBA from IIM Calcutta and is also an electronics and communications engineer. He has worked in credit risk analytics, supply chain analytics, and consulting for multiple FMCG companies to identify ways to improve their profitability. Alexis Perrier is a data science consultant with experience in signal processing and stochastic algorithms. He holds a master's in mathematics from Université Pierre et Marie Curie Paris VI and a PhD in signal processing from Télécom ParisTech. He is actively involved in the DC data science community. He is also an avid book lover and proud owner of a real chalk blackboard, where he regularly shares his fascination of mathematical equations with his kids.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 496

Veröffentlichungsjahr: 2018

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Hands-On Machine Learning on Google Cloud Platform

Implementing smart and efficient analytics using Cloud ML Engine

Giuseppe Ciaburro

V Kishore Ayyadevara

Alexis Perrier

BIRMINGHAM - MUMBAI

Hands-On Machine Learning on Google Cloud Platform

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Sunith ShettyAcquisition Editor: Tushar GuptaContent Development Editor: Cheryl DsaTechnical Editor: Dinesh PawarCopy Editor: Vikrant PhadkayProject Coordinator: Nidhi JoshiProofreader: Safis EditingIndexer: Mariammal ChettiyarGraphics:Tania DuttaProduction Coordinator: Arvindkumar Gupta

First published: April 2018

Production reference: 1260418

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78839-348-5

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the authors

V Kishore Ayyadevara has over 9 years' experience of using analytics to solve business problems and setting up analytical work streams through his work at American Express, Amazon, and, more recently, a retail analytics consulting startup. He has an MBA from IIM Calcutta and is also an electronics and communications engineer. He has worked in credit risk analytics, supply chain analytics, and consulting for multiple FMCG companies to identify ways to improve their profitability.

Alexis Perrier is a data science consultant with experience in signal processing and stochastic algorithms. He holds a master's in mathematics from Université Pierre et Marie Curie Paris VI and a PhD in signal processing from Télécom ParisTech. He is actively involved in the DC data science community. He is also an avid book lover and proud owner of a real chalk blackboard, where he regularly shares his fascination of mathematical equations with his kids.

About the reviewers

Mikhail Berlyant is a data warehousing veteran. He has been a data developer since the late 1970s. Since 2000, he has led data systems, data mining, and data warehouse teams at Yahoo! and Myspace.

He is a Google Cloud expert and senior VP of Technology, at Viant Inc., a people-based advertising tech company that enables marketers to plan, execute, and measure their digital media investments through a cloud-based platform. At Viant, he led the migration of a petabyte-sized data warehouse to Google Cloud. He is currently focusing on self-serve/productivity tools for BigQuery/GCP.

I'd like to say thanks to my beautiful wife, Svetlana, for supporting me in all my endeavors.

Sanket Thodge is an entrepreneur by profession in Pune, India. He is an author of Cloud Analytics with Google Cloud Platform. He founded Pi R Square Digital Solutions. With expertise as a Hadoop developer, he has explored the cloud, IoT, machine learning, and blockchain. He has also applied for a patent in IoT and has worked with numerous startups and MNCs, providing consultancy, architecture building, development, and corporate training across the globe.

Antonio Gulli is a transformational software executive and business leader with a passion for establishing and managing global technological talent for innovation and execution. He is an expert in search engines, online services, machine learning, and so on. Currently, he is a site lead and director of cloud at Google Warsaw, driving European efforts for serverless, Kubernetes, and Google Cloud UX. Antonio has filed for 20+ patents, published multiple academic papers, and served as a senior PC member in multiple international conferences.

Chirag Nayyar is helping organizations to migrate their workload from on-premise to the public cloud. He has experience in web app migration, SAP workload on the cloud, and EDW. He is currently working at Cloud Kinetics Technology Solutions. He holds a wide range of certifications from all major public cloud platforms. He also runs meetups and is a regular speaker at various cloud events.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Title Page

Hands-On Machine Learning on Google Cloud Platform

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the authors

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Introducing the Google Cloud Platform

ML and the cloud

The nature of the cloud

Public cloud

Managed cloud versus unmanaged cloud

IaaS versus PaaS versus SaaS

Costs and pricing

Introducing the GCP

Mapping the GCP

Getting started with GCP

Project-based organization

Creating your first project

Roles and permissions

Preface

Google Cloud ML Engine combines the services of Google Cloud Platform with the power and flexibility of TensorFlow. With this book, you will not only learn how to build and train different complexities of machine learning models at scale, but also to host them in the cloud to make predictions.

This book is focused on making the most of the Google Machine Learning Platform for large datasets and complex problems. You will learn how to create powerful machine-learning-based applications from scratch for a wide variety of problems by leveraging different data services from the Google Cloud Platform. Applications include NLP, speech-to-text, reinforcement learning, time series, recommender systems, image classification, video content inference, and many others. We will implement a wide variety of deep learning use cases and will also make extensive use of data-related services comprising the Google Cloud Platform ecosystem, such as Firebase, Storage APIs, Datalab, and so forth. This will enable you to integrate machine learning and data processing features into your web and mobile applications.

By the end of this book, you will be aware of the main difficulties that you may encounter, and be familiar with appropriate strategies to overcome these difficulties and build efficient systems.

Who this book is for

This book is for data scientists, machine learning developers, and AI developers who want to learn Google Cloud Platform services to build machine learning applications. Since interaction with the Google ML platform is mostly done via the command line, the reader should have some familiarity with the bash shell and Python scripting. Some understanding of machine learning and data science concepts will also be handy.

What this book covers

Chapter 1, Introducing the Google Cloud Platform, explores different services that may be useful to build a machine learning pipeline based on GCP.

Chapter 2, Google Compute Engine, helps you to create and fully manage your VM via both the online console and command-line tools, as well as how to implement a data science workflow and a Jupyter Notebook workspace.

Chapter 3, Google Cloud Storage, shows how to upload data and manage it using the services provided by the Google Cloud Platform.

Chapter 4, Querying Your Data with BigQuery, shows you how to query data from Google Storage and visualize it with Google Data Studio.

Chapter 5, Transforming Your Data, presents Dataprep, a service useful for preprocessing data, extracting features, and cleaning up records. We also look at Dataflow, a service used to implement streaming and batch processing.

Chapter 6, Essential Machine Learning, starts our journey into machine learning and deep learning; we learn when to apply each one.

Chapter 7, Google Machine Learning APIs, teaches us how to use Google Cloud machine learning APIs for image analysis, text and speech processing, translation, and video inference.

Chapter 8, Creating ML Applications with Firebase, shows how to integrate different GCP services to build a seamless machine-learning-based application, mobile or web-based.

Chapter 9, Neural Networks with TensorFlow and Keras, gives a good understanding of the structure and key elements of a feedforward network, how to architecture one, and how to tinker and experiment with different parameters.

Chapter 10, Evaluating Results with TensorBoard, shows how the choice of different parameters and functions impacts the performance of the model.

Chapter 11, Optimizing the Model through Hyperparameter Tuning, teaches us how to use hypertuning in TensorFlow application code and interpret the results to select the best performing model.

Chapter 12, Preventing Overfitting with Regularization, shows how to identify overfitting and make our models more robust to previously unseen data by setting the right parameters and defining the proper architectures.

Chapter 13, Beyond Feedforward Networks – CNN and RNNs, teaches which type of neural network to apply to different problems, and how to define and implement them on GCP.

Chapter 14, Time Series with LSTMs, shows how to create LSTMs and apply them to time series predictions. We will also understand when LSTMs outperform more standard approaches.

Chapter 15, Reinforcement Learning, introduces the power of reinforcement learning and shows how to implement a simple use case on GCP.

Chapter 16, Generative Neural Networks, teaches us how to extract the content generated within the neural net with different types of content—text, images, and sounds.

Chapter 17, Chatbots, shows how to train a contextual chatbot while implementing it in a real mobile application.

To get the most out of this book

In this book, machine learning algorithms are implemented on the Google Cloud Platform. To reproduce the many examples in this book, you need to possess a working account on GCP. We have used Python 2.7 and above to build various applications. In that spirit, we have tried to keep all of the code as friendly and readable as possible. We feel that this will enable our readers to easily understand the code and readily use it in different scenarios.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

www.packtpub.com

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

Enter the name of the book in the

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Machine-Learning-on-Google-Cloud-Platform. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/HandsOnMachineLearningonGoogleCloudPlatform_ColorImages.pdf.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Introducing the Google Cloud Platform

The goal of this first introductory chapter is to give you an overview of the Google Cloud Platform (GCP). We start by explaining why machine learning (ML) and cloud computing go hand in hand as the demand for ever more hungry computing resources grows for today's ML applications. We then proceed with a 360° presentation of the platform's data-related services. Account and project creation as well as role allocation close the chapter.

A data science project follows a regular set of steps: in extracting the data, exploring, cleaning it, extracting information, training and assessing models, and finally building machine-learning-enabled applications. For each step of the data science flow, there are one or several services in the GCP that are adequate.

But, before we present the overall mapping of the GCP data-related services, it is important to understand why ML and cloud computing are truly made for each other.

In this chapter, we will cover the following topics:

ML and the cloud

Introducing the GCP

Data services of the Google platform

ML and the cloud

In short, artificial intelligence (AI) requires a lot of computing resources. Cloud computing addresses those concerns.

ML is a new type of microscope and telescope, allowing each of to us to push the boundaries of human knowledge and human activities. With ever more powerful ML platforms and open tools, we are able to conquer new realms of knowledge and grow new types of businesses. From the comfort of our laptops, at home, or at the office, we can better understand and predict human behavior in a wide range of domains. Think health care, transportation, energy, financial markets, human communication, human-machine interaction, social network dynamics, economic behavior, and nature (astronomy, global warming, or seismic activity). The list of domains affected by the explosion of AI is truly unlimited. The impact on society? Astounding.

With so many resources available to anyone with an online connection, the barrier to joining the AI revolution has never been lower than it is now. Books, tutorials, MOOCs, and meet-ups, as well as open source libraries in a myriad of languages, are freely available to both the seasoned and the beginner data scientist.

As veteran data scientists know well, data science is always hungry for more computational resources. Classification on the Iris or the MINST image datasets or predictive modeling on Titanic passengers does not reflect real-world data. Real-world data is by essence dirty, incomplete, noisy, multi-sourced, and more often than not, in large volumes. Exploiting these large datasets requires computational power, storage, CPUs, GPUs, and fast I/O.

However, more powerful machines are not sufficient to build meaningful ML applications. Grounded in science, data science requires a scientific mindset with concepts such as reproducibility and reviewing. Both aspects are made easier by working with online accessible resources. Sharing datasets and models and exposing results is always more difficult when the data lives on one person's computer. Reproducing results and maintaining models with new data also requires easy accessibility to assets. And as we work on ever more personalized and critical data (for instance in healthcare), privacy and security concerns become all the more important to the project stakeholders.

This is where the cloud comes in, by offering scalability and accessibility while providing an adequate level of security.

Before diving into GCP, let's learn a bit more about the cloud.

The nature of the cloud

ML projects are resource intensive. From storage to computational power, training models sometimes require resources that cannot be found on a simple standalone computer. Physical limitations in terms of storage have shrunk in recent years. As we now enjoy reliable terabyte storage accessible at reduced prices, storage is no longer an issue for most data projects that are not in the realm of big data. Computing power has also increased so much that what required expensive workstations a few years ago can now run on laptops.

However, despite all this amazingly rapid evolution, the power of the standalone PC is finite. There is an upper limit to the volume of data you can store on your machine and to the time you're willing to wait to get your model trained. New frontiers in AI, with speech-to-text, video captioning in real time, self-driving cars, music generation, or chatbots that can fool a human being and pass the turing test, require ever larger resources. This is especially true of deep learning models, which are too slow on standard CPUs and require GPU-based machines to train in a reasonable amount of time.

ML in the cloud does not face these limitations. What you get with cloud computing is direct access to high-performance computing (HPC). Before the cloud (roughly before AWS launched its Elastic Computing Cloud (EC2) service in 2006), HPC was only available via supercomputers, such as the Cray computers. Cray is a US company that has built some of the most powerful supercomputers since the 1960s. China's Tianhe-2 is now the most powerful supercomputer in the world, with a capacity of 100,000 petaflops (that's 102 x 1015, or 10 to the power of 17 floating-point operations per second!).

A supercomputer not only costs millions of US dollars but also requires its own physical infrastructure and has huge maintenance costs. It is also out of reach for individuals and for most companies. Engineers and researchers, hungry for HPC, now turn to on-demand cloud infrastructures. Cloud service offers are democratizing access to HPC.

Computing in the cloud is built on a distributed architecture. The processors are distributed across different servers instead of being aggregated in one single machine. With a few clicks or command lines, anyone can sign up massively complex banks of servers in a matter of minutes. The amount of power at your command can be mind-blowing.

Cloud computing can not only handle the most demanding optimization tasks but also carry out a simple regression on a tiny dataset. Cloud computing is extremely flexible.

To recap, cloud computing offers:

Instantaneity

: Resources can be made available in a matter of minutes.

On-demand

: Instances can be put on stand by or decommissioned when no longer needed.

Diversity

: The wide range of operating systems, storage, and database solutions, allow the architect to create project-focused architectures, from simple mobile applications to ML APIs.

Unlimited resources

: If not infinite yet, the volume of resources for storage computing and networks you can assemble is mind-blowing.

GPUs

: Most PCs are based on CPUs (with the exception of machines optimized for gaming). Deep learning requires GPUs to achieve human-compatible speeds for training models. Cloud computing makes GPUs available at a fraction of the cost needed to buy GPU machines.

Controlled accessibility and security

: With granular role definitions, service compartmentalization, encrypted connections, and user-based access control, cloud platforms greatly reduce the risk of intrusion and data loss.

Apart from these, there are several other types of cloud platforms and offers on the market.

Public cloud

There are two main types of cloud models depending on the needs of the customers: public versus private and multi-tenant versus single-tenant. These different cloud types offer different levels of management, security, and pricing.

A public cloud consists of resources that are located off-site over the internet. In a public cloud, the infrastructure is typically multi-tenant. Multiple customers can share the same underlying hardware or server. Resources such as networking, storage, power, cooling and computing are all shared. The customer usually has no visibility of where this infrastructure is hosted except for choosing a geographic region. The pricing mode of a public cloud service is based on the volume of data, the computing power that is used and other infrastructure-management-related services—or, more precisely, a mix of RAM, vCPUs, disk, and bandwidth.

In a private cloud, the resources are dedicated to a single customer; the architecture is single-tenant instead of multi-tenant. The servers are located on premise or in a remote data center. Customers own (or rent) the infrastructure and are responsible for maintaining it. Private cloud infrastructures are more expensive to operate as they require dedicated hardware to be secured for a single tenant. Customers of the private cloud have more control over their infrastructure, and therefore they can achieve their compliance and security requirements.

Hybrid clouds are composed of a mix of public clouds and private ones.

The GCP is a public multi-tenant cloud platform. You share the servers you use with other customers and let Google handle the support, the data centers, and the infrastructure.

Managed cloud versus unmanaged cloud

The cloud market has also diversified into two large segments—managed cloud versus unmanaged cloud.

In an unmanaged cloud platform, the infrastructure is self-served. In case of failure, it is the responsibility of the customer to have some mechanisms in place to restore the operations. Unmanaged cloud requires the customer to have the qualified expertise and resources to build, manage, and maintain cloud instances and infrastructures. Focused on self-serving applications, unmanaged cloud offers do not include support with their basic tiers.

In a managed cloud platform, the provider will support the underlying infrastructure by offering monitoring, troubleshooting, and around-the-clock customer service. Managed cloud brings along qualified expertise and resources to the team right away. For many companies, having a service provider to handle their public cloud can be easier and more cost-effective than hiring their own staff to operate their clouds.

The GCP is a public, multi-tenant, and unmanaged cloud service. So are AWS and Azure. Rackspace, on the other hand, is an example of a managed cloud service company. As an example, Rackspace just started offering managed services for GCP in March 2017.

IaaS versus PaaS versus SaaS

Another important distinction is to be made with respect to the amount of work done by the user or by the cloud platform provider. Let us take a look at this distinction with the help of the following service levels:

Infrastructure as a Service (IaaS)

: At the minimum level, IaaS, the cloud provider, handles the machines, their virtualization and the required networking. The user is responsible for everything else—OS, middleware, data, and application software. The provider is the host of the resources on which the user builds the infrastructure. Google compute Engine, SQL, DNS, or load balancing are examples of IaaS services within the GCP.

Platform as a service (PaaS)

: In a PaaS offering, the user is only responsible for the software and the data. Everything else is handled by the cloud provider. The provider builds the infrastructure while the user deploys the software. The main advantage of PaaS over IaaS, besides the reduced workload and need for sysadmin resources, is the automatic scaling for web applications. The appropriate number of resources are automatically allocated as demand fluctuates. Examples of PaaS services include Heroku or the Google App Engine.

Software as a service (SaaS)

: In SaaS, the provider is a software company offering services online while the user consumes the service that are provided. Think Uber, Facebook, or Gmail.

While being mostly an IaaS provider, the GCP also has some PaaS offerings such as the Google App Engine. And its ML APIs (text, speech, video, and image) can be considered as SaaS.

Costs and pricing

Pricing of cloud services is complicated and varies across vendors. Basic cost structure of a cloud service can be broken down into:

Computing costs

: The duration of running VMs per number of vCPUs, per GB of RAM

Storage costs

: Disks, files, and databases per GB

Networking costs

: internal and external, inbound and outbound traffic

Google's preemptible VMs (AWS Spot instances) are VMs that are built on leftover, unused capacity and priced three to four times lower than normal on-demand VMs. However, Compute Engine may terminate (preempt) these instances if it requires access to those resources for other tasks. Preemptible instances are adapted to batch processing jobs or workflows that can withstand sudden interruptions. They may also not always be available. In the next chapter, we learn how to launch preemptible instances from the command line.

Google cloud also recently introduced price reduction for committed use. You get a discount when you reserve instances for a long period of time, typically committing to a usage term of 1 year or 3 years.

The argument of cost cutting when moving to the cloud holds when your infrastructure is evolving quickly and requires scalability and rapid modifications. If your applications are very static with stable load, the cloud may not result in lower costs. In the end, as the cloud offers much more flexibility and opens the way to implementing new projects quickly, the overall cost is higher than with a fixed infrastructure. But this flexibility is the true benefit of cloud computing.

See https://cloud.google.com/compute/pricing for the current Google Compute Engine pricing.

Price war The costs of cloud services have dwindled in the past several years. The three major public cloud actors have gone through successive phases of price reduction since 2012, when AWS drastically reduced its storage prices to undermine the competition. The four main cloud providers reduced their prices 22 times in 2012 and 26 times in 2013. Reductions ranged from 6% to 30% and touched all types of services: computing, storage, bandwidth, and databases. As of January 2014, Amazon had reduced the price of their offerings over 40 times. These reductions have been matched or exceeded by the other main cloud service providers. Recently, the three main actors have further reduced their prices on storage, possibly reigniting the price war. According to a recent study of cloud computing prices, there isn't much data suggesting that cloud is anywhere near a commodity yet. 451 research said so, further predicting that relational databases are likely to be the next price war battleground.

ML

So, near-instant availability, low cost, flexible architecture, and near-unlimited resources are the advantages of cloud computing, at the expense of extra overhead and recurring costs.

In the global landscape of cloud computing, the GCP is a public unmanaged IaaS cloud offering, with some PaaS and SaaS services. Although Azure and GCP are directly comparable for standard cloud services such as from computing (EC2, Cloud Compute, and so on), databases (BigQuery, Redshift, and so on), network, and so forth; the Google Cloud approach to ML is quite different than Amazon's or Azure's.

In short, AWS offers, either all-in-one services for very specific applications—face recognition and Alexa-related applications, or a predictive analytics platform based on classic (not deep learning) models called Amazon ML. Microsoft's offer is more PaaS centered, with its Cortana Intelligence Suite. Microsoft's ML service is quite similar to AWS's, with more available models.

The GCP ML offer is based on TensorFlow, Google's deep learning library. Google offers a wide range of ML APIs based on pre-trained TensorFlow models for NLP, speech-to-text, translation, image, and video processing. It also offers a platform where you can train your own TensorFlow models and evaluate them (TensorBoard).

Introducing the GCP

The first cloud computing service dates back to 15 years ago, when, in July 2002, Amazon launched the AWS platform to expose technology and product data from Amazon and its affiliates, enabling developers to build innovative and entrepreneurial applications on their own. In 2006, AWS was relaunched as the EC2.

The early start of AWS gave Amazon a lead in cloud computing, one that has never faltered since. Competitors were slow to counteract and launch their own offers. The first alternative to the AWS cloud services from a major company came with the Google App Engine launched in April 2008 as a PaaS service for developing and hosting web applications. The GCP was thus born. Microsoft and IBM followed, with the Windows Azure platform launched in February 2010 and LotusLive in January 2009.

Google didn’t enter the IaaS market until much later. In 2013, Google released the Compute Engine to the general public with enterprise service-level agreements (SLA).

Mapping the GCP

With over 40 different IaaS, PaaS, and SaaS services, the GCP ecosystem is rich and complex. These services can be grouped into six different categories:

Hosting and computation

Storage and databases

Networking

Identity and security

Resource management and monitoring

In the following section, we learn how to set up and manage a single VM instance on Google Compute Engine. But, before that, we need to create our account.

Getting started with GCP

Getting started on the GCP is pretty much straightforward. All you really need is a Google account. Go to https://cloud.google.com/, log in with your Google account, and follow the instructions. Add your billing information as needed. This gives you access to the web-based UI of the GCP. We'll cover command line and shell accessibility and related SSH key creation in the next chapter.

Free trialsAt the time of writing this, Google has a pretty generous free trial offer with a 12-month period and a credit of $300 for new accounts. There are, however, limitations on some services. For instance, you cannot launch the Google Compute Engine VM instances with more than eight CPUs and you are limited in the number of projects you create, though you can request more than your allocated quota. There is no SLA. Using Google Cloud services for activities such as bitcoin mining is not allowed. Once you upgrade your account, these limitations no longer apply and the money left out of the initial $300 is credited to your account. More information on the free trial offer is available at https://cloud.google.com/free/docs/frequently-asked-questions.

Project-based organization

One key aspect of the GCP is its project-centered organization. All billing, permissions, resources, and settings are grouped within a user-defined project, which basically acts as a global namespace. It is simply not possible to launch a resource without specifying the project it belongs to first.

Each one of these projects has:

A project name, which you choose.

A project ID, suggested by GCP but editable. The project ID is used by API calls and resources within the project.

A project number, which is provided by the GCP.

Both the project ID and project numbers are unique across all GCP projects. The project organization has several straightforward benefits:

As resources are dedicated to a single project, budget allocation and billing are simplified

As the resources allocated to a project are subject to the same regions-and-zones rules and share the same metadata, operations and communications between them work seamlessly

Similarly, access management is coherent across a single project, limiting the overall complexity of access control

Project-based organization greatly simplifies the management of your resources and is a key aspect of what makes the GCP quite easy to work with.

Creating your first project

To create a new project:

Go to the resource management page,

https://console.cloud.google.com/cloud-resource-manager

Click on

CREATE PROJECT.

Write down your project title and notice how Google generates a project ID on the fly. Edit it as needed.

Click on

Create.

You are redirected to the

Role

section of the IAM service.

Roles and permissions

By default, when you create a new project, your Google account is set as the owner of the project with full permissions and access across all the project's resources and billing. In the roles section of the IAM page, https://console.cloud.google.com/iam-admin/roles/, you can add people to your project and define the role for that person. You can also create new custom roles on a service-by-service basis or allocate predefined roles organized by the services.

Go to the IAM page and select the project you just created, if it's not already selected:

https://console.cloud.google.com/iam-admin/iam/project

. You should see your Google account email as the owner of the project.

To add a new person to the project:

Click on

+ ADD

Input the person's Google account email (it has to correspond to an active Google account).

Select all the roles for that person, as shown in the following screenshot:

The role menu is organized by services and administrative domain (billing, logging, and monitoring), and for each service, by level of access. Although this differs depending on the service, you can roughly choose between four types of roles:

Admin

: Full control over the resources

Client

: Connectivity access

Editor/creator

: Full control except for user management, SSL certificates, and deleting instances

Viewer

: Read-only access

You can also create new custom made roles from the roles IAM page at https://console.cloud.google.com/iam-admin/roles/project?project=packt-gcp.

As you allocate new resources to your project, the platform creates the adequate and required roles and permissions between the services. You can view and manage these access permissions and associated roles from the info panel on the right of the manage resource page or the IAM page for the given project. Google does a great job of generating the right access levels, which makes the platform-user's life easier.

Our Google Cloud projectFor this book I created the packt-gcp project. Since the name was unique across all other GCP projects, the project ID is also packt-gcp. And all the resources are created in the us-central1 zone.

Summary

In this introductory chapter, we looked at the nature of the GCP and explored its services architecture. We created a new project and understood role creation and allocation. Although a new entrant on the cloud computing market, the GCP offers a complete set of services for a wide range of applications. We study these services in depth in the rest of this book.

We are now ready to get started with data science on the Google platform. In the next chapter, we'll create a VM instance on Google Compute Engine and install a data science Python stack with the Anaconda distribution. We'll explore the web UI and learn how to manage instances through the command line and the Google Shell.

Google Compute Engine

The core service of Google Cloud Platform (GCP) is Google Compute Engine (GCE). The GCE allows you to launch spin up virtual machines (VMs) with the right operating system, size, RAM, and appropriate number of CPUs or GPUs for your needs. It is an equivalent of AWS EC2. With GCE, we dive into the core of GCP.

In this chapter, you will learn how to:

Create VM instances on GCE that are adapted to your projects.

Use Google's command-line tools to manage your VMs.

Set up a Python data science stack on a GCE VM with

conda

and

scikit-learn

Access your VM via a password-protected Jupyter Notebook. And we'll cover more advanced topics related to images, snapshots, pre-emptibles VMs, startup script, and IPs.

By the end of this chapter, you will be able to create and fully manage your VM both via the online console and the command-line tools, as well as implement a data science workflow and a Jupyter Notebook workspace.

Google Compute Engine

Simply put, GCE is a service that lets you create and run VMs on Google infrastructure. The GCE allows you to launch spin up VMs with the right operating system, size, RAM, and the appropriate number of CPUs or GPUs for your needs. It is the equivalent of AWS EC2.

The GCE was announced on June 28, 2012, at Google I/O 2012 and made available to the general public on May 15, 2013. Compared to AWS EC2, an equivalent product, the GCE is a rather new service:

The following extracts from the release notes timeline illustrate the rapid evolution of the GCE service from a simple contender to a fully fledged player in the Cloud computing domain:

May 15, 2013: GCE is available for everyone.

August 6, 2013: GCE launches load balancing.

December 3, 2013: GCE is announced as being production ready.

Users can now feel confident using Compute Engine to support mission-critical workloads with 24/7 support and a 99.95% monthly SLA

June 25, 2014:

Solid-State Drives

(

SSD

) persistent disks are now available in general availability and open to all users and projects.

September 08, 2015: Pre-emptible instances are now generally available to all users and projects.

March 30, 2016: Persistent disks larger than 10 TB are generally available.

July 1, 2016: Shutdown scripts are now generally available to use with compute engine instances.

September 21, 2017: NVIDIA® Tesla® K80 GPUs are now generally available.

September 26, 2017: Billing increments for GCE VM instances are reduced from per-minute increments to per-second increments.

The most recent news at the time of writing this is the launch in beta of a staggering 96-vCPUs machine types.

In the past four years, Google has been steadily improving and developing its GCE offer at a rapid pace by:

Expanding regions

Adding more powerful machines and Intel CPU platforms

Adding roles and features

Steadily releasing new public images for Windows, Suse, CentOS, Debian, Ubuntu, RHEL, or CoreOS

As the timeline illustrates, the GCE service is a young and dynamic service that embraces the evolution of its customers needs and anticipates them with bold new offers. It reflects Google's drive to become a leader in the Cloud computing business and potentially offset Amazon's lead in Cloud computing.

Before we launch our first GCE VM, let's cover a few important concepts.

VMs, disks, images, and snapshots

A VM is an on-demand virtual server that you spin up for your needs. It is geographically located in one of Google's data centers, but you only choose the region and zone, not the precise location. Although you share some of the infrastructure resources with other users, this sharing is transparent to you.

A VM requires a persistent disk to run on and an operating system such as a Windows or Linux distribution to boot on. Although very much abstracted in a cloud computing context, a GCE disk would refer to a physical drive that the computer can boot on.

An image exists on top of a persistent disk, and includes the operating system necessary to launch the instance. A typical use of an image is to enable sharing a VM setup across many different VMs. An image consists of an operating system and boot loader and can be used to boot an instance.

A snapshot is a reflection of the content of a VM at a given time. A snapshot is mostly used for instant backups. Snapshots are stored as diffs, relative to the previous one, while images are not.

Images and snapshots are quite similar. It's possible to activate an instance using a snapshot or an image.

When you launch a new instance, GCE starts by attaching a persistent disk to your VM. This provides the disk space and gives the instance the root filesystem it needs to boot up. The disk uses the image you have chosen and installs the OS associated with that image. Public images are provided by Google with specific OS while private images are your own images.

By taking snapshots of an image, you can copy data from existing persistent disks to new persistent disks. Snapshots are meant for creating instant backups.

From the Google Shell, you can access and manage all your resources and files.

For example, let's list all our existing instances by typing:

$ gcloud compute instances list

We see our newly created sparrow instance.

Creating a VM

Let's now create our first VM instance using the web console.

Go to the GCE console, https://console.cloud.google.com/. Select the project we created in the previous chapter (or create one if you don't have one yet), and in the menu on the left, click on Compute Engine. Since you don't have a VM yet, you are greeted by the following message. Click on Create as shown in the following screenshot:

For this first VM, we will choose a small one and resize it as we go along.

There are several things you need to decide on at this point:

The name of your instance. I will call mine sparrow. This name does not have to be unique across GCP. Feel free to name yours as you like.

The region and the zone. It's often better to choose the zone closest to you to reduce latency. However, GCP services often open in the US first and become available only after a while in other parts of the world. Different zones may also have different rules and regulations. For instance, Europe offers stronger data related privacy laws than the US. Choose the zone as you see fit. It will always be possible to change the zone later.

Selecting the right machine type is important. At time of writing this book, different machines are grouped in categories as small, standard, high CPU and high RAM:

Small: Shared CPUs and limited RAM

Standard VMs: 3.75 GB of RAM

High-memory VMs: 13 GB RAM

High-CPU VMs

: 1.8 GB

The small category is perfect to get started with and build some hands-on experience with the platform. For more intense projects, you may want more more computational power or more memory.

Note that free-trial accounts are limited to eight CPUs.

It is also possible to customize the machine you need by setting the number of CPUs or memory per CPU you want. This is also where you choose the number of GPUs to have on your machine, as shown in the following screenshot:

Finally, you need to choose the

for your VM. The Debian Linux distribution is offered by default. You have a choice among several OSes: Windows, CentOS, Suse, CoreOS, and Ubuntu. Although Ubuntu is often the most popular choice, there is actually little difference between Debian and Ubuntu and we will go with the default Debian distribution. If you're more familiar with the Ubuntu distribution, go for it. It should not cause any problems in this chapter.

Ubuntu or Debian? Debian is one of the first Linux distributions with a first stable release in 1996. Ubuntu started as a fork, a branched out version of Debian in 2004. The two distributions are very similar, with Ubuntu being more user friendly and having a better desktop/UI experience. Debian is usually preferred for servers, a massive package library, with a strong focus on stability and open-licensed software. A stable version of Debian is released approximately every two years. The Ubuntu release cycle is six months. Ubuntu takes the unstable branch of Debian, makes customization especially in terms of the UI, and releases it. For our work, there should be close to no difference between either distribution and we will use Debian for our VMs.

Leave all the rest of parameters to their default choices. We will come back to HTTPs traffic, disks, networking, and ssh keys in a few pages.

One very useful feature in the web console that lowers the learning curve to mastering the GCP is the two links at the bottom of the VM creation page, Equivalent Rest or command line, as shown in the following image:

The command line link exists on multiple pages of the web console. It is a very useful feature to quickly learn the right syntax and parameters of the GCP command line tools.

Our VM is now created, up and running!

Now that we have a brand new shiny VM, how do we access it? That nicely leads us to the Google Shell.

Google Shell

The Google Shell is Google's smart way of giving you a standalone terminal in your browser to access and manage your resources.

You activate the Google Shell by clicking on the >_ icon in the upper right part of the console page:

The browser window splits into half and the lower part is now a shell terminal:

This terminal runs on an f1-micro GCE VM with a Debian operating system. It is created on a per user and per session basis. It persists when your Cloud Shell session is active and is deleted after 20 minutes of inactivity. The instance runs on a persistent disk with 5 GB storage. Both the disk and the image are available at no extra cost. Although the instance is not persistent across sessions, its associated disk is persistent across sessions. Everything you create via the Google Shell will be available as you've left it at the beginning of your next session. This includes all the files you store, the software you install and the configuration files you edit (.bashrc and .vimrc for instance). This disk is private and cannot be accessed by other users. And, finally, the Google Shell instance comes pre-installed with the Google Cloud SDK and other popular developer tools such as VIM.

Some of the commands you run via the web console will be memorized in your Google Shell VM. For instance, the SQL queries you run on a Google SQL instance, will show up in a .mysql_history file in your user's $HOME folder. More info on the Google Shell can be found in the README-cloudshell.txt in your $HOME folder.

From the Google Shell, you can access and manage all your resources and files. For example, let's list all our existing instances by typing:

$ gcloud compute instances list

We see our newly created sparrow instance:

To access the VM you just created, type in:

$ gcloud compute ssh sparrow

This will run through the creation of the necessary ssh keys. And you are now no longer on the Google's Cloud Shell VM instance but on the sparrow VM. To check which OS and version we're running in our sparrow instance, we run:

$ lsb_release -d

On the sparrow machine, I have Debian GNU/Linux 9 (stretch), while on the Google Shell VM, it's Debian GNU/Linux 8 (jessie). Which tells me that the Google Shell is not yet on the most recent version of the Debian distribution. You may, of course, see different results.

Google Cloud Platform SDK

GCP offers several standalone command-line interfaces (CLIs) to manage and interact with your GCP resources, gcloud being the main one. All secondary command-line tools are installed via gcloud. At time of writing this, the command-line tools are:

gcloud

: The main CLI to manage your GCP resources and projects: authentication, local configuration, developer workflow, and interactions with the GCP APIs. The following services can be handled via the

gcloud

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben:

Hands-On Machine Learning on Google Cloud Platform E-Book

Giuseppe Ciaburro

Key Features

Book Description

What you will learn

Who this book is for

Hands-On Machine Learning on Google Cloud Platform

Why subscribe?

PacktPub.com

Contributors

About the authors

About the reviewers

Packt is searching for authors like you

Table of Contents

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Get in touch

Reviews

Introducing the Google Cloud Platform

ML and the cloud

The nature of the cloud

Public cloud

Managed cloud versus unmanaged cloud

IaaS versus PaaS versus SaaS

Costs and pricing

ML

Introducing the GCP

Mapping the GCP

Getting started with GCP

Project-based organization

Creating your first project

Roles and permissions

Further reading

Summary

Google Compute Engine

Google Compute Engine

VMs, disks, images, and snapshots

Creating a VM

Google Shell

Google Cloud Platform SDK