Journey to Become a Google Cloud Machine Learning Engineer - Dr. Logan Song - E-Book

Journey to Become a Google Cloud Machine Learning Engineer E-Book

Dr. Logan Song

0,0
38,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

This book aims to provide a study guide to learn and master machine learning in Google Cloud: to build a broad and strong knowledge base, train hands-on skills, and get certified as a Google Cloud Machine Learning Engineer.

The book is for someone who has the basic Google Cloud Platform (GCP) knowledge and skills, and basic Python programming skills, and wants to learn machine learning in GCP to take their next step toward becoming a Google Cloud Certified Machine Learning professional.

The book starts by laying the foundations of Google Cloud Platform and Python programming, followed the by building blocks of machine learning, then focusing on machine learning in Google Cloud, and finally ends the studying for the Google Cloud Machine Learning certification by integrating all the knowledge and skills together.

The book is based on the graduate courses the author has been teaching at the University of Texas at Dallas. When going through the chapters, the reader is expected to study the concepts, complete the exercises, understand and practice the labs in the appendices, and study each exam question thoroughly. Then, at the end of the learning journey, you can expect to harvest the knowledge, skills, and a certificate.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 303

Veröffentlichungsjahr: 2022

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Journey to Become a Google Cloud Machine Learning Engineer

Build the mind and hand of a Google Certified ML professional

Dr. Logan Song

BIRMINGHAM—MUMBAI

Journey to Become a Google Cloud Machine Learning Engineer

Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Dhruv Jagdish Kataria

Content Development Editor: Sean Lobo

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Pratik Shirodkar

Production Designer: Prashant Ghare

Marketing Coordinators: Shifa Ansari and Abeer Riyaz Dawe

First published: September 2022

Production reference: 1300822

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80323-372-7

www.packt.com

To Grandpa and Grandma, your love made this book possible.

To Dad and Mom, your DNA is in the book.

To Peiying, Peihua, Peixing, and Hengping, your encouragement is visible on each page of the book.

To Nancy, Neil, and Nicole, you are the driving force of the book.

To Tracey, thank you for allowing me to spend so much time on machine learning in the cloud.

- Logan Song

Contributors

About the author

Dr. Logan Song is the enterprise cloud director and chief cloud architect at Dito (www.ditoweb.com). With 25+ years of professional experience, Dr. Song is highly skilled in enterprise information technologies, specializing in cloud computing and machine learning. He is a Google Cloud-certified professional solution architect and machine learning engineer, an AWS-certified professional solution architect and machine learning specialist, and a Microsoft-certified Azure solution architect expert. Dr. Song holds a Ph.D. in industrial engineering, an MS in computer science, and an ME in management engineering. Currently, he is also an adjunct professor at the University of Texas at Dallas, teaching cloud computing and machine learning courses.

From Thanksgiving of 2021 to August of 2022, it took nine months to complete this book. I want to thank God for the amazing grace that made this book possible.

This book would not have been possible without the great support and collaboration from the Packt team: Sean Lobo, Prashant Ghare, Aparna Nair, Dhruv J. Kataria, the technical reviewer, the copy editors, and the whole Packt production team. It’s been such a great pleasure working with the team!

I also want to thank my friend Farukh Khalilov, and the graduate student assistants at the University of Texas at Dallas, for their tremendous help in developing and verifying the Google Cloud practices and Python labs in this book. My gratitude is beyond words.

About the reviewer

Vijender Singh is a certified multi-cloud expert having 5+ years of experience and currently working with the Amazon Alexa AI team to tackle the effective use of AI on Alexa. He has completed an MSc with Distinction from Liverpool John Moores University with research work on keyphrase extraction. He has completed MLPE GCP, 5x Azure, 2x AWS certification, and TensorFlow certification. Vijender is instrumental in co-mentoring and teaching colleagues with machine learning and TensorFlow, which is a fundamental tool for the ML journey. He believes in working toward a better tomorrow.

Before starting the Google Cloud Machine Learning journey, you need to ask yourself a serious question: am I committed to taking this path?

In the summer of 2015, I was facing the same question: do I really want to get out of my comfort zone and pursue a new career called “cloud computing and machine learning”? At that time, I had been working in the traditional IT industry for over 20 years and was very comfortable with my professional life. Starting a new journey meant that I would have to learn from scratch!

For the whole summer, I was thinking about this question, along with another fundamental question: what do I really want to do in my life?

And one day, I came across Steve Jobs’ famous commencement speech at Stanford University in 2005 (https://news.stanford.edu/2005/06/14/jobs-061505/), and suddenly I heard a voice: “Stay hungry, stay foolish!”

At that moment, I made up my mind.

Today, I am so thankful to God for what happened in that summer and on that day!

Now, if you are determined as I was in 2015, let’s march on our journey, together.

- Logan Song

Table of Contents

Preface

Part 1: Starting with GCP and Python

1

Comprehending Google Cloud Services

Understanding the GCP global infrastructure

Getting started with GCP

Creating a free-tier GCP account

Provisioning our first computer in Google Cloud

Provisioning our first storage in Google Cloud

Managing resources using GCP Cloud Shell

GCP networking – virtual private clouds

GCP organization structure

The GCP resource hierarchy

GCP projects

GCP Identity and Access Management

Authentication

Authorization

Auditing or accounting

Service account

GCP compute services

GCE virtual machines

Load balancers and managed instance groups

Containers and Google Kubernetes Engine

GCP Cloud Run

GCP Cloud Functions

GCP storage and database service spectrum

GCP storage

Google Cloud SQL

Google Cloud Spanner

Cloud Firestore

Google Cloud Bigtable

GCP big data and analytics services

Google Cloud Dataproc

Google Cloud Dataflow

Google Cloud BigQuery

Google Cloud Pub/Sub

GCP artificial intelligence services

Google Vertex AI

Google Cloud ML APIs

Summary

Further reading

2

Mastering Python Programming

Technical requirements

The basics of Python

Basic Python variables and operations

Basic Python data structure

Python conditions and loops

Python functions

Opening and closing files in Python

An interesting problem

Python data libraries and packages

NumPy

Pandas

Matplotlib

Seaborn

Summary

Further reading

Part 2: Introducing Machine Learning

3

Preparing for ML Development

Starting from business requirements

Defining ML problems

Is ML the best solution?

ML problem categories

ML model inputs and outputs

Measuring ML solutions and data readiness

ML model performance measurement

Data readiness

Collecting data

Data engineering

Data sampling and balancing

Numerical value transformation

Categorical value transformation

Missing value handling

Outlier processing

Feature engineering

Feature selection

Feature synthesis

Summary

Further reading

4

Developing and Deploying ML Models

Splitting the dataset

Preparing the platform

Training the model

Linear regression

Binary classification

Support vector machine

Decision tree and random forest

Validating the model

Model validation

Confusion matrix

ROC curve and AUC

More classification metrics

Tuning the model

Overfitting and underfitting

Regularization

Hyperparameter tuning

Testing and deploying the model

Practicing model development with scikit-learn

Summary

Further reading

5

Understanding Neural Networks and Deep Learning

Neural networks and DL

The cost function

The optimizer algorithm

The activation functions

Convolutional Neural Networks

The convolutional layer

The pooling layer

The fully connected layer

Recurrent Neural Networks

Long Short-Term Memory Networks

Generative Adversarial networks

Summary

Further reading

Part 3: Mastering ML in GCP

6

Learning BQ/BQML, TensorFlow, and Keras

GCP BQ

GCP BQML

Introduction to TensorFlow

Understanding the concept of tensors

How tensors flow

Introduction to Keras

Summary

Further reading

7

Exploring Google Cloud Vertex AI

Vertex AI data labeling and datasets

Vertex AI Feature Store

Vertex AI Workbench and notebooks

Vertex AI Training

Vertex AI AutoML

The Vertex AI platform

Vertex AI Models and Predictions

Vertex AI endpoint prediction

Vertex AI batch prediction

Vertex AI Pipelines

Vertex AI Metadata

Vertex AI experiments and TensorBoard

Summary

Further reading

8

Discovering Google Cloud ML API

Google Cloud Sight API

The Cloud Vision API

The Cloud Video API

The Google Cloud Language API

The Google Cloud Conversation API

Summary

Further reading

9

Using Google Cloud ML Best Practices

ML environment setup

ML data storage and processing

ML model training

ML model deployment

ML workflow orchestration

ML model continuous monitoring

Summary

Further reading

Part 4: Accomplishing GCP ML Certification

10

Achieving the GCP ML Certification

GCP ML exam practice questions

Summary

Part 5: Appendices

Appendix 1

Practicing with Basic GCP Services

Practicing using GCP services with the Cloud console

Creating network VPCs using the GCP console

Creating a public VM, vm1, within vpc1/subnet1 using the GCP console

Creating a private VM, vm2, within vpc1/subnet2 using the GCP console

Creating a private VM, vm8, within vpc2/subnet8 using the GCP console

Creating peering between vpc1 and vpc2 using the GCP console

Creating a GCS bucket from the GCP console

Provisioning GCP resources using Google Cloud Shell

Summary

Appendix 2

Practicing Using the Python Data Libraries

NumPy

Generating NumPy arrays

Operating NumPy arrays

Pandas

Series

DataFrames

Missing data handling

GroupBy

Operations

Matplotlib

Seaborn

Summary

Appendix 3

Practicing with Scikit-Learn

Data preparation

Regression

Simple linear regression

Multiple linear regression

Polynomial/non-linear regression

Classification

Summary

Appendix 4

Practicing with Google Vertex AI

Vertex AI – enabling its API

Vertex AI – datasets

Vertex AI – labeling tasks

Vertex AI – training

Vertex AI – predictions (Vertex AI Endpoint)

Deploying the model via Models

Deploying the model via Endpoints

Vertex AI – predictions (Batch Prediction)

Vertex AI – Workbench

Vertex AI – Feature Store

Vertex AI – pipelines and metadata

Vertex AI – model monitoring

Summary

Appendix 5

Practicing with Google Cloud ML API

Google Cloud Vision API

Google Cloud NLP API

Google Cloud Speech-to-Text API

Google Cloud Text-To-Speech API

Google Cloud Translation API

Google Cloud Dialogflow API

Summary

Index

Other Books You May Enjoy

Preface

Since the first programmable digital computer called ENIAC came to our world in 1946, computers have been so widely used and have become an integral part of our lives. It’s impossible to imagine a world without computers.

Entering the 21st century, the so-called ABC Triangle stands out in the computer world, and its three vertices represent today’s most advanced computer technologies – A for Artificial intelligence, B for Big data, and C for Cloud computing, as you can see from the following figure. These technologies are reshaping our world and changing our lives every day.

It is very interesting to look at these advanced computer technologies from a historical point of view, to understand what they are and how they have developed with each other:

Artificial intelligence (AI) is a technology that enables a machine (computer) to simulate human behavior. Machine learning (ML) is a subset of AI that lets a machine automatically learn from past data and predict based on the data. AI was introduced to the world around 1956, shortly after the invention of ENIAC, but in recent years, AI has gained momentum because of the accumulation of big data and the development of cloud computing.Big data refers to the steady exponentially increasing data generated and stored in the past years. In 2018, the total amount of data created and consumed was about 33 zettabytes (1 ZB =8,000,000,000,000,000,000,000 bits) worldwide. This number grew to 59 ZB in 2020 and is predicted to reach a mind-boggling 175 ZB by 2025. To process these big data sets, huge amounts of computing power are needed. It is inconceivable to process these huge data sets on commodity computers, not to mention the time it takes for a company to deploy traditional data centers to place these computers. Big data processing calls for new ways to provision computing powers.Cloud computing came into our world in 2006, about half a century after the idea of AI. Cloud computing provides computing powers featuring elastic, self-provisioning, and on-demand services. In a traditional computing model, infrastructure is conceived as hardware. Hardware solutions are physical – they require space, staff, planning, physical security, and capital expenditure – thus they have a long hardware procurement cycle that involves acquiring, provisioning, and maintaining. The cloud computing model made the infrastructure as software – choose the cloud computing services that best match your business needs, provision and terminate those resources on-demand, scale the resources up and down elastically in an automated fashion based on demand, deploy the infrastructure/resources as immutable codes that are managed with version control, and pay for what you use. With the cloud computing model, computing resources are treated as temporary and disposable: they can be used much more quickly, easily, and cost-effectively. The cloud computing model made AI computing feasible.

AI, big data, and cloud computing work with each other and thrive – more data results in more AI/ML applications, more applications demand more cloud computing power, and more applications will generate more data.

Famous for its innovation-led mindsets and industry-trend-led products, Google is a leader in the ABC Triangle technologies. As an ML pioneer, Google developed AlphaGo in 2017, the first computer program that defeated a professional human Go world champion. AlphaGo was trained on thousands of human amateur and professional games to learn how to play Go. AlphaZero skips this step and learns to play against itself – it quickly surpassed the human level of play and defeated AlphaGo by 100 games to 0. In addition to the legendary AlphaGo and AlphaZero, Google has developed numerous ML models and applications in many areas, including vision, voice, and language processing. In the cloud computing arena, Google is one of the biggest cloud computing service providers in the world. Google Cloud Platform (GCP) provides the best cloud services on earth, especially in the areas of big data and ML. Many companies are keen to use Google Cloud and leverage the GCP ML services for their business use cases. And this is the purpose of our book. We aim to learn about and master the best of the best – ML in Google Cloud.

Who this book is for

This book is for anyone that not only wants to better understand the concept of ML in the cloud, but also those that have a decent grasp and want to dive deep to become a professional Google-certified cloud ML engineer.

What this book covers

Chapter 1, Comprehending Google Cloud Services, provides an overview of the GCP services with the practice examples detailed in Appendix 1.

Chapter 2, Learning Python Programming, delves into the Python basic knowledge and programming skills. The Python data science libraries are explored, with practice examples detailed in Appendix 2.

Chapter 3, Preparing for ML Development, covers preparations for the ML process, including ML problem definition and data preparation.

Chapter 4, Developing and Deploying ML Models, dives into the ML process, including platform preparation, dataset splitting, model training, validation, testing, and deployment with the practice examples detailed in Appendix 3.

Chapter 5, Understanding Neural Networks and Deep Learning, introduces the modern AI methods of deep learning with neural network modeling.

Chapter 6, Learning BQML, TensorFlow, and Keras, discover Google’s BigQuery machine learning for structured data, and Google’s ML framework of TensorFlow and Keras.

Chapter 7, Exploring Google Cloud Vertex AI, examines Google’s end-to-end ML suite of Vertex AI and its ML services with the practice examples detailed in Appendix 4.

Chapter 8, Discovering Google Cloud ML API, looks at how you can leverage Google’s pre-trained model APIs for ML development with the practice examples detailed in Appendix 5.

Chapter 9, Using Google Cloud ML Best Practices, summarizes the best practices in ML development in Google Cloud.

Chapter 10, Achieving the GCP ML Certification, studies GCP ML certification exam questions by integrating the knowledge and skills learned from previous chapters.

Appendix 1, Practicing with Basic GCP Services, provides examples for provisioning basic GCP services.

Appendix 2, Practicing with Python Data Library, provides examples of Python data library practices, including NumPy, Pandas, Matpotlib, and Seaborn.

Appendix 3, Practicing with ScikitLearn, provides examples of scikit-learn library practices.

Appendix 4, Practicing with Google Vertex AI, provides examples for practicing Google Cloud Vertex AI services.

Appendix 5, Practicing with Google Cloud ML API, provides examples for Google Cloud ML API practicing.

To get the most out of this book

It is a best practice to learn machine learning in Google Cloud in two folds: studying the chapters to master the basic concepts, and learning by doing all the lab examples in the chapters, especially the labs in the appendices.

While some basic computer technical knowledge is expected to start, being a cloud developer or cloud engineer is not a necessity. You can read this book from beginning to end, or you can jump to the chapters that seem most relevant to you.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Journey-to-a-Google-Cloud-Professional-Machine-Learning-Engineer. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://packt.link/ugTOg.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.”

A block of code is set as follows:

html, body, #map { height: 100%; margin: 0; padding: 0 }

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default] exten => s,1,Dial(Zap/1|30) exten => s,2,Voicemail(u100) exten => s,102,Voicemail(b100) exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

$ mkdir css

$ cd css

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Select System info from the Administration panel.”

Tips or Important Notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Journey to become a Google Cloud Machine Learning Engineer, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Part 1: Starting with GCP and Python

This part provides a general background of Google Cloud Platform (GCP) and the Python programming language. We introduce the concept of cloud computing and GCP, and briefly look at the basic GCP services including compute, storage, networking, database, big data, and machine learning. We go through an overview of the Python language basics, programming structures, and control flows. We then discuss the Python data science libraries, including NumPy, Pandas, Matpotlib, and Seaborn, to better understand their functions and use cases.

This part comprises the following chapters:

Chapter 1, Comprehending Google Cloud ServicesChapter 2, Learning Python Programming

1

Comprehending Google Cloud Services

In Part 1 of this book, we will be building a foundation by focusing on Google Cloud and Python, the essential platform and tool for our learning journey, respectively.

In this chapter, we will dive into Google Cloud Platform (GCP) and discuss the Google Cloud services that are closely related to Google Cloud Machine Learning. Mastering these services will provide us with a solid background.

The following topics will be covered in this chapter:

Understanding the GCP global infrastructureGetting started with GCPGCP organization structureGCP Identity and Access ManagementGCP compute spectrum GCP storage and database services GCP big data and analytics servicesGCP artificial intelligence services

Let’s get started.

Understanding the GCP global infrastructure

Google is one of the biggest cloud service providers in the world. With the physical computing infrastructures such as computers, hard disk drives, routers, and switches in Google’s worldwide data centers, which are connected by Google’s global backbone network, Google provides a full spectrum of cloud services in GCP, including compute, network, database, security, and advanced services such as big data, machine learning (ML), and many, many more.

Within Google’s global cloud infrastructure, there are many data center groups. Each data center group is called a GCP region. These regions are located worldwide, in Asia, Australia, Europe, North America, and South America. These regions are connected by Google’s global backbone network for performance optimization and resiliency. Each GCP region is a collection of zones that are isolated from each other. Each zone has one or more data centers and is identified by a name that combines a letter identifier with the region’s name. For example, zone US-Central1-a is a zone in the US-Central1 region, which is physically located in Council Bluffs, Iowa, the United State of America. In the GCP global infrastructure, there are also many edge locations or points ofpresence (POPs) where Google’s global networks connect to the internet. More details about GCP regions, zones, and edge locations can be found at https://cloud.google.com/about/locations.

GCP provides on-demand cloud resources at a global scale. These resources can be used together to build solutions that help meet business goals and satisfy technology requirements. For example, if a company needs 1,000 TB of storage in Tokyo, its IT professional can log into their GCP account console and provision the storage in the Asia-northeast1 region at any time. Similarly, a 3,000 TB database can be provisioned in Sydney and a 4,000-node cluster in Frankfurt at any time, with just a few clicks. And finally, if a company wants to set up a global website, such as zeebestbuy.com, with the lowest latencies for their global users, they can build three web servers in the global regions of London, Virginia, and Singapore, and utilize Google’s global DNS service to distribute the web traffic along these three web servers. Depending on the user’s web browser location, DNS will route the traffic to the nearest web server.

Getting started with GCP

Now that we have learned about Google’s global cloud infrastructure and the on-demand resource provisioning concept of cloud computing, we can’t wait to dive into Google Cloud and provision resources in the cloud!

In this section, we will build cloud resources by doing the following:

Creating a free-tier GCP accountProvisioning a virtual computer instance in Google CloudProvisioning our first storage in Google Cloud

Let’s go through each of these steps in detail.

Creating a free-tier GCP account

Google provides a free-tier account type for us to get started on GCP. More details can be found at https://cloud.google.com/free/docs/gcp-free-tier.

Once you have signed up for a GCP free-tier account, it’s time to plan our first resources in Google Cloud – a computer and a storage folder in the cloud. We will provision them as needed. How exciting!

Provisioning our first computer in Google Cloud

We will start with the simplest idea: provisioning a computer in the cloud. Think about a home computer for a moment. It has a Central Processing Unit (CPU), Random Access Memory (RAM), hard disk drives (HDDs), and a network interface card (NIC) for connecting to the relevant Internet Service Provider (ISP) equipment (such as cable modems and routers). It also has an operating system (Windows or Linux), and it may have a database such as MySQL for some family data management, or Microsoft Office for home office usage.

To provision a computer in Google Cloud, we will need to do the same planning for its hardware, such as the number of CPUs, RAM, and the size of HDDs, as well as for its software, such as the operating system (Linux or Windows) and database (MySQL). We may also need to plan the network for the computer, such as an external IP address, and whether the IP address needs to be static or dynamic. For example, if we plan to provision a web server, then our computer will need a static external IP address. And from a security point of view, we will need to set up the network firewalls so that only specific computers at home or work may access our computer in the cloud.

GCP offers a cloud service for consumers to provision a computer in the cloud: Google Compute Engine (GCE). With the GCE service, we can build flexible, self-managed virtual machines (VMs) in the Google Cloud. GCE offers different hardware and software options based on consumers’ needs, so you can use customized VM types and select the appropriate operating system for the VM instances.

Following the instructions at https://cloud.google.com/compute/docs/instances/create-start-instance, you can create a VM in GCP. Let’s pause here and go to the GCP console to provision our first computer.

How do we access the computer? If the VM has a Windows operating system, you can use Remote Desktop to access it. For a Linux VM, you can use Secure Shell (SSH) to log in. More details are available at https://cloud.google.com/compute.

Provisioning our first storage in Google Cloud

When we open the computer case and look inside our home computer, we can see its hardware components – that is, its CPU, RAM, HDD, and NIC. The hard disks within a PC are limited in size and performance. EMC, a company founded in 1979 by Richard Egan and Roger Marino, expanded PC hard disks outside of the PC case to a separate computer network storage platform called Symmetrix in 1990. Symmetrix has its own CPU/RAM and provides huge storage capacities. It is connected to the computer through fiber cables and serves as the storage array of the computer. On the other hand, SanDisk, founded in 1988 by Eli Harari, Sanjay Mehrotra, and Jack Yuan, produced the first Flash-based solid-state drive (SSD) in a 2.5-inch hard drive, called Cruzer, in 2000. Cruzer provides portable storage via a USB connection to a computer. By thinking out of the box and extending either to Symmetrix or Cruzer, EMC and Sandisk extended the hard disk concept out of the box. These are great examples of start-up ideas!

And then comes the great idea of cloud computing – the concept of storage is further extended to cloud-block storage, cloud network-attached storage (NAS), and cloud object storage. Let’s look at these in more detail:

Cloud block storage is a form of software-based storage that can be attached to a VM in the cloud, just like a hard disk is attached to our PC at home. In Google Cloud, cloud block storage is called persistent disks (PD). Instead of buying a physical hard disk and installing it on the PC to use it, PDs can be created instantly and attached to a VM in the cloud, with only a couple of clicks. Cloud network-attached storage (Cloud NAS) is a form of software-based storage that can be shared among many cloud VMs through a virtual cloud network. In GCP, cloud NAS is called Filestore. Instead of buying a physical file server, installing it on a network, and sharing it with multiple PCs at home, a Filestore instance can be created instantly and shared by many cloud VMs, with only a couple of clicks. Cloud object storage is a form of software-based storage that can be used to store objects (files, images, and so on) in the cloud. In GCP, cloud object storage is called Google Cloud Storage (GCS). Different from PD, which is a cloud block storage type that’s used by a VM (it can be shared in read-only mode among multiple VMs), and Filestore, which is a cloud NAS type shared by many VMs, GCS is a cloud object type used for storing immutable objects. Objects are stored in GCS buckets. In GCP, bucket creation and deletion, object uploading, downloading, and deletion can all be done from the GCP console, with just a couple of clicks!

GCS provides different storage classes based on the object accessing patterns. More details can be found at https://cloud.google.com/storage.

Following the instructions at https://cloud.google.com/storage/docs/creating-buckets, you can create a storage folder/bucket and upload objects into it. Let’s pause here and go to the GCP console to provision our first storage bucket and upload some objects into it.

Managing resources using GCP Cloud Shell

So far, we have discussed provisioning VMs and buckets/objects in the cloud from the GCP console. There is another tool that can help us create, manage, and delete resources: GCP Cloud Shell. Cloud Shell is a command-line interface that can easily be accessed from your console browser. After you click the Cloud Shell button on the GCP console, you will get a Cloud Shell – a command-line user interface on a VM, in your web browser, with all the cloud resource management commands already installed.

The following tools are provided by Google for customers to create and manage cloud resources using the command line:

The gcloud tool is the main command-line interface for GCP products and services such as GCE.The gsutil tool is for GCS services.The bq tool is for BigQuery services.The kubectl tool is for Kubernetes services.

Please refer to https://cloud.google.com/shell/docs/using-cloudshell-command for more information about GCP Cloud Shell and commands, as well as how to create a VM and a storage bucket using Cloud Shell commands.

GCP networking – virtual private clouds

Think about home computers again – they are all connected via a network, wired or wireless, so that they can connect to the internet. Without networking, a computer is almost useless. Within GCP, a cloud network unit is called a virtual private cloud (VPC). A VPC is a software-based logical network resource. Within a GCP project, a limited number of VPCs can be provisioned. After launching VMs in the cloud, you can connect them within a VPC, or isolate them from each other in separate VPCs. Since GCP VPCs are global and can span multiple regions in the world, you can provision a VPC, as well as the resources within it, anywhere in the world. Within a VPC, a public subnet has VMs with external IP addresses that are accessible from the internet and can access the internet; a private subnet contains VMs that do not have external IP addresses. VPCs can be peered with each other, within a GCP project, or outside a GCP project.

VPCs can be provisioned using the GCP console or GCP Cloud Shell. Please refer to https://cloud.google.com/vpc/ for details. Let’s pause here and go to the GCP console to provision our VPC and subnets, and then launch some VMs into those subnets.

GCP organization structure

Before we discuss the GCP cloud services further, we need to spend some time talking about the GCP organization structure, which is quite different from that of the Amazon Web Services (AWS) cloud and the Microsoft Azure cloud.

The GCP resource hierarchy

As shown in the following diagram, within a GCP cloud domain, at the top is the GCP organization, followed by folders, then projects. As a common practice, we can map a company’s organizational hierarchy to a GCP structure: a company maps to a GCP organization, its departments (sales, engineering, and more) are mapped to folders, and the functional projects from the departments are mapped to projects under the folders. Cloud resources such as VMs, databases (DBs), and so on are under the projects.

In a GCP organization hierarchy, each project is a separate compartment, and each resource belongs to exactly one project. Projects can have multiple owners and users. They are managed and billed separately, although multiple projects may be associated with the same billing account:

Figure 1.1 – Sample GCP organization structure

In the preceding diagram, there are two organizations: one for production and one for testing (sandbox). Under each organization, there are multiple layers of folders (note that the number of folder layers and the number of folders at each layer may be limited), and under each folder, there are multiple projects, each of which contains multiple resources.

GCP projects

GCP projects are the logical separations of GCP resources. Projects are used to fully isolate resources based on Google Cloud’s Identity and Access Management (IAM) permissions:

Billing isolation: Use different projects to separate spending unitsQuotas and limits: Set at the project level and separated by workloadsAdministrative complexity: Set at the project level for access separationBlast radius: Misconfiguration issues are limited within a projectSeparation of duties: Business units and data sensitivity are separate

In summary, the GCP organization structure provides a hierarchy for managing Google Cloud resources, with projects being the logical isolation and separation. In the next section, we will discuss resource permissions within the GCP organization by looking at IAM.

GCP Identity and Access Management

Once we have reviewed the GCP organization structure and the GCP resources of VMs, storage, and network, we must look at the access management of these resources within the GCP organization: IAM. GCP IAM manages cloud identities using the AAA model: authentication, authorization, and auditing (or accounting).

Authentication

The first A in the AAA model is authentication, which involves verifying the cloud identity that is trying to access the cloud. Instead of the traditional way of just asking for a username and password, multi-factor authentication (MFA) is used, an authentication method that requires users to verify their identity using multiple independent methods. For security reasons, all user authentications, including GCP console access and any other single sign-on (SSO) implementations, must be done while enforcing MFA. Usernames and passwords are simply ineffective in protecting user access these days.

Authorization

Authorization is represented by the second A in the AAA model. It is the process of granting or denying a user access to cloud resources once the user has been authenticated into the cloud account. The amount of information and the number of services the user can access depend on the user’s authorization level. Once a user’s identity has been verified and the user has been authenticated into GCP, the user must pass the authorization rules to access the cloud resources and data. Authorization determines the resources that the user can and cannot access.

Authorization defines who can do what on which resource. The following diagram shows the authorization concept in GCP. As you can see, there are three parties in the authorization process: the first layer in the figure is identity – this specifies who can be a user account, a group of users, or an application (Service Account). The third layer specifies which cloud resources, such as GCS buckets, GCE VMs, VPCs, service accounts, or other GCP resources. A Service Account can be an identity as well as a resource:

Figure 1.2 – GCP IAM authentication

The middle layer is IAM Role, also known as the what, which refers to specific privileges or actions that the identity has against the resources. For example, when a group is provided the privilege of a compute viewer, then the group will have read-only access to get and list GCE resources, without being able to write/change them. GCP supports three types of IAM roles: primitive (basic), predefined, and custom. Let’s take a look:

Primitive (basic) roles, include the Owner, Editor, and Viewer roles, which existed in GCP before the introduction