50,99 €
Expert, guidance for the Google Cloud Machine Learning certification exam In Google Cloud Certified Professional Machine Learning Study Guide, a team of accomplished artificial intelligence (AI) and machine learning (ML) specialists delivers an expert roadmap to AI and ML on the Google Cloud Platform based on new exam curriculum. With Sybex, you'll prepare faster and smarter for the Google Cloud Certified Professional Machine Learning Engineer exam and get ready to hit the ground running on your first day at your new job as an ML engineer. The book walks readers through the machine learning process from start to finish, starting with data, feature engineering, model training, and deployment on Google Cloud. It also discusses best practices on when to pick a custom model vs AutoML or pretrained models with Vertex AI platform. All technologies such as Tensorflow, Kubeflow, and Vertex AI are presented by way of real-world scenarios to help you apply the theory to practical examples and show you how IT professionals design, build, and operate secure ML cloud environments. The book also shows you how to: * Frame ML problems and architect ML solutions from scratch * Banish test anxiety by verifying and checking your progress with built-in self-assessments and other practical tools * Use the Sybex online practice environment, complete with practice questions and explanations, a glossary, objective maps, and flash cards A can't-miss resource for everyone preparing for the Google Cloud Certified Professional Machine Learning certification exam, or for a new career in ML powered by the Google Cloud Platform, this Sybex Study Guide has everything you need to take the next step in your career.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 538
Veröffentlichungsjahr: 2023
Cover
Table of Contents
Title Page
Copyright
Dedication
Acknowledgments
About the Author
About the Technical Editors
About the Technical Proofreader
Google Technical Reviewer
Introduction
Google Cloud Professional Machine Learning Engineer Certification
Who Should Buy This Book
How This Book Is Organized
Bonus Digital Contents
Conventions Used in This Book
Google Cloud Professional ML Engineer Objective Map
How to Contact the Publisher
Assessment Test
Answers to Assessment Test
Chapter 1: Framing ML Problems
Translating Business Use Cases
Machine Learning Approaches
ML Success Metrics
Responsible AI Practices
Summary
Exam Essentials
Review Questions
Chapter 2: Exploring Data and Building Data Pipelines
Visualization
Statistics Fundamentals
Data Quality and Reliability
Establishing Data Constraints
Running TFDV on Google Cloud Platform
Organizing and Optimizing Training Datasets
Handling Missing Data
Data Leakage
Summary
Exam Essentials
Review Questions
Chapter 3: Feature Engineering
Consistent Data Preprocessing
Encoding Structured Data Types
Class Imbalance
Feature Crosses
TensorFlow Transform
GCP Data and ETL Tools
Summary
Exam Essentials
Review Questions
Chapter 4: Choosing the Right ML Infrastructure
Pretrained vs. AutoML vs. Custom Models
Pretrained Models
AutoML
Custom Training
Provisioning for Predictions
Summary
Exam Essentials
Review Questions
Chapter 5: Architecting ML Solutions
Designing Reliable, Scalable, and Highly Available ML Solutions
Choosing an Appropriate ML Service
Data Collection and Data Management
Automation and Orchestration
Serving
Summary
Exam Essentials
Review Questions
Chapter 6: Building Secure ML Pipelines
Building Secure ML Systems
Identity and Access Management
Privacy Implications of Data Usage and Collection
Summary
Exam Essentials
Review Questions
Chapter 7: Model Building
Choice of Framework and Model Parallelism
Modeling Techniques
Transfer Learning
Semi‐supervised Learning
Data Augmentation
Model Generalization and Strategies to Handle Overfitting and Underfitting
Summary
Exam Essentials
Review Questions
Chapter 8: Model Training and Hyperparameter Tuning
Ingestion of Various File Types into Training
Developing Models in Vertex AI Workbench by Using Common Frameworks
Training a Model as a Job in Different Environments
Hyperparameter Tuning
Tracking Metrics During Training
Retraining/Redeployment Evaluation
Unit Testing for Model Training and Serving
Summary
Exam Essentials
Review Questions
Chapter 9: Model Explainability on Vertex AI
Model Explainability on Vertex AI
Summary
Exam Essentials
Review Questions
Chapter 10: Scaling Models in Production
Scaling Prediction Service
Serving (Online, Batch, and Caching)
Google Cloud Serving Options
Hosting Third‐Party Pipelines (
MLflow
) on Google Cloud
Testing for Target Performance
Configuring Triggers and Pipeline Schedules
Summary
Exam Essentials
Review Questions
Chapter 11: Designing ML Training Pipelines
Orchestration Frameworks
Identification of Components, Parameters, Triggers, and Compute Needs
System Design with Kubeflow/TFX
Hybrid or Multicloud Strategies
Summary
Exam Essentials
Review Questions
Chapter 12: Model Monitoring, Tracking, and Auditing Metadata
Model Monitoring
Model Monitoring on Vertex AI
Logging Strategy
Model and Dataset Lineage
Vertex AI Experiments
Vertex AI Debugging
Summary
Exam Essentials
Review Questions
Chapter 13: Maintaining ML Solutions
MLOps Maturity
Retraining and Versioning Models
Feature Store
Vertex AI Permissions Model
Common Training and Serving Errors
Summary
Exam Essentials
Review Questions
Chapter 14: BigQuery ML
BigQuery – Data Access
BigQuery ML Algorithms
Explainability in BigQuery ML
BigQuery ML vs. Vertex AI Tables
Interoperability with Vertex AI
BigQuery Design Patterns
Summary
Exam Essentials
Review Questions
Appendix: Answers to Review Questions
Chapter 1: Framing ML Problems
Chapter 2: Exploring Data and Building Data Pipelines
Chapter 3: Feature Engineering
Chapter 4: Choosing the Right ML Infrastructure
Chapter 5: Architecting ML Solutions
Chapter 6: Building Secure ML Pipelines
Chapter 7: Model Building
Chapter 8: Model Training and Hyperparameter Tuning
Chapter 9: Model Explainability on Vertex AI
Chapter 10: Scaling Models in Production
Chapter 11: Designing ML Training Pipelines
Chapter 12: Model Monitoring, Tracking, and Auditing Metadata
Chapter 13: Maintaining ML Solutions
Chapter 14: BigQuery ML
Index
End User License Agreement
Chapter 1
TABLE 1.1 ML problem types
TABLE 1.2 Structured data
TABLE 1.3 Time‐Series Data
TABLE 1.4 Confusion matrix for a binary classification example
TABLE 1.5 Summary of metrics
Chapter 2
TABLE 2.1 Mean, median, and mode for outlier detection
Chapter 3
TABLE 3.1 One‐hot encoding example
TABLE 3.2 Run a TFX pipeline on GCP
Chapter 4
TABLE 4.1 Vertex AI AutoML Tables algorithms
TABLE 4.2 AutoML algorithms
TABLE 4.3 Problems solved using AutoML
TABLE 4.4 Summary of the recommendation types available in Retail AI
Chapter 5
TABLE 5.1 ML workflow to GCP services mapping
TABLE 5.2 When to use BigQuery ML vs. AutoML vs. a custom model
TABLE 5.3 Google Cloud tools to read BigQuery data
TABLE 5.4 NoSQL data store options
Chapter 6
TABLE 6.1 Difference between server‐side and client‐side encryption
TABLE 6.2 Strategies for handling sensitive data
TABLE 6.3 Techniques to handle sensitive fields in data
Chapter 7
TABLE 7.1 Distributed training strategies using TensorFlow
TABLE 7.2 Summary of loss functions based on ML problems
TABLE 7.3 Differences between L1 and L2 regularization
Chapter 8
TABLE 8.1 Dataproc connectors
TABLE 8.2 Data storage guidance on GCP for machine learning
TABLE 8.3 Differences between managed and user‐managed notebooks
TABLE 8.4 Worker pool tasks in distributed training
TABLE 8.5 Search algorithm options for hyperparameter tuning on GCP
TABLE 8.6 Tools to track metric or profile training metrics
TABLE 8.7 Retraining strategies
Chapter 9
TABLE 9.1 Explainable techniques used by Vertex AI
Chapter 10
TABLE 10.1 Static vs. dynamic features
TABLE 10.2 Input data options for batch training in Vertex AI
TABLE 10.3 ML orchestration options
Chapter 11
TABLE 11.1 Kubeflow Pipelines vs. Vertex AI Pipelines vs. Cloud Composer
Chapter 13
TABLE 13.1 Table of baseball batters
Chapter 14
TABLE 14.1 Models available on BigQuery ML
TABLE 14.2 Model types
Chapter 1
FIGURE 1.1 Business case to ML problem
FIGURE 1.2 AUC
FIGURE 1.3 AUC PR
Chapter 2
FIGURE 2.1 Box plot showing quartiles
FIGURE 2.2 Line plot
FIGURE 2.3 Bar plot
FIGURE 2.4 Data skew
FIGURE 2.5 TensorFlow Data Validation
FIGURE 2.6 Dataset representation
FIGURE 2.7 Credit card data representation
FIGURE 2.8 Downsampling credit card data
Chapter 3
FIGURE 3.1 Difficult to separate by line or a linear method
FIGURE 3.2 Difficult to separate classes by line
FIGURE 3.3 Summary of feature columnsGoogle Cloud via Coursera, www.coursera...
FIGURE 3.4 TensorFlow Transform
Chapter 4
FIGURE 4.1 Pretrained, AutoML, and custom models
FIGURE 4.2 Analyzing a photo using Vision AI
FIGURE 4.3 Vertex AI AutoML, providing a “budget”
FIGURE 4.4 Choosing the size of model in Vertex AI
FIGURE 4.5 TPU system architecture
Chapter 5
FIGURE 5.1 Google AI/ML stack
FIGURE 5.2 Kubeflow Pipelines and Google Cloud managed services
FIGURE 5.3 Google Cloud architecture for performing offline batch prediction...
FIGURE 5.4 Google Cloud architecture for online prediction
FIGURE 5.5 Push notification architecture for online prediction
Chapter 6
FIGURE 6.1 Creating a user‐managed Vertex AI Workbench notebook
FIGURE 6.2 Managed Vertex AI Workbench notebook
FIGURE 6.3 Permissions for a managed Vertex AI Workbench notebook
FIGURE 6.4 Creating a private endpoint in the Vertex AI console
FIGURE 6.5 Architecture for de‐identification of PII on large datasets using...
Chapter 7
FIGURE 7.1 Asynchronous data parallelism
FIGURE 7.2 Model parallelism
FIGURE 7.3 Training strategy with TensorFlow
FIGURE 7.4 Artificial or feedforward neural network
FIGURE 7.5 Deep neural network
Chapter 8
FIGURE 8.1 Google Cloud data and analytics overview
FIGURE 8.2 Cloud Dataflow source and sink
FIGURE 8.3 Summary of processing tools on GCP
FIGURE 8.4 Creating a managed notebook
FIGURE 8.5 Opening the managed notebook
FIGURE 8.6 Exploring frameworks available in a managed notebook
FIGURE 8.7 Data integration with Google Cloud Storage within a managed noteb...
FIGURE 8.8 Data Integration with BigQuery within a managed notebook
FIGURE 8.9 Scaling up the hardware from a managed notebook
FIGURE 8.10 Git integration within a managed notebook
FIGURE 8.11 Scheduling or executing code in the notebook
FIGURE 8.12 Submitting the notebook for execution
FIGURE 8.13 Scheduling the notebook for execution
FIGURE 8.14 Choosing TensorFlow framework to create a user‐managed notebook...
FIGURE 8.15 Create a user‐managed TensorFlow notebook
FIGURE 8.16 Exploring the network
FIGURE 8.17 Training in the Vertex AI console
FIGURE 8.18 Vertex AI training architecture for a prebuilt container
FIGURE 8.19 Vertex AI training console for pre‐built containersSource: Googl...
FIGURE 8.20 Vertex AI training architecture for custom containers
FIGURE 8.21 ML model parameter and hyperparameter
FIGURE 8.22 Configure hyperparameter tuning by training the pipeline UISourc...
FIGURE 8.23 Enabling an interactive shell in the Vertex AI consoleSource: Go...
FIGURE 8.24 Web terminal to access an interactive shellSource: Google LLC.
Chapter 9
FIGURE 9.1 SHAP model explainability
FIGURE 9.2 Feature attribution using integrated gradients for cat image
Chapter 10
FIGURE 10.1 TF model serving options
FIGURE 10.2 Static reference architecture
FIGURE 10.3 Dynamic reference architecture
FIGURE 10.4 Caching architecture
FIGURE 10.5 Deploying to an endpoint
FIGURE 10.6 Sample prediction request
FIGURE 10.7 Batch prediction job in Console
Chapter 11
FIGURE 11.1 Relation between model data and ML code for MLOps
FIGURE 11.2 End‐to‐end ML development workflow
FIGURE 11.3 Kubeflow architecture
FIGURE 11.4 Kubeflow components and pods
FIGURE 11.5 Vertex AI Pipelines
FIGURE 11.6 Vertex AI Pipelines condition for deployment
FIGURE 11.7 Lineage tracking with Vertex AI Pipelines
FIGURE 11.8 Lineage tracking in Vertex AI Metadata store
FIGURE 11.9 Continuous training and CI/CD
FIGURE 11.10 CI/CD with Kubeflow Pipelines
FIGURE 11.11 Kubeflow Pipelines on GCP
FIGURE 11.12 TFX pipelines, libraries, and components
Chapter 12
FIGURE 12.1 Categorical features
FIGURE 12.2 Numerical values
FIGURE 12.3 Vertex Metadata data model
FIGURE 12.4 Vertex AI Pipelines showing lineage
Chapter 13
FIGURE 13.1 Steps in MLOps level 0
FIGURE 13.2 MLOps Level 1 or strategic phase
FIGURE 13.3 MLOps level 2, the transformational phase
Chapter 14
FIGURE 14.1 Running a SQL query in the web console
FIGURE 14.2 Running the same SQL query through a Jupyter Notebook on Vertex ...
FIGURE 14.3 SQL options for
DNN_CLASSIFIER
and
DNN_REGRESSOR
FIGURE 14.4 Query showing results of model evaluation
FIGURE 14.5 Query results showing only the predictions
FIGURE 14.6 Global feature importance returned for our model
FIGURE 14.7 Prediction result
FIGURE 14.8 Top feature attributions for the prediction
Cover
Table of Contents
Title Page
Copyright
Dedication
Acknowledgments
About the Author
About the Technical Editors
Introduction
Begin Reading
Appendix: Answers to Review Questions
Index
End User License Agreement
i
ii
iii
v
vii
ix
xxi
xxii
xxiii
xxiv
xxv
xxvi
xxvii
xxviii
xxix
xxx
xxxi
xxxii
xxxiii
xxxiv
xxxv
xxxvi
xxxvii
xxxviii
xxxix
xl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
Mona MonaPratap Ramamurthy
Copyright © 2024 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada and the United Kingdom.
ISBNs: 9781119944461 (paperback), 9781119981848 (ePDF), 9781119981565 (ePub)
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per‐copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750‐8400, fax (978) 750‐4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748‐6011, fax (201) 748‐6008, or online at www.wiley.com/go/permission.
Trademarks: WILEY and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. Google Cloud is a trademark of Google, Inc. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and authors have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762‐2974, outside the United States at (317) 572‐3993 or fax (317) 572‐4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our website at www.wiley.com.
Library of Congress Control Number: 2023931675
Cover image: © Getty Images Inc./Jeremy WoodhouseCover design: Wiley
To my late father, grandparents, mom, and husband (Pratyush Ranjan), mentor (Mark Smith), and friends. Also to anyone trying to study for this exam. Hope this book helps you pass the exam with flying colors!
—Mona Mona
To my parents, wonderful wife (Swetha), and two fantastic children: Rishab and Riya.
—Pratap Ramamurthy
Although this book bears my name as author, many other people contributed to its creation. Without their help, this book wouldn't exist, or at best would exist in a lesser form. Pratap Ramamurthy as my co‐author has helped contribute a third of the content of this book. Kim Wimpsett, the development editor, Christine O'Connor, the managing editor, and Saravanan Dakshinamurthy, the production specialist, oversaw the book as it progressed through all its stages. Arielle Guy was the book's proofreader and Judy Flynn was the copyeditor. Last but not the least, thanks to Hitesh Hinduja for being an amazing reviewer throughout the book writing process.
I'd also like to thank Jim Minatel and Melissa Burlock at Wiley, and Dan Sullivan, who helped connect me with Wiley to write this book.
—Mona Mona
This book is the product of hard work by many people, and it was wonderful to see everyone come together as a team, starting with Jim Minatel and Melissa Burlock from Wiley and including Kim Wimpsett, Christine O' Connor, Saravanan Dakshinamurthy, Judy Flynn, Arielle Guy, and the reviewers.
Most importantly, I would like to thank Mona for spearheading this huge effort. Her knowledge from her previous writing experience and leadership from start to finish was crucial to bringing this book to completion.
—Pratap Ramamurthy
Mona Mona is an AI/ML specialist at Google Public Sector. She is the author of the book Natural Language Processing with AWS AI Services and a speaker. She was a senior AI/ML specialist Solution Architect at AWS before joining Google. She has 14 certifications and has created courses for AWS AI/ML Certification Specialty Exam readiness. She has authored 17 blogs on AI/ML and also co‐authored a research paper on AWS CORD‐19 Search: A neural search engine for COVID‐19 literature, which won an award at the Association for the Advancement of Artificial Intelligence (AAAI) conference. She can be reached at [email protected].
Pratap Ramamurthy loves to solve problems using machine learning. Currently he is an AI/ML specialist at Google Public Sector. Previously he worked at AWS as a partner solution architect where he helped build the partner ecosystem for Amazon SageMaker. Later he was a principal solution architect at H2O.ai, a company that works on machine learning algorithms for structured data and natural language. Prior to that he was a developer and a researcher. To his credit he has several research papers in networking, server profiling technology, genetic algorithms, and optoelectronics. He holds three patents related to cloud technologies. In his spare time, he likes to teach AI using modern board games. He can be reached at [email protected].
Hitesh Hinduja is an ardent artificial intelligence (AI) and data platforms enthusiast currently working as a senior manager in Azure Data and AI at Microsoft. He worked as a senior manager in AI at Ola Electric, where he led a team of 30+ people in the areas of machine learning, statistics, computer vision, deep learning, natural language processing, and reinforcement learning. He has filed 14+ patents in India and the United States and has numerous research publications under his name. Hitesh has been associated in research roles at India's top B‐schools: Indian School of Business, Hyderabad, and the Indian Institute of Management, Ahmedabad. He is also actively involved in training and mentoring and has been invited as a guest speaker by various corporations and associations across the globe. He is an avid learner and enjoys reading books in his free time.
Kanchana Patlolla is an AI innovation program leader at Google Cloud. Previously she worked as an AI/ML specialist in Google Cloud Platform. She has architected solutions with major public cloud providers in financial services industries on their quest to the cloud, particularly in their Big Data and machine learning journey. In her spare time, she loves to try different cuisines and relax with her kids.
Adam Vincent is an experienced educator with a passion for spreading knowledge and helping people expand their skill sets. He is multi‐certified in Google Cloud, is a Google Cloud Authorized Trainer, and has created multiple courses about machine learning. Adam also loves playing with data and automating everything. When he is not behind a screen, he enjoys playing tabletop games with friends and family, reading sci‐fi and fantasy novels, and hiking.
Wiley and the authors wish to thank the Google Technical Reviewer Emma Freeman for her thorough review of the proofs for this book.
When customers have a business problem, say to detect objects in an image, sometimes it can be solved very well using machine learning. Google Cloud Platform (GCP) provides an extensive set of tools to be able to build a model that can accomplish this and deploy it for production usage. This book will cover many different use cases, such as using sales data to forecast for next quarter, identifying objects in images or videos, and even extracting information from text documents. This book helps an engineer build a secure, scalable, resilient machine learning application and automate the whole process using the latest technologies.
The purpose of this book is to help you pass the latest version of the Google Cloud Professional ML Engineer (PMLE) exam. Even after you've taken and passed the PMLE exam, this book should remain a useful reference as it covers the basics of machine learning, BigQuery ML, the Vertex AI platform, and MLOps.
A Professional Machine Learning Engineer designs, builds, and productionizes ML models to solve business challenges using Google Cloud technologies and knowledge of proven ML models and techniques. The ML engineer considers responsible AI throughout the ML development process and collaborates closely with other job roles to ensure the long‐term success of models. The ML engineer should be proficient in all aspects of model architecture, data pipeline interaction, and metrics interpretation. The ML engineer needs familiarity with foundational concepts of application development, infrastructure management, data engineering, and data governance. Through an understanding of training, retraining, deploying, scheduling, monitoring, and improving models, the ML engineer designs and creates scalable solutions for optimal performance.
There are several good reasons to get your PMLE certification.
Provides proof of professional achievement
Certifications are quickly becoming status symbols in the computer service industry. Organizations, including members of the computer service industry, are recognizing the benefits of certification.
Increases your marketability
According to Forbes (
www.forbes.com/sites/louiscolumbus/2020/02/10/15-top-paying-it-certifications-in-2020/?sh=12f63aa8358e
), jobs that require GCP certifications are the highest‐paying jobs for the second year in a row, paying an average salary of $175,761/year. So, there is a demand from many engineers to get certified. Of the many certifications that GCP offers, the AI/ML certified engineer is a new certification and is still evolving.
Provides an opportunity for advancement
IDC's research (
www.idc.com/getdoc.jsp?containerId=IDC_P40729
) indicates that while AI/ML adoption is on the rise, the cost, lack of expertise, and lack of life cycle management tools are among the top three inhibitors to realizing AI and ML at scale.
This book is the first in the market to talk about Google Cloud AI/ML tools and the technology covering the latest Professional ML Engineer certification guidelines released on February 22, 2022.
Recognizes Google as a leader in open source and AI
Google is the main contributor to many of the path‐breaking open source softwares that dramatically changed the landscape of AI/ML, including TensorFlow, Kubeflow, Word2vec, BERT, and T5. Although these algorithms are in the open source domain, Google has the distinct ability of bringing these open source projects to the market through the Google Cloud Platform (GCP). In this regard, the other cloud providers are frequently seen as trailing Google's offering.
Raises customer confidence
As the IT community, users, small business owners, and the like become more familiar with the PMLE certified professional, more of them will realize that the PMLE professional is more qualified to architect secure, cost‐effective, and scalable ML solutions on the Google Cloud environment than a noncertified individual.
You do not have to work for a particular company. It's not a secret society. There is no prerequisite to take this exam. However, there is a recommendation to have 3+ years of industry experience, including one or more years designing and managing solutions using Google Cloud.
This exam is 2 hours and has 50–60 multiple‐choice questions.
You can register two ways for this exam:
Take the online‐proctored exam from anywhere or sitting at home. You can review the online testing requirements at
www.webassessor.com/wa.do?page=certInfo&branding=GOOGLECLOUD&tabs=13
.
Take the on‐site, proctored exam at a testing center.
We usually prefer to go with the on‐site option as we like the focus time in a proctored environment. We have taken all our certifications in a test center. You can find and locate a test center near you at www.kryterion.com/Locate-Test-Center.
This book is intended to help students, developers, data scientists, IT professionals, and ML engineers gain expertise in the ML technology on the Google Cloud Platform and take the Professional Machine Learning Engineer exam. This book intends to take readers through the machine learning process starting from data and moving on through feature engineering, model training, and deployment on the Google Cloud. It also walks readers through best practices for when to pick custom models versus AutoML or pretrained models. Google Cloud AI/ML technologies are presented through real‐world scenarios to illustrate how IT professionals can design, build, and operate secure ML cloud environments to modernize and automate applications.
Anybody who wants to pass the Professional ML Engineer exam may benefit from this book. If you're new to Google Cloud, this book covers the updated machine learning exam course material, including the Google Cloud Vertex AI platform, MLOps, and BigQuery ML. This is the only book on the market to cover the complete Vertex AI platform, from bringing your data to training, tuning, and deploying your models.
Since it's a professional‐level study guide, this book is written with the assumption that you know the basics of the Google Cloud Platform, such as compute, storage, networking, databases, and identity and access management (IAM) or have taken the Google Cloud Associate‐level certification exam. Moreover, this book assumes you understand the basics of machine learning and data science in general. In case you do not understand a term or concept, we have included a glossary for your reference.
This book consists of 14 chapters plus supplementary information: a glossary, this introduction, and the assessment test after the introduction. The chapters are organized as follows:
Chapter
1
: Framing ML Problems
This chapter covers how you can translate business challenges into ML use cases.
Chapter
2
: Exploring Data and Building Data Pipelines
This chapter covers visualization, statistical fundamentals at scale, evaluation of data quality and feasibility, establishing data constraints (e.g., TFDV), organizing and optimizing training datasets, data validation, handling missing data, handling outliers, and data leakage.
Chapter
3
: Feature Engineering
This chapter covers topics such as encoding structured data types, feature selection, class imbalance, feature crosses, and transformations (TensorFlow Transform).
Chapter
4
: Choosing the Right ML Infrastructure
This chapter covers topics such as evaluation of compute and accelerator options (e.g., CPU, GPU, TPU, edge devices) and choosing appropriate Google Cloud hardware components. It also covers choosing the best solution (ML vs. non‐ML, custom vs. pre‐packaged [e.g., AutoML, Vision API]) based on the business requirements. It talks about how defining the model output should be used to solve the business problem. It also covers deciding how incorrect results should be handled and identifying data sources (available vs. ideal). It talks about AI solutions such as CCAI, DocAI, and Recommendations AI.
Chapter
5
: Architecting ML Solutions
This chapter explains how to design reliable, scalable, and highly available ML solutions. Other topics include how you can choose appropriate ML services for a use case (e.g., Cloud Build, Kubeflow), component types (e.g., data collection, data management), automation, orchestration, and serving in machine learning.
Chapter
6
: Building Secure ML Pipelines
This chapter describes how to build secure ML systems (e.g., protecting against unintentional exploitation of data/model, hacking). It also covers the privacy implications of data usage and/or collection (e.g., handling sensitive data such as personally identifiable information [PII] and protected health information [PHI]).
Chapter
7
: Model Building
This chapter describes the choice of framework and model parallelism. It also covers modeling techniques given interpretability requirements, transfer learning, data augmentation, semi‐supervised learning, model generalization, and strategies to handle overfitting and underfitting.
Chapter
8
: Model Training and Hyperparameter Tuning
This chapter focuses on the ingestion of various file types into training (e.g., CSV, JSON, IMG, parquet or databases, Hadoop/Spark). It covers training a model as a job in different environments. It also talks about unit tests for model training and serving and hyperparameter tuning. Moreover, it discusses ways to track metrics during training and retraining/redeployment evaluation.
Chapter
9
: Model Explainability on Vertex AI
This chapter covers approaches to model explainability on Vertex AI.
Chapter
10
: Scaling Models in Production
This chapter covers scaling prediction service (e.g., Vertex AI Prediction, containerized serving), serving (online, batch, caching), Google Cloud serving options, testing for target performance, and configuring trigger and pipeline schedules.
Chapter
11
: Designing ML Training Pipelines
This chapter covers identification of components, parameters, triggers, and compute needs (e.g., Cloud Build, Cloud Run). It also talks about orchestration framework (e.g., Kubeflow Pipelines/Vertex AI Pipelines, Cloud Composer/Apache Airflow), hybrid or multicloud strategies, and system design with TFX components/Kubeflow DSL.
Chapter
12
: Model Monitoring, Tracking, and Auditing Metadata
This chapter covers the performance and business quality of ML model predictions, logging strategies, organizing and tracking experiments, and pipeline runs. It also talks about dataset versioning and model/dataset lineage.
Chapter
13
: Maintaining ML Solutions
This chapter covers establishing continuous evaluation metrics (e.g., evaluation of drift or bias), understanding the Google Cloud permission model, and identification of appropriate retraining policies. It also covers common training and serving errors (TensorFlow), ML model failure, and resulting biases. Finally, it talks about how you can tune the performance of ML solutions for training and serving in production.
Chapter
14
: BigQuery ML
This chapter covers BigQueryML algorithms, when to use BigQueryML versus Vertex AI, and the interoperability with Vertex AI.
Each chapter begins with a list of the objectives that are covered in the chapter. The book doesn't cover the objectives in order. Thus, you shouldn't be alarmed at some of the odd ordering of the objectives within the book.
At the end of each chapter, you'll find several elements you can use to prepare for the exam.
Exam Essentials
This section summarizes important information that was covered in the chapter. You should be able to perform each of the tasks or convey the information requested.
Review Questions
Each chapter concludes with 8+ review questions. You should answer these questions and check your answers against the ones provided after the questions. If you can't answer at least 80 percent of these questions correctly, go back and review the chapter, or at least those sections that seem to be giving you difficulty.
The review questions, assessment test, and other testing elements included in this book are not derived from the PMLE exam questions, so don't memorize the answers to these questions and assume that doing so will enable you to pass the exam. You should learn the underlying topic, as described in the text of the book. This will let you answer the questions provided with this book and pass the exam. Learning the underlying topic is also the approach that will serve you best in the workplace.
To get the most out of this book, you should read each chapter from start to finish and then check your memory and understanding with the chapter‐end elements. Even if you're already familiar with a topic, you should skim the chapter; machine learning is complex enough that there are often multiple ways to accomplish a task, so you may learn something even if you're already competent in an area.
Like all exams, the Google Cloud certification from Google is updated periodically and may eventually be retired or replaced. At some point after Google is no longer offering this exam, the old editions of our books and online tools will be retired. If you have purchased this book after the exam was retired, or are attempting to register in the Sybex online learning environment after the exam was retired, please know that we make no guarantees that this exam’s online Sybex tools will be available once the exam is no longer available.
This book is accompanied by an online learning environment that provides several additional elements. The following items are available among these companion files:
Practice tests
All of the questions in this book appear in our proprietary digital test engine—including the 30‐question assessment test at the end of this introduction and the 100+ questions that make up the review question sections at the end of each chapter. In addition, there are two 50‐question bonus exams.
Electronic “flash cards”
The digital companion files include 50+ questions in flash card format (a question followed by a single correct answer). You can use these to review your knowledge of the exam objectives.
Glossary
The key terms from this book, and their definitions, are available as a fully searchable PDF.
You can access all these resources at www.wiley.com/go/sybextestprep.
This book uses certain typographic styles in order to help you quickly identify important information and to avoid confusion over the meaning of words such as on‐screen prompts. In particular, look for the following styles:
Italicized text
indicates key terms that are described at length for the first time in a chapter. These words probably appear in the searchable online glossary. (Italics are also used for emphasis.)
A monospaced font
indicates the contents of configuration files, messages displayed as a text‐mode Google Cloud shell prompt, filenames, text‐mode command names, and Internet URLs.
In addition to these text conventions, which can apply to individual words or entire paragraphs, a few conventions highlight segments of text:
A note indicates information that's useful or interesting but that's somewhat peripheral to the main text. A note might be relevant to a small number of networks, for instance, or it may refer to an outdated feature.
A tip provides information that can save you time or frustration and that may not be entirely obvious. A tip might describe how to get around a limitation or how to use a feature to perform an unusual task.
Here is where to find the objectives covered in this book.
OBJECTIVE
CHAPTER(S)
Section 1: Architecting low‐code ML solutions
1.1 Developing ML models by using BigQuery ML. Considerations include:
14
Building the appropriate BigQuery ML model (e.g., linear and binary classification, regression, time‐series, matrix factorization, boosted trees, autoencoders) based on the business problem
14
Feature engineering or selection by using BigQuery ML
14
Generating predictions by using BigQuery ML
14
1.2 Building AI solutions by using ML APIs. Considerations include:
4
Building applications by using ML APIs (e.g., Cloud Vision API, Natural Language API, Cloud Speech API, Translation)
4
Building applications by using industry‐specific APIs (e.g., Document AI API, Retail API)
4
1.3 Training models by using AutoML. Considerations include:
4
Preparing data for AutoML (e.g., feature selection, data labeling, Tabular Workflows on AutoML)
4
Using available data (e.g., tabular, text, speech, images, videos) to train custom models
4
Using AutoML for tabular data
4
Creating forecasting models using AutoML
4
Configuring and debugging trained models
4
Section 2: Collaborating within and across teams to manage data and models
2.1 Exploring and preprocessing organization‐wide data (e.g., Cloud Storage, BigQuery, Cloud Spanner, Cloud SQL, Apache Spark, Apache Hadoop). Considerations include:
3
,
5
,
6
,
8
,
13
Organizing different types of data (e.g., tabular, text, speech, images, videos) for efficient training
8
Managing datasets in Vertex AI
8
Data preprocessing (e.g., Dataflow, TensorFlow Extended [TFX], BigQuery)
3
,
5
Creating and consolidating features in Vertex AI Feature Store
13
Privacy implications of data usage and/or collection (e.g., handling sensitive data such as personally identifiable information [PII] and protected health information [PHI])
6
2.2 Model prototyping using Jupyter notebooks. Considerations include:
6
,
8
Choosing the appropriate Jupyter backend on Google Cloud (e.g., Vertex AI Workbench notebooks, notebooks on Dataproc)
8
Applying security best practices in Vertex AI Workbench
6
Using Spark kernels
8
Integration with code source repositories
8
Developing models in Vertex AI Workbench by using common frameworks (e.g., TensorFlow, PyTorch, sklearn, Spark, JAX)
8
2.3 Tracking and running ML experiments. Considerations include:
5
,
12
Choosing the appropriate Google Cloud environment for development and experimentation (e.g., Vertex AI Experiments, Kubeflow Pipelines, Vertex AI TensorBoard with TensorFlow and PyTorch) given the framework
5
,
12
Section 3: Scaling prototypes into ML models
3.1 Building models. Considerations include:
7
Choosing ML framework and model architecture
7
Modeling techniques given interpretability requirements
7
3.2 Training models. Considerations include:
7
,
8
Organizing training data (e.g., tabular, text, speech, images, videos) on Google Cloud (e.g., Cloud Storage, BigQuery)
8
Ingestion of various file types (e.g., CSV, JSON, images, Hadoop, databases) into training
8
Training using different SDKs (e.g., Vertex AI custom training, Kubeflow on Google Kubernetes Engine, AutoML, tabular workflows)
8
Using distributed training to organize reliable pipelines
7
,
8
Hyperparameter tuning
8
Troubleshooting ML model training failures
8
3.3 Choosing appropriate hardware for training. Considerations include:
4
,
8
Evaluation of compute and accelerator options (e.g., CPU, GPU, TPU, edge devices)
4
Distributed training with TPUs and GPUs (e.g., Reduction Server on Vertex AI, Horovod)
8
Section 4: Serving and scaling models
4.1 Serving models. Considerations include:
5
,
10
Batch and online inference (e.g., Vertex AI, Dataflow, BigQuery ML, Dataproc)
5
,
10
Using different frameworks (e.g., PyTorch, XGBoost) to serve models
10
Organizing a model registry
10
A/B testing different versions of a model
10
4.2 Scaling online model serving. Considerations include:
4
,
5
,
6
,
10
,
13
Vertex AI Feature Store
13
Vertex AI public and private endpoints
6
Choosing appropriate hardware (e.g., CPU, GPU, TPU, edge)
4
Scaling the serving backend based on the throughput (e.g., Vertex AI Prediction, containerized serving)
10
Tuning ML models for training and serving in production (e.g., simplification techniques, optimizing the ML solution for increased performance, latency, memory, throughput)
5
,
10
Section 5: Automating and orchestrating ML pipelines
5.1 Developing end‐to‐end ML pipelines. Considerations include:
2
,
3
,
10
,
11
Data and model validation
2
,
3
Ensuring consistent data pre‐processing between training and serving
3
Hosting third‐party pipelines on Google Cloud (e.g., MLFlow)
10
Identifying components, parameters, triggers, and compute needs (e.g., Cloud Build, Cloud Run)
11
Orchestration framework (e.g., Kubeflow Pipelines, Vertex AI Managed Pipelines, Cloud Composer)
11
Hybrid or multicloud strategies
11
System design with TFX components or Kubeflow DSL (e.g., Dataflow)
11
5.2 Automating model retraining. Considerations include:
13
Determining an appropriate retraining policy
13
Continuous integration and continuous delivery (CI/CD) model deployment (e.g., Cloud Build, Jenkins)
13
5.3 Tracking and auditing metadata. Considerations include:
12
Tracking and comparing model artifacts and versions (e.g., Vertex AI Experiments, Vertex ML Metadata)
12
Hooking into model and dataset versioning
12
Model and data lineage
12
Section 6: Monitoring ML solutions
6.1 Identifying risks to ML solutions. Considerations include:
6
,
9
Building secure ML systems (e.g., protecting against unintentional exploitation of data or models, hacking)
6
,
9
Aligning with Google's Responsible AI practices (e.g., biases)
9
Assessing ML solution readiness (e.g., data bias, fairness)
9
Model explainability on Vertex AI (e.g., Vertex AI Prediction)
9
6.2 Monitoring, testing, and troubleshooting ML solutions. Considerations include:
12
,
13
Establishing continuous evaluation metrics (e.g., Vertex AI Model Monitoring, Explainable AI)
12
,
13
Monitoring for training‐serving skew
12
Monitoring for feature attribution drift
12
Monitoring model performance against baselines, simpler models, and across the time dimension
12
Common training and serving errors
13
Exam domains and objectives are subject to change at any time without prior notice and at Google's sole discretion. Please visit its website (https://cloud.google.com/certification/machine-learning-engineer) for the most current information.
If you believe you have found a mistake in this book, please bring it to our attention. At John Wiley & Sons, we understand how important it is to provide our customers with accurate content, but even with our best efforts an error may occur.
In order to submit your possible errata, please email it to our Customer Service Team at [email protected] with the subject line “Possible Book Errata Submission.”
How would you split the data to predict a user lifetime value (LTV) over the next 30 days in an online recommendation system to avoid data and label leakage? (Choose three.)
Perform data collection for 30 days.
Create a training set for data from day 1 to day 29.
Create a validation set for data for day 30.
Create random data split into training, validation, and test sets.
You have a highly imbalanced dataset and you want to focus on the positive class in the classification problem. Which metrics would you choose?
Area under the precision‐recall curve (AUC PR)
Area under the curve ROC (AUC ROC)
Recall
Precision
A feature cross is created by ________________ two or more features.
Swapping
Multiplying
Adding
Dividing
You can use Cloud Pub/Sub to stream data in GCP and use Cloud Dataflow to transform the data.
True
False
You have training data, and you are writing the model training code. You have a team of data engineers who prefer to code in SQL. Which service would you recommend?
BigQuery ML
Vertex AI custom training
Vertex AI AutoML
Vertex AI pretrained APIs
What are the benefits of using a Vertex AI managed dataset? (Choose three.)
Integrated data labeling for unlabeled, unstructured data such as video, text, and images using Vertex data labeling.
Track lineage to models for governance and iterative development.
Automatically splitting data into training, test, and validation sets.
Manual splitting of data into training, test, and validation sets.
Masking, encrypting, and bucketing are de‐identification techniques to obscure PII data using the Cloud Data Loss Prevention API.
True
False
Which strategy would you choose to handle the sensitive data that exists within images, videos, audio, and unstructured free‐form data?
Use NLP API, Cloud Speech API, Vision AI, and Video Intelligence AI to identify sensitive data such as email and location out of box, and then redact or remove it.
Use Cloud DLP to address this type of data.
Use Healthcare API to hide sensitive data.
Create a view that doesn't provide access to the columns in question. The data engineers cannot view the data, but at the same time the data is live and doesn't require human intervention to de‐identify it for continuous training.
You would use __________________ when you are trying to reduce features while trying to solve an overfitting problem with large models.
L1 regularization
L2 regularization
Both A and B
Vanishing gradient
If the weights in a network are very large, then the gradients for the lower layers involve products of many large terms leading to exploding gradients that get too large to converge. What are some of the ways this can be avoided? (Choose two.)
Batch normalization
Lower learning rate
The ReLU activation function
Sigmoid activation function
You have a Spark and Hadoop environment on‐premises, and you are planning to move your data to Google Cloud. Your ingestion pipeline is both real time and batch. Your ML customer engineer recommended a scalable way to move your data using Cloud Dataproc to BigQuery. Which of the following Dataproc connectors would you
not
recommend?
Pub/Sub Lite Spark connector
BigQuery Spark connector
BigQuery connector
Cloud Storage connector
You have moved your Spark and Hadoop environment and your data is in Google Cloud Storage. Your ingestion pipeline is both real time and batch. Your ML customer engineer recommended a scalable way to run Apache Hadoop or Apache Spark jobs directly on data in Google Cloud Storage. Which of the following Dataproc connector would you recommend?
Pub/Sub Lite Spark connector
BigQuery Spark connector
BigQuery connector
Cloud Storage connector
Which of the following is
not
a technique to speed up hyperparameter optimization?
Parallelize the problem across multiple machines by using distributed training with hyperparameter optimization.
Avoid redundant computations by pre‐computing or cache the results of computations that can be reused for subsequent model fits.
Use grid search rather than random search.
If you have a large dataset, use a simple validation set instead of cross‐validation.
Vertex AI Vizier is an independent service for optimizing complex models with many parameters. It can be used only for non‐ML use cases.
True
False
Which of the following is
not
a tool to track metrics when training a neural network?
Vertex AI interactive shell
What‐If Tool
Vertex AI TensorBoard Profiler
Vertex AI hyperparameter tuning
You are a data scientist working to select features with structured datasets. Which of the following techniques will help?
Sampled Shapley
Integrated gradient
XRAI (eXplanation with Ranked Area Integrals)
Gradient descent
Variable selection and avoiding target leakage are the benefits of feature importance.
True
False
A TensorFlow SavedModel is what you get when you call __________________. Saved models are stored as a directory on disk. The file within that directory,
saved_model.pb
, is a protocol buffer describing the functional tf.Graph.
tf.saved_model.save()
tf.Variables
tf.predict()
Tf.keras.models.load_model
What steps would you recommend a data engineer trying to deploy a TensorFlow model trained locally to set up real‐time prediction using Vertex AI? (Choose three.)
Import the model to Model Registry.
Deploy the model.
Create an endpoint for deployed model.
Create a model in Model Registry.
You are an MLOps engineer and you deployed a Kubeflow pipeline on Vertex AI pipelines. Which Google Cloud feature will help you track lineage with your Vertex AI pipelines?
Vertex AI Model Registry
Vertex AI Artifact Registry
Vertex AI ML metadata
Vertex AI Model Monitoring
What is
not
a recommended way to invoke a Kubeflow pipeline?
Using Cloud Scheduler
Responding to an event, using
Pub/Sub
and
Cloud Functions
Cloud Composer and Cloud Build
Directly using BigQuery
You are a software engineer working at a start‐up that works on organizing personal photos and pet photos. You have been asked to use machine learning to identify and tag which photos have pets and also identify public landmarks in the photos. These features are not available today and you have a week to create a solution for this. What is the best approach?
Find the best cat/dog dataset and train a custom model on Vertex AI using the latest algorithm available. Do the same for identifying landmarks.
Find a pretrained cat/dog dataset (available) and train a custom model on Vertex AI using the latest deep neural network TensorFlow algorithm.
Use the cat/dog dataset to train a Vertex AI AutoML image classification model on Vertex AI. Do the same for identifying landmarks.
Vision AI already identifies pets and landmarks. Use that to see if it meets the requirements. If not, use the Vertex AI AutoML model.
You are building a product that will accurately throw a ball into the basketball net. This should work no matter where it is placed on the court. You have created a very large TensorFlow model (size more than 90 GB) based on thousands of hours of video. The model uses custom operations, and it has optimized the training loop to not have any I/O operations. What are your hardware options to train this model?
Use a TPU slice because the model is very large and has been optimized to not have any I/O operations.
Use a TPU pod because the model size is larger than 50 GB.
Use a GPU‐only instance.
Use a CPU‐only instance to build your model.
You work in the fishing industry and have been asked to use machine learning to predict the age of lobster based on size and color. You have thousands of images of lobster from Arctic fishing boats, from which you have extracted the size of the lobster that is passed to the model, and you have built a regression model for predicting age. Your model has performed very well in your test and validation data. Users want to use this model from their boats. What are your next steps? (Choose three.)
Deploy the model on Vertex AI, expose a REST endpoint.
Enable monitoring on the endpoint and see if there is any training‐serving skew and drift detection. The original dataset was only from Arctic boats.
Also port this model to BigQuery for batch prediction.
Enable Vertex AI logging and analyze the data in BigQuery.
You have built a custom model and deployed it in Vertex AI. You are not sure if the predictions are being served fast enough (low latency). You want to measure this by enabling Vertex AI logging. Which type of logging will give you information like time stamp and latency for each request?
Container logging
Time stamp logging
Access logging
Request‐response logging
You are part of a growing ML team in your company that has started to use machine learning to improve your business. You were initially building models using Vertex AI AutoML and providing the trained models to the deployment teams. How should you scale this?
Create a Python script to train multiple models using Vertex AI.
You are now in level 0, and your organization needs level 1 MLOps maturity. Automate the training using Vertex AI Pipelines.
You are in the growth phase of the organization, so it is important to grow the team to leverage more ML engineers.
Move to Vertex AI custom models to match the MLOps maturity level.
What is
not
a reason to use Vertex AI Feature Store?
It is a managed service.
It extracts features from images and videos and stores them.
All data is a time‐series, so you can track when the features values change over time.
The features created by the feature engineering teams are available during training time but not during serving time. So this helps in bridging that.
You are a data analyst in an organization that has thousands of insurance agents, and you have been asked to predict the revenue by each agent for the next quarter. You have the historical data for the last 10 years. You are familiar with all AI services on Google Cloud. What is the most efficient way to do this?
Build a Vertex AI AutoML forecast, deploy the model, and make predictions using REST API.
Build a Vertex AI AutoML forecast model, import the model into BigQuery, and make predictions using BigQuery ML.
Build a BigQuery ML ARIMA+ model using data in BigQuery, and make predictions in BigQuery.
Build a BigQuery ML forecast model, export the model to Vertex AI, and run a batch prediction in Vertex AI.
You are an expert in Vertex AI Pipelines, Vertex AI training, and Vertex AI deployment and monitoring. A data analyst team has built a highly accurate model, and this has been brought to you. Your manager wants you to make predictions using the model and use those predictions. What do you do?
Retrain the model on Vertex AI with the same data and deploy the model on Vertex AI as part of your CD.
Run predictions on BigQuery ML and export the predictions into GCS and then load into your pipeline.
Export the model from BigQuery into the Vertex AI model repository and run predictions in Vertex AI.
Download the BigQuery model, and package into a Vertex AI custom container and deploy it in Vertex AI.
Which of the following statements about Vertex AI and BigQuery ML is incorrect?
BigQueryML supports both unsupervised and supervised models.
BigQuery ML is very portable. Vertex AI supports all models trained on BigQuery ML.
Vertex AI model monitoring and logs data is stored in BigQuery tables.
BigQuery ML also has algorithms to predict recommendations for users.
A, B, C. In case of time‐series data, the best way to perform a split is to do a time‐based split rather than a random split to avoid the data or label leakage. For more information, see
Chapter 2
.
A. In the case of an imbalanced class, precision‐recall curves (PR curves) are recommended for highly skewed domains. For more information, see
Chapter 3
.
B. A feature cross, or synthetic feature, is created by multiplying (crossing) two or more features. It can be multiplying the same feature by itself [A * A] or it can be multiplying values of multiple features such as [A * B * C]. In machine learning, feature crosses are usually performed on one‐hot encoded features. For example, binned_latitude × binned_longitude. For more information, see
Chapter 3
.
A, True. Cloud Pub/Sub creates a pipeline for streaming the data and Cloud Dataflow is used for data transformation. For more information, see
Chapter 5
.
A. If you want to perform ML using SQL, BigQuery ML is the right approach. For more information, see
Chapter 5
.
A, B, C. As stated in options A, B, and C, the advantages of using a managed dataset are to have integrated data labeling, data lineage, and automatic labeling features. For more information, see
Chapter 5
.
A. Cloud DLP uses all the mentioned techniques to obscure the PII data. For more information, see
Chapter 6
.
A. Cloud DLP only applies to data with a defined pattern for masking. If you have image data and a pattern of masking is not defined (for example, you want to redact faces from images), you would use Vision AI to identify the image and then redact the bounding box of the image using Python code. For more information, see
Chapter 6
.
A. You will use L1 when you are trying to reduce features and L2 when you are looking for a stable model. Vanishing gradients for the lower layers (closer to the input) can become very small. When the gradients vanish toward 0 for the lower layers, these layers train very slowly or they do not train at all. For more information, see
Chapter 7
.
A, B. Batch normalization and lower learning rate can help prevent exploding gradients. The ReLU activation function can help prevent vanishing gradients. For more information, see
Chapter 7
.
D. You will not use the Cloud Storage connector as the data is on premises. You would need a connector to move data directly to BigQuery. For more information, see
Chapter 8
.
D. The premise of the question is that you've moved the data to Cloud Storage for use. The Cloud Storage connector will allow you to use that data in your Hadoop/Spark jobs without it having to be moved onto the machines in the cluster. For more information, see
Chapter 8
.
C. You can improve performance by using a random search algorithm since it uses fewer trails. Options A, B, and D are all correct ways to improve optimization. For more information, see
Chapter 8
.
B. Vertex AI Vizier is an independent service for optimizing complex models with many parameters. It can be used for both ML and non‐ML use cases. For more information, see
Chapter 8
.
D. Vertex AI hyperparameter tuning is not a tool to track metrics when training a neural network; rather, it is used for tuning hyperparameters. For more information, see
Chapter 8
.
A. Sampled Shapley is the only method with Explainable AI, which can help explain tabular or structured datasets. For more information, see
Chapter 9
.
A. Feature importance is a technique that explains the features that make up the training data using a score (importance). It indicates how useful or valuable a feature is relative to other features. For more information, see
Chapter 9
.
A. After TensorFlow model training, you get a SavedModel. A SavedModel contains a complete TensorFlow program, including trained parameters (i.e.,
tf.Variable
s
) and computation. For more information, see
Chapter 10
.
A, B, C. You need to import your models to the Model Registry in Vertex AI and then deploy the model before creating an endpoint. For more information, see
Chapter 10
.
C. Vertex ML Metadata lets you record the metadata and artifacts produced by your ML system and query that metadata to help analyze, debug, and audit the performance of your ML system or the artifacts that it produces. For more information, see
Chapter 10
.
D. You cannot invoke a Kubeflow pipeline using BigQuery as they are used for ETL workloads and not for MLOps. For more information, see
Chapter 11
.
D. The easiest approach is to use Vision AI as it is pretrained and already available. Options A, B, and C are all valid but they are unnecessarily complex given that Vision AI already achieves that. The key point to note is that you only have a week to do this task, so choose the fastest option. For more information, see
Chapter 4
.
C. TPU cannot be used for this case because it has custom TensorFlow operations. So options A and B are not valid. Option C is the best option because it is a large model. Using CPU only is going to be very slow. For more information, see
Chapter 4
.
A, B, D. There is no need to port into BigQuery for batch processing. Based on the question, batch is not a requirement; only online prediction is a requirement. The other options of deploying the model on Vertex AI, creating an endpoint and monitoring and logging, are valid. For more information, see
Chapter 12
.
C. Container logging gives you stderr and stdout from the container. Request‐response logs a sample of online predictions. There is no such thing as time stamp logging. Access logging is the correct answer. For more information, see
Chapter 12
.