38,99 €
Succeed on the AWS Machine Learning exam or in your next job as a machine learning specialist on the AWS Cloud platform with this hands-on guide As the most popular cloud service in the world today, Amazon Web Services offers a wide range of opportunities for those interested in the development and deployment of artificial intelligence and machine learning business solutions. The AWS Certified Machine Learning Study Guide: Specialty (MLS-CO1) Exam delivers hyper-focused, authoritative instruction for anyone considering the pursuit of the prestigious Amazon Web Services Machine Learning certification or a new career as a machine learning specialist working within the AWS architecture. From exam to interview to your first day on the job, this study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. And with the practice exams and assessments, electronic flashcards, and supplementary online resources that accompany this Study Guide, you'll be prepared for success in every subject area covered by the exam. You'll also find: * An intuitive and organized layout perfect for anyone taking the exam for the first time or seasoned professionals seeking a refresher on machine learning on the AWS Cloud * Authoritative instruction on a widely recognized certification that unlocks countless career opportunities in machine learning and data science * Access to the Sybex online learning resources and test bank, with chapter review questions, a full-length practice exam, hundreds of electronic flashcards, and a glossary of key terms AWS Certified Machine Learning Study Guide: Specialty (MLS-CO1) Exam is an indispensable guide for anyone seeking to prepare themselves for success on the AWS Certified Machine Learning Specialty exam or for a job interview in the field of machine learning, or who wishes to improve their skills in the field as they pursue a career in AWS machine learning.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 543
Veröffentlichungsjahr: 2021
Cover
Title Page
Copyright
Dedication
Acknowledgments
About the Authors
About the Technical Editor
Introduction
The AWS Certified Machine Learning Specialty Exam
Who Should Buy This Book
Study Guide Features
AWS Certified Machine Learning Specialty Exam Objectives
Assessment Test
Answers to Assessment Test
PART I: Introduction
Chapter 1: AWS AI ML Stack
Amazon Rekognition
Amazon Textract
Amazon Transcribe
Amazon Translate
Amazon Polly
Amazon Lex
Amazon Kendra
Amazon Personalize
Amazon Forecast
Amazon Comprehend
Amazon CodeGuru
Amazon Augmented AI
Amazon SageMaker
AWS Machine Learning Devices
Summary
Exam Essentials
Review Questions
Chapter 2: Supporting Services from the AWS Stack
Storage
Amazon VPC
AWS Lambda
AWS Step Functions
AWS RoboMaker
Summary
Exam Essentials
Review Questions
PART II: Phases of Machine Learning Workloads
Chapter 3: Business Understanding
Phases of ML Workloads
Business Problem Identification
Summary
Exam Essentials
Review Questions
Chapter 4: Framing a Machine Learning Problem
ML Problem Framing
Recommended Practices
Summary
Exam Essentials
Review Questions
Chapter 5: Data Collection
Basic Data Concepts
Data Repositories
Data Migration to AWS
Summary
Exam Essentials
Review Questions
Chapter 6: Data Preparation
Data Preparation Tools
Summary
Exam Essentials
Review Questions
Chapter 7: Feature Engineering
Feature Engineering Concepts
Feature Engineering Tools on AWS
Summary
Exam Essentials
Review Questions
Chapter 8: Model Training
Common ML Algorithms
Local Training and Testing
Remote Training
Distributed Training
Monitoring Training Jobs
Debugging Training Jobs
Hyperparameter Optimization
Summary
Exam Essentials
Review Questions
Chapter 9: Model Evaluation
Experiment Management
Metrics and Visualization
Summary
Exam Essentials
Review Questions
Chapter 10: Model Deployment and Inference
Deployment for AI Services
Deployment for Amazon SageMaker
Advanced Deployment Topics
Summary
Exam Essentials
Review Questions
Chapter 11: Application Integration
Integration with On-Premises Systems
Integration with Cloud Systems
Integration with Front-End Systems
Summary
Exam Essentials
Review Questions
PART III: Machine Learning Well-Architected Lens
Chapter 12: Operational Excellence Pillar for ML
Operational Excellence on AWS
Summary
Exam Essentials
Review Questions
Chapter 13: Security Pillar
Security and AWS
Secure SageMaker Environments
AI Services Security
Summary
Exam Essentials
Review Questions
Chapter 14: Reliability Pillar
Reliability on AWS
Change Management for ML
Failure Management for ML
Summary
Exam Essentials
Review Questions
Chapter 15: Performance Efficiency Pillar for ML
Performance Efficiency for ML on AWS
Summary
Exam Essentials
Review Questions
Chapter 16: Cost Optimization Pillar for ML
Common Design Principles
Cost Optimization for ML Workloads
Summary
Exam Essentials
Review Questions
Chapter 17: Recent Updates in the AWS AI/ML Stack
New Services and Features Related to AI Services
New Features Related to Amazon SageMaker
Summary
Exam Essentials
Appendix Answers to the Review Questions
Chapter 1: AWS AI ML Stack
Chapter 2: Supporting Services from the AWS Stack
Chapter 3: Business Understanding
Chapter 4: Framing a Machine Learning Problem
Chapter 5: Data Collection
Chapter 6: Data Preparation
Chapter 7: Feature Engineering
Chapter 8: Model Training
Chapter 9: Model Evaluation
Chapter 10: Model Deployment and Inference
Chapter 11: Application Integration
Chapter 12: Operational Excellence Pillar for ML
Chapter 13: Security Pillar
Chapter 14: Reliability Pillar
Chapter 15: Performance Efficiency Pillar for ML
Chapter 16: Cost Optimization Pillar for ML
Index
End User License Agreement
Chapter 1
TABLE 1.1 Various features of SageMaker corresponding to the different phase...
Chapter 2
TABLE 2.1 AWS Lambda limits
Chapter 5
TABLE 5.1 Table of housing data
Chapter 8
TABLE 8.1 Services relevant to an end-to-end machine learning workflow that ...
Chapter 1
FIGURE 1.1 Document analysis with human review flow
FIGURE 1.2 Flow showing how to translate customer service calls followed by ...
FIGURE 1.3 The
AppointmentBot
can be built using Amazon Lex and backend ...
FIGURE 1.4 The end-to-end flow with Amazon Personalize (text on top) and how...
Chapter 2
FIGURE 2.1 Pattern for using FSx for Lustre with Amazon SageMaker for traini...
FIGURE 2.2 Architecture showing the use of VPC endpoints to connect to vario...
FIGURE 2.3 An example ML pipeline constructed using step functions that orch...
Chapter 3
FIGURE 3.1 Diagram showing the phases of the machine learning lifecycle
Chapter 5
FIGURE 5.1 Various data sources you can use with AWS Data Pipeline to land d...
FIGURE 5.2 Various data sources you can use with AWS DMS to land data in S3...
FIGURE 5.3 Conceptual diagram of Kinesis Data Streams
FIGURE 5.4 Conceptual diagram of Kinesis Data Firehose showing how data can ...
FIGURE 5.5 Diagram showing streaming data flow pattern for Kinesis Data Anal...
Chapter 6
FIGURE 6.1 Diagram showing SageMaker Ground Truth data labeling tool
FIGURE 6.2 Diagram showing AWS Glue as an ETL tool
Chapter 7
FIGURE 7.1 Diagram showing how you can deal with skewed distributions using ...
FIGURE 7.2 Diagram showing how you backtest on time series data
Chapter 8
FIGURE 8.1 Linear regression example. The error terms shown by the vertical ...
FIGURE 8.2 Example showing lack of constant variance. The error terms shown ...
FIGURE 8.3 Example showing violation of linearity assumption
FIGURE 8.4 SVM conceptual example showing the separatrix by the solid line a...
FIGURE 8.5 You can use a kernel SVM to separate the points. Although they cl...
FIGURE 8.6 Sequential learning of XGBoost to combine many weak learners into...
FIGURE 8.7 Possible output of a clustering analysis that splits data into th...
FIGURE 8.8 Elbow curve analysis of PCA to determine optimal number of cluste...
FIGURE 8.9 Values of
N1
and
N2
explored with grid search
FIGURE 8.10 Values of
N1
and
N2
explored with random search
Chapter 9
FIGURE 9.1 Data from a toy two-class classification problem
FIGURE 9.2 Example SVM hyperplane separating the two classes of data
FIGURE 9.3 Example ROC curve
FIGURE 9.4 Example showing comparison of two ROC curves by calculating the A...
FIGURE 9.5 Example precision vs. recall curve
Chapter 10
FIGURE 10.1 SageMaker real-time endpoints under the hood
FIGURE 10.2 SageMaker Batch transform under the hood
FIGURE 10.3 Re-create strategy showing how to stop endpoint A and start endp...
FIGURE 10.4 Ramped strategy showing how to gradually shift from endpoint A t...
Chapter 11
FIGURE 11.1 Typical architecture for connecting Amazon API Gateway to AWS La...
Chapter 12
FIGURE 12.1 Diagram showing the ML workflow with different ways you can use ...
FIGURE 12.2 Diagram showing a typical CI/CD workflow that can be used as par...
Chapter 13
FIGURE 13.1 Diagram showing the AWS shared responsibility model. Understand ...
FIGURE 13.2 Diagram showing tag-based controls that can be applied using att...
FIGURE 13.3 Authentication flow using SAML 2.0 to access the AWS console
FIGURE 13.4 Different IAM roles applicable to SageMaker
FIGURE 13.5 Private network traffic to SageMaker Studio
Cover
Table of Contents
Title Page
Copyright
Dedication
Acknowledgments
About the Authors
About the Technical Editor
Introduction
Begin Reading
Appendix Answers to the Review Questions
Index
End User License Agreement
iii
iv
v
vi
vii
viii
xvii
xviii
xix
xx
xxi
xxii
xxiii
xxiv
xxv
xxvi
xxvii
xxviii
xxix
xxx
xxxi
xxxii
xxxiii
xxxiv
xxxv
xxxvi
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
113
114
115
116
117
118
119
120
121
122
123
124
125
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
207
208
209
210
211
212
213
214
215
216
217
218
219
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
241
242
243
244
245
246
247
248
249
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
303
304
305
306
307
308
309
310
311
312
313
314
315
Shreyas Subramanian
Stefan Natu
Copyright © 2022 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada and the United Kingdom.
978-1-119-82100-7978-1-119-82102-1 (ebk.)978-1-119-82101-4 (ebk.)
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Website is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Further, readers should be aware the Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Control Number: 2021944004
Trademarks: WILEY, the Wiley logo, Sybex, and the Sybex logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. Amazon Web Services and AWS are trademarks of Amazon, Inc. or its affiliates in the United States and/or other countries. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
Cover image: © Jeremy Woodhouse/Getty ImagesCover design: Wiley
To our parents.
Although this book bears our names as authors, many other people contributed to its creation. Without their help, this book wouldn't exist, or at best would exist in a lesser form. Kenyon Brown was the acquisitions editor and so helped get the book started. Christine O'Connor, the managing editor, and Caroline Define, project manager, oversaw the book as it progressed through all its stages. Sonam Mishra was the technical editor, who checked the text for technical errors and omissions—but any mistakes that remain are our own. We would also like to thank Matt Wagner from FreshBooks, who helped connect us with Wiley to write this book. Finally, we would like to thank our wives for their patience as we spent many weekend hours researching the content and writing this book over the past 8 months.
Shreyas Subramanian has a PhD in multilevel systems optimization and application of machine learning to large-scale optimization. He is currently a principal machine learning specialist at Amazon Web Services, and he has worked with several large-scale companies on their business-critical machine learning and optimization problems. Subramanian is passionate about simplifying difficult concepts within optimization, and he holds two patents in areas connected to aviation-related tools and techniques for improving efficiency and security of the airspace. He has also published over 20 conference and journal papers on the topics of aircraft design, evolutionary optimization, distributed optimization, and multilevel systems or systems optimization. He has several years of experience building machine learning and optimization models for customers in large enterprises to small startups, while taking part in and winning hackathons on the side. Subramanian is passionate about teaching practical machine learning to citizen data scientists and has trained hundreds of customers in private, hands-on environments and has helped customers build proofs-of-concept that are now in production today, providing millions of dollars’ worth of revenue to the AWS business as well as customers.
Stefan Natu is a principal machine learning (ML) architect at Alexa AI, where he is building an ML platform for Alexa scientists and engineers. Prior to that, Natu was the lead ML architect at Amazon Web Services, where he focused on financial services and helped major investment banking, asset management, and insurance customers build and operationalize ML use cases on AWS, with an emphasis on security, enterprise data, and model governance. Natu has developed and evangelized common ML architecture and infrastructure patterns globally across AWS highly regulated customers, leading to numerous production ML deployments and millions of dollars in AWS cloud revenue. He has authored over 25 AWS machine learning blogs, code samples. and whitepapers, and is a frequent speaker at conferences such as AWS re:Invent. He completed his PhD in atomic and condensed matter physics from Cornell University, and he worked as a research physicist at ExxonMobil, submitting two patents and over 25 peer-reviewed publications. Natu is passionate about mentorship and has served as a technical adviser at Insight Data Science, where he guided students in their transition from careers in academia to industry.
Sonam Mishra is an IT consultant with several years of experience in diverse roles ranging from software development, to application testing, to technical content creation. She is passionate about new and emerging technologies, particularly in the area of cloud computing. She lives in the United Kingdom with her family.
Machine learning (ML) is one of the most popular and rapidly growing fields in the technology industry today, with far-reaching business implications. The market for ML solutions and products is expected to grow annually by tens of billions of dollars, and with it, the demand for professionals who understand how to analyze data and build ML solutions is expected to grow as well.
ML is a highly technical field, and successful ML professionals need a foundation in mathematics, statistics, and data analysis. They must be able to code and have a fundamental understanding of infrastructure and software development best practices. In the past, the practitioners of machine learning were academics and PhDs, but the industry demand for ML is much larger than the supply of new PhDs emerging from academic institutions.
The purpose of this book is for you to understand the concepts and principles behind ML, with the practical goal of passing the AWS Certified Machine Learning Specialty exam. As practicing ML solution architects, we go well beyond the scope of the test in this book and incorporate architecture patterns and best practices that we have seen employed in the industry today. Reading this book will also give you an understanding of what is required to be a successful machine learning architect.
This is not a book on ML foundations. That is simply too vast a field for us to do it justice in this book and also is not our intention. There are a number of excellent textbooks and online resources you can use to develop a foundation on ML algorithms, deep learning, and similar topics. However, we will cover the concepts that you will need for the test.
Finally, one of our favorite leadership principles here at Amazon that widely applies to the solution architect role is learn and be curious. We have found that the best way to learn a topic is to get hands-on, and we highly recommend that you go beyond this book and get hands-on experience in ML. Download and explore some public datasets, and train some simple predictive models. Build a neural network from scratch using TensorFlow/PyTorch or just native Python. Explore AWS services such as Amazon SageMaker by running some of the sample Jupyter Notebooks. We highly recommend getting some hands-on knowledge before taking the test. Check out the AWS Training and Certification web page for helpful courses: www.aws.training.
Don't just study the questions and answers! The questions on the actual exam will be different from the practice questions included in this book. The exam is designed to test your knowledge of a concept or objective, so use this book to learn the objectives behind the questions.
The ML space is maturing and growing very quickly; what this means is that our book is just a snapshot in time of our understanding of the industry and certification requirements. We highly recommend that you read the SageMaker home page to review the latest releases that may appear on the test.
The AWS Certified Machine Learning Specialty exam is intended for professionals who perform a data science, machine learning engineer role. The official details of the test can be found here: https://aws.amazon.com/certification/certified-machine-learning-specialty.
The focus of the test is to validate your understanding of foundational ML concepts, foundations of statistics, data analysis, exploration, feature engineering, and common ML algorithms. This is required knowledge for anyone performing this role in industry today. However, in addition to this, this certification focuses on your ability to deploy those solutions on AWS and to be able to architect an end-to-end solution on AWS from data ingestion to model deployment and monitoring using a host of relevant AWS services for a given business use case.
There are several good reasons to get your AWS Certified Machine Learning certification:
It provides proof of professional achievement.
Certifications are quickly becoming status symbols in the computer service industry. Organizations, including members of the computer service industry, are recognizing the benefits of cloud certification such as the AWS Solution Architect Professional, Certified Security, and Advanced Networking Specialty. As ML becomes increasingly popular, these certifications provide proof of your understanding of ML and your ability to practically deploy ML solutions on AWS.
It provides an opportunity for advancement.
The solution architect role is one of the most coveted roles in the tech industry today due to the breadth and depth of the knowledge you gain, while having an outsized impact on customers’ business. The Machine Learning Specialty Certification could provide you with an opportunity to specialize in ML and become a practicing ML architect, a unique role that many employers are looking to hire.
It helps you develop an industry understanding of ML.
ML education is rapidly becoming a crowded space with blogs, textbooks, online courses that cover the foundations of ML, statistics and data science, and even ML tooling. However, there is no substitute for experience, and there isn't much material on actual industry use cases with solutions and best practices (with the exception of some fantastic tech blogs published by companies like Uber, Google, Netflix, Lyft, Airbnb, and many others). This book aims to cover some of that gap by providing you with a practical understanding of building real-life ML solutions on AWS.
It will satisfy your curiosity.
As technologists and technology enthusiasts, we are constantly learning new areas and expanding our knowledge. One of the best and most fulfilling reasons to take this certification is simply to satiate your curiosity to learn how to build ML solutions on AWS.
The AWS Certified Machine Learning Specialty exam is available to anyone and does not require other AWS certifications as prerequisites. It is recommended, however, that you have 1–2 years of experience developing and architecting ML and deep learning workloads on AWS prior to taking the test. Because it is a specialty certification, it also assumes prior foundational understanding of AWS services for storage, networking, security, databases, and so forth; however, these are not tested in detail.
The exam is administered by Pearson VUE and PSI. To register for the test with PSI, you can register online at https://awsavailability.psiexams.com. To register with Pearson VUE, you can register online using https://home.pearsonvue.com/Clients/Amazon-Web-Services.aspx.
Exam policies can change from time to time. We highly recommend that you check both the PSI and Pearson VUE sites for the most up-to-date information when you begin preparing, when you register, and again a few days before your scheduled exam date.
Anybody who wants to pass the AWS Certified Machine Learning Specialty exam may benefit from this book. This book is also helpful for business and IT professionals who want to learn how ML is practically used in the industry and pivot their careers toward an ML-centric role such as a data scientist or ML engineer working on AWS. We include a number of practical case studies, industry best practices, and architecture patterns that we have seen used in industry today from our engagements with hundreds of AWS customers. This book is also essential for data scientists, engineers, and other data professionals who are curious about how you can build, train, and deploy models at scale on AWS.
This book assumes some familiarity with ML and with AWS. If you are completely new to machine learning, we recommend that you first learn some basic ML concepts since this book is mainly focused on the practical aspects of building ML solutions. There are several great resources that cover ML foundations, particularly for building statistical models and for deep learning. Two of our favorites are Aurélion Géron's Hands-on Machine Learning with Scikit-learn and TensorFlow (O'Reilly Publishing) and Francois Chollet's Deep Learning with Python (Manning, 2017). There are also several awesome blogs on Medium.com and TowardsDataScience.com. Finally, we also recommend a number of industry blogs from leading tech companies like Uber, Google, Facebook, Amazon, Airbnb, and others on how they deploy large-scale ML solutions to have a holistic understanding of the industry landscape in this space.
As a practical matter, you'll need a laptop or desktop with which to practice and learn in a hands-on way. This book does not cover labs, and there is no substitute for hands-on experience. Go get familiar with AWS ML services such as SageMaker, as well as the AI services, before taking the test. We also recommend that you explore some public datasets, engineer features, and train simple models as well as some deep learning models.
This study guide uses a number of common elements to help you prepare. These include the following:
Summaries
The summary section of each chapter briefly explains the chapter, allowing you to easily understand what it covers.
Exam Essentials
The Exam Essentials focus on major exam topics and critical knowledge that you should take into the test. They focus on the exam objectives provided by AWS.
Chapter Review Questions
A set of questions at the end of each chapter will help you assess your knowledge and if you are ready to take the exam based on your knowledge of that chapter's topics.
The review questions, assessment test, and other testing elements included in this book are not derived from the actual exam questions, so don't memorize the answers to these questions and assume that doing so will enable you to pass the exam. You should learn the underlying topic, as described in the text of the book. This will let you answer the questions provided with this book and pass the exam. Learning the underlying topic is also the approach that will serve you best in the workplace—the ultimate goal of a certification.
We’ve worked hard to provide some really great tools to help you with your certification process. The interactive online learning environment that accompanies the AWS Certified Machine Learning Study Guide: Specialty (MLS-C01) Exam provides a test bank with study tools to help you prepare for the certification exam—and increase your chances of passing it the first time! The test bank includes the following:
Sample Tests: All the questions in this book are provided, including the assessment test at the end of this introduction and the review questions at the end of each chapter. In addition, there is a practice exam with 76 questions. Use these questions to test your knowledge of the study guide material. The online test bank runs on multiple devices.
Flashcards: The online text bank includes flashcards specifically written to challenge you, so don’t get discouraged if you don’t ace your way through them at first. They’re there to ensure that you’re really ready for the exam. And no worries—armed with the book, reference material, review questions, practice exams, and flashcards, you’ll be more than prepared when exam day comes. Questions are provided in digital flashcard format (a question followed by a single correct answer). You can use the flashcards to reinforce your learning and provide last-minute test prep before the exam.
Glossary: A glossary of key terms from this book is available as a fully searchable PDF.
Go to www.wiley.com/go/sybextestprep, register your book to receive your unique PIN, and then once you have the PIN, return to www.wiley.com/go/sybextestprep and register a new account or add this book to an existing account.
This book uses certain typographic styles in order to help you quickly identify important information and to avoid confusion over the meaning of words such as on-screen prompts. In particular, look for the following styles:
Italicized text
indicates key terms that are described at length for the first time in a chapter. (Italics are also used for emphasis.)
A monospaced font
indicates the contents of configuration files, messages displayed at a text-mode Linux shell prompt, filenames, text-mode command names, and Internet URLs.
Italicized monospaced text
indicates a variable—information that differs from one system or command run to another, such as the name of a client computer or a process ID number.
Bold monospaced text
is information that you're to type into the computer, such as at a shell prompt. This text can also be italicized to indicate that you should substitute an appropriate value for your system.
In addition to these text conventions, which can apply to individual words or entire paragraphs, a few conventions highlight segments of text:
A note indicates information that's useful or interesting but that's somewhat peripheral to the main text. A note might be relevant to a small number of networks, for instance, or it may refer to an outdated feature.
A tip provides information that can save you time or frustration and that may not be entirely obvious. A tip might describe how to get around a limitation or how to use a feature to perform an unusual task.
Warnings describe potential pitfalls or dangers. If you fail to heed a warning, you may end up spending a lot of time recovering from a bug, or you may even end up restoring your entire system from scratch.
A real-world scenario is a type of sidebar that describes a task or an example that's particularly grounded in the real world. This may be a situation we or somebody we know has encountered, or it may be advice on how to work around problems that are common in real, working ML environments.
AWS Certified Machine Learning Study Guide has been written to cover every AWS exam objective at a level appropriate to its exam weighting. The following table provides a breakdown of this book's exam coverage, showing you the weight of each section and the chapter where each objective or subobjective is covered:
Subject Area
% of Exam
Domain 1: Data Engineering Domain 2: Exploratory Data Analysis Domain 3: Modeling Domain 4: Machine Learning Implementation and Operations
20% 24% 36% 20%
Total
100%
Exam Objective
Chapter
1.1-1. Create data repositories for machine learning
5
Identify data sources
5
Determine storage mediums
2
,
5
Exam Objective
Chapter
1.2-1. Data job styles/types (batch load/streaming)
6
,
7
1.2-2. Data ingestion pipelines
7
Kinesis
7
Kinesis Analytics
7
Kinesis Firehose
7
EMR
7
Glue
7
1.2-3. Job scheduling
7
,
15
Exam Objective
Chapter
1.3-1. Transforming data transit (ETL: Glue, EMR, AWS Batch)
6
1.3-2. Handle ML-specific data using map reduce (Hadoop, Spark, Hive)
6
,
7
Exam Objective
Chapter
2.1-1. Identify and handle missing data, corrupt data, stop words, etc.
6
2.1-2. Formatting, normalizing, augmenting, and scaling data
6
2.1-3. Labeled data (recognizing when you have enough labeled data and identifying mitigation strategies [Data labeling tools (Mechanical Turk, manual labor)])
1
,
5
,
6
Exam Objective
Chapter
2.2-1. Identify and extract features from datasets, including from data sources such as text, speech, image, public datasets, etc.
7
2.2-2. Analyze/evaluate feature engineering concepts (binning, tokenization, outliers, synthetic features, One-hot encoding, reducing dimensionality of data)
7
Exam Objective
Chapter
2.3-1. Graphing (scatter plot, time series, histogram, box plot)
9
2.3-2. Interpreting descriptive statistics (correlation, summary statistics, p value)
9
2.3-3. Clustering (hierarchical, diagnosing, elbow plot, cluster size)
9
Exam Objective
Chapter
3.1-1. Determine when to use/when not to use ML
3
3.1-2. Know the difference between supervised and unsupervised learning
4
3.1-3. Selecting from among classification, regression, forecasting, clustering, recommendation, etc.
4
Exam Objective
Chapter
3.2-1. XGBoost, logistic regression, K-means, linear regression, decision trees, random forests, RNN, CNN, Ensemble, Transfer learning
8
3.2-2. Express intuition behind models
8
Exam Objective
Chapter
3.3-1. Train validation test split, cross-validation
6
3.3-2. Optimizer, gradient descent, loss functions, local minima, convergence, batches, probability, etc.
8
3.3-3. Compute choice (GPU vs. CPU, distributed vs. non-distributed, platform [Spark vs. non-Spark]
12
,
16
3.3-4. Model updates and retraining
8
,
12
Exam Objective
Chapter
3.4-1. Regularization
8
3.4-2. Cross validation
9
3.4-3. Model initialization
8
3.4-4. Neural network architecture (layers/nodes), learning rate, activation functions
8
3.4-5. Tree-based models (# of trees, # of levels)
8
3.4-6. Linear models (learning rate)
8
Exam Objective
Chapter
3.5-1. Avoid overfitting/underfitting (detect and handle bias and variance
9
3.5-2. Metrics (AUC-ROC, accuracy, precision, recall, RMSE, F1 score)
9
3.5-3. Confusion matrix
9
3.5-4. Offline and online model evaluation, A/B testing
9
3.5-5. Compare models using metrics (time to train a model, quality of model, engineering costs)
9
3.5-6. Cross validation
9
Exam Objective
Chapter
4.1-1. AWS environment logging and monitoring
8
CloudTrail and CloudWatch
8
Build Error Monitoring
8
4.1-2. Multiple regions, Multiple AZs
14
4.1-3. Docker containers
8
4.1-4. Auto Scaling groups
10
4.1-5. Rightsizing
8
,
10
,
12
,
15
4.1-6. Load balancing
10
,
15
4.1-7. AWS best practices
12
,
13
,
14
,
15
,
16
Exam Objective
Chapter
4.2-1. ML on AWS (application services)
1
4.2-2. AWS service limits
1
,
2
4.2-3. Build your own model vs. SageMaker built-in algorithms
8
4.2-4. Infrastructure: Instances types for ML and cost considerations
16
Exam Objective
Chapter
4.3-1. IAM
2
,
13
4.3-2. S3 Bucket Policies
2
,
13
4.3-3. Security groups
2
,
13
4.3-4. VPC
2
,
13
4.3-5. Encryption/anonymization
13
Exam Objective
Chapter
4.4-1. Exposing endpoints and interacting with them
10
,
11
4.4-2. ML model versioning
8
,
12
4.4-3. A/B testing
10
4.4-4. Retrain pipelines
15
4.4-5. ML debugging/troubleshooting
12
Detect and mitigate drop in performance
15
Monitor performance of the model
15
Exam domains and objectives are subject to change at any time without prior notice and at AWS's sole discretion. Please visit their website (https://aws.amazon.com/certification/certified-machine-learning-specialty) for the most current information.
THE AWS CERTIFIED MACHINE LEARNING (ML) SPECIALTY EXAM OBJECTIVES COVERED IN THIS CHAPTER INCLUDE BUT ARE NOT LIMITED TO THE FOLLOWING:
Domain 3.0: Modeling
3.1 Frame business problems as machine learning problems
3.2 Select the appropriate model(s) for a given machine learning problem
Sample ML architectures for common business workflows such as video analysis, text mining, and others
Details about some common algorithms used in solving complex problems involving unstructured data like text and image
Domain 4.0: Machine Learning Implementation and Operations
4.2 Recommend and implement the appropriate machine learning services and features for a given problem
Details about algorithms for different ML use cases
Details about when to use the proper AWS AI/ML Service
In this chapter, you will learn about different AWS Services for Machine Learning, starting with the artificial intelligence (AI) services for common machine learning (ML) tasks such as image and video analysis, natural language processing, text-to-speech conversion or vice versa, or building recommendation systems or time-series forecasting into your applications. These services make it easy for you to build ML-powered applications without machine learning experience. You will then learn about Amazon SageMaker, which is a fully managed service for data scientists and machine learning developers to build, train, and deploy ML models in the AWS cloud for various business applications. For reference, the exam guide can be found at https://d1.awsstatic.com/training-and-certification/docs-ml/AWS-Certified-Machine-Learning-Specialty_Exam-Guide.pdf.
Amazon Rekognition is an AI service that makes it easy for users to implement image or video analysis workflows into their applications. Amazon Rekognition aims to leverage Amazon's vast experience in using deep learning for various image-based workloads such as image classification, object detection, detection of text in image, facial recognition, sentiment, and most recently, public safety.
Although there is a vast amount of deep learning research behind developing models for image analytics, training these deep learning models is often computationally expensive and can take several cycles of data scientist or developer time. That's where Amazon Rekognition comes in. With Amazon Rekognition, developers can simply leverage pretrained models or train custom machine learning models without having to worry about writing the algorithm code, or about setting up or managing the infrastructure to train and deploy a deep learning model. More importantly, you don't require any prior machine learning or deep learning knowledge to use this service.
Before diving into Amazon Rekognition, let's quickly grasp the lay of the land on the subject of images and videos in deep learning. Image recognition typically relies on convolutional neural network (CNN) architectures. CNNs are deep learning algorithms consisting of alternating convolutional layers, which apply various filters on the input data to capture different information at different scales, followed by pooling layers, which reduce the number of parameters in the network and also the spatial size of the representation. The initial layers capture low-level features like edges and curves, whereas latter layers build up to higher-level ones to eventually identify the object. There are many popular architectures for CNNs such as ResNet or Inception V4, but it is important to understand the basic concept.
It is also useful to understand the concept of transfer learning. Transfer learning refers to taking a model that was pretrained on one dataset, freezing the initial layers, and letting it relearn the last few layers of the model on a different dataset. The benefits of this are that:
It is computationally less expensive than training a full neural network from scratch.
When you don't have a lot of data or data labeling is expensive, using a pretrained model can provide better model performance than training a model from scratch.
Both Inception V4 and ResNet models are popular algorithms for transfer learning in the image classification space. Transfer learning can be used in many deep learning applications—not just image or video data use cases, but also in natural language processing (NLP).
For object detection, the fundamental architecture is similar, but instead of detecting objects such as a cat versus a dog (fixed label), the model aims to detect a bounding box encapsulating the object of interest. Common algorithms used include single-shot detector (SSD), R-CNN or Faster R-CNN, and YOLO v4.
Finally, semantic segmentation actually segments the object of interest in an image by classifying whether or not an object belongs in a given pixel. An example is detecting a tumor in a human tissue. In order to be useful for doctors, it is not just sufficient to draw a bounding box, but you also need to accurately isolate the tumor from healthy tissue.
You can use Amazon Rekognition with the following key use cases:
Image Labeling
This refers to labeling whether an image consists of certain objects (popular objects in nature), events (party, graduation, etc.), concepts (landscape, nature, evening), or activities.
Custom Image Labeling
Imagine that you are a manufacturer and you need to detect whether or not parts on your assembly line are defective. Since your parts do not correspond to common objects found in nature, you may need to train a custom model. We will discuss this in more detail later, but Amazon Rekognition allows you to train a custom model for use cases of this kind.
Face Detection and Search
Amazon Rekognition can not only detect faces in images but also search for faces from an existing collection. Imagine you are a company that wants to implement face detection for your employees to access your corporate buildings. You can store pictures of your employees in a collection, and call Amazon Rekognition APIs to recognize employees from that collection.
People Paths
Amazon Rekognition can track the movement of people in a video. For example, you may want to track the movement of players on a field during a game for postprocessing, stats, and analytics for fans.
Text Detection
Amazon Rekognition can detect text in images and convert it to machine-readable text that you can use for downstream actions.
Celebrity Detection
Amazon Rekognition recognizes celebrities from images and stored videos.
Personal Protective Equipment (PPE)
Amazon Rekognition can now detect PPE on persons in an image.
Look out for key phrases like “without any prior machine learning/deep learning knowledge” or “cost effective” or any of the use cases just described to think of Amazon Rekognition as the solution.
If the question contains a phrase like “custom model,” unless it has to do with image labeling, usually Amazon Rekognition is not the answer.