38,99 €
Move your career forward with AWS certification! Prepare for the AWS Certified Data Analytics Specialty Exam with this thorough study guide This comprehensive study guide will help assess your technical skills and prepare for the updated AWS Certified Data Analytics exam. Earning this AWS certification will confirm your expertise in designing and implementing AWS services to derive value from data. The AWS Certified Data Analytics Study Guide: Specialty (DAS-C01) Exam is designed for business analysts and IT professionals who perform complex Big Data analyses. This AWS Specialty Exam guide gets you ready for certification testing with expert content, real-world knowledge, key exam concepts, and topic reviews. Gain confidence by studying the subject areas and working through the practice questions. Big data concepts covered in the guide include: * Collection * Storage * Processing * Analysis * Visualization * Data security AWS certifications allow professionals to demonstrate skills related to leading Amazon Web Services technology. The AWS Certified Data Analytics Specialty (DAS-C01) Exam specifically evaluates your ability to design and maintain Big Data, leverage tools to automate data analysis, and implement AWS Big Data services according to architectural best practices. An exam study guide can help you feel more prepared about taking an AWS certification test and advancing your professional career. In addition to the guide's content, you'll have access to an online learning environment and test bank that offers practice exams, a glossary, and electronic flashcards.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 657
Veröffentlichungsjahr: 2020
Cover
Title Page
Copyright
Dedication
Acknowledgments
About the Author
About the Technical Editor
Introduction
What Does This Book Cover?
Preparing for the Exam
Registering for the Exam
Studying for the Exam
The Night before the Exam
During the Exam
Interactive Online Learning Environment and Test Bank
Exam Objectives
Assessment Test
Answers to the Assessment Test
Chapter 1: History of Analytics and Big Data
Evolution of Analytics Architecture Over the Years
The New World Order
Analytics Pipeline
The Big Data Reference Architecture
Data Lakes and Their Relevance in Analytics
Building a Data Lake on AWS
Using Lake Formation to Build a Data Lake on AWS
Exam Objectives
Assessment Test
References
Chapter 2: Data Collection
Exam Objectives
AWS IoT
Amazon Kinesis
AWS Glue
Amazon SQS
Amazon Data Migration Service
AWS Data Pipeline
Large-Scale Data Transfer Solutions
Summary
Review Questions
References
Exercises & Workshops
Chapter 3: Data Storage
Introduction
Amazon S3
Amazon S3 Glacier
Amazon DynamoDB
Amazon DocumentDB
Graph Databases and Amazon Neptune
Storage Gateway
Amazon EFS
Amazon FSx for Lustre
AWS Transfer for SFTP
Summary
Exercises
Review Questions
Further Reading
References
Chapter 4: Data Processing and Analysis
Introduction
Types of Analytical Workloads
Amazon Athena
Amazon EMR
Amazon Elasticsearch Service
Amazon Redshift
Kinesis Data Analytics
Comparing Batch Processing Services
Comparing Orchestration Options on AWS
Summary
Exam Essentials
Exercises
Review Questions
References
Chapter 5: Data Visualization
Introduction
Data Consumers
Data Visualization Options
Amazon QuickSight
Other Visualization Options
Predictive Analytics
Summary
Exam Essentials
Exercises
Review Questions
References
Additional Reading Material
Chapter 6: Data Security
Introduction
Shared Responsibility Model
Security Services on AWS
AWS IAM Overview
Amazon EMR Security
Amazon S3 Security
Amazon Athena Security
Amazon Redshift Security
Amazon Elasticsearch Security
Amazon Kinesis Security
Amazon QuickSight Security
Amazon DynamoDB Security
Summary
Exam Essentials
Exercises/Workshops
Review Questions
References and Further Reading
Appendix: Answers to Review Questions
Chapter 1: History of Analytics and Big Data
Chapter 2: Data Collection
Chapter 3: Data Storage
Chapter 4: Data Processing and Analysis
Chapter 5: Data Visualization
Chapter 6: Data Security
Index
Online Test Bank
Register and Access the Online Test Bank
End User License Agreement
Chapter 2
TABLE 2.1 ML algorithms with Amazon Kinesis
TABLE 2.2 Transformations available within AWS Glue
TABLE 2.3 Transferring 100 TB of data over an Internet connection
Chapter 3
TABLE 3.1 Same-region replication (SRR) vs. cross-region replication (CRR)
TABLE 3.2 Compression algorithms and relevant use cases
TABLE 3.3 Scalar data types in DynamoDB
TABLE 3.4 Document data types in DynamoDB
TABLE 3.5 Common data categories and use cases
Chapter 4
TABLE 4.1 Performance improvements with columnar formats
TABLE 4.2 Performance improvements with partitioned data
TABLE 4.3 Amazon EMR instance types
TABLE 4.4 Apache Hive Pros and Cons
TABLE 4.5 Apache Pig – simple data types
TABLE 4.6 Apache Pig – complex data types
TABLE 4.7 Apache Presto Pros and Cons
TABLE 4.8 Apache Spark Benefits
TABLE 4.9 Running Apache Spark on Amazon EMR
TABLE 4.10 Choosing the right analytics tool
TABLE 4.11 Comparison of Elasticsearch with an RDBMS
TABLE 4.12 Redshift instance types
TABLE 4.13 Redshift parameter options
TABLE 4.14 Redshift data types
TABLE 4.15 Network transmission in query processing
TABLE 4.16 Comparing batch services
TABLE 4.17 Comparing orchestration options
Chapter 5
TABLE 5.1 Key SPICE limits
TABLE 5.2 Amazon QuickSight visualization types
TABLE 5.3 Differences between Amazon QuickSight Standard and Enterprise editi...
Chapter 6
TABLE 6.1 AWS security, identity, and compliance services
TABLE 6.2 AWS security credentials for accessing an AWS account
TABLE 6.3 Types of roles/terminology in AWS
TABLE 6.4 Managed security groups – Amazon EMR
TABLE 6.5 Encryption options – EMR
TABLE 6.6 S3 data encryption options
TABLE 6.7 Security options in Amazon QuickSight
Chapter 1
FIGURE 1.1 Traditional data warehousing setup in early to mid-2000s
FIGURE 1.2 Overview of an analytics pipeline
FIGURE 1.3 Business analytics spectrum
FIGURE 1.4 AWS analytics architecture
FIGURE 1.5 Data characteristics for hot, warm, and cold data
FIGURE 1.6 Typical steps in building a data lake
FIGURE 1.7 Building a data lake on AWS
FIGURE 1.8 Moving data into S3
FIGURE 1.9 AWS Glue
Chapter 2
FIGURE 2.1 AWS IoT device software services
FIGURE 2.2 AWS IoT control services
FIGURE 2.3 AWS IoT data services
FIGURE 2.4 AWS IoT - How it Works
FIGURE 2.5 Information half-life in decision-making
FIGURE 2.6 Kinesis Data Streams overview
FIGURE 2.7 Kinesis Data Streams data flow
FIGURE 2.8 Aggregation and collection with KPL
FIGURE 2.9 Data flow - S3 destination
FIGURE 2.10 Amazon Redshift as a destination for Kinesis Firehose
FIGURE 2.11 Amazon Elasticsearch Service as a destination for Kinesis Fireho...
FIGURE 2.12 Splunk as a destination for Kinesis Firehose
FIGURE 2.13 Amazon Kinesis Data Analytics
FIGURE 2.14 Kinesis application creation via Console
FIGURE 2.15 Kinesis Data Analytics application
FIGURE 2.16 Connecting to a streaming source
FIGURE 2.17 Author Kinesis Data Analytics using SQL
FIGURE 2.18 Kinesis Data Analytics authoring screen
FIGURE 2.19 Kinesis Data Analytics - Flink Interface
FIGURE 2.20 Scaling your Flink applications with Kinesis
FIGURE 2.21 Working with Kinesis Video Analytics
FIGURE 2.22 Kinesis Video Streams
FIGURE 2.23 Glue flow
FIGURE 2.24 Tables defined in a sample catalog
FIGURE 2.25 Table details
FIGURE 2.26 Glue crawlers
FIGURE 2.27 Authoring jobs in AWS Glue
FIGURE 2.28 Amazon Simple Queue Service
FIGURE 2.29 AWS Database Migration Service
FIGURE 2.30 AWS Data Pipeline
FIGURE 2.31 AWS Snowball
FIGURE 2.32 AWS Direct Connect
Chapter 3
FIGURE 3.1 Importance of data storage in an analytics pipeline
FIGURE 3.2 Types of storage solutions provided by AWS
FIGURE 3.3 Types of AWS S3 storage classes provided by AWS
FIGURE 3.4 Amazon S3 Glacier – vault creation
FIGURE 3.5 DynamoDB global tables
FIGURE 3.6 DynamoDB Accelerator
FIGURE 3.7 Amazon DynamoDB Streams – Kinesis Adapter
FIGURE 3.8 Amazon DocumentDB architecture
FIGURE 3.9 File Storage Deployment – storage gateway
FIGURE 3.10 Amazon EFS use cases
FIGURE 3.11 Amazon EFS use cases
FIGURE 3.12 AWS Transfer for SFTP
Chapter 4
FIGURE 4.1 Apache Presto architecture
FIGURE 4.2 Amazon Athena federated query
FIGURE 4.3 HDFS architecture
FIGURE 4.4 Anatomy of a MapReduce job
FIGURE 4.5 YARN applications
FIGURE 4.6 Security configurations in EMR (Encryption)
FIGURE 4.7 Security configurations in EMR (Authentication and IAM Role for E...
FIGURE 4.8 Creating an EMR notebook
FIGURE 4.9 Apache Spark Framework overview
FIGURE 4.10 Apache Spark architecture
FIGURE 4.11 Apache Spark DStreams Overview
FIGURE 4.12 An undirected graph with seven vertices/nodes and nine edges
FIGURE 4.13 Setting up HBase using the Quick Options creation mode
FIGURE 4.14 Setting up HBase using Advanced Options
FIGURE 4.15 Apache Flink overview
FIGURE 4.16 Working of Amazon Elasticsearch service
FIGURE 4.17 Amazon Redshift architecture
FIGURE 4.18 Amazon Redshift AQUA architecture
FIGURE 4.19 Amazon Redshift workload management
FIGURE 4.20 Life of a query
FIGURE 4.21 Relationship between streams, segments, and steps
FIGURE 4.22 Redshift cluster VPCs
FIGURE 4.23 Four-tier hierarchy—encryption keys in Redshift
FIGURE 4.24 Working of Kinesis Data Analytics
FIGURE 4.25 Working of Kinesis Data Analytics
Chapter 5
FIGURE 5.1 Data and analytics consumption patterns
FIGURE 5.2 Amazon QuickSight overview
FIGURE 5.3 Frequency per opportunity stage
FIGURE 5.4 Amazon QuickSight supported data sources
FIGURE 5.5 Editing a dataset – Amazon QuickSight
FIGURE 5.6 Anomaly detection in Amazon QuickSight
FIGURE 5.7 Sharing dashboards in Amazon QuickSight
FIGURE 5.8 The Kibana dashboard on Amazon Elasticsearch
FIGURE 5.9 Kibana dashboard with sample e-commerce data
FIGURE 5.10 AWS Machine Learning Stack
Chapter 6
FIGURE 6.1 AWS shared responsibility model
FIGURE 6.2 Amazon EMR inside a public subnet
FIGURE 6.3 Amazon EMR inside a private subnet
FIGURE 6.4 Security options within Amazon EMR Console
FIGURE 6.5 Security Configurations Amazon EMR- Encryption options
FIGURE 6.6 EMR security configuration – Quick Options mode
FIGURE 6.7 Security options in Amazon EMR – advanced cluster setup
FIGURE 6.8 S3 Block Public Access settings
FIGURE 6.9 S3 encryption options
FIGURE 6.10 Federated access for Amazon Athena
FIGURE 6.11 Key hierarchy in Amazon Redshift
FIGURE 6.12 Amazon Elasticsearch network configuration
FIGURE 6.13 Fine-grained access control within a VPC sample workflow
FIGURE 6.14 Fine grained access control with a public domain sample workflow...
FIGURE 6.15 Web identity federation with Amazon DynamoDB
Cover Page
Table of Contents
Begin Reading
iii
iv
v
vii
ix
xi
xxi
xxii
xxiii
xxiv
xxv
xxvi
xxvii
xxviii
xxix
xxx
xxxi
xxxii
xxxiii
xxxiv
xxxv
xxxvi
xxxvii
xxxviii
xxxix
xl
xli
xlii
xliii
xliv
xlv
xlvi
xlvii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
339
340
341
342
343
344
345
346
347
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
368
369
Asif Abbasi
Copyright © 2021 by John Wiley & Sons, Inc., Indianapolis, Indiana
Published simultaneously in Canada
ISBN: 978-1-119-64947-2
ISBN: 978-1-119-64944-1 (ebk.)
ISBN: 978-1-119-64945-8 (ebk.)
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Web site may provide or recommendations it may make. Further, readers should be aware that Internet Web sites listed in this work may have changed or disappeared between when this work was written and when it is read.
For general information on our other products and services or to obtain technical support, please contact our Customer Care Department within the U.S. at (877) 762-2974, outside the U.S. at (317) 572-3993 or fax (317) 572-4002.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Control Number: 2020938557
TRADEMARKS: Wiley, the Wiley logo, and the Sybex logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. AWS is a registered trademark of Amazon Technologies, Inc. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
To all my teachers, family members, and great friends who are constant sources of learning, joy, and a boost to happiness!
Writing acknowledgments is the hardest part of book writing, and the reason is that there are a number of people and organizations who have directly and indirectly influenced the writing process. The last thing you ever want to do is to miss giving the credit to folks where it is due. Here is my feeble attempt to ensure I recognize everyone who inspired and helped during the writing of this book. I apologize sincerely to anyone that I have missed.
I would like to first of all acknowledge the great folks at AWS, who work super hard to not only produce great technology but also to create such great content in the form of blogs, AWS re:Invent videos, and supporting guides that are a great source of inspiration and learning, and this book would not have been possible without tapping into some great resources produced by my extended AWS team. You guys rock! I owe it to every single employee within AWS; you are all continually raising the bar. I would have loved to name all the people here, but I have been told acknowledgments cannot be in the form of a book.
I would also like to thank John Streit, who was super supportive throughout the writing of the book. I would like to thank my specialist team across EMEA who offered support whenever required. You are some of the most gifted people I have worked with during my entire career.
I would also like to thank Wiley's great team, who were patient with me during the entire process, including Kenyon Brown, David Clark, Todd Montgomery, Saravanan Dakshinamurthy, Christine O’ Connor, and Judy Flynn in addition to the great content editing and production team.
Asif Abbasi is currently working as a specialist solutions architect for AWS, focusing on data and analytics, and is currently working with customers across Europe, the Middle East, and Africa. Asif joined AWS in 2008 and has since been helping customers with building, migration, and optimizing their analytics pipelines on AWS.
Asif has been working in the IT industry for over 20 years, with a core focus on data, and has worked with industry leaders in this space like Teradata, Cisco, and SAS prior to joining AWS. Asif authored a book on Apache Spark in 2017 and has been a regular reviewer of AWS data and analytics blogs.
Asif has a master's degree in computer science (Software Engineering) and business administration. Asif is currently living in Dubai, United Arab Emirates, with his wife, Hifza, and his children Fatima, Hassan, Hussain, and Aisha. When not working with customers, Asif spends most of his time with family and mentoring students in the area of data and analytics.
Todd Montgomery (Austin, Texas) is a senior data center networking engineer for a large international consulting company where he is involved in network design, security, and implementation of emerging data center and cloud-based technologies. He holds six AWS certifications, including the Data Analytics specialty certification. Todd holds a degree in Electronics Engineering and multiple certifications from Cisco Systems, Juniper Networks, and CompTIA. Todd also leads the Austin AWS certification meetup group. In his spare time, Todd likes motorsports, live music, and traveling.
Studying for any certification exam can seem daunting. AWS Certified Data Analytics Study Guide: Specialty (DAS-C01) Exam was designed and developed with relevant topics, questions, and exercises to enable a cloud practitioner to focus their precious study time and effort on the germane set of topics targeted at the right level of abstraction so they can confidently take the AWS Certified Data Analytics – Specialty (DAS-C01) exam.
This study guide presents a set of topics around the data and analytics pipeline and discusses various topics including data collection, data transformation, data storage and processing, data analytics, data visualization, and the encompassing security elements for the pipeline. The study guide also includes reference material and additional materials and hands-on workshops that are highly recommended and will aid in your overall learning experience.
This book covers topics you need to know to prepare for the AWS Certified Data Analytics – Specialty (DAS-C01) exam:
Chapter 1
: History of Analytics and Big Data
This chapter begins with a history of big data and its evolution over the years before discussing the analytics pipeline and the big data reference architecture. It also talks about some key architectural principles for an analytics pipeline and introduces the concept of data lakes before introducing AWS Lake Formation to build the data lakes.
Chapter 2
: Data Collection
Data collection is typically the first step in an analytics pipeline. This chapter discusses the various services involved in data collection, ranging from services related to streaming data ingestion like Amazon Kinesis and Amazon SQS to mini-batch and large-scale batch transfers like AWS Glue, AWS Data Pipeline, and the AWS Snow family.
Chapter 3
: Data Storage
Chapter 3
discusses various storage options available on Amazon Web Services, including Amazon S3, Amazon S3 Glacier, Amazon DynamoDB, Amazon DocumentDB, Amazon Neptune, AWS Storage Gateway, Amazon EFS, Amazon FSx for Lustre, and AWS Transfer for SFTP. I not only discuss the different options but the use cases around which each one of these are suitable and when to choose one over the other.
Chapter 4
: Data Processing and Analysis
In
Chapter 4
, we will cover data processing and analysis technologies on the AWS stack, including Amazon Athena, Amazon EMR, Amazon Elasticsearch Service, Amazon Redshift, and Amazon Kinesis Data Analytics, before wrapping up the chapter with a discussion around orchestration tools like AWS Step Functions, Apache Airflow, and AWS Glue workflow management. I'll also compare some of the processing technologies around the use cases and when to use which technology.
Chapter 5
: Data Visualization
Chapter 5
will discuss the visualization options like Amazon QuickSight and other visualization options available on AWS Marketplace. I'll briefly touch on the AWS ML stack as that is also a natural consumer of analytics on the AWS stack.
Chapter 6
: Data Security
A major section of the exam is security considerations for the analytics pipeline, and hence I have dedicated a complete chapter to security, discussing IAM and security for each service available on the Analytics stack.
AWS offers multiple levels of certification for the AWS platform. The basic level for the certification is the foundation level, which covers the AWS Certified Cloud Practitioner exam.
We then have the associate-level exams, which require at least one year of hands-on knowledge on the AWS platform. At the time of this writing, AWS offers three associate-level exams:
AWS Certified Solutions Architect Associate
AWS Certified SysOps Administrator Associate
AWS Certified Developer Associate
AWS then offers professional-level exams, which require the candidates to have at least two years of experience with designing, operating, and troubleshooting the solutions using the AWS cloud. At the time of this writing, AWS offers two professional exams:
AWS Certified Solutions Architect Professional
AWS Certified DevOps Engineer Professional
AWS also offers specialty exams, which are considered to be professional-level exams and require deep technical expertise in the area being tested. At the time of this writing, AWS offers six specialty exams:
AWS Certified Advanced Networking Specialty
AWS Certified Security Specialty
AWS Certified Alexa Skill Builder Specialty
AWS Certified Database Specialty
AWS Certified Data Analytics Specialty
AWS Certified Machine Learning Specialty
You are preparing for the AWS Certified Data Analytics Specialty exam, which covers the services that are discussed in the book. However, this book is not the “bible” on the exam; this is a professional-level exam, which means you will have to bring your A game to the table if you are looking to pass the exam. You will need to have hands-on experience with data analytics in general and AWS analytics services in particular. In this introduction, we will look at what you need to do to prepare for the exam and how to sit for the actual exam and then provide you with a sample exam that you can attempt before you attend the actual exam.
Let's get started.
You can schedule any AWS exam by following this link bit.ly/PrepareAWSExam. If you don't have an AWS certification account, you can sign up for the account during the exam registration process.
You can choose an appropriate test delivery vendor like Pearson VUE or PSI or proctor it online. Search for the exam code DSA-C01 to register for the exam.
At the time of this writing, the exam costs $300, with the practice exam costing $40. The cost of the exam is subject to change.
While this book covers information around the data analytics landscape and the technologies covered in the exam, it alone is not enough for you to pass the exam; you need to have the required practical knowledge to go with it. As a recommended practice, you should complement the material from each chapter with practical exercises provided at the end of the chapter and tutorials on AWS documentation. Professional-level exams require hands-on knowledge with the concepts and tools that you are being tested on.
The following workshops are essential for you to go through before you can attempt the AWS Data Analytics Specialty exam. At the time of this writing, the following workshops were available to the general public, and each provides really good technical depth around the technologies:
AWS DynamoDB Labs –
amazon-dynamodb-labs.com
Amazon Elasticsearch workshops –
deh4m73phis7u.cloudfront.net/log-analytics/mainlab
Amazon Redshift Modernization Workshop –
github.com/aws-samples/amazon-redshift-modernize-dw
Amazon Database Migration Workshop –
github.com/aws-samples/amazon-aurora-database-migration-workshop-reinvent2019
AWS DMS Workshop –
dms-immersionday.workshop.aws
AWS Glue Workshop –
aws-glue.analytics.workshops.aws.dev/en
Amazon Redshift Immersion Day –
redshift-immersion.workshop.aws
Amazon EMR with Service Catalog –
s3.amazonaws.com/kenwalshtestad/cfn/public/sc/bootcamp/emrloft.html
Amazon QuickSight Workshop –
d3akduqkn9yexq.cloudfront.net
Amazon Athena Workshop –
athena-in-action.workshop.aws
AWS Lakeformation Workshop –
lakeformation.aworkshop.io
Data Engineering 2.0 Workshop –
aws-dataengineering-day.workshop.aws/en
Data Ingestion and Processing Workshop –
dataprocessing.wildrydes.com
Incremental data processing on Amazon EMR –
incremental-data-processing-on-amazonemr.workshop.aws/en
Realtime Analytics and serverless datalake demos –
demostore.cloud
Serverless datalake workshop –
github.com/aws-samples/amazon-serverless-datalake-workshop
Voice-powered analytics –
github.com/awslabs/voice-powered-analytics
Amazon Managed Streaming for Kafka Workshop –
github.com/awslabs/voice-powered-analytics
AWS IOT Analytics Workshop –
s3.amazonaws.com/iotareinvent18/Workshop.html
Opendistro for Elasticsearch Workshop –
reinvent.aesworkshops.com/opn302
Data Migration (AWS Storage Gateway, AWS snowball, AWS DataSync) –
reinvent2019-data-workshop.s3-website-us-east-1.amazonaws.com
AWS Identity – Using Amazon Cognito for serverless consumer apps –
serverless-idm.awssecworkshops.com
Serverless data prep with AWS Glue –
s3.amazonaws.com/ant313/ANT313.html
AWS Step Functions –
step-functions-workshop.go-aws.com
S3 Security Settings and Controls –
github.com/aws-samples/amazon-s3-security-settings-and-controls
Data Sync and File gateway –
github.com/aws-samples/aws-datasync-migration-workshop
AWS Hybrid Storage Workshop –
github.com/aws-samples/aws-hybrid-storage-workshop
AWS also offers a free digital exam readiness training for the Data and Analytics exam that can be attended for free online. The training is available at www.aws.training/Details/eLearning?id=46612. This is a 3.5-hour digital training course that will help you with the following aspects of the exam:
Navigating the logistics of the examination process
Understanding the exam structure and question types
Identifying how questions relate to AWS data analytics concepts
Interpreting the concepts being tested by exam questions
Developing a personalized study plan to prepare for the exam
This is a good way to not only ensure that you have covered all important material for the exam, but also to develop a personalized plan to prepare for the exam.
Once you have studied for the exam, it's time to run through some mock questions. While AWS exam readiness training will help you prepare for the exam, there is nothing better than sitting for a mock exam and testing yourself in conditions similar to exam conditions. AWS offers a practice exam, which I would recommend you take at least a week before the actual exam, to judge your ability to sit for the actual exam. Based on the discussions with other test takers, if you have scored around 80 percent in the practice exam, you should be pretty confident to take the actual exam. However, before the practice exam, make sure you do other tests available. We have included a couple of practice tests with this book, which should give you some indication of your readiness for the exam. Make sure you take the tests in one complete sitting rather than over multiple days. Once you have done that, you need to look at all the questions you answered correctly and why the answers were correct. It could be a case in which, while you answered the question correctly, you didn't understand the concept the question was testing or have missed out on certain details that could potentially change the answer.
You need to read through the reference material for each test to ensure that you've covered the necessary aspects required to pass the exam.
An AWS professional-level exam requires you to be on top of your game, and just like any professional player, you need to be well rested before the exam. I recommend getting eight hours of sleep the night before the exam. Regarding scheduling the exam, I am often asked what the best time is to take a certification exam. I personally like doing it early in the morning; however, you need to identify the time in the day when you feel most energetic. Some people are full of energy early in the morning, while others ease into the day and are at full throttle by midafternoon.
You should be well hydrated before you take the exam.
You have 170 minutes (2 hours, 50 minutes) to answer 68±3 questions depending on how many test questions you get during the exam. The test questions are questions that are actually used for the purpose of improving the exam, as new questions are introduced on a regular basis, and the passing rate indicates if a question is valid for the exam. You have roughly two and a half minutes per question on average, with the majority of the questions being two to three paragraphs (almost one page) with at least four plausible choices. The plausible choice means that for a less-experienced candidate, all four choices will seem correct; however, there would be guidance in the question that makes one choice more correct than the other. This also means that you will spend most of the exam reading the question, occasionally twice, and if your reading speed is not good, you will find it hard to complete the entire exam.
Remember that while the exam does test your knowledge, I believe that it is also an examination of your patience and your focus.
You need to make sure that you go through not only the core material but also the reference material discussed in the book and that you run through the examples and workshops.
All the best with the exam!
I've worked hard to provide some really great tools to help you with your certification process. The interactive online learning environment that accompanies the AWS Certified Data Analytics Study Guide: Specialty (DAS-C01) Exam provides a test bank with study tools to help you prepare for the certification exam—and increase your chances of passing it the first time! The test bank includes the following:
Sample Tests
All the questions in this book are provided, including the assessment test at the end of this introduction and the review questions at the end of each chapter. In addition, there are two practice exams with 65 questions each. Use these questions to test your knowledge of the study guide material. The online test bank runs on multiple devices.
Flashcards
The online text banks include more than 150 flashcards specifically written to hit you hard, so don't get discouraged if you don't ace your way through them at first. They're there to ensure that you're really ready for the exam. And no worries—armed with the reading material, reference material, review questions, practice exams, and flashcards, you'll be more than prepared when exam day comes. Questions are provided in digital flashcard format (a question followed by a single correct answer). You can use the flashcards to reinforce your learning and provide last-minute test prep before the exam.
Glossary
A glossary of key terms from this book is available as a fully searchable PDF.
Go to www.wiley.com/go/sybextestprep to register and gain access to this interactive online learning environment and test bank with study tools.
The AWS Certified Data Analytics—Specialty (DAS-C01) exam is intended for people who are performing a data analytics–focused role. This exam validates an examinee's comprehensive understanding of using AWS services to design, build, secure, and maintain analytics solutions that provide insight from data.
It validates an examinee's ability in the following areas:
Designing, developing, and deploying cloud-based solutions using AWS
Designing and developing analytical projects on AWS using the AWS technology stack
Designing and developing data pipelines
Designing and developing data collection architectures
An understanding of the operational characteristics of the collection systems
Selection of collection systems that handle frequency, volume, and the source of the data
Understanding the different types of approaches of data collection and how the approaches differentiate from each other on the data formats, ordering, and compression
Designing optimal storage and data management systems to cater for the volume, variety, and velocity
Understanding the operational characteristics of analytics storage solutions
Understanding of the access and retrieval patterns of data
Understanding of appropriate data layout, schema, structure, and format
Understanding of the data lifecycle based on the usage patterns and business requirements
Determining the appropriate system for the cataloging of data and metadata
Identifying the most appropriate data processing solution based on business SLAs, data volumes, and cost
Designing a solution for transformation of data and preparing for further analysis
Automating appropriate data visualization solutions for a given scenario
Identifying appropriate authentication and authorization mechanisms
Applying data protection and encryption techniques
Applying data governance and compliance controls
Recommended AWS Knowledge
A minimum of 5 years of experience with common data analytics technologies
At least 2 years of hands-on experience working on AWS
Experience and expertise working with AWS services to design, build, secure, and maintain analytics solutions
The following table lists each domain and its weighting in the exam, along with the chapters in the book where that domain's objectives and subobjectives are covered.
Domain
Percentage of Exam
Chapter
Domain 1.0: Data Collection
18%
1, 2, 3
1.1 – Determine the operational characteristics of the collection system.
1.2 – Select a collection system that handles the frequency, volume and source of data.
1.3 – Select a collection system that addresses the key properties of the data, such as order, format and compression.
Domain 2.0: Storage and Data Management
22%
3, 4
2.1 – Determine the operational characteristics of the analytics storage solution.
2.2 – Determine data access and retrieval patterns.
2.3 – Select appropriate data layout, schema, structure and format.
2.4 – Define data lifecycle based on usage patterns and business requirements.
2.5 – Determine the appropriate system for cataloguing data and managing metadata.
Domain 3.0: Processing
24%
3, 4
3.1 – Determine appropriate data processing solution requirements.
3.2 – Design a solution for transforming and preparing data for analysis.
3.3 – Automate and operationalize data processing solution.
Domain 4.0: Analysis and Visualization
18%
3, 4, 5
4.1 – Determine the operational characteristics of the analysis and visualization layer.
4.2 – Select the appropriate data analysis solution for a given scenario.
4.3 – Select the appropriate data visualization solution for a given scenario.
Domain 5.0: Security
18%
2, 3, 4, 5, 6
5.1 – Select appropriate authentication and authorization mechanisms.
5.2 – Apply data protection and encryption techniques.
5.3 – Apply data governance and compliance controls.
You have been hired as a solution architect for a large media conglomerate that wants a cost-effective way to store a large collection of recorded interviews with the guests collected as MP4 files and a data warehouse system to capture the data across the enterprise and provide access via BI tools. Which of the following is the most cost-effective solution for this requirement?
Store large media files in Amazon Redshift and metadata in Amazon DynamoDB. Use Amazon DynamoDB and Redshift to provide decision-making with BI tools.
Store large media files in Amazon S3 and metadata in Amazon Redshift. Use Amazon Redshift to provide decision-making with BI tools.
Store large media files in Amazon S3, and store media metadata in Amazon EMR. Use Spark on EMR to provide decision-making with BI tools.
Store media files in Amazon S3, and store media metadata in Amazon DynamoDB. Use DynamoDB to provide decision-making with BI tools.
Which of the following is a distributed data processing option on Apache Hadoop and was the main processing engine until Hadoop 2.0?
MapReduce
YARN
Hive
ZooKeeper
You are working as an enterprise architect for a large fashion retailer based out of Madrid, Spain. The team is looking to build ETL and has large datasets that need to be transformed. Data is arriving from a number of sources and hence deduplication is also an important factor. Which of the following is the simplest way to process data on AWS?
Load data into Amazon Redshift, and build transformations using SQL. Build custom deduplication script.
Using AWS Glue to transform the data using built-in FindMatches ML Transform.
Load data into Amazon EMR, build Spark SQL scripts, and use custom deduplication script.
Use Amazon Athena for transformation and deduplication.
Which of these statements are true about AWS Glue crawlers? (Choose three.)
AWS Glue crawlers provide built-in classifiers that can be used to classify any type of data.
AWS Glue crawlers can connect to Amazon S3, Amazon RDS, Amazon Redshift, Amazon DynamoDB, and any JDBC sources.
AWS Glue crawlers provide custom classifiers, which provide the option to classify data that cannot be classified by built-in classifiers.
AWS Glue crawlers write metadata to AWS Glue Data Catalog.
You are working as an enterprise architect for a large player within the entertainment industry that has grown organically and by acquisition of other media players. The team is looking to build a central catalog of information that is spread across multiple databases (all of which have a JDBC interface), Amazon S3, Amazon Redshift, Amazon RDS, and Amazon DynamoDB tables. Which of the following is the most cost-effective way to achieve this on AWS?
Build scripts to extract the metadata from the different databases using native APIs and load them into Amazon Redshift. Build appropriate indexes and UI to support searching.
Build scripts to extract the metadata from the different databases using native APIs and load them into Amazon DynamoDB. Build appropriate indexes and UI to support searching.
Build scripts to extract the metadata from the different databases using native APIs and load them into an RDS database. Build appropriate indexes and UI to support searching.
Use AWS crawlers to crawl the data sources to build a central catalog. Use AWS Glue UI to support metadata searching.
You are working as a data architect for a large financial institution that has built its data platform on AWS. It is looking to implement fraud detection by identifying duplicate customer accounts and looking at when a newly created account matches one for a previously fraudulent user. The company wants to achieve this quickly and is looking to reduce the amount of custom code that might be needed to build this. Which of the following is the most cost-effective way to achieve this on AWS?
Build a custom deduplication script using Spark on Amazon EMR. Use PySpark to compare dataframes representing the new customers and fraudulent customers to identify matches.
Load the data to Amazon Redshift and use SQL to build deduplication.
Load the data to Amazon S3, which forms the basis of your data lake. Use Amazon Athena to build a deduplication script.
Load data to Amazon S3. Use AWS Glue FindMatches Transform to implement this.
Where is the metadata definition store in the AWS Glue service?
Table
Configuration files
Schema
Items
AWS Glue provides an interface to Amazon SageMaker notebooks and Apache Zeppelin notebook servers. You can also open a SageMaker notebook from the AWS Glue console directly.
True
False
AWS Glue provides support for which of the following languages? (Choose two.)
SQL
Java
Scala
Python
You work for a large ad-tech company that has a set of predefined ads displayed routinely. Due to the popularity of your products, your website is getting popular, garnering attention of a diverse set of visitors. You are currently placing dynamic ads based on user click data, but you have discovered the process time is not keeping up to display the new ads since a users' stay on the website is short lived (a few seconds) compared to your turnaround time for delivering a new ad (less than a minute). You have been asked to evaluate AWS platform services for a possible solution to analyze the problem and reduce overall ad serving time. What is your recommendation?
Push the clickstream data to an Amazon SQS queue. Have your application subscribe to the SQS queue and write data to an Amazon RDS instance. Perform analysis using SQL.
Move the website to be hosted in AWS and use AWS Kinesis to dynamically process the user clickstream in real time.
Push web clicks to Amazon Kinesis Firehose and analyze with Kinesis Analytics or Kinesis Client Library.
Push web clicks to Amazon Kinesis Stream and analyze with Kinesis Analytics or Kinesis Client Library (KCL).
You work for a new startup that is building satellite navigation systems competing with the likes of Garmin, TomTom, Google Maps, and Waze. The company's key selling point is its ability to personalize the travel experience based on your profile and use your data to get you discounted rates at various merchants. Its application is having huge success and the company now needs to load some of the streaming data from other applications onto AWS in addition to providing a secure and private connection from its on-premises data centers to AWS. Which of the following options will satisfy the requirement? (Choose two.)
AWS IOT Core
AWS IOT Device Management
Amazon Kinesis
Direct Connect
You work for a toy manufacturer whose assembly line contains GPS devices that track the movement of the toys on the conveyer belt and identify the real-time production status. Which of the following tools will you use on the AWS platform to ingest this data?
Amazon Redshift
Amazon Pinpoint
Amazon Kinesis
Amazon SQS
Which of the following refers to performing a single action on multiple items instead of repeatedly performing the action on each individual item in a Kinesis stream?
Batching
Collection
Aggregation
Compression
What is the term given to a sequence of data records in a stream in AWS Kinesis?
Batch
Group Stream
Consumer
Shard
You are working for a large telecom provider who has chosen the AWS platform for its data and analytics needs. It has agreed to using a data lake and S3 as the platform of choice for the data lake. The company is getting data generated from DPI (deep packet inspection) probes in near real time and looking to ingest it into S3 in batches of 100 MB or 2 minutes, whichever comes first. Which of the following is an ideal choice for the use case without any additional custom implementation?
Amazon Kinesis Data Analytics
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Redshift
You are working for a car manufacturer that is using Apache Kafka for its streaming needs. Its core challenges are scalability and manageability a current of Kafka infrastructure–hosted premise along with the escalating cost of human resources required to manage the application. The company is looking to migrate its analytics platform to AWS. Which of the following is an ideal choice on the AWS platform for this migration?
Amazon Kinesis Data Streams
Apache Kafka on EC2 instances
Amazon Managed Streaming for Kafka
Apache Flink on EC2 instances
You are working for a large semiconductor manufacturer based out of Taiwan that is using Apache Kafka for its streaming needs. It is looking to migrate its analytics platform to AWS and Amazon Managed Streaming for Kafka and needs your help to right-size the cluster. Which of the following will be the best way to size your Kafka cluster? (Choose two.)
Lift and shift your on-premises cluster.
Use your on-premises cluster as a guideline.
Perform a deep analysis of usage, patterns, and workloads before coming up with a recommendation.
Use the MSK calculator for pricing and sizing.
You are running an MSK cluster that is running out of disk space. What can you do to mitigate the issue and avoid running out of space in the future? (Choose four.)
Create a CloudWatch alarm that watches the KafkaDataLogsDiskUsed metric.
Create a CloudWatch alarm that watches the KafkaDiskUsed metric.
Reduce message retention period.
Delete unused shards.
Delete unused topics.
Increase broker storage.
Which of the following services can act as sources for Amazon Kinesis Data Firehose?
Amazon Managed Streaming for Kafka
Amazon Kinesis Data Streams
AWS Lambda
AWS IOT
How does a Kinesis Data Streams distribute data to different shards?
ShardId
Row hash key
Record sequence number
Partition key
How can you write data to a Kinesis Data Stream? (Choose three.)
Kinesis Producer Library
Kinesis Agent
Kinesis SDK
Kinesis Consumer Library
You are working for an upcoming e-commerce retailer that has seen its sales quadruple during the pandemic. It is looking to understand more about the customer purchase behavior on its website and believes that analyzing clickstream data might provide insight into the customers' time spent on the website. The clickstream data is being ingested in a streaming fashion with Kinesis Data Streams. The analysts are looking to rely on their advance SQL skills, while the management is looking for a serverless model to reduce their TCO rather than upfront investment. What is the best solution?
Spark streaming on Amazon EMR
Amazon Redshift
AWS Lambda with Kinesis Data Streams
Kinesis Data Analytics
Which of the following writes data to a Kinesis stream?
Consumers
Producers
Amazon MSK
Shards
Which of the following statements are true about KPL (Kinesis Producer Library)? (Choose three.)
Writes to one or more Kinesis Data Streams with an automatic and configurable retry mechanism.
Aggregates user records to increase payload size.
Submits CloudWatch metrics on your behalf to provide visibility into producer performance.
Forces the caller application to block and wait for a confirmation.
KPL does not incur any processing delay and hence is useful for all applications writing data to a Kinesis stream.
RecordMaxBufferedTime within the library is set to 1 millisecond and not changeable.
Which of the following is true about Kinesis Client Library? (Choose three.)
KCL is a Java library and does not support other languages.
KCL connects to the data stream and enumerates the shards within the data stream.
KCL pulls data records from the data stream.
KCL does not provide a checkpointing mechanism.
KCL instantiates a record processor for each stream.
KCL pushes the records to the corresponding record processor.
Which of the following metrics are sent by the Amazon Kinesis Data Streams agent to Amazon CloudWatch? (Choose three.)
MBs Sent
RecordSendAttempts
RecordSendErrors
RecordSendFailures
ServiceErrors
ServiceFailures
You are working as a data engineer for a gaming startup, and the operations team notified you that they are receiving a ReadProvisionedThroughputExceeded error. They are asking you to help out and identify the reason for the issue and help in the resolution. Which of the following statements will help? (Choose two.)
The GetRecords calls are being throttled by KinesisDataStreams over a duration of time.
The GetShardIterator is unable to get a new shard over a duration of time.
Reshard your stream to increase the number of shards.
Redesign your stream to increase the time between checks for the provision throughput to avoid the errors.
You are working as a data engineer for a microblogging website that is using Kinesis for streaming weblogs data. The operations team notified that they are experiencing an increase in latency when fetching records from the stream. They are asking you to help out and identify the reason for the issue and help in the resolution. Which of the following statements will help? (Choose three.)
There is an increase in record count resulting in an increase in latency.
There is an increase in the size of the record for each GET request.
There is an increase in the shard iterator's latency resulting in an increase in record fetch latency.
Increase the number of shards in your stream.
Decrease the stream retention period to catch up with the data backlog.
Move the processing to MSK to reduce latency.
Which of the following is true about rate limiting features on Amazon Kinesis? (Choose two.)
Rate limiting is not possible within Amazon Kinesis and you need MSK to implement rate limiting.
Rate limiting is only possible through Kinesis Producer Library.
Rate limiting is implemented using tokens and buckets within Amazon Kinesis.
Rate limiting uses standard counter implementation.
Rate limiting threshold is set to 50 percent and is not configurable.
What is the default data retention period for a Kinesis stream?
12 hours
168 hours
30 days
365 days
Which of the following options help improve efficiency with Kinesis Producer Library? (Choose two.)
Aggregation
Collection
Increasing number of shards
Reducing overall encryption
Which of the following services are valid destinations for Amazon Kinesis Firehose? (Choose three.)
Amazon S3
Amazon SageMaker
Amazon Elasticsearch
Amazon Redshift
Amazon QuickSight
AWS Glue
Which of the following is a valid mechanism to do data transformations from Amazon Kinesis Firehose?
AWS Glue
Amazon SageMaker
Amazon Elasticsearch
AWS Lambda
Which of the following is a valid mechanism to perform record conversions from Amazon Kinesis Firehose AWS Console? (Choose two.)
Apache Parquet
Apache ORC
Apache Avro
Apache Pig
You are working as a data engineer for a mid-sized boating company that is capturing data in real time for all of its boats connected via a 3G/4G connection. The boats typically sail in areas with good connectivity, and data loss from the IoT devices on the boat to a Kinesis stream is not possible. You are monitoring the data arriving from the stream and have realized that some of the records are being missed. What can be the underlying issue for records being skipped?
The connectivity from the boat to AWS is the reason for missed records.
processRecords() is throwing exceptions that are not being handled and hence the missed records.
The shard is already full and hence the data is being missed.
The record length is more than expected.
Apache Pig
How does Kinesis Data Firehose handle server-side encryption? (Choose three.)
Kinesis Data Firehose does not support server-side encryption.
Kinesis Data Firehose server-side encryption depends on the data source.
Kinesis Data Firehose does not store the unencrypted data at rest when the data source is a Kinesis Data stream encrypted by AWS KMS.
Kinesis Data Firehose stores the unencrypted data to S3 when the data source is a Kinesis Data stream encrypted by AWS KMS.
When data is delivered using Direct PUT, you can start encryption by using StartDeliveryStreamEncryption.
When data is delivered using Direct PUT, you can start encryption by using StartKinesisFirhoseEncryption.
How can you start an AWS Glue job automatically after the completion of a crawler? (Choose two.)
Use AWS Glue triggers to start a job when the crawler run completes.
Create an AWS Lambda function using Amazon CloudWatch events rule.
Use AWS Glue workflows.
This is not possible. You have to run it manually.
You are working as a consultant for an advertising agency that has hired a number of data scientists who are working to improve the online and offline campaigns for the company and using AWS Glue for most of their data engineering workloads. The data scientists have broad experience with adtech workloads and before joining the team have developed Python libraries that they would like to use in AWS Glue. How can they use the external Python libraries in an AWS Glue job? (Choose two.)
Package the libraries in a
.tar
file, and upload to Amazon S3.
Package the libraries in a
.zip
file, and upload to Amazon S3.
Use the library in a job or job run.
Unzip the compressed file programmatically before using the library in the job or job run.
You are working as a consultant for a large conglomerate that has recently acquired another company. It is looking to integrate the applications using a messaging system and it would like the applications to remain decoupled but still be able to send messages. Which of the following is the most cost-effective and scalable service to achieve the objective?
Apache Flink on Amazon EMR
Amazon Kinesis
Amazon SQS
AWS Glue streaming.
What types of queues does Amazon SQS support? (Choose two.)
Standard queue
FIFO queue
LIFO queue
Advanced queue
You are working as a data engineer for a telecommunications operator that is using DynamoDB for its operational data store. The company is looking to use AWS Data Pipeline for workflow orchestration and needs to send some SNS notifications as soon as an order is placed and a record is available in the DynamoDB table. What is the best way to handle this?
Configure a lambda function to keep scanning the DynamoDB table. Send an SNS notification once you see a new record.
Configure Amazon DynamoDB streams to orchestrate AWS Data Pipeline kickoff.
Configure an AWS Glue job that reads the DynamoDB table to trigger an AWS Data Pipeline job.
Use the preconditions available in AWS Data Pipeline.
You have been consulting on the AWS analytics platform for some years now. One of your top customers has reached out to you to understand the best way to export data from its DynamoDB table to its data lake on S3. The customer is looking to keep the cost to a minimum and ideally not involve a consulting expertise at this moment. What is the easiest way to handle this?
Export the data from Amazon DynamoDB to Amazon S3 using EMR custom scripts.
Build a custom lambda function that scans the data from DynamoDB and writes it to S3.
Use AWS Glue to read the DynamoDB table and use AWS Glue script generation to generate the script for you.
Use AWS Data Pipeline to copy data from DynamoDB to Amazon S3.
You have built your organization's data lake on Amazon S3. You are looking to capture and track all requests made to an Amazon S3 bucket. What is the simplest way to enable this?
Use Amazon Macie.
Use Amazon CloudWatch.
Use AWS CloudTrail.
Use Amazon S3 server access logging.
Your customer has recently received multiple 503 Slow Down errors during the Black Friday sale while ingesting data to Amazon S3. What could be the reason for this error?
Amazon S3 is unable scale to the needs of your data ingestion patterns.
This is an application-specific error originating from your web application and has nothing to do with Amazon S3.
You are writing lots of objects per prefix. Amazon S3 is scaling in the background to handle the spike in traffic.
You are writing large objects resulting in this error from Amazon S3.
Which of the following is a fully managed NoSQL service?
Amazon Redshift
Amazon Elasticsearch
Amazon DynamoDB
Amazon DocumentDB
Your customer is using Amazon DynamoDB for the operational use cases. One of its engineers has accidently deleted 10 records. Which of the following is a valid statement when it comes to recovering Amazon DynamoDB data?
Use backups from Amazon S3 to re-create the tables.
Use backups from Amazon Redshift to re-create the tables.
Use data from a different region.
Use Amazon DynamoDB PITR to recover the deleted data.
Which of the following scenarios suit a provisioned scaling mode for DynamoDB? (Choose two.)
You have predictable application traffic.
You are running applications whose traffic is consistent or ramps up gradually.
You are unable to forecast your capacity requirements.
You prefer a pay-as-you-go pricing model.
Which of the following statements are true about primary keys in DynamoDB? (Choose two.)
A table's primary key can be defined after the table creation.
DynamoDB supports two types of primary keys only.
A composite primary key is the same as a combination of partition key and sort key.
DynamoDB uses a sort key as an input to internal hash function, the output of which determines the partition where the item is stored.
You are working as a data engineer for a large corporation that is using DynamoDB to power its low-latency application requests. The application is based on a customer orders table that is used to provide information about customer orders based on a specific customer ID. A new requirement had recently arisen to identify customers based on a specific product ID. You decided to implement it as a secondary index. The application engineering team members have recently complained about the performance they are getting from the secondary index. Which of the following is a the most common reason for the performance degradation of a secondary index in DynamoDB?
The application engineering team is querying data for project attributes.
The application engineering team is querying data not projected in the secondary index.
The application engineering team is querying a partition key that is not part of the local secondary index.
The application engineering team is querying data for a different sort key value.
Your customer is looking to reduce the spend of its on-premises storage ensuring the low latency of the application, which depends on a subset of the entire dataset. The customer is happy with the characteristics of Amazon S3. Which of the following would you recommend?
Cached volumes
Stored volumes
File gateway
Tape gateway
Your customer is looking to reduce the spend of its on-premises storage ensuring the low latency of the application, which depends on the entire dataset. The customer is happy with the characteristics of Amazon S3. Which of the following would you recommend?
Cached volumes
Stored volumes
File gateway
Tape gateway
You are working as a consultant for a telecommunications company. The data scientists have requested direct access to the data to dive deep into the structure of the data and build models. They have good knowledge of SQL. Which of the following tools will you choose to provide them with direct access to the data and reduce the infrastructure and maintenance overhead while ensuring that access to data on Amazon S3 can be provided?
Amazon S3 Select
Amazon Athena
Amazon Redshift
Apache Presto on Amazon EMR
Which of the following file formats are supported by Amazon Athena? (Choose three.)
Apache Parquet
CSV
DAT
Apache ORC
Apache AVRO
TIFF
Your EMR cluster is facing performance issues. You are looking to investigate the errors and understand the potential performance problems on the nodes. Which of the following nodes can you skip during your test?
Master node
Core node
Task node
Leader node
Which of the following statements are true about Redshift leader nodes? (Choose three.)
Redshift clusters can have a single leader node.
Redshift clusters can have more than one leader node.
Redshift Leader nodes should have more memory than the compute nodes.
Redshift Leader nodes have the exact same specifications as the compute nodes.
You can choose your own Leader node sizing, and it is priced separately.
Redshift leader node is chosen automatically and is free to the users.
