E-Book
38,99 €

AWS Certified Data Analytics Study Guide E-Book

Asif Abbasi

0,0

38,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: John Wiley & Sons
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Move your career forward with AWS certification! Prepare for the AWS Certified Data Analytics Specialty Exam with this thorough study guide This comprehensive study guide will help assess your technical skills and prepare for the updated AWS Certified Data Analytics exam. Earning this AWS certification will confirm your expertise in designing and implementing AWS services to derive value from data. The AWS Certified Data Analytics Study Guide: Specialty (DAS-C01) Exam is designed for business analysts and IT professionals who perform complex Big Data analyses. This AWS Specialty Exam guide gets you ready for certification testing with expert content, real-world knowledge, key exam concepts, and topic reviews. Gain confidence by studying the subject areas and working through the practice questions. Big data concepts covered in the guide include: * Collection * Storage * Processing * Analysis * Visualization * Data security AWS certifications allow professionals to demonstrate skills related to leading Amazon Web Services technology. The AWS Certified Data Analytics Specialty (DAS-C01) Exam specifically evaluates your ability to design and maintain Big Data, leverage tools to automate data analysis, and implement AWS Big Data services according to architectural best practices. An exam study guide can help you feel more prepared about taking an AWS certification test and advancing your professional career. In addition to the guide's content, you'll have access to an online learning environment and test bank that offers practice exams, a glossary, and electronic flashcards.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 657

Veröffentlichungsjahr: 2020

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Cover

Title Page

Dedication

Acknowledgments

About the Author

About the Technical Editor

Introduction

What Does This Book Cover?

Preparing for the Exam

Registering for the Exam

Studying for the Exam

The Night before the Exam

During the Exam

Interactive Online Learning Environment and Test Bank

Exam Objectives

Assessment Test

Answers to the Assessment Test

Chapter 1: History of Analytics and Big Data

Evolution of Analytics Architecture Over the Years

The New World Order

Analytics Pipeline

The Big Data Reference Architecture

Data Lakes and Their Relevance in Analytics

Building a Data Lake on AWS

Using Lake Formation to Build a Data Lake on AWS

Exam Objectives

Assessment Test

References

Chapter 2: Data Collection

Exam Objectives

AWS IoT

Amazon Kinesis

AWS Glue

Amazon SQS

Amazon Data Migration Service

AWS Data Pipeline

Large-Scale Data Transfer Solutions

Summary

Review Questions

References

Exercises & Workshops

Chapter 3: Data Storage

Introduction

Amazon S3

Amazon S3 Glacier

Amazon DynamoDB

Amazon DocumentDB

Graph Databases and Amazon Neptune

Storage Gateway

Amazon EFS

Amazon FSx for Lustre

AWS Transfer for SFTP

Summary

Exercises

Review Questions

List of Tables

Chapter 2

TABLE 2.1 ML algorithms with Amazon Kinesis

TABLE 2.2 Transformations available within AWS Glue

TABLE 2.3 Transferring 100 TB of data over an Internet connection

Chapter 3

TABLE 3.1 Same-region replication (SRR) vs. cross-region replication (CRR)

TABLE 3.2 Compression algorithms and relevant use cases

TABLE 3.3 Scalar data types in DynamoDB

TABLE 3.4 Document data types in DynamoDB

TABLE 3.5 Common data categories and use cases

Chapter 4

TABLE 4.1 Performance improvements with columnar formats

TABLE 4.2 Performance improvements with partitioned data

TABLE 4.3 Amazon EMR instance types

TABLE 4.4 Apache Hive Pros and Cons

TABLE 4.5 Apache Pig – simple data types

TABLE 4.6 Apache Pig – complex data types

TABLE 4.7 Apache Presto Pros and Cons

TABLE 4.8 Apache Spark Benefits

TABLE 4.9 Running Apache Spark on Amazon EMR

TABLE 4.10 Choosing the right analytics tool

TABLE 4.11 Comparison of Elasticsearch with an RDBMS

TABLE 4.12 Redshift instance types

TABLE 4.13 Redshift parameter options

TABLE 4.14 Redshift data types

TABLE 4.15 Network transmission in query processing

TABLE 4.16 Comparing batch services

TABLE 4.17 Comparing orchestration options

Chapter 5

TABLE 5.1 Key SPICE limits

TABLE 5.2 Amazon QuickSight visualization types

TABLE 5.3 Differences between Amazon QuickSight Standard and Enterprise editi...

Chapter 6

TABLE 6.1 AWS security, identity, and compliance services

TABLE 6.2 AWS security credentials for accessing an AWS account

TABLE 6.3 Types of roles/terminology in AWS

TABLE 6.4 Managed security groups – Amazon EMR

TABLE 6.5 Encryption options – EMR

TABLE 6.6 S3 data encryption options

TABLE 6.7 Security options in Amazon QuickSight

List of Illustrations

Chapter 1

FIGURE 1.1 Traditional data warehousing setup in early to mid-2000s

FIGURE 1.2 Overview of an analytics pipeline

FIGURE 1.3 Business analytics spectrum

FIGURE 1.4 AWS analytics architecture

FIGURE 1.5 Data characteristics for hot, warm, and cold data

FIGURE 1.6 Typical steps in building a data lake

FIGURE 1.7 Building a data lake on AWS

FIGURE 1.8 Moving data into S3

FIGURE 1.9 AWS Glue

Chapter 2

FIGURE 2.1 AWS IoT device software services

FIGURE 2.2 AWS IoT control services

FIGURE 2.3 AWS IoT data services

FIGURE 2.4 AWS IoT - How it Works

FIGURE 2.5 Information half-life in decision-making

FIGURE 2.6 Kinesis Data Streams overview

FIGURE 2.7 Kinesis Data Streams data flow

FIGURE 2.8 Aggregation and collection with KPL

FIGURE 2.9 Data flow - S3 destination

FIGURE 2.10 Amazon Redshift as a destination for Kinesis Firehose

FIGURE 2.11 Amazon Elasticsearch Service as a destination for Kinesis Fireho...

FIGURE 2.12 Splunk as a destination for Kinesis Firehose

FIGURE 2.13 Amazon Kinesis Data Analytics

FIGURE 2.14 Kinesis application creation via Console

FIGURE 2.15 Kinesis Data Analytics application

FIGURE 2.16 Connecting to a streaming source

FIGURE 2.17 Author Kinesis Data Analytics using SQL

FIGURE 2.18 Kinesis Data Analytics authoring screen

FIGURE 2.19 Kinesis Data Analytics - Flink Interface

FIGURE 2.20 Scaling your Flink applications with Kinesis

FIGURE 2.21 Working with Kinesis Video Analytics

FIGURE 2.22 Kinesis Video Streams

FIGURE 2.23 Glue flow

FIGURE 2.24 Tables defined in a sample catalog

FIGURE 2.25 Table details

FIGURE 2.26 Glue crawlers

FIGURE 2.27 Authoring jobs in AWS Glue

FIGURE 2.28 Amazon Simple Queue Service

FIGURE 2.29 AWS Database Migration Service

FIGURE 2.30 AWS Data Pipeline

FIGURE 2.31 AWS Snowball

FIGURE 2.32 AWS Direct Connect

Chapter 3

FIGURE 3.1 Importance of data storage in an analytics pipeline

FIGURE 3.2 Types of storage solutions provided by AWS

FIGURE 3.3 Types of AWS S3 storage classes provided by AWS

FIGURE 3.4 Amazon S3 Glacier – vault creation

FIGURE 3.5 DynamoDB global tables

FIGURE 3.6 DynamoDB Accelerator

FIGURE 3.7 Amazon DynamoDB Streams – Kinesis Adapter

FIGURE 3.8 Amazon DocumentDB architecture

FIGURE 3.9 File Storage Deployment – storage gateway

FIGURE 3.10 Amazon EFS use cases

FIGURE 3.11 Amazon EFS use cases

FIGURE 3.12 AWS Transfer for SFTP

Chapter 4

FIGURE 4.1 Apache Presto architecture

FIGURE 4.2 Amazon Athena federated query

FIGURE 4.3 HDFS architecture

FIGURE 4.4 Anatomy of a MapReduce job

FIGURE 4.5 YARN applications

FIGURE 4.6 Security configurations in EMR (Encryption)

FIGURE 4.7 Security configurations in EMR (Authentication and IAM Role for E...

FIGURE 4.8 Creating an EMR notebook

FIGURE 4.9 Apache Spark Framework overview

FIGURE 4.10 Apache Spark architecture

FIGURE 4.11 Apache Spark DStreams Overview

FIGURE 4.12 An undirected graph with seven vertices/nodes and nine edges

FIGURE 4.13 Setting up HBase using the Quick Options creation mode

FIGURE 4.14 Setting up HBase using Advanced Options

FIGURE 4.15 Apache Flink overview

FIGURE 4.16 Working of Amazon Elasticsearch service

FIGURE 4.17 Amazon Redshift architecture

FIGURE 4.18 Amazon Redshift AQUA architecture

FIGURE 4.19 Amazon Redshift workload management

FIGURE 4.20 Life of a query

FIGURE 4.21 Relationship between streams, segments, and steps

FIGURE 4.22 Redshift cluster VPCs

FIGURE 4.23 Four-tier hierarchy—encryption keys in Redshift

FIGURE 4.24 Working of Kinesis Data Analytics

FIGURE 4.25 Working of Kinesis Data Analytics

Chapter 5

FIGURE 5.1 Data and analytics consumption patterns

FIGURE 5.2 Amazon QuickSight overview

FIGURE 5.3 Frequency per opportunity stage

FIGURE 5.4 Amazon QuickSight supported data sources

FIGURE 5.5 Editing a dataset – Amazon QuickSight

FIGURE 5.6 Anomaly detection in Amazon QuickSight

FIGURE 5.7 Sharing dashboards in Amazon QuickSight

FIGURE 5.8 The Kibana dashboard on Amazon Elasticsearch

FIGURE 5.9 Kibana dashboard with sample e-commerce data

FIGURE 5.10 AWS Machine Learning Stack

Chapter 6

FIGURE 6.1 AWS shared responsibility model

FIGURE 6.2 Amazon EMR inside a public subnet

FIGURE 6.3 Amazon EMR inside a private subnet

FIGURE 6.4 Security options within Amazon EMR Console

FIGURE 6.5 Security Configurations Amazon EMR- Encryption options

FIGURE 6.6 EMR security configuration – Quick Options mode

FIGURE 6.7 Security options in Amazon EMR – advanced cluster setup

FIGURE 6.8 S3 Block Public Access settings

FIGURE 6.9 S3 encryption options

FIGURE 6.10 Federated access for Amazon Athena

FIGURE 6.11 Key hierarchy in Amazon Redshift

FIGURE 6.12 Amazon Elasticsearch network configuration

FIGURE 6.13 Fine-grained access control within a VPC sample workflow

FIGURE 6.14 Fine grained access control with a public domain sample workflow...

FIGURE 6.15 Web identity federation with Amazon DynamoDB

Guide

Cover Page

Table of Contents

Begin Reading

Pages

iii

vii

xxi

xxii

xxiii

xxiv

xxv

xxvi

xxvii

xxviii

xxix

xxx

xxxi

xxxii

xxxiii

xxxiv

xxxv

xxxvi

xxxvii

xxxviii

xxxix

xli

xlii

xliii

xliv

xlv

xlvi

xlvii

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

339

340

341

342

343

344

345

346

347

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

368

369

AWS Certified Data Analytics

Study Guide Specialty (DAS-C01) Exam

Asif Abbasi

Published simultaneously in Canada

ISBN: 978-1-119-64947-2

ISBN: 978-1-119-64944-1 (ebk.)

ISBN: 978-1-119-64945-8 (ebk.)

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Web site may provide or recommendations it may make. Further, readers should be aware that Internet Web sites listed in this work may have changed or disappeared between when this work was written and when it is read.

For general information on our other products and services or to obtain technical support, please contact our Customer Care Department within the U.S. at (877) 762-2974, outside the U.S. at (317) 572-3993 or fax (317) 572-4002.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Control Number: 2020938557

TRADEMARKS: Wiley, the Wiley logo, and the Sybex logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. AWS is a registered trademark of Amazon Technologies, Inc. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

To all my teachers, family members, and great friends who are constant sources of learning, joy, and a boost to happiness!

Acknowledgments

Writing acknowledgments is the hardest part of book writing, and the reason is that there are a number of people and organizations who have directly and indirectly influenced the writing process. The last thing you ever want to do is to miss giving the credit to folks where it is due. Here is my feeble attempt to ensure I recognize everyone who inspired and helped during the writing of this book. I apologize sincerely to anyone that I have missed.

I would like to first of all acknowledge the great folks at AWS, who work super hard to not only produce great technology but also to create such great content in the form of blogs, AWS re:Invent videos, and supporting guides that are a great source of inspiration and learning, and this book would not have been possible without tapping into some great resources produced by my extended AWS team. You guys rock! I owe it to every single employee within AWS; you are all continually raising the bar. I would have loved to name all the people here, but I have been told acknowledgments cannot be in the form of a book.

I would also like to thank John Streit, who was super supportive throughout the writing of the book. I would like to thank my specialist team across EMEA who offered support whenever required. You are some of the most gifted people I have worked with during my entire career.

I would also like to thank Wiley's great team, who were patient with me during the entire process, including Kenyon Brown, David Clark, Todd Montgomery, Saravanan Dakshinamurthy, Christine O’ Connor, and Judy Flynn in addition to the great content editing and production team.

About the Author

Asif Abbasi is currently working as a specialist solutions architect for AWS, focusing on data and analytics, and is currently working with customers across Europe, the Middle East, and Africa. Asif joined AWS in 2008 and has since been helping customers with building, migration, and optimizing their analytics pipelines on AWS.

Asif has been working in the IT industry for over 20 years, with a core focus on data, and has worked with industry leaders in this space like Teradata, Cisco, and SAS prior to joining AWS. Asif authored a book on Apache Spark in 2017 and has been a regular reviewer of AWS data and analytics blogs.

Asif has a master's degree in computer science (Software Engineering) and business administration. Asif is currently living in Dubai, United Arab Emirates, with his wife, Hifza, and his children Fatima, Hassan, Hussain, and Aisha. When not working with customers, Asif spends most of his time with family and mentoring students in the area of data and analytics.

About the Technical Editor

Todd Montgomery (Austin, Texas) is a senior data center networking engineer for a large international consulting company where he is involved in network design, security, and implementation of emerging data center and cloud-based technologies. He holds six AWS certifications, including the Data Analytics specialty certification. Todd holds a degree in Electronics Engineering and multiple certifications from Cisco Systems, Juniper Networks, and CompTIA. Todd also leads the Austin AWS certification meetup group. In his spare time, Todd likes motorsports, live music, and traveling.

Introduction

Studying for any certification exam can seem daunting. AWS Certified Data Analytics Study Guide: Specialty (DAS-C01) Exam was designed and developed with relevant topics, questions, and exercises to enable a cloud practitioner to focus their precious study time and effort on the germane set of topics targeted at the right level of abstraction so they can confidently take the AWS Certified Data Analytics – Specialty (DAS-C01) exam.

This study guide presents a set of topics around the data and analytics pipeline and discusses various topics including data collection, data transformation, data storage and processing, data analytics, data visualization, and the encompassing security elements for the pipeline. The study guide also includes reference material and additional materials and hands-on workshops that are highly recommended and will aid in your overall learning experience.

What Does This Book Cover?

This book covers topics you need to know to prepare for the AWS Certified Data Analytics – Specialty (DAS-C01) exam:

Chapter 1

: History of Analytics and Big Data

This chapter begins with a history of big data and its evolution over the years before discussing the analytics pipeline and the big data reference architecture. It also talks about some key architectural principles for an analytics pipeline and introduces the concept of data lakes before introducing AWS Lake Formation to build the data lakes.

Chapter 2

: Data Collection

Data collection is typically the first step in an analytics pipeline. This chapter discusses the various services involved in data collection, ranging from services related to streaming data ingestion like Amazon Kinesis and Amazon SQS to mini-batch and large-scale batch transfers like AWS Glue, AWS Data Pipeline, and the AWS Snow family.

Chapter 3

: Data Storage

Chapter 3

discusses various storage options available on Amazon Web Services, including Amazon S3, Amazon S3 Glacier, Amazon DynamoDB, Amazon DocumentDB, Amazon Neptune, AWS Storage Gateway, Amazon EFS, Amazon FSx for Lustre, and AWS Transfer for SFTP. I not only discuss the different options but the use cases around which each one of these are suitable and when to choose one over the other.

Chapter 4

: Data Processing and Analysis

Chapter 4

, we will cover data processing and analysis technologies on the AWS stack, including Amazon Athena, Amazon EMR, Amazon Elasticsearch Service, Amazon Redshift, and Amazon Kinesis Data Analytics, before wrapping up the chapter with a discussion around orchestration tools like AWS Step Functions, Apache Airflow, and AWS Glue workflow management. I'll also compare some of the processing technologies around the use cases and when to use which technology.

Chapter 5

: Data Visualization

Chapter 5

will discuss the visualization options like Amazon QuickSight and other visualization options available on AWS Marketplace. I'll briefly touch on the AWS ML stack as that is also a natural consumer of analytics on the AWS stack.

Chapter 6

: Data Security

A major section of the exam is security considerations for the analytics pipeline, and hence I have dedicated a complete chapter to security, discussing IAM and security for each service available on the Analytics stack.

Preparing for the Exam

AWS offers multiple levels of certification for the AWS platform. The basic level for the certification is the foundation level, which covers the AWS Certified Cloud Practitioner exam.

We then have the associate-level exams, which require at least one year of hands-on knowledge on the AWS platform. At the time of this writing, AWS offers three associate-level exams:

AWS Certified Solutions Architect Associate

AWS Certified SysOps Administrator Associate

AWS Certified Developer Associate

AWS then offers professional-level exams, which require the candidates to have at least two years of experience with designing, operating, and troubleshooting the solutions using the AWS cloud. At the time of this writing, AWS offers two professional exams:

AWS Certified Solutions Architect Professional

AWS Certified DevOps Engineer Professional

AWS also offers specialty exams, which are considered to be professional-level exams and require deep technical expertise in the area being tested. At the time of this writing, AWS offers six specialty exams:

AWS Certified Advanced Networking Specialty

AWS Certified Security Specialty

AWS Certified Alexa Skill Builder Specialty

AWS Certified Database Specialty

AWS Certified Data Analytics Specialty

AWS Certified Machine Learning Specialty

You are preparing for the AWS Certified Data Analytics Specialty exam, which covers the services that are discussed in the book. However, this book is not the “bible” on the exam; this is a professional-level exam, which means you will have to bring your A game to the table if you are looking to pass the exam. You will need to have hands-on experience with data analytics in general and AWS analytics services in particular. In this introduction, we will look at what you need to do to prepare for the exam and how to sit for the actual exam and then provide you with a sample exam that you can attempt before you attend the actual exam.

Let's get started.

Registering for the Exam

You can schedule any AWS exam by following this link bit.ly/PrepareAWSExam. If you don't have an AWS certification account, you can sign up for the account during the exam registration process.

You can choose an appropriate test delivery vendor like Pearson VUE or PSI or proctor it online. Search for the exam code DSA-C01 to register for the exam.

At the time of this writing, the exam costs $300, with the practice exam costing $40. The cost of the exam is subject to change.

Studying for the Exam

While this book covers information around the data analytics landscape and the technologies covered in the exam, it alone is not enough for you to pass the exam; you need to have the required practical knowledge to go with it. As a recommended practice, you should complement the material from each chapter with practical exercises provided at the end of the chapter and tutorials on AWS documentation. Professional-level exams require hands-on knowledge with the concepts and tools that you are being tested on.

The following workshops are essential for you to go through before you can attempt the AWS Data Analytics Specialty exam. At the time of this writing, the following workshops were available to the general public, and each provides really good technical depth around the technologies:

AWS DynamoDB Labs –

amazon-dynamodb-labs.com

Amazon Elasticsearch workshops –

deh4m73phis7u.cloudfront.net/log-analytics/mainlab

Amazon Redshift Modernization Workshop –

github.com/aws-samples/amazon-redshift-modernize-dw

Amazon Database Migration Workshop –

github.com/aws-samples/amazon-aurora-database-migration-workshop-reinvent2019

AWS DMS Workshop –

dms-immersionday.workshop.aws

AWS Glue Workshop –

aws-glue.analytics.workshops.aws.dev/en

Amazon Redshift Immersion Day –

redshift-immersion.workshop.aws

Amazon EMR with Service Catalog –

s3.amazonaws.com/kenwalshtestad/cfn/public/sc/bootcamp/emrloft.html

Amazon QuickSight Workshop –

d3akduqkn9yexq.cloudfront.net

Amazon Athena Workshop –

athena-in-action.workshop.aws

AWS Lakeformation Workshop –

lakeformation.aworkshop.io

Data Engineering 2.0 Workshop –

aws-dataengineering-day.workshop.aws/en

Data Ingestion and Processing Workshop –

dataprocessing.wildrydes.com

Incremental data processing on Amazon EMR –

incremental-data-processing-on-amazonemr.workshop.aws/en

Realtime Analytics and serverless datalake demos –

demostore.cloud

Serverless datalake workshop –

github.com/aws-samples/amazon-serverless-datalake-workshop

Voice-powered analytics –

github.com/awslabs/voice-powered-analytics

Amazon Managed Streaming for Kafka Workshop –

github.com/awslabs/voice-powered-analytics

AWS IOT Analytics Workshop –

s3.amazonaws.com/iotareinvent18/Workshop.html

Opendistro for Elasticsearch Workshop –

reinvent.aesworkshops.com/opn302

Data Migration (AWS Storage Gateway, AWS snowball, AWS DataSync) –

reinvent2019-data-workshop.s3-website-us-east-1.amazonaws.com

AWS Identity – Using Amazon Cognito for serverless consumer apps –

serverless-idm.awssecworkshops.com

Serverless data prep with AWS Glue –

s3.amazonaws.com/ant313/ANT313.html

AWS Step Functions –

step-functions-workshop.go-aws.com

S3 Security Settings and Controls –

github.com/aws-samples/amazon-s3-security-settings-and-controls

Data Sync and File gateway –

github.com/aws-samples/aws-datasync-migration-workshop

AWS Hybrid Storage Workshop –

github.com/aws-samples/aws-hybrid-storage-workshop

AWS also offers a free digital exam readiness training for the Data and Analytics exam that can be attended for free online. The training is available at www.aws.training/Details/eLearning?id=46612. This is a 3.5-hour digital training course that will help you with the following aspects of the exam:

Navigating the logistics of the examination process

Understanding the exam structure and question types

Identifying how questions relate to AWS data analytics concepts

Interpreting the concepts being tested by exam questions

Developing a personalized study plan to prepare for the exam

This is a good way to not only ensure that you have covered all important material for the exam, but also to develop a personalized plan to prepare for the exam.

Once you have studied for the exam, it's time to run through some mock questions. While AWS exam readiness training will help you prepare for the exam, there is nothing better than sitting for a mock exam and testing yourself in conditions similar to exam conditions. AWS offers a practice exam, which I would recommend you take at least a week before the actual exam, to judge your ability to sit for the actual exam. Based on the discussions with other test takers, if you have scored around 80 percent in the practice exam, you should be pretty confident to take the actual exam. However, before the practice exam, make sure you do other tests available. We have included a couple of practice tests with this book, which should give you some indication of your readiness for the exam. Make sure you take the tests in one complete sitting rather than over multiple days. Once you have done that, you need to look at all the questions you answered correctly and why the answers were correct. It could be a case in which, while you answered the question correctly, you didn't understand the concept the question was testing or have missed out on certain details that could potentially change the answer.

You need to read through the reference material for each test to ensure that you've covered the necessary aspects required to pass the exam.

The Night before the Exam

An AWS professional-level exam requires you to be on top of your game, and just like any professional player, you need to be well rested before the exam. I recommend getting eight hours of sleep the night before the exam. Regarding scheduling the exam, I am often asked what the best time is to take a certification exam. I personally like doing it early in the morning; however, you need to identify the time in the day when you feel most energetic. Some people are full of energy early in the morning, while others ease into the day and are at full throttle by midafternoon.

During the Exam

You should be well hydrated before you take the exam.

You have 170 minutes (2 hours, 50 minutes) to answer 68±3 questions depending on how many test questions you get during the exam. The test questions are questions that are actually used for the purpose of improving the exam, as new questions are introduced on a regular basis, and the passing rate indicates if a question is valid for the exam. You have roughly two and a half minutes per question on average, with the majority of the questions being two to three paragraphs (almost one page) with at least four plausible choices. The plausible choice means that for a less-experienced candidate, all four choices will seem correct; however, there would be guidance in the question that makes one choice more correct than the other. This also means that you will spend most of the exam reading the question, occasionally twice, and if your reading speed is not good, you will find it hard to complete the entire exam.

Remember that while the exam does test your knowledge, I believe that it is also an examination of your patience and your focus.

You need to make sure that you go through not only the core material but also the reference material discussed in the book and that you run through the examples and workshops.

All the best with the exam!

Interactive Online Learning Environment and Test Bank

I've worked hard to provide some really great tools to help you with your certification process. The interactive online learning environment that accompanies the AWS Certified Data Analytics Study Guide: Specialty (DAS-C01) Exam provides a test bank with study tools to help you prepare for the certification exam—and increase your chances of passing it the first time! The test bank includes the following:

Sample Tests

All the questions in this book are provided, including the assessment test at the end of this introduction and the review questions at the end of each chapter. In addition, there are two practice exams with 65 questions each. Use these questions to test your knowledge of the study guide material. The online test bank runs on multiple devices.

Flashcards

The online text banks include more than 150 flashcards specifically written to hit you hard, so don't get discouraged if you don't ace your way through them at first. They're there to ensure that you're really ready for the exam. And no worries—armed with the reading material, reference material, review questions, practice exams, and flashcards, you'll be more than prepared when exam day comes. Questions are provided in digital flashcard format (a question followed by a single correct answer). You can use the flashcards to reinforce your learning and provide last-minute test prep before the exam.

Glossary

A glossary of key terms from this book is available as a fully searchable PDF.

Go to www.wiley.com/go/sybextestprep to register and gain access to this interactive online learning environment and test bank with study tools.

Exam Objectives

The AWS Certified Data Analytics—Specialty (DAS-C01) exam is intended for people who are performing a data analytics–focused role. This exam validates an examinee's comprehensive understanding of using AWS services to design, build, secure, and maintain analytics solutions that provide insight from data.

It validates an examinee's ability in the following areas:

Designing, developing, and deploying cloud-based solutions using AWS

Designing and developing analytical projects on AWS using the AWS technology stack

Designing and developing data pipelines

Designing and developing data collection architectures

An understanding of the operational characteristics of the collection systems

Selection of collection systems that handle frequency, volume, and the source of the data

Understanding the different types of approaches of data collection and how the approaches differentiate from each other on the data formats, ordering, and compression

Designing optimal storage and data management systems to cater for the volume, variety, and velocity

Understanding the operational characteristics of analytics storage solutions

Understanding of the access and retrieval patterns of data

Understanding of appropriate data layout, schema, structure, and format

Understanding of the data lifecycle based on the usage patterns and business requirements

Determining the appropriate system for the cataloging of data and metadata

Identifying the most appropriate data processing solution based on business SLAs, data volumes, and cost

Designing a solution for transformation of data and preparing for further analysis

Automating appropriate data visualization solutions for a given scenario

Identifying appropriate authentication and authorization mechanisms

Applying data protection and encryption techniques

Applying data governance and compliance controls

Recommended AWS Knowledge

A minimum of 5 years of experience with common data analytics technologies

At least 2 years of hands-on experience working on AWS

Experience and expertise working with AWS services to design, build, secure, and maintain analytics solutions

Objective Map

The following table lists each domain and its weighting in the exam, along with the chapters in the book where that domain's objectives and subobjectives are covered.

Domain

Percentage of Exam

Chapter

Domain 1.0: Data Collection

18%

1, 2, 3

1.1 – Determine the operational characteristics of the collection system.

1.2 – Select a collection system that handles the frequency, volume and source of data.

1.3 – Select a collection system that addresses the key properties of the data, such as order, format and compression.

Domain 2.0: Storage and Data Management

22%

3, 4

2.1 – Determine the operational characteristics of the analytics storage solution.

2.2 – Determine data access and retrieval patterns.

2.3 – Select appropriate data layout, schema, structure and format.

2.4 – Define data lifecycle based on usage patterns and business requirements.

2.5 – Determine the appropriate system for cataloguing data and managing metadata.

Domain 3.0: Processing

24%

3, 4

3.1 – Determine appropriate data processing solution requirements.

3.2 – Design a solution for transforming and preparing data for analysis.

3.3 – Automate and operationalize data processing solution.

Domain 4.0: Analysis and Visualization

18%

3, 4, 5

4.1 – Determine the operational characteristics of the analysis and visualization layer.

4.2 – Select the appropriate data analysis solution for a given scenario.

4.3 – Select the appropriate data visualization solution for a given scenario.

Domain 5.0: Security

18%

2, 3, 4, 5, 6

5.1 – Select appropriate authentication and authorization mechanisms.

5.2 – Apply data protection and encryption techniques.

5.3 – Apply data governance and compliance controls.

Assessment Test

You have been hired as a solution architect for a large media conglomerate that wants a cost-effective way to store a large collection of recorded interviews with the guests collected as MP4 files and a data warehouse system to capture the data across the enterprise and provide access via BI tools. Which of the following is the most cost-effective solution for this requirement?

Store large media files in Amazon Redshift and metadata in Amazon DynamoDB. Use Amazon DynamoDB and Redshift to provide decision-making with BI tools.

Store large media files in Amazon S3 and metadata in Amazon Redshift. Use Amazon Redshift to provide decision-making with BI tools.

Store large media files in Amazon S3, and store media metadata in Amazon EMR. Use Spark on EMR to provide decision-making with BI tools.

Store media files in Amazon S3, and store media metadata in Amazon DynamoDB. Use DynamoDB to provide decision-making with BI tools.

Which of the following is a distributed data processing option on Apache Hadoop and was the main processing engine until Hadoop 2.0?

MapReduce

YARN

Hive

ZooKeeper

You are working as an enterprise architect for a large fashion retailer based out of Madrid, Spain. The team is looking to build ETL and has large datasets that need to be transformed. Data is arriving from a number of sources and hence deduplication is also an important factor. Which of the following is the simplest way to process data on AWS?

Load data into Amazon Redshift, and build transformations using SQL. Build custom deduplication script.

Using AWS Glue to transform the data using built-in FindMatches ML Transform.

Load data into Amazon EMR, build Spark SQL scripts, and use custom deduplication script.

Use Amazon Athena for transformation and deduplication.

Which of these statements are true about AWS Glue crawlers? (Choose three.)

AWS Glue crawlers provide built-in classifiers that can be used to classify any type of data.

AWS Glue crawlers can connect to Amazon S3, Amazon RDS, Amazon Redshift, Amazon DynamoDB, and any JDBC sources.

AWS Glue crawlers provide custom classifiers, which provide the option to classify data that cannot be classified by built-in classifiers.

AWS Glue crawlers write metadata to AWS Glue Data Catalog.

You are working as an enterprise architect for a large player within the entertainment industry that has grown organically and by acquisition of other media players. The team is looking to build a central catalog of information that is spread across multiple databases (all of which have a JDBC interface), Amazon S3, Amazon Redshift, Amazon RDS, and Amazon DynamoDB tables. Which of the following is the most cost-effective way to achieve this on AWS?

Build scripts to extract the metadata from the different databases using native APIs and load them into Amazon Redshift. Build appropriate indexes and UI to support searching.

Build scripts to extract the metadata from the different databases using native APIs and load them into Amazon DynamoDB. Build appropriate indexes and UI to support searching.

Build scripts to extract the metadata from the different databases using native APIs and load them into an RDS database. Build appropriate indexes and UI to support searching.

Use AWS crawlers to crawl the data sources to build a central catalog. Use AWS Glue UI to support metadata searching.

You are working as a data architect for a large financial institution that has built its data platform on AWS. It is looking to implement fraud detection by identifying duplicate customer accounts and looking at when a newly created account matches one for a previously fraudulent user. The company wants to achieve this quickly and is looking to reduce the amount of custom code that might be needed to build this. Which of the following is the most cost-effective way to achieve this on AWS?

Build a custom deduplication script using Spark on Amazon EMR. Use PySpark to compare dataframes representing the new customers and fraudulent customers to identify matches.

Load the data to Amazon Redshift and use SQL to build deduplication.

Load the data to Amazon S3, which forms the basis of your data lake. Use Amazon Athena to build a deduplication script.

Load data to Amazon S3. Use AWS Glue FindMatches Transform to implement this.

Where is the metadata definition store in the AWS Glue service?

Table

Configuration files

Schema

Items

AWS Glue provides an interface to Amazon SageMaker notebooks and Apache Zeppelin notebook servers. You can also open a SageMaker notebook from the AWS Glue console directly.

True

False

AWS Glue provides support for which of the following languages? (Choose two.)

SQL

Java

Scala

Python

You work for a large ad-tech company that has a set of predefined ads displayed routinely. Due to the popularity of your products, your website is getting popular, garnering attention of a diverse set of visitors. You are currently placing dynamic ads based on user click data, but you have discovered the process time is not keeping up to display the new ads since a users' stay on the website is short lived (a few seconds) compared to your turnaround time for delivering a new ad (less than a minute). You have been asked to evaluate AWS platform services for a possible solution to analyze the problem and reduce overall ad serving time. What is your recommendation?

Push the clickstream data to an Amazon SQS queue. Have your application subscribe to the SQS queue and write data to an Amazon RDS instance. Perform analysis using SQL.

Move the website to be hosted in AWS and use AWS Kinesis to dynamically process the user clickstream in real time.

Push web clicks to Amazon Kinesis Firehose and analyze with Kinesis Analytics or Kinesis Client Library.

Push web clicks to Amazon Kinesis Stream and analyze with Kinesis Analytics or Kinesis Client Library (KCL).

You work for a new startup that is building satellite navigation systems competing with the likes of Garmin, TomTom, Google Maps, and Waze. The company's key selling point is its ability to personalize the travel experience based on your profile and use your data to get you discounted rates at various merchants. Its application is having huge success and the company now needs to load some of the streaming data from other applications onto AWS in addition to providing a secure and private connection from its on-premises data centers to AWS. Which of the following options will satisfy the requirement? (Choose two.)

AWS IOT Core

AWS IOT Device Management

Amazon Kinesis

Direct Connect

You work for a toy manufacturer whose assembly line contains GPS devices that track the movement of the toys on the conveyer belt and identify the real-time production status. Which of the following tools will you use on the AWS platform to ingest this data?

Amazon Redshift

Amazon Pinpoint

Amazon Kinesis

Amazon SQS

Which of the following refers to performing a single action on multiple items instead of repeatedly performing the action on each individual item in a Kinesis stream?

Batching

Collection

Aggregation

Compression

What is the term given to a sequence of data records in a stream in AWS Kinesis?

Batch

Group Stream

Consumer

Shard

You are working for a large telecom provider who has chosen the AWS platform for its data and analytics needs. It has agreed to using a data lake and S3 as the platform of choice for the data lake. The company is getting data generated from DPI (deep packet inspection) probes in near real time and looking to ingest it into S3 in batches of 100 MB or 2 minutes, whichever comes first. Which of the following is an ideal choice for the use case without any additional custom implementation?

Amazon Kinesis Data Analytics

Amazon Kinesis Data Firehose

Amazon Kinesis Data Streams

Amazon Redshift

You are working for a car manufacturer that is using Apache Kafka for its streaming needs. Its core challenges are scalability and manageability a current of Kafka infrastructure–hosted premise along with the escalating cost of human resources required to manage the application. The company is looking to migrate its analytics platform to AWS. Which of the following is an ideal choice on the AWS platform for this migration?

Amazon Kinesis Data Streams

Apache Kafka on EC2 instances

Amazon Managed Streaming for Kafka

Apache Flink on EC2 instances

You are working for a large semiconductor manufacturer based out of Taiwan that is using Apache Kafka for its streaming needs. It is looking to migrate its analytics platform to AWS and Amazon Managed Streaming for Kafka and needs your help to right-size the cluster. Which of the following will be the best way to size your Kafka cluster? (Choose two.)

Lift and shift your on-premises cluster.

Use your on-premises cluster as a guideline.

Perform a deep analysis of usage, patterns, and workloads before coming up with a recommendation.

Use the MSK calculator for pricing and sizing.

You are running an MSK cluster that is running out of disk space. What can you do to mitigate the issue and avoid running out of space in the future? (Choose four.)

Create a CloudWatch alarm that watches the KafkaDataLogsDiskUsed metric.

Create a CloudWatch alarm that watches the KafkaDiskUsed metric.

Reduce message retention period.

Delete unused shards.

Delete unused topics.

Increase broker storage.

Which of the following services can act as sources for Amazon Kinesis Data Firehose?

Amazon Managed Streaming for Kafka

Amazon Kinesis Data Streams

AWS Lambda

AWS IOT

How does a Kinesis Data Streams distribute data to different shards?

ShardId

Row hash key

Record sequence number

Partition key

How can you write data to a Kinesis Data Stream? (Choose three.)

Kinesis Producer Library

Kinesis Agent

Kinesis SDK

Kinesis Consumer Library

You are working for an upcoming e-commerce retailer that has seen its sales quadruple during the pandemic. It is looking to understand more about the customer purchase behavior on its website and believes that analyzing clickstream data might provide insight into the customers' time spent on the website. The clickstream data is being ingested in a streaming fashion with Kinesis Data Streams. The analysts are looking to rely on their advance SQL skills, while the management is looking for a serverless model to reduce their TCO rather than upfront investment. What is the best solution?

Spark streaming on Amazon EMR

Amazon Redshift

AWS Lambda with Kinesis Data Streams

Kinesis Data Analytics

Which of the following writes data to a Kinesis stream?

Consumers

Producers

Amazon MSK

Shards

Which of the following statements are true about KPL (Kinesis Producer Library)? (Choose three.)

Writes to one or more Kinesis Data Streams with an automatic and configurable retry mechanism.

Aggregates user records to increase payload size.

Submits CloudWatch metrics on your behalf to provide visibility into producer performance.

Forces the caller application to block and wait for a confirmation.

KPL does not incur any processing delay and hence is useful for all applications writing data to a Kinesis stream.

RecordMaxBufferedTime within the library is set to 1 millisecond and not changeable.

Which of the following is true about Kinesis Client Library? (Choose three.)

KCL is a Java library and does not support other languages.

KCL connects to the data stream and enumerates the shards within the data stream.

KCL pulls data records from the data stream.

KCL does not provide a checkpointing mechanism.

KCL instantiates a record processor for each stream.

KCL pushes the records to the corresponding record processor.

Which of the following metrics are sent by the Amazon Kinesis Data Streams agent to Amazon CloudWatch? (Choose three.)

MBs Sent

RecordSendAttempts

RecordSendErrors

RecordSendFailures

ServiceErrors

ServiceFailures

You are working as a data engineer for a gaming startup, and the operations team notified you that they are receiving a ReadProvisionedThroughputExceeded error. They are asking you to help out and identify the reason for the issue and help in the resolution. Which of the following statements will help? (Choose two.)

The GetRecords calls are being throttled by KinesisDataStreams over a duration of time.

The GetShardIterator is unable to get a new shard over a duration of time.

Reshard your stream to increase the number of shards.

Redesign your stream to increase the time between checks for the provision throughput to avoid the errors.

You are working as a data engineer for a microblogging website that is using Kinesis for streaming weblogs data. The operations team notified that they are experiencing an increase in latency when fetching records from the stream. They are asking you to help out and identify the reason for the issue and help in the resolution. Which of the following statements will help? (Choose three.)

There is an increase in record count resulting in an increase in latency.

There is an increase in the size of the record for each GET request.

There is an increase in the shard iterator's latency resulting in an increase in record fetch latency.

Increase the number of shards in your stream.

Decrease the stream retention period to catch up with the data backlog.

Move the processing to MSK to reduce latency.

Which of the following is true about rate limiting features on Amazon Kinesis? (Choose two.)

Rate limiting is not possible within Amazon Kinesis and you need MSK to implement rate limiting.

Rate limiting is only possible through Kinesis Producer Library.

Rate limiting is implemented using tokens and buckets within Amazon Kinesis.

Rate limiting uses standard counter implementation.

Rate limiting threshold is set to 50 percent and is not configurable.

What is the default data retention period for a Kinesis stream?

12 hours

168 hours

30 days

365 days

Which of the following options help improve efficiency with Kinesis Producer Library? (Choose two.)

Aggregation

Collection

Increasing number of shards

Reducing overall encryption

Which of the following services are valid destinations for Amazon Kinesis Firehose? (Choose three.)

Amazon S3

Amazon SageMaker

Amazon Elasticsearch

Amazon Redshift

Amazon QuickSight

AWS Glue

Which of the following is a valid mechanism to do data transformations from Amazon Kinesis Firehose?

AWS Glue

Amazon SageMaker

Amazon Elasticsearch

AWS Lambda

Which of the following is a valid mechanism to perform record conversions from Amazon Kinesis Firehose AWS Console? (Choose two.)

Apache Parquet

Apache ORC

Apache Avro

Apache Pig

You are working as a data engineer for a mid-sized boating company that is capturing data in real time for all of its boats connected via a 3G/4G connection. The boats typically sail in areas with good connectivity, and data loss from the IoT devices on the boat to a Kinesis stream is not possible. You are monitoring the data arriving from the stream and have realized that some of the records are being missed. What can be the underlying issue for records being skipped?

The connectivity from the boat to AWS is the reason for missed records.

processRecords() is throwing exceptions that are not being handled and hence the missed records.

The shard is already full and hence the data is being missed.

The record length is more than expected.

Apache Pig

How does Kinesis Data Firehose handle server-side encryption? (Choose three.)

Kinesis Data Firehose does not support server-side encryption.

Kinesis Data Firehose server-side encryption depends on the data source.

Kinesis Data Firehose does not store the unencrypted data at rest when the data source is a Kinesis Data stream encrypted by AWS KMS.

Kinesis Data Firehose stores the unencrypted data to S3 when the data source is a Kinesis Data stream encrypted by AWS KMS.

When data is delivered using Direct PUT, you can start encryption by using StartDeliveryStreamEncryption.

When data is delivered using Direct PUT, you can start encryption by using StartKinesisFirhoseEncryption.

How can you start an AWS Glue job automatically after the completion of a crawler? (Choose two.)

Use AWS Glue triggers to start a job when the crawler run completes.

Create an AWS Lambda function using Amazon CloudWatch events rule.

Use AWS Glue workflows.

This is not possible. You have to run it manually.

You are working as a consultant for an advertising agency that has hired a number of data scientists who are working to improve the online and offline campaigns for the company and using AWS Glue for most of their data engineering workloads. The data scientists have broad experience with adtech workloads and before joining the team have developed Python libraries that they would like to use in AWS Glue. How can they use the external Python libraries in an AWS Glue job? (Choose two.)

Package the libraries in a

.tar

file, and upload to Amazon S3.

Package the libraries in a

.zip

file, and upload to Amazon S3.

Use the library in a job or job run.

Unzip the compressed file programmatically before using the library in the job or job run.

You are working as a consultant for a large conglomerate that has recently acquired another company. It is looking to integrate the applications using a messaging system and it would like the applications to remain decoupled but still be able to send messages. Which of the following is the most cost-effective and scalable service to achieve the objective?

Apache Flink on Amazon EMR

Amazon Kinesis

Amazon SQS

AWS Glue streaming.

What types of queues does Amazon SQS support? (Choose two.)

Standard queue

FIFO queue

LIFO queue

Advanced queue

You are working as a data engineer for a telecommunications operator that is using DynamoDB for its operational data store. The company is looking to use AWS Data Pipeline for workflow orchestration and needs to send some SNS notifications as soon as an order is placed and a record is available in the DynamoDB table. What is the best way to handle this?

Configure a lambda function to keep scanning the DynamoDB table. Send an SNS notification once you see a new record.

Configure Amazon DynamoDB streams to orchestrate AWS Data Pipeline kickoff.

Configure an AWS Glue job that reads the DynamoDB table to trigger an AWS Data Pipeline job.

Use the preconditions available in AWS Data Pipeline.

You have been consulting on the AWS analytics platform for some years now. One of your top customers has reached out to you to understand the best way to export data from its DynamoDB table to its data lake on S3. The customer is looking to keep the cost to a minimum and ideally not involve a consulting expertise at this moment. What is the easiest way to handle this?

Export the data from Amazon DynamoDB to Amazon S3 using EMR custom scripts.

Build a custom lambda function that scans the data from DynamoDB and writes it to S3.

Use AWS Glue to read the DynamoDB table and use AWS Glue script generation to generate the script for you.

Use AWS Data Pipeline to copy data from DynamoDB to Amazon S3.

You have built your organization's data lake on Amazon S3. You are looking to capture and track all requests made to an Amazon S3 bucket. What is the simplest way to enable this?

Use Amazon Macie.

Use Amazon CloudWatch.

Use AWS CloudTrail.

Use Amazon S3 server access logging.

Your customer has recently received multiple 503 Slow Down errors during the Black Friday sale while ingesting data to Amazon S3. What could be the reason for this error?

Amazon S3 is unable scale to the needs of your data ingestion patterns.

This is an application-specific error originating from your web application and has nothing to do with Amazon S3.

You are writing lots of objects per prefix. Amazon S3 is scaling in the background to handle the spike in traffic.

You are writing large objects resulting in this error from Amazon S3.

Which of the following is a fully managed NoSQL service?

Amazon Redshift

Amazon Elasticsearch

Amazon DynamoDB

Amazon DocumentDB

Your customer is using Amazon DynamoDB for the operational use cases. One of its engineers has accidently deleted 10 records. Which of the following is a valid statement when it comes to recovering Amazon DynamoDB data?

Use backups from Amazon S3 to re-create the tables.

Use backups from Amazon Redshift to re-create the tables.

Use data from a different region.

Use Amazon DynamoDB PITR to recover the deleted data.

Which of the following scenarios suit a provisioned scaling mode for DynamoDB? (Choose two.)

You have predictable application traffic.

You are running applications whose traffic is consistent or ramps up gradually.

You are unable to forecast your capacity requirements.

You prefer a pay-as-you-go pricing model.

Which of the following statements are true about primary keys in DynamoDB? (Choose two.)

A table's primary key can be defined after the table creation.

DynamoDB supports two types of primary keys only.

A composite primary key is the same as a combination of partition key and sort key.

DynamoDB uses a sort key as an input to internal hash function, the output of which determines the partition where the item is stored.

You are working as a data engineer for a large corporation that is using DynamoDB to power its low-latency application requests. The application is based on a customer orders table that is used to provide information about customer orders based on a specific customer ID. A new requirement had recently arisen to identify customers based on a specific product ID. You decided to implement it as a secondary index. The application engineering team members have recently complained about the performance they are getting from the secondary index. Which of the following is a the most common reason for the performance degradation of a secondary index in DynamoDB?

The application engineering team is querying data for project attributes.

The application engineering team is querying data not projected in the secondary index.

The application engineering team is querying a partition key that is not part of the local secondary index.

The application engineering team is querying data for a different sort key value.

Your customer is looking to reduce the spend of its on-premises storage ensuring the low latency of the application, which depends on a subset of the entire dataset. The customer is happy with the characteristics of Amazon S3. Which of the following would you recommend?

Cached volumes

Stored volumes

File gateway

Tape gateway

Your customer is looking to reduce the spend of its on-premises storage ensuring the low latency of the application, which depends on the entire dataset. The customer is happy with the characteristics of Amazon S3. Which of the following would you recommend?

Cached volumes

Stored volumes

File gateway

Tape gateway

You are working as a consultant for a telecommunications company. The data scientists have requested direct access to the data to dive deep into the structure of the data and build models. They have good knowledge of SQL. Which of the following tools will you choose to provide them with direct access to the data and reduce the infrastructure and maintenance overhead while ensuring that access to data on Amazon S3 can be provided?

Amazon S3 Select

Amazon Athena

Amazon Redshift

Apache Presto on Amazon EMR

Which of the following file formats are supported by Amazon Athena? (Choose three.)

Apache Parquet

CSV

DAT

Apache ORC

Apache AVRO

TIFF

Your EMR cluster is facing performance issues. You are looking to investigate the errors and understand the potential performance problems on the nodes. Which of the following nodes can you skip during your test?

Master node

Core node

Task node

Leader node

Which of the following statements are true about Redshift leader nodes? (Choose three.)

Redshift clusters can have a single leader node.

Redshift clusters can have more than one leader node.

Redshift Leader nodes should have more memory than the compute nodes.

Redshift Leader nodes have the exact same specifications as the compute nodes.

You can choose your own Leader node sizing, and it is priced separately.

Redshift leader node is chosen automatically and is free to the users.