37,19 €
The AWS Certified Database – Specialty certification is one of the most challenging AWS certifications. It validates your comprehensive understanding of databases, including the concepts of design, migration, deployment, access, maintenance, automation, monitoring, security, and troubleshooting. With this guide, you'll understand how to use various AWS databases, such as Aurora Serverless and Global Database, and even services such as Redshift and Neptune.
You’ll start with an introduction to the AWS databases, and then delve into workload-specific database design. As you advance through the chapters, you'll learn about migrating and deploying the databases, along with database security techniques such as encryption, auditing, and access controls. This AWS book will also cover monitoring, troubleshooting, and disaster recovery techniques, before testing all the knowledge you've gained throughout the book with the help of mock tests.
By the end of this book, you'll have covered everything you need to pass the DBS-C01 AWS certification exam and have a handy, on-the-job desk reference guide.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 543
Veröffentlichungsjahr: 2022
A comprehensive guide to becoming an AWS Certified Database specialist
Kate Gawron
BIRMINGHAM—MUMBAI
Copyright © 2022 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Product Manager: Heramb Bhavsar
Senior Editor: Nazia Shaikh
Content Development Editor: Sean Lobo
Technical Editor: Rahul Limbachiya
Copy Editor: Safis Editing
Project Coordinator: Aparna Ravikumar Nair
Proofreader: Safis Editing
Indexer: Tejal Daruwale Soni
Production Designer: Nilesh Mohite
Marketing Coordinator: Nivedita Singh
First published: May 2022
Production reference: 1220422
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80324-310-8
www.packt.com
Kate Gawron is a full-time senior database consultant and part-time future racing driver. She was a competitor in Formula Woman, and she aspires to become a professional Gran Turismo (GT) racing driver. Away from the racetrack, Kate has worked with Oracle databases for 18 years and AWS for five years. She holds four AWS certifications, including the AWS Certified Database – Specialty certification as well as two professional Oracle qualifications. Kate currently works as a senior database architect, where she works with customers to migrate and refactor their databases to work optimally within the AWS cloud.
Divaker Goel started his career in Bengaluru, India, a city that he loves. He has 18+ years' experience in various roles, including database administrator, team leader, and database manager. He has six AWS certificates and experience in multiple relational and NoSQL databases. Currently based out of Austin, Texas, he works for AWS as a database consultant. His role at AWS includes working with customers to identify their data store requirements and suggest one that best fits their use case, using his vast experience and knowledge. He also helps customers to deploy and configure databases on the cloud, migrate their on-premises data, set up monitoring, and optimize cloud usage. In his free time, he enjoys reading technical books and watching movies.
Amit Upadhyay is a certified AWS database specialist and database modernization consultant at Amazon Web Services, Inc. He received a Master's degree and PhD in computer science. He is focusing on developing a cost-effective cloud modernization framework program for industry leaders to make important financial decisions during their digital transformation journey. Amit has delivered strategies, solutions, designs, proof of concepts, and best practices for AWS high-tech, financial, and insurance customers, to refactor and modernize on a large scale commercial, complex, and mission-critical databases as AWS cloud databases, such as Amazon RDS, Aurora, Redshift, DynamoDB, ElastiCache, MemoryDB for Redis, DocumentDB, and Keyspace.
Shirin Ali Kanchwala is a database consultant for AWS who lives in Katy, Texas, with her family. Originally Canadian, Shirin brings unique experiences and approaches to every problem and solution. She works as a database migration specialist to help Amazon customers migrate their on-premises database workloads to the AWS cloud.
"To my daughter Zahabia. You are my rock. Thank you."
Since 2009, when Amazon Web Services launched the first fully-managed cloud database solution (Relational Database Service (RDS), the desire for Database Administrators (DBAs) with a cloud skillset has grown rapidly. Today, the AWS Certified Database Specialty certification is one of the most sought after.
Many DBAs and other IT professionals are looking to expand their skills in cloud computing and cloud databases in particular. The differences between on-premises databases and the cloud are vast, and as such, many DBAs can find they have many barriers to overcome to adapt. This book is designed to help fill that gap. I have written this book in a practical style that will allow you the opportunity to understand both the theory of the subject and to experiment using cloud technology through hands-on workshops and labs.
By the end of this book, you will be able to comfortably explain the major concepts in cloud databases, from the basics to advanced performance tuning and troubleshooting. You will also be equipped with the practical skills to continue your learning beyond this book to develop additional skills to help progress your career. Ultimately, this book will provide you with the knowledge and skills to pass the AWS DBS-C01 exam confidently.
This book is for anyone with a background working with databases who is looking to expand into the cloud or to develop their existing skills. However, this book assumes that you have a low base level of knowledge in databases or in AWS in general, and so dedicates the opening chapters to fundamental skills.
Chapter 1, AWS Certified Database – Specialty Exam Overview, introduces the exam topics and format and offers hints and tips to help you excel in the exam.
Chapter 2, Understanding Database Fundamentals, helps to give an overview of database technologies and explains the different types of databases, for readers not proficient in database technologies.
Chapter 3, Understanding AWS Infrastructure, offers a high-level view of AWS as a whole as well as a deeper dive into some of the AWS services you need to know.
Chapter 4, Relational Database Service, introduces the first of the AWS fully-managed database offerings.
Chapter 5, Amazon Aurora, explores a custom database service developed by AWS.
Chapter 6, Amazon DynamoDB, introduces our first NoSQL database offered by AWS.
Chapter 7, Redshift and DocumentDB, looks at two specialized database solutions, one for analytics and the other for storing data held within documents.
Chapter 8, Neptune, Quantum Ledger Database, and Timestream, explores three different database technologies: graphs, ledges, and time series.
Chapter 9, Amazon ElastiCache, looks at using a caching database to improve database and application performance.
Chapter 10, The AWS Schema Conversion Tool and AWS Database Migration Service, introduces knowledge around database migrations to AWS and changing the database engine.
Chapter 11, Database Task Automation, covers how to use automation skills and services to reduce manual work and enforce standards.
Chapter 12, AWS Database Security, dives deep into database security processes and procedures.
Chapter 13, CloudWatch and Logging, looks into how to monitor your databases and how to use CloudWatch to find anomalies and errors.
Chapter 14, Backup and Restore, offers a comprehensive explanation of AWS backup and restore techniques as well as theory on RTO and RPO.
Chapter 15, Troubleshooting Tools and Techniques, introduces troubleshooting techniques to help find and resolve common database errors.
Chapter 16, Exam Practice, provides an opportunity to test your new skills in a practice exam with questions very similar to the ones you will see in the exam.
Chapter 17, Answers, provides you with all the answers to the questions at the end of every chapter and their explanations.
To complete the hands-on sections of this book, you will need an AWS account complete with root access. Chapter 1, AWS Certified Database – Specialty Exam Overview, contains instructions on how to set up your account in the best way. You will also need some software to allow you to connect to databases such as SQL Developer or MySQL Workbench.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
Each chapter may have different requirements so please review the Technical requirements section at the start of each chapter.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/AWS-Certified-Database---Specialty-DBS-C01-Certification. If there's an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781803243108_ColorImages.pdf.
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."
A block of code is set as follows:
html, body, #map {
height: 100%;
margin: 0;
padding: 0
}
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)
Any command-line input or output is written as follows:
$ mkdir css
$ cd css
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "Select System info from the Administration panel."
Tips or Important notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Once you've read AWS Certified Database - Specialty (DBS-C01) Certification Guide, we'd love to hear your thoughts! Scan the QR code below to go straight to the Amazon review page for this book and share your feedback.
https://packt.link/r/1-803-24310-4
Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.
This section provides an overview of the AWS Database Specialty exam. It also introduces database fundamentals and ensures you have a good base level of database knowledge before delving deeper into AWS-specific concepts.
This section includes the following chapters:
Chapter 1, AWS Certified Database – Specialty Exam OverviewChapter 2, Understanding Database FundamentalsChapter 3, Understanding AWS InfrastructureThe AWS certifications are some of the most prized in the IT industry. The specialty level exams demonstrate a high level of knowledge about the chosen subject, and obtaining such as certification can lead to increased salaries, better career options, and recognition. The first step on this path is to begin learning all the skills and techniques that will be covered in the exam and this book aims to be a comprehensive guide that will aid you greatly during your preparation.
Before we start studying the technical areas covered in the exam, it is worthwhile discussing the exam format and what types of questions will be asked. You can use this information to aid your learning as you progress through this book. You will learn how the exam works, how long you have to complete it, and how many questions there will be. You'll then learn the specific domains in the exam and the likely number of questions on each domain before learning some useful exam tips to guide you in your approach during the exam to maximize your marks.
In this chapter, we are going to cover the following topics:
Exam formatExam domainsDatabase securityExam tipsFirst, let's look at the exam format itself so you know what to expect when you book the AWS Certified Database – Specialty exam to when you earn that pass!
All AWS exams are taken electronically either at a test center or remotely via an online proctoring session.
The exam lasts 180 minutes and there will be 65 questions.
The pass mark will vary slightly between each exam, but the minimum will always be 750 out of 1000. This variation is due to some questions being rated as more or less difficult than the default, so they are weighted for fairness. As a rough guide, a pass would be obtained by answering 50 questions correctly.
You are not penalized for incorrect answers, so you should attempt to answer all questions even if you do not know the answer.
The exam starts with a short opening section where you need to confirm your details, the exam you are taking, and the copyright that you may not share the exam questions. Once this is done, you will be given a brief overview of the exam and how to navigate through the screens.
The majority of the questions are situational style, requiring you to be able to interpret the question to work out the correct answer.
The questions are all multiple choice with two different styles:
Multiple choice: Has one correct answer and three incorrect answers. Multiple answer: Has two or more correct answers out of five or more options. The question will state how many answers are expected.You can mark each question for review at the end.
At the end of the exam, there is a survey about the exam and your preparation for it. You must complete this before receiving your exam result.
You will receive your pass or fail result immediately once you complete the survey, but you will not receive your full results and the score achieved until it has been verified. This verification normally takes 3 working days. Once the verification has been completed, you will receive an email to your registered address and you will be able to obtain your full score report, showing how well you performed on each domain. This is especially useful if you do not meet the passing grade as you will be given areas to focus your studies on for the next attempt.
Now that you understand what happens after you book and take the exam, let's look at what you'll need to know to obtain a passing grade. In the next section, we are going to look at the exam domains or areas that will need to be covered.
The AWS Certified Database – Specialty (DBS-C01) exam is split into five high-level topics covering a wide range of subjects. These are broken down as follows:
Figure 1.1 – Table showing the percentage weighting for the five domains in the exam
The percentage refers to the most likely number of questions that will be asked in the exam. You can expect a breakdown similar to the following:
Figure 1.2 – Table showing the approximate number of questions for each domain in the exam
AWS offers a high-level description of each domain, but it doesn't fully explain all the technologies, solutions, and services you'll need to know in order to pass the exam. In the next few sections, we are going to look in more depth at what each domain really means and the key topics within it. This can be used to help guide you while you study and prepare for the exam. Let's now look at each domain in detail.
Workload-specific database design can also sometimes be referred to as purpose-built databases. It means that before choosing which database to use, you look closely at the type of data that will be stored and how the application or users access it.
To succeed in this domain, you will need to know how to build a scalable and resilient database using the right database engine for the use case and to understand how to use AWS infrastructure to protect it from failures.
In order to succeed in this domain, you will need to:
Know how to build a scalable and resilient database using the right database engine for the use case and to understand how to use AWS infrastructure to protect it from failures. Know the features that can be used to increase the performance of your databases, and which work for each different database type and use case. Understand and calculate the costs of different database solutions and how you can help to optimize costs while meeting the performance and resilience requirements of the use case.Under this domain, the following topics will be covered:
Selecting appropriate database services for specific types of data and workloadsDetermining strategies for disaster recovery and high availabilityDesigning database solutions for performance, compliance, and scalabilityComparing the costs of database solutionsIn the following sections we will cover every topic in detail.
This area focuses on the different database engines AWS supports and the options available for them.
You will need to know about all of the database engines that AWS offers and versions, what features are available for each, and how to apply those to a specific use case. We will spend time studying these in depth later in this book as knowing these thoroughly will be the best way to pass the exam.
Disaster recovery and high availability are critical topics in the exam. You will need to understand terms such as Multi-AZ, read replicas, cross-region, and backup and restore techniques, all of which will be covered in depth throughout this book.
You will need to know how to use these tools correctly for different database engines to meet the use case.
Each different database offers different ways to scale. You will need to understand horizontal versus vertical scaling and how that affects database design.
You will need to know different compliance standards and how AWS uses tools and services to meet them.
For this domain, you will need to understand the different pricing models AWS uses for each database type, and you will learn the types of instances available as well as how pricing plans work for serverless databases.
You will need to be able to work out which database type is most cost-effective for your given use case.
The second domain you will be tested on is deployment and migration. This will cover creating different databases both via the console and the command line and the options available, such as cloning. For the migration, you will need to know AWS best practices on moving data to AWS and how to use the tools offered for migrations.
Under this domain, the following topics will be covered:
Automating database solution deploymentsDetermining data preparation and migration strategies Executing and validating data migrationWe will begin by looking more closely at each topic.
Automation is the key to this area, and you will need to know the AWS automation tools and how to use them.
You will also need to know the common commands to create databases programmatically.
For this area, you will need to understand and explain the different methods of migrating data to AWS and the options available. This section focuses on understanding and assessing the current database and producing a strategy for the migration.
You will need to understand and explain the different methods of migrating data to AWS and the options available.
This area covers the migration steps and tools available to use on AWS. You will be tested on your understanding of data validation techniques and how AWS manages this.
You will need to understand how to monitor data migration and how to tune and optimize it.
Once you have migrated your data to AWS, you now need to know how to manage and operate the databases. You will need to understand backup and restore technologies, maintenance tasks and windows, and how to work with database options.
The topics covered in this domain are the following:
Determining maintenance tasks and processesDetermining backup and restore strategiesManaging the operational environment of a database solutionNext, we will study these topics in detail.
AWS uses maintenance windows to allow their teams to carry out work in the background. You will need to understand how this works, what can happen during them, and how to use them for your own maintenance strategies.
This area focuses on backup strategies rather than the underlying technology. You will need to understand and describe terms such as RTO and RPO and know how to utilize different AWS services to meet the needs of the use case. You will need to know the fastest and most efficient options for recovering from the failure of different databases.
The operational environment includes areas such as parameters and options that can be enabled.
You will need to understand how to use these as well as best practices for consistency for large database estates.
If your database starts to underperform or to give errors, you need to be able to identify the problem and understand common troubleshooting techniques. This domain focuses on database metrics, diagnosing faults, and how to resolve them quickly, as well as configuring alerting.
The topics covered in this domain are the following:
Determining monitoring and alerting strategiesTroubleshooting and resolving common database issuesOptimizing database performanceNext, we will study these topics in detail.
Working with AWS tools to create an alerting strategy that meets the needs of the use case is critical for all database deployments.
You will need to know the standard metrics that AWS uses and how to configure these for alerting purposes.
AWS databases have multiple logs to write data. You will need to know the different logs and which one to use for each problem.
You will need to understand to use the information in the logs to resolve common database problems such as space issues or error messages.
In this area, you will need to learn how to read AWS graphs showing database metrics and what actions to take if there are issues.
You will need to understand how the different tools can offer deeper insights into the performance of the database to help analyze the problem.
The final domain will test your understanding of database security covering all aspects, from access and audit controls to patching for security fixes. This domain also covers encryption techniques, both of the stored data and in transit.
The topics covered in this domain are the following:
Encrypting data at rest and in transitEvaluating auditing solutionsDetermining access control and authentication mechanisms Recognizing potential security vulnerabilities within database solutionsNow, let's begin to study these topics.
Encryption is used to make it harder for anyone unauthorized to see the data stored or in transit. You will need to know how to work with encryption at the database layer and how to encrypt connections between the application and the database.
Auditing is used to keep a record of actions made within a database, but it can cause performance issues if not configured correctly.
You will need to understand different auditing techniques and the tools AWS provides to assist.
Databases in AWS have multiple methods for access that differ depending on the database. AWS also has its own built-in identity management service that can be used to restrict or grant database access.
You will need to know which methods work with which databases and how to configure and administrate logins using different methods.
This area focuses on patching and why this is done. It also expects you to understand what your responsibilities are in terms of securing your own databases and what areas are the responsibility of AWS.
You will need to understand the AWS shared responsibility model as well as understand the patching strategies offered by AWS.
We've covered the format of the exam (the duration of the exam, how many questions there are) and we've looked in depth at the domains and topics you will be tested on, so you should be confident in what you need to study to be able to pass the exam.
However, knowing some tips around how to tackle the exam can make a difference in terms of a pass or fail, so it's worth taking some time to think about a strategy for the exam so that you can maximize your chances of earning that pass.
Let's take a look at a sample question:
A company's online application stores order transactions in an Amazon RDS for a MySQL database. The database has run out of available storage and the application is currently unable to take orders.
Which action should a database specialist take to resolve the issue in the shortest amount of time?
Configure a read replica in a different AZ with more storage space.Create a new DB instance with more storage space from the latest backup.Change the DB instance status from STORAGE_FULL to AVAILABLE.Add additional storage space to the DB instance using the ModifyDBInstance action.Follow this approach to answer the question:
To begin, you should seek to understand what the question is really asking you, and what tools and services are in scope? For this question, these are the topics you need to consider:
Amazon RDS for MySQLStorageNow, you can clearly see the database technology in use that will help work out the correct answers, but first, let's eliminate the wrong ones.
For a large number of questions, at least one of the answers is obviously incorrect. If you can quickly identify the ones that cannot be correct and then make an educated guess between the remaining ones, you are likely to get a few more questions right, which could make the difference between a pass and a fail.
Once you've worked through this book, hopefully, you should quickly be able to discount answers 1 and 3. 3 is wrong because you cannot modify an instance status in that way, and 1 because adding a read replica does not solve the full storage problem.
That now leaves us with just two possible correct answers. If, at this point, you are not certain of the answer, you have still doubled your odds of guessing correctly.
In general, once you have removed the obviously incorrect answers, the ones you are left with would both work, but one meets the use case of the question more closely than the other.
The final step is to search the question for important keywords such as:
Shortest timeFastestMost cost-efficientSimplestOur question uses the phrase 'shortest amount of time'. We can use this additional information to help us decide on the correct answer. Creating a new database instance will take a lot longer than adding more storage to our existing one, and therefore the correct answer is 4.
You may find many questions you can simply answer without using this technique, but following this strategy on the questions you do not know will really make a difference to your final score.
To summarize the steps you should take for answering questions during the exam:
Identify the technologies and solutions being referred to in the question.Remove answers that cannot be correct based on those technologies.Identify the keywords in the question to help you work out the best answer from among the ones remaining. Now that you've learned the best techniques and tips to take into the exam, you should have a lot more confidence and a greater chance of success.
In this chapter, you have learned about the format the exam will take, including how many questions there will be and how long you have to answer them. We've also looked at how the exam is graded and what you need to do to earn a pass grade.
You should now know all the different topics that will be covered in the exam, which will be used to help guide your learning through this book.
And finally, we took some time to go over exam tips and techniques to help you maximize your success when you take the exam.
This information is critical for the exam, both in terms of knowing what areas to study, and knowing how best to tackle the exam. When you are unsure of an answer, the exam tips given in this chapter will be very valuable in helping you work out the most likely options.
The next chapter will start looking at the AWS infrastructure in which you will incorporate and develop databases, and the exam will include questions that will expect you to know the fundamentals of AWS beyond databases.
Before we start looking at specific AWS database technologies and services, it's important to understand the different types of databases that are available and what type of workloads you should consider putting into each database. We are doing this so that when we start learning about the various AWS services, you will understand how and why there are so many different types and options.
We will be studying how databases differ between running them on-premises and in the cloud in terms of access, administration, and maintenance. These topics will appear in the exam, so being able to define the differences is important.
If you already have a database background and are comfortable with the different database types and how they work, please feel free to skip this chapter and go straight to Chapter 3, Understanding AWS Infrastructure. But if you want to go ahead and learn about the differences, then stick around until the end of this chapter. In this chapter, you will learn how to describe the key differences between on-premises and cloud databases and describe the different types of databases and when to use them. You will also learn how to define the benefits of a cloud database compared to an on-premises database and understand the compromises cloud databases entail.
In particular, we will be covering the following main topics:
On-premises versus cloud databasesSQL databases versus NoSQLRelational database management systemsKey-value and document databasesGraph and ledger databasesNow, let's begin by comparing on-premises databases to cloud databases and learn about some of the key differences that we will need to know for the AWS Certified Database – Specialty (DBS-C01) exam.
An on-premises database could be defined as follows:
"A database that is owned, operated, and maintained by the customer within a location that they control and have full autonomy over the database and underlying servers and networking components. This can be a server room in their office or a rented cabinet within a shared data center that the customer has full and direct access to."
With an on-premises database, everything is done internally, from installation and implementation to running the database every day. Maintenance, security, and updates also need to be taken care of in-house. You will need to arrange for when the software will be purchased before it is installed on your servers. The customer will assume complete ownership and control, even if this is via a management company or service provider.
A cloud database could be defined as follows:
"A database that is owned, operated, and maintained by the customer within a location that they do not control and do not have full autonomy over the database and the underlying servers and networking components. This will often be a virtual machine running on a group of shared servers that the customer will have no direct access to."
Cloud computing is the ability to provision, run, and maintain an on-demand computer system, including its servers, infrastructure, databases, and other applications. These usually require no or minimal day-to-day management of the underlying servers or network and allow the customer to focus on their applications. You can create a new database from scratch in minutes with no previous database knowledge. There are options available for utilizing a subscription model so that, in exchange for a monthly or yearly fee, the chosen cloud provider maintains servers, networks, and software for you. There are options for a dedicated private cloud that allows customers to use the platform completely, with no shared resources, and allows for additional customization, backup controls, and upgrades. With a shared cloud, multiple tenants share the underlying servers, but with strict controls over security and the privacy of data. As such, far fewer customizations can be made, but the costs are typically lower.
There are five main areas you will need to be aware of for the exam:
Scalability: How can a database grow and shrink to handle the load expected of it?Costs: How do the costs of running a database in the cloud differ from an on-premises database?Security and Access: What security considerations need to be made when you're using a cloud database and how do you access it?Compliance: How can you stay compliant with any legal obligations when you're using a cloud database? Is this different for an on-premises database?Performance and Reliability: How does a cloud database maintain reliability compared to an on-premises database?In this section, you will learn how to do the following:
Describe the key differences between an on-premises database and the cloud.Explain the terms scalability, security, reliability, and compliance from a database perspective.Describe the key benefits of a cloud database and the key benefits of an on-premises database.Let's begin by looking at our first topic – scalability – and learning how to describe the different methods that are used by cloud databases compared to on-premises databases.
The ability to scale up and add resources quickly and easily as your requirements change is one of the biggest advantages of a cloud-based solution.
A cloud-based database would be able to add and remove resources such as CPU or memory rapidly, even programmatically to react to the changing requirements of the databases. On-premises, this would be extremely difficult, if not impossible, to achieve. The cloud also offers much faster addition of storage, so there is less of a requirement to plan growth patterns over long periods as you can simply attach further disk space as required. Again, this can even be set to grow automatically at certain usage thresholds. In contrast, on-premises, you often need to provide growth metrics for several years to ensure the database has sufficient disk space to allow it to operate for a long time, which can be inefficient as well as expensive.
The key differences between cloud databases and on-premises for scalability are as follows:
Cloud databases can scale both up and down rapidly, allowing for changing requirements.Cloud databases do not need to be equipped with any greater storage than is required at the current time as more can easily be added.On-premises databases can share a server more easily to optimize resources.Next, we will learn about how the costs differ between the cloud and on-premises.
One of the main differences between cloud databases is cost management. The cloud is not always a cheaper option, but the costs are easier to recognize and account for compared to on-premises. For example, when you're running a database on an on-premises server, the true cost of that system includes things such as the running costs for the data center, which can be hard to evaluate. On the cloud, all of those costs are included in the price that's paid. Costs are also often lower on the cloud as a result of the scalability provided. As we mentioned previously, on-premises databases are often over-resourced at the start of their life cycle as it can be difficult and time-consuming to change them later on. On the cloud, you would more accurately resource the database for its current workload and performance requirements with the knowledge you can easily change this later on, thereby saving money on both resources and, potentially, database licensing. The cloud also offers a pay-as-you-go model, where you can provision resources only when you need them.
The key differences between cloud databases and on-premises for costs are as follows:
Cloud databases have clearly defined costs, including day-to-day running costs, which can be hidden when you're using on-premises databases.On-premises databases have typically fixed costs, which can make financial planning easier and more predictable as opposed to cloud databases, which can change from month to month.Now, let's look at the differences between on-premises and the cloud in terms of security and access.
One of the biggest areas of concern around migrating to the cloud is security. When moving to a cloud-based database, you need to consider whether the security and data separation you are being offered meets the needs of the data you will be storing. Cloud systems are made up of a mesh of servers and as a result, you will be sharing an underlying server with other databases. There are strict controls in place to ensure that your section of the server cannot be affected or accessed by another party, but for some, this concern is too high. Security in AWS follows a shared responsibility model where AWS takes responsibility for the cloud but the customer takes responsibility for the data in the cloud.
Given that you are using shared resources, the access you have to the server itself is also restricted. For example, you cannot have full root access to any of the managed services offered and you cannot have access to any virtual machine management tools within the cloud. This lack of control and inability to change certain settings on your servers or databases can cause compatibility issues, as well as further security concerns. For fully managed databases, you have no access to the servers running the database; all access must be through the database itself. This can cause significant changes to how you maintain and run the databases.
To date, there have been no breaches on any of the major cloud providers where a customer's data has been exposed due to a cloud provider's setup; in other words, there have been no data security breaches where the cloud provider was at fault, but there have been many breaches where the customer was. One of the largest breaches to date was Capital One (a large credit card company). who had a database containing the financial data of over 106 million customers fraudulently accessed by hackers. The root cause of the hack was a misconfigured firewall running in the cloud.
The key differences between cloud databases and on-premises for security and access are as follows:
On-premises databases give you full control over the servers and network, allowing you to ensure your security standards are met.Cloud databases rely on the cloud provider maintaining their security standards on the servers and network, which can free the customer to focus on the database security solely and can simplify deployments.Cloud databases can be encrypted by default and can use a password and key management service to store database credentials securely without exposing them in plain text.Now that we've learned about security and access, let's look at the next key area where there are big differences between on-premises and the cloud: compliance.
There are regulatory controls that most companies need to abide by, whether they are legal requirements such as General Data Protection Regulations (GDPR) or industry-specific requirements such as Health Insurance Portability and Accountability Act (HIPAA) and Provisional FedRAMP. To meet these government and industry regulations, it is imperative that companies remain compliant and can demonstrate to an auditor that all their databases meet these rules. This can easily be achieved if all the data is maintained in-house, where you have full control over the data, servers, and network. However, with a cloud database, where you cannot show complete control of all the servers and networks, this can be more challenging.
If you decide to deploy a cloud-based solution, you must ensure that the cloud provider meets the requirements of the regulatory body. This information is publicly available for all the major cloud providers so that you can check and ensure the solution meets the requirements. Given the growth of cloud-based applications and databases, the majority of recognized compliance regulations now accept cloud-based solutions as compliant, so long as specific rules on deployment and configuration are followed. So, the issue of compliance for cloud-based databases is diminishing.
The key differences between cloud databases and on-premises for compliance are as follows:
On-premises databases are fully under the customer's control, so you, as the customer, can ensure they meet all the required compliance standards.Cloud databases now meet most common regulation and compliance standards by default, allowing the customer to focus on their database rather than the compliance of the server and the network.Now, let's learn about how performance and reliability compare between on-premises databases and cloud databases.
The reliability and availability of your data are critical. With on-premises, while you have full control over the servers and the storage, you are likely going to be limited when it comes to how many copies of the data you can keep or how many spare servers you have available in case one stops working. Cloud providers will offer an almost unlimited number of servers for your application and database, so if one were to fail, it would be moved automatically, almost immediately reducing any downtime. You can also ensure your data is saved in multiple locations within the cloud, again reducing the likelihood of a catastrophic data loss occurring to almost zero, even if a cloud data center were to go offline.
With on-premises systems, you can decide what maintenance windows you have. You can also decide if you patch or upgrade your systems and when that is done. On the cloud, these are determined for you, and you cannot always opt out of upgrades once your system version is no longer supported.
So, on the cloud, you will find that you have greater reliability in terms of your database and likely faster recovery times if you do suffer an outage, but that you have less control over patching or maintenance windows as a consequence.
The key differences between cloud databases and on-premises for performance and reliability are as follows:
On-premises databases give you full control over maintenance windows and patching so that you can fully control any planned outages.Cloud databases can recover very quickly from any failures, thus reducing any unplanned downtime.Cloud databases offer multiple locations to save your data, minimizing the risk of any data loss, which, in turn, can reduce RPO and RTO and improve your business continuity planning.With that, we've learned about the key differences between on-premises databases and cloud databases, which means you can now explain the benefits between them in terms of the following:
ScalabilityCostsSecurity and accessCompliancePerformance and reliabilityNow, let's look at the different types of databases that cloud providers typically support, starting with SQL databases versus NoSQL.
One of the largest decisions to make when planning a new database deployment is whether to use a Structured Query Language (SQL) or Not only SQL (NoSQL) database. These two types of databases differ greatly and making the wrong choice can compromise the performance and the ability of your application to function.
First, let's discuss the key features of both database types before doing a deep comparison of both so that you can decide between them.
SQL databases are designed to excel in storing structured data. They can carry out complex querying and they commonly store the minimum data possible by reducing any duplication of the data in a table in a process known as normalization. Normalized data means that accessing it often requires complex joins of different tables.
Normalized data would look similar to this:
Figure 2.1 – RDBMS table structure
These tables only contain the specific columns that apply to them, so users only contains columns about the user and nothing about movies or ratings. The movies table is similar and only contains data about movies. The other two tables (tags and ratings) contain pointers back to the users and movies tables. If you wanted to get the first_name parameter of a user who gave a 5-star rating to a specific movie, you would have to join the three tables to retrieve this information as it is not stored in one place. For large datasets, you can see that the performance would be poor as the database has to not only obtain the data from three different locations but also combine that data before passing the results back to the user. This technique optimizes storage at the cost of performance on larger databases.
SQL databases comply with atmoicity, consistency, isolation, and durability (ACID) guarantees and are a good choice for transactional data, where data loss must be minimized and accuracy must be maintained. SQL databases are typically based on a single-node design, where adding additional nodes to scale horizontally is complex and expensive.
SQL databases sacrifice the speed and performance of large datasets in favor of consistency and durability. As a result, it is common to see performance issues and slowdowns when you're working with data in the range of millions of records.
There are two main types of SQL database that we will learn about in more detail later in this chapter:
Row-oriented (relational or online transaction processing (OLTP))Column-orientated (analytical or online analytic processing (OLAP))These two types of SQL databases are shown in the following diagram:
Figure 2.2 – Relational database versus an analytical or columnar database
Now, let's look at NoSQL databases to see how they differ from SQL databases.
NoSQL databases are designed for storing semi-structured or non-structured data as they don't enforce a concrete schema for tables. You can add data attributes when needed without changing the structure of the entire table. Since no particular structure is enforced by the database, they are not good at join queries. NoSQL databases support data to be stored in a similar format in which it will be most commonly accessed. The database structure is often defined by looking at the application code that will be used to access it and mimicking the same structure.
NoSQL databases can scale horizontally with ease, and they are designed to handle partitioning and sharding. NoSQL databases are commonly used when you need extremely fast response times for large amounts of data. NoSQL databases achieve this speed by compromising consistency and referential integrity with most NoSQL databases using an eventual consistency model. What this means is that there is a greater risk of data loss or inconsistent results being returned using a NoSQL database.
There are multiple types of NoSQL databases, each of which is designed for a different use case:
Key-value databases Document databasesColumn-oriented and analytics databasesGraph databasesThe following diagram provides an overview of the different types of NoSQL databases:
Figure 2.3 – The four main NoSQL database types
Now that we've learned about the two main database models – NoSQL and SQL – let's take a closer look at the different types within those categories, starting with relational database management systems.
SQL or RDBMS databases have two main types, which describe the way data is stored on disk:
Row-orientatedColumn-orientatedThe different methods of storing the data and how it is arranged will offer very different performance patterns (that is, fast at some things but slow at others), and knowing about the right type to use can greatly improve the performance of your application. While both database types may appear very similar on the surface, they are quite different under the hood.
In the exam, there may be a question around a customer use case and asking which database would be the best fit.
First, let's look at row-orientated, which is the more common database system.
In a row-orientated database, the data is stored in tables in normalized form (we discussed this in the SQL databases section) with links or keys between them.
Row-orientated databases store the data in continuous blocks, with each row following on from the next. Let's look at an example table:
Figure 2.4 – A table for storing user data
Now, let's look at how this would be stored on disk:
Figure 2.5 – Table rows stored in a line
As you can see, the data is stored in complete rows, one after another. If we were to add a new row to this table, it would be very fast for the database to do so as it simply finds the last row on the disk and adds the new data. Similarly, for reads, where you want to return the majority of the columns from each row, this will be returned efficiently. However, problems could occur if you wanted to only return one column from a table but for a large number of rows. For example, let's say you wanted to calculate the average age of all users. Here, you'd need to read a large number of blocks of data as the information you need is stored scattered across many areas of the disk. The following diagram shows that you would need to read from three different areas of the disk for just one query:
Figure 2.6 – Disk reads for a single column query
So, to summarize, we know the following about row-orientated databases:
They are quick at writing data.They are efficient at retrieving data when you need the majority of the row data to be returned.They are inefficient when you want single columns.Due to this behavior, a row-orientated database is commonly used for applications where there is a similar number of reads and writes and where you will use a lot of the data in each row rather than just a single column. These databases are often called online transactional processing databases or OLTP databases.
For a use case where you will be working with mostly single columns of data, you can look at column orientated or online analytic processing (OLAP) systems.
Just like the row-orientated databases, these databases use tables that are often normalized and have links or keys between them. The difference is how the data is stored on disk.
Let's consider the same table that was shown in the preceding section – that is, Figure 2.4. Here, we can see the same data being stored on disk but in a column-oriented database:
Figure 2.7 – Table rows stored in a line
As you can see, this time, the data is in column groups, with each entry from each column being placed together rather than each row. If we were to add a new row to our table, it would become much more complex. Now, instead of finding the end of the last row and adding the new data there, our database needs to either find a gap on the disk in the same area as the other columns, or it needs to move everything around to make it fit. This is going to slow down adding new data considerably. If you wanted to read most of the columns from the table, you are now going to have to jump to multiple areas of the disk – a different one for each column – which, again, can cause performance problems.
However, column-orientated databases are very efficient at returning aggregate data from one or two columns, such as getting the average age of our users. Now, it just needs to go to one area of the disk and retrieve that. The following diagram shows how quick and efficient a column-orientated database is at retrieving single columns:
Figure 2.8 – Disk reads for a single column query
To summarize, we know the following about column-orientated databases:
They are slow at writing data.They are inefficient at retrieving data when you need the majority of the row data to be returned.