29,99 €
This book will help you master Amazon DynamoDB, the fully managed, serverless, NoSQL database service designed for high performance at any scale. Authored by Aman Dhingra, senior DynamoDB specialist solutions architect at AWS, and Mike Mackay, former senior NoSQL specialist solutions architect at AWS, this guide draws on their expertise to equip you with the knowledge and skills needed to harness DynamoDB's full potential.
This book not only introduces you to DynamoDB's core features and real-world applications, but also provides in-depth guidance on transitioning from traditional relational databases to the NoSQL world. You'll learn essential data modeling techniques, such as vertical partitioning, and explore the nuances of DynamoDB's indexing capabilities, capacity modes, and consistency models. The chapters also help you gain a solid understanding of advanced topics such as enhanced analytical patterns, implementing caching with DynamoDB Accelerator (DAX), and integrating DynamoDB with other AWS services to optimize your data strategies.
By the end of this book, you’ll be able to design, build, and deliver low-latency, high-throughput DynamoDB solutions, driving new levels of efficiency and performance for your applications.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 620
Veröffentlichungsjahr: 2024
Explore enterprise-ready, serverless NoSQL with predictable, scalable performance
Aman Dhingra
Mike Mackay
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Apeksha Shetty
Publishing Product Manager: Nilesh Kowadkar
Book Project Manager: Hemangi Lotlikar
Senior Editor: Tazeen Shaikh
Technical Editor: Seemanjay Ameriya
Copy Editor: Safis Editing
Proofreader: Tazeen Shaikh
Indexer: Hemangini Bari
Production Designer: Jyoti Kadam
DevRel Marketing Coordinator: Nivedita Singh
First published: August 2024
Production reference: 1160824
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-80324-689-5
www.packtpub.com
Every word in this book was written with Mike in mind. I am honored to contribute to Mike’s legacy as a co-author. This work is dedicated to Oliver and the entire Mackay family.
Many thanks to my DynamoDB colleagues at AWS—engineering, product, and Worldwide Specialists alike. Some are friends first and colleagues later. I am grateful to those who have contributed to this remarkable technology. My fascination with distributed systems fuels my passion for each new DynamoDB project, and I am thrilled to contribute to its evolution!
Finally, a huge thank you to my family. Balancing the day job with authoring this book left little time for family and friends. I am deeply grateful to Divya, Dad, Mum, Cheets, and Janki for their support.
- Aman Dhingra
Aman Dhingra is a Senior DynamoDB specialist solutions architect at AWS, where he assists organizations in leveraging AWS to its fullest potential. He focuses on designing cost-efficient solutions and maximizing the agility and elasticity offered by the cloud and AWS. With a specialization in Amazon DynamoDB and NoSQL databases, Aman is also well-versed in AWS’ big data suite of technologies, many of which are tightly integrated with open source projects. Aman is Dublin, Ireland-based.
You can connect with the author on LinkedIn at amdhing.
Mike Mackay was a senior NoSQL specialist solutions architect at AWS, known as one of the early AWS architects in Europe specializing in NoSQL technologies. Before AWS, he was technical director at Digital Annexe / Your Favourite Story and head of development for Outside Line.
Mike’s freelance work includes collaborations with Warner Brothers (UK), Heineken Music, and Alicia Keys. He has also written PHP tutorials for .net magazine and Linux Format. His projects with Outside Line featured prominent clients such as The Hoosiers, Oasis, and Paul McCartney.
Outside of work, Mike loved to chat about F1 and indulge in DJ. Mike was London/Essex-based.
Check out Mike's LinkedIn at mikemackay82.
Saumil Hukerikar is a distinguished software engineer with over 12 years of experience. Holding a Master’s degree in computer science, Saumil specializes in backend distributed systems and database technologies, with a particular interest in NoSQL databases. As a former DynamoDB engineer, he played a pivotal role in developing its Global Admission Control system. This expertise ensures his reviews are both insightful and analytical. Beyond his professional achievements, Saumil is dedicated to giving back to the tech community. He actively participates in book reviews, serves as a judge in local and international competitions, and mentors the next generation of software engineers, sharing his knowledge and passion for technology.
Brandon Tarr is a seasoned IT professional with a diverse background encompassing infrastructure, software development, cloud technologies, and multi-SaaS product solutions. Currently, he specializes in architecting collaborative and SaaS solutions, with a deep focus on platforms like Microsoft 365, Atlassian, and Miro, and a particular emphasis on Slack Enterprise Grid. Brandon excels in translating business needs into technical solutions, driving digital transformation initiatives, and crafting strategic roadmaps. He is passionate about fostering collaboration and innovation within organizations.
I'd like to thank my wife, best friend, and love of my life, Samantha. Also, my twin girls, who mean the world to me. Lastly, I'm grateful to everyone who has given me the opportunity to work with them. It's always an honor.
This part introduces Amazon DynamoDB, highlighting its core features and real-world applications. It guides you through the AWS Management Console and SDKs, which are essential tools for interacting with DynamoDB. Additionally, the NoSQL Workbench is introduced, offering a user-friendly interface for modeling, visualizing, and querying data. These chapters lay a solid foundation for understanding and using DynamoDB effectively.
Part 1 has the following chapters:
Chapter 1, Amazon DynamoDB in ActionChapter 2, The AWS Management Console and SDKsChapter 3, NoSQL Workbench for DynamoDBAmazon DynamoDB is a fully-managed NoSQL database service from Amazon Web Services (AWS). It offers single-digit millisecond performance and virtually unlimited data storage, and it can easily be tuned to support any throughput required for your workload, at any scale. Since its launch in January 2012, DynamoDB has come a remarkably long way and consistently evolves to support additional features, enabling more businesses and enterprises to leverage the scale of this NoSQL offering.
Welcome to the first chapter of Amazon DynamoDB – The Definitive Guide, where Mike and I, Aman, dive into all things NoSQL, focusing on DynamoDB. We’ll explore its backstory, how it can supercharge your applications, and how it frees you from the hassles of managing clusters and instances. Learning NoSQL and DynamoDB isn’t too tough, but our aim with this book is to share everything we’ve learned to make your journey smoother. That’s why we’ve titled it The Definitive Guide – it’s your go-to resource for mastering DynamoDB.
The book is for those of you who are in roles such as software architects, developers, engineering managers, data engineers, product owners, or traditional database administrators. The book itself will provide you with all the skills and knowledge needed to make the most of DynamoDB for your application. This means benefiting from predictable performance at scale, while also being highly available, fault-tolerant, and requiring minimal ongoing management. I could continue listing the service’s benefits and why you should choose it for your applications, but it might be more valuable if you read through examples and success stories of other customers later in this chapter. Before we get started, let’s learn about Mike’s relationship with DynamoDB, in his own words.
My (Mike) journey into the world of DynamoDB has been a tale of two paths. The first path was to overall enlightenment, which was not as simple as I thought it might be – and that is what in part has led to this book. I wanted to share my knowledge, the way I approached data modeling, and how I had to unlearn some core concepts from the relational database world that then gave me that lightbulb moment when working with DynamoDB.
The second path was knowing where DynamoDB fits in my toolset, how to take advantage of the astonishing number of features that come baked into the service, and lastly, knowing when not to try and make workloads fit when there are more practical technologies out there. While DynamoDB can work for the majority of your database workloads, we need to be realistic and understand what it is not suited for, and that is where being armed with the right knowledge lets us make better, informed decisions about our database choices.
In this chapter, we will dive into DynamoDB’s position in today’s database market and trace its evolution in time. Subsequently, we will explore real-world use cases showcasing the impressive scale and volume regularly achieved by DynamoDB, demonstrating its battle-hardened capabilities beyond routine customer traffic.
We will then evaluate which workloads DynamoDB is suited for and how it can easily become one of the most used tools in your set of database technologies once you qualify DynamoDB as the right fit for the use case. There are many additional database technologies that do their own job extremely well, so knowing when to offload duties to the right technology is key in our decision making.
By the end of this chapter, you will have enough background knowledge and plenty of excitement to start setting up a working environment and looking at some useful tools we can leverage in preparation for getting hands-on with DynamoDB.
In this chapter, we will cover the following topics:
NoSQL and DynamoDB in the current database marketDynamoDB case studiesWorkloads not suited for DynamoDBLet’s take a look at how DynamoDB came to be – where it sits in the current NoSQL market, and how NoSQL compares to relational databases. Here, we cover a couple of core differences, both functional and technical, and learn of the very beginnings of the DynamoDB service.
My IT career started more than 20 years ago, in the early 2000s, when relational databases were the default choice of any application that required a data persistence layer. I had built up a fairly solid knowledge of leveraging MySQL in my builds and applications, as was the de facto for many similar developers and open source enthusiasts at the time. Like many before (and after) me, I had been at the mercy of maxing out CPU and connection limits of single-instance database servers, having headaches over performance, trying to manage uptime and reliability, and, of course, ensuring everything stays as secure as possible.
I remember sitting in the keynote session at the AWS London Summit at ExCeL, an exhibition center. An AWS customer was on stage talking about their use of DynamoDB. They were in the ad-space sector and bidding on real-time advertising impressions. Speed was crucial to their success (as is the case for many other industry verticals), and their database technology of choice was DynamoDB. I recall a slide where they discussed that their peak request per second (RPS) rate was frequently hitting 1 million RPS and that DynamoDB was easily handling this (and more when having to burst above that rate). I sat there thinking, “If there’s a database technology that lets me do 1M RPS without breaking a sweat or spending hours (if not days) having to configure it for that level, I want to be a part of it.” It was from there that I became engrossed in DynamoDB, and quite frankly, I have not looked back since.
Over the past few years, NoSQL database technologies have gained more and more in popularity and the increasing ease of use and low cost have attributed to this rise. With that, the relational database management system (RDBMS) is certainly not going anywhere anytime soon. There is, and always will be, a market for both NoSQL and RDBMSs. A key differentiator between the two is that NoSQL tables are typically designed for specific, well-known, and defined access patterns and online transaction processing (OLTP) traffic, whereas an RDBMS allows for ad-hoc queries and/or online analytical processing (OLAP) traffic.
Let’s take a moment to step back. Relational databases have a rich history spanning many decades, excelling in their functionality. Whether open source or commercial, RDBMSs are widely used globally, with the choice of vendor often hinging on the application’s complexity. While there are numerous distinctions between relational database offerings and NoSQL alternatives, we’ll focus on one aspect here. In relational database design, a fundamental principle is normalization—a practice that involves organizing data and establishing relationships between tables to enhance data flexibility, ensure protection, and eliminate redundant data. This normalization is expressed in various normal forms (e.g., First Normal Form (1NF) to Fifth Normal Form (5NF)), typically achieving Third Normal Form (3NF) in most cases. In contrast, NoSQL embraces data duplication to facilitate storage and presentation in a denormalized and read-optimized fashion.
So why is normalization so crucial to relational databases? If we look back over the course of the relational history, we can note that storage cost and disk size/availability were significant limiting factors when using a relational database technology. Reducing the duplication of data in the database results in more storage available on the attached disk. More work was offloaded to the CPU to join data together via relationships and help with the overall integrity of the data. With today’s technological advances, storage costs are fundamentally cheaper, and storage space is almost limitless so we can take advantage of these changes to help optimize the load on the CPU (which is attributed to the most expensive part of the database layer) to retrieve our data. While these advances haven’t predominantly changed the way relational databases work, this is a core value proposition of most NoSQL technologies.
Additionally, one of the primary issues that NoSQL database technologies aim to address is that of extreme scale. What does extreme scale mean in this case? Over time and with the evolution of cloud services and the global nature of the internet, traffic to applications continues to grow exponentially, as do the requests to any underlying databases. While maintaining capacity and availability at this scale, databases need to remain highly performant and deliver results in ever-demanding low-latency environments – milliseconds and microseconds are considered normal, with the latter being often sought after. This can be achieved by horizontally scaling the data (and therefore the requests) across multiple storage devices, or nodes—a task that some relational databases can do but aren’t necessarily performant at, at scale. To scale a relational database, you would typically scale vertically by upgrading the hardware on the server itself, although this has its limits.
Including high-level differences between RDBMS and NoSQL databases, the following table can be formulated:
Characteristic
RDBMS
NoSQL
Data Organization
Encourage normalization
Encourage de-normalization
Scalability
Typically scale vertically; has limitations
Typically scale horizontally; can be extremely scalable
Schema Support
Strict schema, enforced on read
Typically schema-less, flexible
Integrity Constraints
Enforced through relationships and constraints
Depends on the database; not enforced for DynamoDB
Storage/Compute Optimized
Optimized for storage
Optimized for compute
Read Consistency Support
Strong consistency by design; eventual consistency on read replicas
Depends on the database; strong and eventual supported by DynamoDB
Atomicity, Consistency, Isolation, and Durability (ACID)
Most RDBMSs are designed for ACID
Depends on the database; limited ACID transactions supported for DynamoDB
Query Language
Typically SQL
Depends on the database; proprietary for DynamoDB
Examples
MySQL, Oracle, and Postgres
DynamoDB, MongoDB, Neo4j, and Redis
Table 1.1 – RDBMS and NoSQL high-level comparison
It is important to realize that NoSQL is a bigger umbrella of different database technologies that differ from each other in terms of the data structures they support, their query languages, and the nature of the stored data itself. NoSQL broadly covers the following categories:
Key-valueDocumentWide columnGraphDynamoDB is of the key-value kind, as we have established already, but other categories may support vastly different data models and have their own query languages.
Without digging into all of those, let us next understand how DynamoDB came about.
DynamoDB was simply born out of frustration and a need to solve problems that were directly impacting Amazon’s customers. Toward the end of 2004, the Amazon.com retail platform suffered from several outages, attributed to database scalability, availability, and performance challenges with their relational database (1).
A group of databases and distributed systems experts at Amazon wanted something better. A database that could support their business needs for the long term that could scale as they needed, perform consistently, and offer the high availability that the retail platform demanded. A review of their current database usage revealed that ~70% of the operations they performed were of the key-value kind (2), and they were often not using the relational capabilities offered by their current database. Each of these operations only dealt with a single row at any given time.
Moreover, about 20% of the operations returned multiple rows but all from the same table (2). Not utilizing core querying capabilities of relational database systems such as expensive runtime JOINs in the early 2000s was a testament to the engineering efforts at Amazon. Armed with this insight, a team assembled internally and designed a horizontally scalable distributed database that banked on the simple key-value access requirements of typically smaller-sized rows (less than 1 MB), known back then as Dynamo.
The early internal results of Dynamo were extremely promising. Over a year of running and operating their platform backed by this new database system, they were able to handle huge peaks including holiday seasons much more effortlessly. This included a shopping cart service that served tens of millions of requests and resulted in about three million checkouts in a single day. This also involved managing the session state of hundreds of thousands of concurrent active users shopping for themselves and their nearest and dearest ones. Most of this was using the simple key-value access pattern.
This success led the team to go on to write Amazon’s Dynamo whitepaper (3), which was shared at the 2007 ACM Symposium on Operating Systems Principles (SOSP) conference so industry colleagues and peers could benefit from it. This paper, in turn, helped spark the category and development of distributed database technologies, commonly known as NoSQL. It was so influential that it inspired several NoSQL database technologies, most notably Apache Cassandra (4).
As with many services developed internally by Amazon, and with AWS being a continually growing and evolving business, the team soon realized that AWS customers might find Dynamo as useful and supportive as they had. Therefore, they furthered the design requirements to ensure it was easy to manage and operate, which is a key requirement for the mass adoption of almost any technology. In January 2012, the team launched Amazon DynamoDB (5), the cloud-based, fully-managed NoSQL database offering designed from launch to support extreme scale.
In terms of the general key-value database landscape, where does DynamoDB sit? According to the current (at the time of writing) DB-Engines Ranking (6A) list, Amazon’s DynamoDB is the second most popular key-value database behind Redis (7). Over the years, it has steadily and consistently increased in use and popularity and held its place in the chart firmly (6B).
The rest, as they say, is history.
Important note
We’ve only covered a very small fraction of NoSQL’s history in a nutshell. For a greater explanation of the rise and popularity of the NoSQL movement, I recommend reading Getting Started with NoSQL by Gaurav Vaish (8).
We’ve learned about the advantages that NoSQL and DynamoDB offer, but what does that mean in terms of usage, and how are people utilizing DynamoDB’s features? In the next section, we’ll examine some high-profile customer use cases that show just how powerful DynamoDB can be.
DynamoDB is currently used by more than a million customers globally (9). These range from small applications performing a few requests here and there, to enterprise-grade systems that continually require millions of requests per day, and all these and more depend on DynamoDB to power mission-critical services.
There is an incredibly high chance that you have indirectly been a DynamoDB user if you have spent any time on the internet. From retail to media to motorsports and hospitality, DynamoDB powers a sizable number of online applications. One of the best ways to find out about the incredible workloads and systems that not only DynamoDB but also AWS helps to power is to hear it through the voice of their customers.
What’s key to think about overall is that while these case studies shine a light on some of the large-scale DynamoDB solutions in use today, they may not completely represent what you or I want or need to use DynamoDB for right now. Importantly, the takeaway here is that you and I and every developer, start-up, or enterprise have the same power at our fingertips with DynamoDB as the following case studies. Every feature these customers have access to within the service, you do too, and that’s where the power of not only DynamoDB but also AWS comes in.
The upcoming case studies illustrate how DynamoDB users adopted the service either to overcome specific challenges or to leverage their trust in the platform based on prior positive experiences with the technology.
As we briefly covered in the previous section, Amazon.com, the retail platform, heavily relied on its existing relational databases but was struggling to scale its system to keep pace with the rapid growth and demand of the platform. The Amazon Herd team decided to move over to DynamoDB (10).
Migrating from a relational database to a NoSQL offering takes time, planning, and understanding of the workload that is going to be migrated. The internal teams (as we’ve highlighted) were able to identify that ~70% of the current database queries were that of the key-value type (the lookup was performed by a single primary key with which a single row was returned), which aligns well with NoSQL.
Once fully implemented, one of the many benefits the Herd team was able to make use of was being able to reduce their planning and the time required to scale the system for large events by 90%. This allowed the team to spend more time in other areas, such as creating new value-adding features for the Amazon retail platform.
On the back of this success, DynamoDB powers many of the mission-critical systems used within Amazon (including multiple high-traffic Amazon properties and systems). To further exhibit just how powerful and instrumental the move to DynamoDB was, and over the period of the multiple-day Amazon Prime Day event in 2023, in an official post by AWS (11), the following is stated:
“DynamoDB powers multiple high-traffic Amazon properties and systems including Alexa, the Amazon.com sites, and all Amazon fulfilment centers. Over the course of Prime Day, these sources made trillions of calls to the DynamoDB API. DynamoDB maintained high availability while delivering single-digit millisecond responses and peaking at 126 million requests per second.”
– Jeff Barr, Chief Evangelist for Amazon Web Services
Let’s just think about that for a second – 126 million requests per second?! For any service to be able to deliver this level of throughput is an incredible feat. What’s more, this was delivered on top of all other AWS customers’ workloads across the globe, without breaking a sweat. Whatever you can throw at DynamoDB, the service can handle it in its stride. To this day (and every year when I read the latest Amazon Prime Day stats), I’m in awe of the tremendous power that DynamoDB offers. This is one of the many reasons why so many customers choose DynamoDB to power their workloads.
What is Amazon Herd?
Amazon Herd is an internal orchestration system that helps to enable over 1,300 workflows within the Amazon retail system. Herd is used to manage services ranging from order processing and certain fulfillment center activities as well as powering parts that help Amazon Alexa to function in the cloud.
In April 2021, the Walt Disney Company outlined in a press release (12) how they are using AWS to support their global expansion of Disney+, one of the largest online streaming video services in the world. Disney+ scales part of its feature set globally on DynamoDB.
If, like me, you have used Disney+ and have added a video to your watchlist, or perhaps you have started watching any of their video content, paused it, and have come back later, or even watched the rest of it on a different device, these features have data that persisted under the hood with DynamoDB. Globally, these events (and more) amount to billions of customer actions each day.
Disney+ takes advantage of AWS’ global footprint to deploy its services in multiple regions (13), and with DynamoDB, they are making use of a feature called Global Tables (14) (see Chapter 13, Global Tables). In a nutshell, Global Tables provides a fully-managed solution for multi-region, multi-active tables within DynamoDB. You simply enable the feature and decide which regions you want to be “active” in, and DynamoDB takes care of regional provisioning, any data back-filling, and then all ongoing replication for you. It’s incredibly powerful and requires no maintenance from you. It’s just one of the mechanisms in place that we can refer to when thinking about DynamoDB’s “click-button” scaling.
With Global Tables in place, many of the Disney+ APIs are configured to read and write content within the same region where the data is located, reducing overall latencies to and from the database. Global Tables also allows their services to failover geographically from one entire region to another, from both a technical failure perspective as well as being able to perform maintenance in a given region without incurring user downtime or affecting the availability and usability of the overall Disney+ service.
I highly recommend watching an AWS re:Invent video titled How Disney+ scales globally on Amazon DynamoDB (15). Next, let us review the characteristics of workloads that may not be suited for DynamoDB.
Let’s quickly cover a couple of areas that aren’t as well supported with DynamoDB, such as search and analytics. While not technically impossible to implement, often, the requirements are strong enough to warrant a dedicated technology to which to offload these tasks. Just because we are using a NoSQL database, it doesn’t mean that we should shoehorn every workload into it. As I called out in the opening paragraph, it’s better to know when not to use a technology than forcing it to work and ending up with an overly and unnecessarily complex data model because of it.
While DynamoDB does have a very basic ASCII search operator available, this is only available on a sort key attribute or by using filtering on additional attributes. It can only perform left-to-right or exact string matching and does not offer any level of fuzzy matching or in-depth or regular expression matching. Unless your string matching operates at this level, you will need to utilize a dedicated technology for this. An approach often seen, and well documented, is to support search queries with Amazon OpenSearch Service (16, 17A, 17B), or similar.
Unlike some other NoSQL technologies, such as MongoDB (18), DynamoDB does not have any kind of aggregation framework built into it. This means you cannot easily perform analytical queries across results or datasets within your table. If you need to generate sums of data (for dashboarding, for example), this must be done within your application and the results written back into DynamoDB for later retrieval. Although an advanced design pattern using DynamoDB Streams and AWS Lambda could be leveraged to perform some kinds of aggregations, on a holistic level, the database engine itself does not support native operators for aggregation queries.
There is a misconception with NoSQL that once you have created your model, you cannot change it or work with newer and evolving access patterns – that’s untrue. It’s always worth pointing out that with many production applications, it is common for these access patterns to change organically over time. There could be several reasons for this: a change in business model, or a new team or service is introduced to the company that has an alternative access pattern that needs fulfilling. With DynamoDB, you are not strictly “stuck” with your original data model design. The DynamoDB team has got you covered with support for additional indices on your table by using what’s called a Global Secondary Index (GSI) (19).
We will cover data modeling and design in a lot more depth over the coming chapters, giving plenty of opportunity to explore how we build data models and accommodate changes with GSIs – don’t worry if this does not make sense right now, just know that you can support additional access patterns on the same table, without necessarily having to rebuild and restructure the entire table itself.
In this chapter, we learned about the beginnings of the service from Dynamo through to DynamoDB, and how some of the biggest companies in the world are leveraging DynamoDB to power their mission-critical workloads. We covered where DynamoDB sits within the NoSQL market, and briefly recapped how NoSQL technologies fit within the overall database landscape.
We read about some incredible examples of how well DynamoDB performs and operates at scale, and I encourage you to explore further case studies available on the DynamoDB product page (20).
We touched on the fact that the workloads best suited to DynamoDB are those that have well-known and defined access patterns and workloads that are predominantly, although not strictly limited to, OLTP. Search and analytical workloads typically aren’t well supported, so knowing when to offload this need is important.
In the next chapter, we will explore how we interact with DynamoDB in order for us to prepare a working environment (either locally or remotely in the cloud) so we can start data modeling and working with some of the incredible features that DynamoDB has to offer.
Before we begin designing, building, and working with DynamoDB, we should understand what tools and options are available for us to interact and work with. Unlike some NoSQL databases that you connect directly to the primary server or host with (Redis or MongoDB, for example), DynamoDB is different—we interact with it through a series of stateless application programming interfaces (APIs).
The underlying service of those APIs corresponds to fleets of request routers and storage nodes, all of which take care of sending our requests to the correct storage nodes, managing any authentication and authorization alongside interfacing with any additional service features, such as point-in-time recovery (PITR) and read/write capacity modes.
Having most tasks abstracted away into an API means we can focus more on our application and data modeling. While, at first, querying a database through API calls may seem strange, it can really help with the simplification of performing queries and ensuring those queries are well-formed and validated before being sent to the database.
In this chapter, we will start off by taking a look through the AWS Management Console to familiarize ourselves with the layout and options that we have to work with in DynamoDB without needing to know any programming languages or install any software to get up and running. In fact, we can get fully started for free, as AWS offers a certain amount of use of DynamoDB under the AWS Free Tier.
We will then look at the AWS software development kits (SDKs) available that enable us to work directly with the DynamoDB APIs. Finally, we’ll look at how to run our code natively in the cloud with AWS Lambda, rounding off with how we can make use of DynamoDB local to develop our application offline.
By the end of this chapter, you will understand how to access and work with DynamoDB to suit your preferred working environment and development needs.
In this chapter, we’re going to cover the following main topics:
Working with the AWS Management ConsoleNavigating and working with itemsThe AWS SDKsUsing AWS Lambda and installing DynamoDB localTo get started with DynamoDB in the AWS Management Console, the first thing you will need is an AWS account. If you don’t already have one, you can sign up for free and use a set number of services at no cost under the AWS Free Tier. There are three ways that the AWS Free Tier works—some services are offered as a free trial for a short term, others are free to use for 12 months, and lastly, some remain free forever and do not expire.
For DynamoDB, the AWS Free Tier offers the following:
25 GB of table storage25 write capacity units (WCU) of provisioned capacity25 read capacity units (RCU) of provisioned capacity25 replicated write capacity units (rWCU)2.5 million DynamoDB Streams read requests1 GB data transfer out (15 GB for the first 12 months)The benefits listed are given under the Standard table class.
Not only does this allow you to perform up to 200 million requests per month, but it also remains free forever and does not expire after 12 months.
To get started with a new account, and to take advantage of the AWS Free Tier, please visit the following URL: https://aws.amazon.com/free
Important note
As stated, some services in the AWS Free Tier are only valid for 12 months from sign-up, so ensure you fully understand what is and what is not covered during that period to avoid any excess charges during your usage period or at the end of it.
If creating an AWS account isn’t something you’re ready to do right now, but you still wish to carry on working with DynamoDB, then skip ahead to the Using AWS Lambda and installing DynamoDB local section, where you will find out how to download a version of DynamoDB that you can run locally instead.
Now that you have a new account (or an existing one to use), let’s log in to the console. I will assume that you have followed the login process and are sitting at the AWS Management Consolelanding page:
Figure 2.1 – Overview of the AWS Management Console landing page
When you first log in, you may be directed to the US East (N. Virginia) region. You are free to continue working in this region; however, I prefer to work in the region geographically closest to me (London). You can select your closest region from the drop-down list.
Feel free to explore the console in more depth and see what is available to you. When you’re ready, simply enter DynamoDB into the main search box at the top of the page as shown in the following figure, and click on the title under the Services heading to be taken to the DynamoDB console:
Figure 2.2 – DynamoDB in the Services search bar
Following this, you should be directed to the DynamoDB console. Next, let us use the console to create a DynamoDB table and play around.
As we’re not yet building an application that connects to DynamoDB through the SDKs, but merely want to check out how DynamoDB works at a basic level and how we can enable and work with its supported features, this can all be done from within the DynamoDB console. From here, we can issue requests into DynamoDB and see the responses alongside being able to add, edit, and delete items in our table(s).
In this part of the chapter, we will start by creating a new table and then we’ll add some items to it. We won’t be issuing any advanced queries here, but we will be able to see how to quickly manage data in DynamoDB straight from the AWS Management Console.
Let’s take our first jump into the world of DynamoDB and create our first table:
To start with, we’re going to leave the majority of options at the default state when creating our table. Don’t worry if you don’t fully understand what each term or option means; we will cover these in depth later:Figure 2.3 – DynamoDB console Tables landing page
Click on Create table and you will be taken to a relatively simple screen (Figure 2.4):Figure 2.4 – Creating a new DynamoDB table
Here, we only need to enter information into two fields to create our DynamoDB table, Table name and Partition key. We are going to leave Sort key empty (it is optional anyway), and we will use Default settings for the rest of this table. For now, we do not need to add any tags to the table.
Once you have entered the table name (I’m using DefinitiveGuide01) and partition key value (I am using id), click on Create table. DynamoDB will now go ahead and make this table ready for you and will provision the necessary resources behind the scenes so that you are ready to go as soon as the table status changes from Creating to Active (this typically only takes a few seconds). The following figure shows the table in the Creating status:
Figure 2.5 – Table in Creating status
As previously mentioned, the table should transition from the Creating to Active status within a few seconds. The following figure shows my table in the Active status and is ready to use:
Figure 2.6 – Table in Active status
I’ve kept this initial table simple for now, including the partition key value. We will explore more advanced concepts in naming and how to use the sort key to our advantage in the next part of this book, CoreData Modeling.
Once the table is ready, the name of the table changes to a clickable link. Click on it and you will see a new screen that shows information about the table, including details such as Capacity Mode along with a section titled Items summary, which shows details about your table’s data.
With the information screen about our newly created table open, clicking on the View items button at the top-right of the screen takes us to our item detail screen (Figure 2.7). At this point, our table is empty and you should see the The query did not return any results message onscreen, as shown in the following figure:
Figure 2.7 – Empty table results screen
This message is the result of the console running a scan on the table. A scan simply calls the DynamoDB API and effectively says give me all the items that are stored in this table. This can be useful at times, but as we will discover later, it is not always beneficial and can often end up costing a fair amount of RCUs when our table grows. For what we are doing presently in the console, it is fine and will not exceed our capacity throughput.
Our table is now ready, and we can start creating items in it. To summarize, we logged into the AWS Management Console and navigated our way through to the DynamoDB-specific console. From there, we created our first DynamoDB table and saw how the DynamoDB console can quickly show us the contents of our table. Next, we shall navigate through the console further and interact with our new DynamoDB table.
You have seen how easy it is to create a table in DynamoDB. Now, we are going to populate that table with some items (an item in NoSQL is what we would call a row in a relational database), and I am pleased to tell you that it is just as easy.
On the right of the Items returned section, click on the Create item button and you will be taken to a screen that lets you create a new DynamoDB item (Figure 2.8). The first thing to notice here is that the only required attribute to create an item is the partition key. DynamoDB does not need anything further. Of course, not adding other attributes (in relational speak, an attribute would otherwise be known as a column) will mean a fairly empty and questionable database. For this example, we will add one attribute or column.
An important part to remember about DynamoDB is that no matter what, we must always have a partition key value—this is what forms the core concept of the key-value principle. It is this key that gets supplied to an internal hashing function and from there, DynamoDB determines which partition (or storage node) the item sits on. Whenever you perform a get_item or query operation, you have to supply the partition key regardless of whether you’re using a sort key or not. The only time you do not need to know the item’s partition key value is when you are performing a Scanoperation on the table. So, it is vital that the partition key forms a solid and useful value and structure.
Sticking with keeping it simple, I have set my id, the partition key value to foo and added a String attribute (found under the Add new attribute drop-down menu) named title, with a value of bar:
Figure 2.8 – Creating a new DynamoDB item
Once that is entered, click on Create item and that’s it—your newly created item is now stored in DynamoDB: in the cloud, on three storage nodes, across multiple Availability Zones, and ready for retrieval with millisecond latency. How simple yet fantastic is that?!
Our Items returned list has now been updated (to 1) and shows the newly created item along with its attribute names and values. The following figure shows this:
Figure 2.9 – A new DynamoDB has been created
To edit this item, we can either click on the partition key value (shown as a blue hyperlink) or we can select the checkbox next to it, then from the Actions drop-down menu, select Edit