Database Design and Modeling with Google Cloud - Abirami Sukumaran - E-Book

Database Design and Modeling with Google Cloud E-Book

Abirami Sukumaran

0,0
24,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

In the age of lightning-speed delivery, customers want everything developed, built, and delivered at high speed and at scale. Knowledge, design, and choice of database is critical in that journey, but there is no one-size-fits-all solution. This book serves as a comprehensive and practical guide for data professionals who want to design and model their databases efficiently.
The book begins by taking you through business, technical, and design considerations for databases. Next, it takes you on an immersive structured database deep dive for both transactional and analytical real-world use cases using Cloud SQL, Spanner, and BigQuery. As you progress, you’ll explore semi-structured and unstructured database considerations with practical applications using Firestore, cloud storage, and more. You’ll also find insights into operational considerations for databases and the database design journey for taking your data to AI with Vertex AI APIs and generative AI examples.
By the end of this book, you will be well-versed in designing and modeling data and databases for your applications using Google Cloud.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Database Design and Modeling with Google Cloud

Learn database design and development to take your data to applications, analytics, and AI

Abirami Sukumaran

BIRMINGHAM—MUMBAI

Database Design and Modeling with Google Cloud

Copyright © 2023 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Reshma Raman

Publishing Product Manager: Heramb Bhavsar

Book Project Manager: Kirti Pisat

Content Development Editor: Manikandan Kurup, Joseph Sunil

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Proofreader: Safis Editing

Indexer: Subalakshmi Govindhan

Production Designer: Jyoti Kadam

Senior DevRel Marketing Coordinator: Nivedita Singh

First published: December 2023

Production reference: 1121223

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK.

ISBN 978-1-80461-145-6

www.packtpub.com

To my mother, Janaki, for her unconditional love To the memory of my father, Sukumaran, for inspiring me to be who I am To the memory of my grandparents Lakshmi, Srinivasan, and family to whom I owe everything To my beloved country and friends for nurturing my soul And to God for giving me resilience

Forewords

Databases are essential to the modern enterprise, as we all know. They store and manage our data, which is the lifeblood of our businesses. However, designing and modeling a database is not straightforward. It requires a deep understanding of the business, technology, and data.

Database Design and Modeling with Google Cloud, by Abirami Sukumaran, provides a comprehensive and practical guide to cloud database design and modeling. It serves as a compass for data professionals aiming to design and model databases that not only meet but exceed the demands of the modern, fast-paced digital era.

Abirami is a seasoned database practitioner and leader with almost two decades of experience in the databases and analytics field. She has worked with a wide range of businesses, from startups to large enterprises, and has helped them design and implement successful database applications. In this book, Abirami shares her wealth of experience by providing an organized approach to database design, and she shows you how to use Google Cloud to build scalable, reliable, and secure database applications.

One of the book’s strengths lies in its inclusivity and relevance, addressing structured, semi-structured, and unstructured data considerations. The practical insights coupled with examples showcase the versatility of Google Cloud services in handling diverse data types, empowering readers to make informed decisions in their database design journey.

The hands-on approach is evident as the book leverages Google Cloud services, such as Cloud SQL, Spanner, BigQuery, Firestore, and Cloud Storage, to illustrate concepts in real-world applications. It goes beyond the traditional boundaries of database design, extending into the integration of databases with artificial intelligence through Vertex AI APIs and generative AI examples. This forward-thinking approach positions you to smoothly transition from data management to leveraging AI in their applications.

Whether you are a database novice or a seasoned professional, this book will help you take your database skills to the next level. I highly recommend this book to anyone who wants to learn more about cloud database design and modeling. It is a valuable resource that will help you design your databases thoughtfully and improve the performance of your applications centered around data.

Priyanka Vergadia

Head of North America Cloud Developer Advocacy

Google

It is an honor to write the foreword for this book, Database Design and Modeling with Google Cloud, by Abirami Sukumaran. She has poured her heart and soul into this work, a testament to her passion and dedication to empowering developers through technology.

In the age of AI and cloud, data is the new currency. A well-designed and futuristically modeled database can provide a solid foundation for speed, agility, efficiency, and scalability. However, navigating this rapidly evolving data and technology landscape can be complicated and daunting. Abirami Sukumaran’s book emerges as a lighthouse, guiding practitioners through a holistic and pragmatic approach to designing databases that meet the demands of the digital age and empower organizations to thrive in a data-centric environment.

This book is a journey of discovery, learning, and growth. The journey commences with a strategic exploration of business and technical considerations that set the stage for effective database design. It equips you with the ability to ask the right questions, make informed decisions, and navigate the endless possibilities of cloud-based databases. It leads you through an immersive, hands-on experience, navigating the complexities of different data formats and real-world use cases. You will particularly love how the flow effortlessly takes them into designing databases for analytical and AI-centric applications. The book positions you on a path to creating intelligent, data-driven solutions.

Congratulations on getting started on this journey! This book is a strategic companion and hands-on guide for engineers, architects, and developers practicing the art and science of cloud database design.

Bagirathi Narayanan

SVP, Chief Architect, Teladoc Health

Forbes Member Leader

Contributors

About the author

Abirami Sukumaran is a lead developer advocate at Google, focusing on databases and data to AI journey with Google Cloud. She has over 17 years of experience in data management, data governance, and analytics across several industries in various roles from engineering to leadership, and has 3 patents filed in the data area. She believes in driving social and business impact with technology.

She is also an international keynote, tech panel, and motivational speaker, including key events like Google I/O, Cloud NEXT, MLDS, GDS, Huddle Global, India Startup Festival, Women Developers Academy, and so on. She founded Code Vipassana, an award-winning, non-profit, tech-enablement program powered by Google and she run with the support of Google Developer Communities GDG Cloud Kochi, Chennai, Mumbai, and a few developer leads. She is pursuing her doctoral research in business administration with artificial intelligence, is a certified Yoga instructor, practitioner, and an Indian above everything else.

I want to thank Packt and its team along with the tech reviewers for their effort in ensuring that the content is wholesome. My sincere thanks to Martin Paynter (Customer Engineering Manager, Google) and Chris Taylor (VP, Engineering Fellow at Google) for going out of their way to complete the review and approval process smoothly.

About the reviewers

Alfons Muñoz, a dedicated technophile and cloud computing expert, is the Communities Director at C2C, the Google Cloud Customer community. His role involves enriching the GCP professional network through collaboration with Googlers and customers.

Alfons started his Google journey as a partner, specializing in Google Workspace solutions for diverse companies. He holds three Google Cloud certifications, including Google Workspace administrator and Professional Cloud Architect, backed by a B.Sc. in Computer Science. Renowned in the C2C community, Alfons actively engages in global events and conferences, sharing valuable insights into the dynamic world of Google Cloud.

Bitthal Khaitan is a Senior Cloud Engineer at CVS Health, USA. CVS Health is a prominent Fortune #4 healthcare solutions company. With more than 13 years of industry experience, Bitthal has played a pivotal role in driving cloud data-driven initiatives using a versatile technology stack encompassing Google Cloud, Teradata, Big Data, Hadoop, Spark, Scala, and Python. Before his tenure at CVS Health, Bitthal spent eight years at Tata Consultancy Services, Asia's largest IT company as a Data Engineer where he collaborated with Fortune 500 companies like Walmart, Cigna Healthcare, DaVita Kidney Care, and Kaiser Foundation Hospitals on critical projects involving enterprise data warehousing, big data lake development, and cloud migration.

I would like to thank my spouse and my 2 little daughters for being an amazing support system!

Table of Contents

Preface

Part 1: Database Model: Business and Technical Design Considerations

1

Data, Databases, and Design

Data

Databases

A teeny-tiny bit about the evolution of databases

DBMS

Database design

Data modeling

Database modeling

Considerations for a good database design

Business aspect

Ingestion

Technical aspect

Choosing the right database

Relational database

NoSQL database

Summary

2

Handling Data on the Cloud

Types of cloud services

Use case categories

The benefits of cloud computing

Data applications on cloud

Storage

Backup and disaster recovery

Analytics and insights

Application development

User experience and personalization

Managed, unmanaged, and database as a service

Managed databases

Unmanaged databases

Database as a service

Cloud database considerations

A quick follow-up

Summary

Part 2: Structured Data

3

Database Modeling for Structured Data

Structured data

Rows and columns

Transactional applications

Analytical applications

Using an RDBMS for structured data

Atomicity

Consistency

Isolation

Durability

Considerations for your RDBMS

Structured query language

Sample SQL queries

Summary

4

Setting Up a Fully Managed RDBMS

Fully managed databases

Fully managed RDBMS

Cloud SQL

Setting up and configuring a fully managed RDBMS

Creating a Cloud SQL instance for MySQL

Connecting to the instance

Creating a database

Creating a table

Inserting values

Querying values

Creating an application with the Cloud database

Configuring the Cloud Functions service account

Creating a Cloud Function

Operational aspects of cloud relational databases

Migration

Monitoring

Query Insights

Security

Summary

5

Designing an Analytical Data Warehouse

Understanding how data warehouses are different from databases

Significance of ETL in data warehouse

Learning about BigQuery

Features of BigQuery

Setting up and configuring a fully managed data warehouse with BigQuery

Enabling BigQuery from the console

Creating a BigQuery dataset

Using an existing public dataset

Creating a table in the dataset

Performing simple analytics

Summary of operational aspects and design considerations

Summary

Part 3: Semi-Structured, Unstructured Data, and NoSQL Design

6

Designing for Semi-Structured Data

Semi-structured data

Pros and cons of semi-structured data

Use cases of semi-structured data

NoSQL for semi-structured data

Data structures supported by NoSQL databases

Firestore and its features

Setting up Firestore

Collection

Document

Subcollection

Security

Client libraries and APIs

Indexing

Single-field index

Composite index

Collection group query

Data model considerations

Hierarchical format

Denormalized format

Easy querying with RunQuery API

API endpoint and method

The parent parameter

JSON body format

StructuredQuery

The from clause

The where clause

Putting the pieces together

Implementing RunQuery API programmatically

Summary

7

Unstructured Data Management

Use cases

Processing unstructured data

Storage options in Google Cloud

Cloud Storage, classes, and features

Unstructured data storage with BigQuery

External sources

External connections

Unstructured data analytics with BigQuery

Summary

Part 4: DevOps and Databases

8

DevOps and Databases

Upgrades, updates, and patching

Security, privacy, and encryption

Replication and availability

Scalability

Performance and throughput

SLA, SLI, and SLO

Data federation

Continuous integration/continuous delivery (CI/CD)

Migrating to cloud databases

Database Migration Service

System, query, and performance insights

Summary

Part 5: Data to AI

9

Data to AI – Modeling Your Databases for Analytics and ML

Modeling considerations for analytics, AI, and ML

Data to AI

Google Cloud ETL services

Google Cloud Dataflow at a glance

Real-world use cases for Google Cloud Dataflow

Step-by-step guide to Google Cloud Dataflow

Taking your data to AI

Summary

10

Looking Ahead – Designing for LLM Applications

Capturing the evolution of LLMs

Getting started with LLMs

Understanding the underlying principles of LLMs

Comparing real-world applications of LLMs and traditional analytics

Understanding the differences in data modeling for traditional analytics and LLMs

Data model design considerations for applications that use LLMs

Learning about data modeling principles and techniques

Ethical and responsible practices

Hands-on time – building an LLM application

Step 1 – create a table

Step 2 – insert data into the table

Step 3 – create an external connection for BigQuery to access the Vertex AI model

Step 4 – grant permissions to the service account to access the Vertex AI service

Step 5 – create the remote model in BigQuery

Step 6 – query the dataset

Step 7 – generate text (create an LLM application) using only SQL

Vector databases

Summary

Onward and upward!

Index

Other Books You May Enjoy

Preface

This is the age of lightning-speed delivery. Whether it is the physical world of consumer products or the technological world that enables digitization, we want everything developed, built, and delivered at speed and at scale. Yes, I am talking about the software products and applications that are powering several million businesses these days. We all know that the knowledge, design, and choice of the database are critical in that journey. Is there a cookie-cutter template solution that gets you there quickly and correctly? The answer is no. But there are ways to get your idea into a product relatively faster and efficiently with the right design, and technical and business knowledge of your data. That is what I have addressed in this book.

This book focuses on taking the readers on a journey of taking idea through product, with a structured approach to guide you into asking the right questions and knowledge about your own data and business. It will also have a deep-dive discussion on each type and structure of data, and an immersive walk along the entire lifecycle of data in your application. You will also learn about database design best practices, considerations for data security, privacy and compliance with real-world examples and solutions with Google Cloud Database options for each category.

Who this book is for

This book is targeted towards database developers, data engineers, data architects, and data analysts who are looking to build solutions and insights using databases and Google Cloud services tailored to their organizational needs.

What this book covers

Chapter 1, Data, Databases, and Design, will help us explore all the basics related to data, database, and modeling. You will learn all the general considerations you need to have while working with them.

Chapter 2, Handling Data on the Cloud, will help us dive into the details of cloud computing, and its different types, and explore the use cases and applications. By the end of this chapter, you’ll have a clear understanding of cloud computing, its types, use cases, benefits, applications, and considerations.

Chapter 3, Database Modeling for Structured Data, discusses structured data, its properties, types, use cases, key considerations, data modeling best practices, SQL basics, and some hands-on data modeling and query experiments.

Chapter 4, Setting up a Fully Managed RDBMS, takes the structured database design to hands-on learning with a fully managed cloud relational database. You will learn how to set up and configure your instance, how to create databases and objects in the database, and how to programmatically connect to the database and access data.

Chapter 5, Designing an Analytical Data Warehouse, will move on to designing for analytical data and take it to hands-on learning with a fully managed cloud data warehouse. You will learn how to set up and configure, create datasets and objects, query, and perform sample analytics on the data.

Chapter 6, Designing for Semi-structured Data, will show you the fundamentals of semi-structured data with examples, real-world use cases, characteristics of semi-structured data, design considerations, and components of a document database.

Chapter 7, Unstructured Data Management, will show you the fundamentals of unstructured data with examples, real-world use cases, how to store, manage, and perform analytics and with unstructured data.

Chapter 8, DevOps and Databases, discusses DevOps and operational attributes of database management like upgrades, security, monitoring, scalability, performance, SLA and SLOs, data federation, CI/CD, migration, and so on. We will also discuss how Google Cloud simplifies the design decisions for these operational considerations.

Chapter 9, Data to AI – Modeling Your Databases for Analytics and ML, explores some key considerations and best practices while designing for analytics, ML, and AI with cloud databases, covering topics like modeling considerations for analytics and ML, analytics, ETL, and the journey of data to AI.

Chapter 10, Looking Ahead – Designing for LLM Applications, will set the stage for data modeling for LLM applications by covering the evolution and basics of LLM, the difference between ML and generative AI applications, the ethical and responsible practices and considerations, and finally the real-world use cases and hands-on implementation to extend your database application to include LLM insights.

To get the most out of this book

Basic understanding of data journey, types, databases, role in the product development cycle, cloud computing, and analytics basics would be useful in getting started with database design and modeling concepts and hands-on implementations.

Software/hardware covered in the book

Operating system requirements

SQL (BigQuery)

Windows, macOS, or Linux

Java 11

Google Cloud services (on browser): Spanner, Cloud SQL, Firestore, BigQuery, Cloud Storage, Cloud Functions, Cloud Run, Cloud Shell

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

If you manage to complete all the exercises in the book, head over to https://codevipassana.dev for more hands-on resources and references.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Database-Design-and-Modeling-with-Google-Cloud. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Enter bigquery-public-data in the Type to Search field again and once it appears, Expandthis project.”

A block of code is set as follows:

SELECT country_name, new_confirmed, date, cumulative_confirmed, population FROM `bigquery-public-data.covid19_open_data.covid19_open_data` LIMIT 10

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

SELECT country_name, MAX(cumulative_confirmed) AS TOTAL_CONFIRMED_COVID19 FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE cumulative_confirmed > 0 GROUP BY country_name ORDER BY MAX(cumulative_confirmed) DESC LIMIT 5;

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “To start, click Activate Cloud Shell from Google Cloud Console in the top-right corner.”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Database Design and Modeling with Google Cloud, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781804611456

Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directly

Part 1:Database Model: Business and Technical Design Considerations

This part will help you understand the basics of data, database, data modeling, and design. You will learn to use an objective approach to asking the right Business Design and Technical Design Questions. You will also explore a sample application to demonstrate the business and technical aspects.

This part has the following chapters:

Chapter 1, Data, Databases, and DesignChapter 2, Handling Data on the Cloud

1

Data, Databases, and Design

Data is the foundation of almost all web and mobile applications, and databases are required to handle the essence of your business. There are several database options to choose from, depending on the nature of your business, as well as its type, format, and structure of data, and other design considerations and dependencies of your business and data, on the cloud or otherwise.

Starting with the foundations of databases and design, this book covers fundamentals of database design, data's types, structure, and applications on the cloud, and designing the cloud database model for your application. This will involve us demonstrating database modeling with real-world use cases and examples and integration with other layers of the tech stack like applications, runtimes, analytics and other services such as monitoring, security, access control, analytics, machine learning, and generative AI. The cloud databases and services we will be using to exercise these concepts are all from Google Cloud.

By the end of this book, you'll have learned the fundamentals of cloud database design, taken the data to AI design journey and will have experimented with hands-on applications using Cloud databases and storage options like Cloud SQL, Spanner, Firestore, BigQuery and Cloud Storage. In summary, this book prepares you for designing the data, databases, and data to AI stacks of your product or application with practical examples and hands-on design considerations.

In this chapter, you’ll learn about the considerations that are critical in choosing a database at different stages in the life cycle of your data. We will cover the different stages, types, formats, structures and categories of data, types of databases, and the business and technical aspects of database design.

We’ll cover the following topics in this chapter:

Basics of data modeling and designTypes of data and applicationsBusiness aspects of dataTechnical considerationsTypes of databasesChoosing the right database

Data

Raw data can be omnipresent, indefinite, and ubiquitous. Yes – I wake up every day to the smell of freshly ground cardamom as if it were my alarm (my alarm, on the other hand, fails the one job it was designated to do). Anyway, I get ready and drive to work using Google Maps for directions (I like to keep an eye out for traffic, so I choose my routes wisely), log the day’s ongoing work for my reference later, check the lunch menu at work in the app, connect with colleagues in Chat (and in person sometimes), work out with my favorite YouTube videos, order food/groceries online, post and connect via social media, and play sleep stories from my meditative collection as I get through and sway my day effortlessly into the non-REM (Rapid Eye Movement, the fourth stage of the sleep cycle) part of my sleep cycle while my subconscious brain starts to pick up where I left off. This cycle repeats as I wake up the next day to the smell of tea and freshly ground cardamom, wondering how nice it would be with some buttery soft sliced bread!

Hmm... if only it weren’t for my gluten allergy.

Anyway, the one thing that is half as old as sliced bread but twice as good as that is databases! Imagine the volume and variety of data I share, consume, and involuntarily indulge in throughout my life while undertaking the routine that just I shared (and perhaps I need to get a life)! What goes into accommodating those activities and what resources do I need to accomplish them?

Databases

That’s right! The set of application programs that store, access, manage, and update this data while dealing with structure, recovery, security, privacy, concurrency, and more, and attribute comprehensively to getting the day in the life of a modern-day human done right, is the database. It is also called the database management systemor DBMS.

A teeny-tiny bit about the evolution of databases

Long before the term data was even coined, humans used the Ishango bone (what is assumed to be a notched baboon bone) as a tally stick speculated to have some mathematical engravings or even something of astrological relevance. Dating to 20,000 years before the present, it is regarded as the oldest mathematical “database” (logging numerical information for future use) tool for humankind, with the possible exception of the approximately 40,000-year-old Lebombo bone from southern Africa. Then, we have Acharya Pingala from the third to second century BC who first described the binary number system that lives on today – forming the foundations of any computing there is, including database systems. Slowly and steadily, we progressed into advanced computing, databases, and technology in general with calculators, computers, automation, wartime wonders, relational database management, the internet, Google Search (yes, it has come to be identified as an important event in the evolution of technology), artificial intelligence (AI), machine learning (ML), and big data.

Isn’t it fascinating how everything important dates back to monkeys or monkey bones, just like computers and homo sapiens themselves?

DBMS

Exactly 52 years ago, E.F. Codd, the father of DBMSs, propounded and formalized the 12 commandments, of which there are 13 (starting from 0. I know, right?), that make up a DBMS. You can read about it here: https://en.wikipedia.org/wiki/Codd%27s_12_rules. We have evolved since the 1960s, when we used one database to store and secure information, to modern times, where we use one database per stage in the data life cycle – that is, one database per data stage, type, and structure in most cases. We will dive deeper into each of these categories throughout this book with examples and exercises, so don’t panic if this is a jargon overdose at this point.

In this chapter, we are going to discuss the business attributes, technical aspects, design questions, and considerations to keep in mind while designing a database model.

Database design

Database design or modeling refers to the activity of designing a database and modeling the data and structure that stores, transforms, manages, and extracts data. Why is it important to design a database so elaborately, you ask? Here are some reasons:

Data is any organization’s most valuable asset and to leverage it to derive the most benefit for the company, we need to ensure it is thoroughly planned and thought throughAs the current generation databases are easy to set up and use, the most common side effect is that business users tend to dive into creating databases with flawed and much-simplified structures without understanding the components of designWith poorly designed databases, the systems end up having inaccurate results and difficult-to-trace errors, which leads to inconsistent decision-making in the business

If that does not serve your purpose, then consider this: the database is the most critical lifeline of your entire technical architecture as it runs across and connects all the components of your design. It starts with user interfaces, operational systems, messaging services, monitoring systems, analytics solutions, AI/ML applications, and even executive dashboards in the visualization layer that you use for business decision-making. If that lifeline is not well-thought-out, then you are starting your business and technology journey on a path full of surprises, twists, and turns. I understand those are good on paper but very inconvenient when it comes to real-world applications. Imagine having to redesign the foundational component of your understanding 6 months into rolling out your product!

Data modeling

Data modeling is the process of organizing the elements of your data and establishing their relationship with each other and external systems. Within the chosen database environment, a data model represents the structure, attributes (characteristics), relationships, transformations, business rules, and exceptions. There are a lot of data modeling frameworks and tools available in the market, so pick the one that works with your database model, structure of data, business rules, operations, and components of the data.

Database modeling

Database modeling refers to the process of determining the choice and logical structure of your database and designing the way you store, organize, and transform data. I would like to think of a database model as something that needs to be well-vetted out, so that should be the case with the data model. But in a sense, a data model can go through many rounds of trial and error and can evolve as you build compared to a database model, which needs to be designed for scale and also to acclimatize the data model it contains.

Several questions and considerations go into the design and model of a database. That is exactly what we will focus on as a framework in the rest of this chapter.

Considerations for a good database design

Why is it important to take a lot of possible scenarios and probabilities into account while making design decisions? Let me tell you a story that I read in a book called How Not to Be Wrong, by Jordan Ellenberg, an interesting read that talks about real-life applications of mathematics.

Like many other things, our story dates back to World War II, when things happened frequently. In this case, the US fighter planes entered combat with loaded machine guns. They didn’t want their fighter planes to get shot down by the enemy fighters. So, a statistical research group (SRG) of extraordinary statisticians was organized to aid the war effort, where equations were developed (and not explosives) to find a solution. Here’s the question that was posed to them: “You don’t want the fighter planes to get shot down, so you need to armor them, but armor makes the plane heavier, and heavier planes are less maneuverable and consume more fuel. Not too much, not too little, but we need to armor the planes somewhere at the optimum level.”

The military gave some useful data to the SRG: when American planes came back from a mission, they were covered in bullet holes. However, the damage wasn’t uniformly distributed across the aircraft. There were more bullet holes in the fuselage than in the engines. The bullet-holes-per-square-foot distribution was as follows:

1.11 in the engine1.73 in the fuselage1.55 in the fuel system1.8 on the rest of the plane

The officers saw an opportunity to quickly conclude that the armor concentration needed to be in the area with the most bullet holes per square foot. One specific answer from the smartest statistician in that room contradicted this popular opinion, yet it was very insightful. Abraham Wald stated that the armor doesn’t go where the bullet holes are the most concentrated, but where they aren’t. Wald’s insight was simply to ask: where are the missing holes? The missing holes were the ones that would have been all over the engine casing if the bullet hole distribution had been spread equally all over the plane. Wald was sure that the missing holes were on the planes that had been shot down. The officers simply observed the planes that had returned from the mission but the ones that got hit in the engine had not returned at all!

I am sure by now you have already connected the dots as to why it is important to assess all the “missing holes” before you settle on the choice of database and the database model – it is because you don’t want to armor the wrong part of your business and end up spending more effort in the long run. If you’re wondering whether there is a quick and dirty solution, the short answer is no. However, a good design comes with a set of considerations around business attributes and technical aspects while designing the database model.

Business aspect

Business requirements are the starting point for your application and also for choosing your database system. There are four stages in the life cycle of data in its business application that help determine the choice of database system:

Data ingestionStorageProcessVisualize

The following diagram represents the attributes in the four stages of data and the categories of questions in each stage in the life cycle of your data:

Figure 1.1 – Representation of the four stages of data and the categories of questions in each stage

Let’s look at some of these attributes in detail. Some of them are in the business attributes category, while others are technical.

Ingestion

This is the first stage in the data life cycle and it is all about acquiring (bringing in) data from different sources in one place into your system. In this stage, the questions that arise are bucketed into three categories:

What type of data are you bringing in?What is the purpose of this data?What is the structure of your data?