E-Book
33,59 €

A Definitive Guide to Apache ShardingSphere E-Book

Trista Pan

0,0

33,59 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Fachliteratur
Sprache: Englisch

Beschreibung

Apache ShardingSphere is a new open source ecosystem for distributed data infrastructures based on pluggability and cloud-native principles that helps enhance your database.
This book begins with a quick overview of the main challenges faced by database management systems (DBMSs) in production environments, followed by a brief introduction to the software's kernel concept. After that, using real-world examples of distributed database solutions, elastic scaling, DistSQL, synthetic monitoring, database gateways, and SQL authority and user authentication, you’ll fully understand ShardingSphere's architectural components, how they’re configured and can be plugged into your existing infrastructure, and how to manage your data and applications. You’ll also explore ShardingSphere-JDBC and ShardingSphere-Proxy, the ecosystem’s clients, and how they can work either concurrently or independently to address your needs. You’ll then learn how to customize the plugin platform to define personalized user strategies and manage multiple configurations seamlessly. Finally, the book enables you to get up and running with functional and performance tests for all scenarios.
By the end of this book, you’ll be able to build and deploy a customized version of ShardingSphere, addressing the key pain points encountered in your data management infrastructure.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 426

Veröffentlichungsjahr: 2022

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Denke (nach) und werde reich

Napoleon Hill

30 Minuten Resilienz

Ulrich Siegrist

Krebszellen mögen keine Himbeeren - Der große Bestseller - Vollständig überarbeitet und aktualisiert

Richard Béliveau

Die Hormonrevolution

Michael E Platt

Der Crash ist die Lösung

Matthias Weik

Günter, der innere Schweinehund, lernt verkaufen

Stefan Frädrich

Die Leber wächst mit ihren Aufgaben

Dr. med. Eckart von Hirschhausen

Der größte Raubzug der Geschichte

Matthias Weik

Unsere Hunde - gesund durch Homöopathie

Hans Günter Wolff

Die Jahrhundertlüge, die nur Insider kennen

Heiko Schrang

Organisation für Komplexität

Niels Pfläging

Radikal führen

Reinhard K. Sprenger

30 Minuten Sympathisch und souverän: So geht Vortragen!

Thomas Lorenz

BLACKOUT - Morgen ist es zu spät

Marc Elsberg

The Truth About Employee Engagement

Patrick M. Lencioni

Mensch und Wald

Carsten Wippermann

The Food Truck Handbook

David Weber

Die selbstbestimmte Geburt

Ina May Gaskin

Leseprobe

A Definitive Guide to Apache ShardingSphere

Transform any DBMS into a distributed database with sharding, scaling, encryption features, and more

Trista Pan

Zhang Liang

Yacine Si Tayeb, PhD

BIRMINGHAM—MUMBAI

A Definitive Guide to Apache ShardingSphere

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Devika Battike

Senior Editors: Roshan Kumar and Tazeen Shaikh

Content Development Editor: Shreya Moharir

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Rekha Nair

Production Designer: Shyam Sundar Korumilli

Marketing Coordinator: Nivedita Singh

First published: July 2022

Production reference: 1300622

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80323-942-2

www.packt.com

Contributors

About the authors

Trista Pan is the co-founder and CTO of SphereEx, an Apache Member and Incubator Mentor, Apache ShardingSphere PMC, AWS Data Hero, China Mulan open source community mentor, and Tencent Cloud TVP. Trista used to be responsible for the design and development of the intelligent database platform of JD Digital Science and Technology. She now focuses on the distributed database and middleware ecosystem, and the open source community. She was the recipient of the 2020 China Open-Source Pioneer,2021 OSCAR 2021 Top Open Source Pioneer, and 2021 CSDN IT Leading Personality awards. Her paper, Apache ShardingSphere: A Holistic and Pluggable Platform for Data Sharding, was published on ICDE in 2022.

Zhang Liang is the founder and CEO of SphereEx, an Apache Member, the founder of Apache ShardingSphere ElasticJob, the PMC Chair, Tencent Cloud TVP, and Microsoft MVP. Zhang is an open source enthusiast and thought leader in Java-based distributed architectures. Currently, he focuses on turning Apache ShardingSphere into an industry-leading distributed database solution. His 2019 book, Future Architecture: From Service to Cloud Native, was well received by both critics and the community. His 2022 paper, Apache ShardingSphere: A Holistic and Pluggable Platform for Data Sharding, was published on ICDE. Zhang was awarded titles in the Top Ten Distributed Database Pioneers of 2021 by CSDN, and the 33 China Open Source Pioneers in 2021 by SegmentFault.

Yacine Si Tayeb, PhD, is the Head of International Operations at SphereEx and one of the core contributors and community builders at Apache ShardingSphere. Passionate about technology and innovation, Yacine moved to Beijing to pursue his PhD in enterprise management and was in awe of the local startup and tech scene. His career path and research have so far been shaped by opportunities at the intersection of technology and business. He is a published scholar, and his passion for technology led him to research the impact of corporate governance and financial performance on corporate innovation outcomes, and to take a keen interest in the development of the Apache ShardingSphere big data ecosystem and open source community building.

About the reviewers

Longtao Jiang is an Apache ShardingSphere committer. He has been active in the community for a long time and has contributed many useful functions to ShardingSphere. Before becoming a committer, he was also a skilled user of ShardingSphere, applying ShardingSphere to multiple financial-level data sharding scenarios. Now, Longtao Jiang is mainly responsible for DistSQL-related innovations and practices in the ShardingSphere community.

Zhengqiang Duan is a committer of the Apache ShardingSphere community and is also a senior middleware engineer at SphereEx. He has been in contact with the Apache ShardingSphere project since 2018 and has led data sharding projects with massive amounts of data within the company and has rich practical experience. He loves open source very much and is willing to share and communicate. He is currently focusing on the development of Apache ShardingSphere kernel modules and strives to provide more powerful and easy-to-use features for the Apache ShardingSphere community.

Nianjun Sun has 15 years of coding experience as a Java developer and is interested in cloud-native and distributed database-related technology. He used to be the architect of Bizseer and was responsible for the design and development of their AIOps platform. He currently works in the core team of Apache ShardingSphere, which founded and built distributed data infrastructures while delivering a SaaS experience through the cloud. He published the paper, Apache ShardingSphere: A Holistic and Pluggable Platform for Data Sharding, on ICDE in May 2022.

Hongsheng Zhong is an Apache ShardingSphere committer and is passionate about open source and database ecosystems. Currently, he works for SphereEx as a senior Java engineer, focusing on the development of the Apache ShardingSphere database middleware ecosystem and open source community building. Previously, he worked on R&D of cloud database products at JD Technology, and has experience of multiple replicas on Raft.

Preface

Section 1: Introducing Apache ShardingSphere

Chapter 1: The Evolution of DBMSs, DBAs, and the Role of Apache ShardingSphere

The evolution of DBMSs

Industry pain points

The new industry needs are creating new opportunities for DBMSs

The evolving role of the DBAs

Overwhelming traffic load increase

Microservice architecture for frontend services

Cloud-native disrupts delivery and stale deployment practices

The opportunities and future directions for DBMSs

Database safety

SQL, NoSQL, and NewSQL

New architecture

Embracing a transparent sharding middleware

Database-as-a-Service

AI database management platform

Database migration

Understanding Apache ShardingSphere

Connect

Enhance

Pluggable

Summary

Chapter 2: Architectural Overview of Apache ShardingSphere

What is a distributed database architecture?

The SQL-based load-balancing layer

Sidecar improves performance and availability

Database Mesh innovates the cloud-native database development path

Apache ShardingSphere and Database Mesh

Solving database pain points with Database Plus

An architecture inspired by the Database Plus concept

Feature architecture

Introduction to the feature layer

Deployment architecture

Plugin platform

Microkernel ecosystem

Simple Push Down Engine

SQL Federation Engine

Summary

Section 2: Apache ShardingSphere Architecture, Installation, and Configuration

Chapter 3: Key Features and Use Cases – Your Distributed Database Essentials

Distributed database solutions

Understanding data sharding

Understanding vertical sharding

Understanding horizontal sharding

Data sharding key points

Why you need sharding

Understanding SQL optimization

SQL optimization definition

Overview and characteristics of distributed transactions

Distributed transactions

ShardingSphere's support for transactions

Transaction modes comparison

An introduction to elastic scaling

Mastering elastic scaling

The workflow to implement elastic scaling

Elastic scaling key points

How to leverage this technology to solve real-world issues

Read/write splitting

Read/write splitting definition

Key points regarding the read/write splitting function

How it works

Application scenarios

Summary

Chapter 4: Key Features and Use Cases – Focusing on Performance and Security

Understanding High Availability

Database HA

ShardingSphere HA

Application scenarios

Introducing data encryption and decryption

What are data encryption and decryption?

Key components

Workflow

Application scenarios

User authentication

Authentication of DBMS versus distributed database

User ID storage

Mechanism

Workflow

Configuration

SQL Authority

Defining SQL Authority

Mechanism

Planned development

Application scenarios

Database and app online tracing

How it works

A total synthetic monitoring solution

Database gateway

Understanding the database gateway

Distributed SQL

Introduction to DistSQL

Application scenarios

Additional notes for DistSQL

Implications for ShardingSphere

Understanding cluster mode

Cluster mode definition

Kernel concepts

Compatibility with other ShardingSphere features

Cluster management

Computing nodes

Storage nodes

Observability

Clarifying the concept of observability

Applying observability to your system

Mechanisms

Application scenarios

Summary

Chapter 5: Exploring ShardingSphere Adaptors

Technical requirements

Differences between ShardingSphere-JDBC and ShardingSphere-Proxy

ShardingSphere-JDBC

The ShardingSphere-JDBC development mechanism

Deployment and user quick start guide

ShardingSphere-Proxy

The ShardingSphere-Proxy development mechanism

Applicability and target users of ShardingSphere-Proxy

Deployment and user quick start guide

Downloading from the official website

Architecture introduction

Applicability and target users

Deployment and user quick start guide

Summary

Chapter 6: ShardingSphere-Proxy Installation and Startup

Technical requirements

Installing with the binary package

Installing with Docker

Introduction to Distributed SQL

Configuration – sharding

DistSQL – the SQL syntax

YAML

Configuration – read/write splitting

YAML

Configuration – encryption

DistSQL

YAML configuration items

Configuration – shadow database

DistSQL

YAML

Configuration – mode

Configuration – scaling

DistSQL for job management

YAML – configuration items

Configuration – multiple features, server properties

DistSQL

YAML

Mixed – encryption + read/write splitting + cluster

Configuration – server

Authority

Transaction

Props configuration

Summary

Chapter 7: ShardingSphere-JDBC Installation and Start-Up

Technical requirements

Setup and configuration

Introducing the preliminary requirements

Introducing the configuration method

Sharding configurations

Java configuration items

YAML configuration items

Spring Boot configuration items

SpringNameSpace configuration items

Understanding read/write splitting configuration

Java configuration items

YAML configuration items

Spring Boot configuration items

SpringNameSpace configuration items

Understanding data encryption configuration

Java configuration items

YAML configuration items

Spring Boot configuration items

SpringNameSpace configuration items

Configuring a shadow database

Java configuration items

YAML configuration items

A Spring Boot example

SpringNameSpace configuration items

Configuring ShardingSphere's modes

Java configuration items

YAML configuration items

Spring Boot configuration items

A SpringNameSpace example

Props configuration for JDBC

Java configuration items

YAML configuration items

Spring Boot configuration items

SpringNameSpace configuration items

Configuration – miscellaneous

Sharding, read/write splitting, and cluster configuration items

Configuring sharding, encryption, and cluster mode

Summary

Section 3: Apache ShardingSphere Real-World Examples, Performance, and Scenario Tests

Chapter 8: Apache ShardingSphere Advanced Usage – Database Plus and Plugin Platform

Technical requirements

Introducing Database Plus

ShardingSphere's pursuit of Database Plus

Connect – building upper-level standards for databases

Enhance – database computing enhancement engine

Pluggable – building a database-oriented functional ecosystem

Plugin platform introduction and SPI

The pluggable architecture of Apache ShardingSphere

Extensible algorithms and interfaces

User-defined functions and strategies – SQL parser, sharding, read/write splitting, distributed transactions

Customizing your SQL parser

Customizing the data sharding feature

Read/write splitting

Distributed transactions

User-defined functions and strategies – encryption, SQL authority, user authentication, shadow DB, distributed governance

Data encryption

User authentication

SQL authority

Shadow DB

Distributed governance

Scaling

ShardingSphere-Proxy – tuning properties and user scenarios

Properties introduction

Extensible algorithms

Summary

Chapter 9: Baseline and Performance Test System Introduction

Technical requirements

Baseline

Benchmarking tools

BenchmarkSQL

A good-to-know alternative benchmarking tool

Databases

ShardingSphere

Performance testing

Test preparation

Summary

Chapter 10: Testing Frequently Encountered Application Scenarios

Technical requirements

Testing distributed database scenarios

Preparing to test your distributed system

Deployment and configuration

How to run your testing on a distributed system

Analyzing a ShardingSphere-Proxy data display – the sharding feature

Scenario-based testing for database security

Preparing to test your database security

Deployment and configuration

How to run your testing on database security

Report analysis

Synthetic monitoring

Preparing to test synthetic monitoring

Deployment and configuration

How to run your testing on synthetic monitoring

Report analysis

Database gateway

Preparation to test the database gateway

Deployment and configuration

How to run your testing on a database gateway

Report analysis

Summary

Chapter 11: Exploring the Best Use Cases for ShardingSphere

Technical requirements

Two clients to choose from

Your DBMS

Sharding strategy

Distributed transaction

HA and the read/write splitting strategy

Elastic scaling

Distributed governance

Implementing ShardingSphere for database security

Two clients to choose from

Applying a data security solution to your DBMS

Data encryption/data masking

Data migration with encryption

Authentication

SQL authority/privilege checking

Flow gateway

Application performance monitoring and Cyborg Agent

Database shield

Overview and architecture

Database management

Read/write splitting

Summary

Chapter 12: Applying Theory to Practical Real-World Examples

Technical requirements

Distributed database solution

Case 1 – ShardingSphere-Proxy + ShardingSphere-JDBC + PostgreSQL + distributed transaction + cluster mode + the MOD sharding algorithm

Case 2 – ShardingSphere-Proxy + MySQL + read/write splitting + cluster mode + HA + RANGE sharding algorithm + scaling

Database security

Case 3 – ShardingSphere-Proxy + ShardingSphere-JDBC + PostgreSQL + data encryption

Case 4 – ShardingSphere-Proxy + MySQL + data masking + authentication + checking privileges

Synthetic monitoring

Case 5 – Synthetic monitoring

The deployment architecture

Database gateway

The deployment architecture

The example configuration

The recommended cloud/on-premises server

Start and test it!

Summary

Appendix: and the Evolution of the Apache ShardingSphere Open Source Community

How to leverage the documentation to find answers to your questions

Example project introduction

How to use the example project section

Scenarios and examples

Source code, license, and version

shardingsphere-kernel

shardingsphere-infra

shardingsphere-jdbc

shardingsphere-db-protocol

shardingsphere-proxy

shardingsphere-mode

shardingsphere-features

License introduction

Version introduction

Open source community

Open source contribution

Website and documentation

Websites

Channels

Concluding note

Other Books You May Enjoy

Preface

Apache ShardingSphere is a new open source ecosystem for distributed data infrastructures based on pluggability and cloud-native principles.

This book begins with a quick overview of the main challenges faced by DBMSs today in production environments, followed by a brief introduction to the ShardingSphere software's kernel concept. Thereafter, through real-world examples, including distributed database solutions, elastic scaling, DistSQL, synthetic monitoring, SQL authorization and user authentication, and database gateway, you will gain a full understanding of ShardingSphere's architectural components and how they are configured and can be plugged into your existing infrastructure to manage your data and applications. Moving ahead, you will get well versed with ShardingSphere-JDBC and ShardingSphere-Proxy, the ecosystem's clients, and how they can work either concurrently or independently according to your needs. Then, you will learn how to customize the plugin platform to define your personalized user strategies and manage multiple configurations seamlessly. Lastly, you will get up and running with functional and performance tests for all scenarios.

By the end of this book, you will be able to build and deploy your own customized version of ShardingSphere, addressing the key pain points encountered in your data management infrastructure.

Who this book is for

This book is for database administrators (DBAs) working with distributed database solutions who are looking to explore the capabilities of Apache ShardingSphere. DBAs looking for more capable, flexible, and cost-effective alternatives to the solutions they're currently utilizing will also find this book helpful. To get started with this book, a basic understanding of, or even an interest in, databases, relational databases, SQL, cloud computing, and data management in general is needed.

What this book covers

Chapter 1, The Evolution of DBMSs, DBAs, and the Role of Apache ShardingSphere, introduces the main challenges faced by DBMSs today in production environments, the evolving role of the DBAs, and the opportunities and future directions for DBMSs. This chapter lays the foundation for the rest of this book by including a brief introduction to the Apache ShardingSphere ecosystem, the context of the project development, and the need filled by the software solution.

Chapter 2, Architectural Overview of Apache ShardingSphere, provides a professionally focused description of the software's architecture. The Database Plus driving development concept is introduced, together with the deployment architecture and the plugin platform.

Chapter 3, Key Features and Use Cases – Your Distributed Database Essentials, gives an overview of the potential use cases of ShardingSphere in professional and enterprise environments divided by industry (fintech, media, e-commerce, etc.) and introduces the solution's features that are necessary for a distributed database.

Chapter 4, Key Features and Use Cases – Focusing on Performance and Security, expands your knowledge on the potential use cases of ShardingSphere in a professional/enterprise environment by focusing on the ecosystem's features that'll allow you to monitor and improve performance and enhance security.

Chapter 5, Exploring ShardingSphere Adaptors, includes a description of the ecosystem's main clients, their differences, and how they can work either concurrently or independently, depending on your needs.

Chapter 6, ShardingSphere-Proxy Installation and Startup, introduces ShardingSphere-Proxy, how to use it directly as MySQL and PostgreSQL servers, and how to apply it to any kind of terminal.

Chapter 7, ShardingSphere-JDBC Installation and Startup, explains the client end, how it connects to databases, its third-party database connection pool, and how to successfully install it.

Chapter 8, Apache ShardingSphere Advanced Usage – Database Plus and Plugin Platform, illustrates how to customize the plugin platform on an ad hoc basis to get the most out of your system and introduces cloud-native principles.

Chapter 9, Baseline and Performance Test System Introduction, introduces the built-in baseline and performance testing system and how to use it – from preparing your test to report analysis.

Chapter 10, Testing Frequently Encountered Application Scenarios, offers tests for frequently encountered application scenarios, including distributed databases, read/write splitting, and shadow databases.

Chapter 11, Exploring the Best Use Cases for ShardingSphere, showcases the best use cases for each scenario, as well as a series of real-world examples such as distributed database solutions, database security, synthetic monitoring, and database gateway.

Chapter 12, Applying Theory to Practical Real-World Examples, builds on the knowledge cumulated in Chapter 11, Exploring the Best Use Cases for ShardingSphere, and provides methodologies to turn theory into practice.

Appendix and the Evolution of the Apache ShardingSphere Open Source Community, offers a guide to leverage the ecosystem's documentation, the example projects in the GitHub repository, more information on ShardingSphere's source code and license, and how to join the project's open source community.

To get the most out of this book

For this book, to be able to get the most out of your system and Apache ShardingSphere, you will need a few simple tools.

We have compiled a list of the software you may need, depending on the features or ShardingSphere clients you are interested in.

In terms of the operating system, you may use any of the mainstream choices available, Windows, macOS, or Linux. All code examples have been tested in all three operating systems and should work with future versions too.

If you were to encounter any difficulties in the installation, and you cannot find the fix covered in this book for reasons such as a particular setup you may be using, you can reach out to the community in the Issues or Discussions sections of Apache ShardingSphere's GitHub repository.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

ShardingSphere uses the Apache Software Foundation's Apache License 2.0. You can find the complete details here: https://www.apache.org/licenses/LICENSE-2.0.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/A-Definitive-Guide-to-Apache-ShardingSphere. If there's an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

You can connect with the community for more events, and user cases, and if you want to try, become an open source developer.

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://packt.link/VUBd8.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

html, body, #map { height: 100%; margin: 0; padding: 0}

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]exten => s,1,Dial(Zap/1|30)exten => s,2,Voicemail(u100)exten => s,102,Voicemail(b100)exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

$ mkdir css

$ cd css

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "Select System info from the Administration panel."

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you've read A Definitive Guide to Apache ShardingSphere, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

Section 1: Introducing Apache ShardingSphere

In this part, you will gain an overview of Apache ShardingSphere, its architecture, concepts, and clients. You will get up to speed on the latest challenges affecting databases today and future developments, and be able to conceptualize ShardingSphere's position in the current database landscape.

This section comprises the following chapters:

Chapter 1, The Evolution of DBMSs, DBAs, and the Role of Apache ShardingSphereChapter 2, Architectural Overview of Apache ShardingSphere

Chapter 1: The Evolution of DBMSs, DBAs, and the Role of Apache ShardingSphere

Today, data is recognized as the most valuable property available. As the so-called warehouses for this most valuable property, databases were not always given the enviable amount of attention they have been getting as of late. The hyper-growth of the internet, as well as its related and non-related industries (think traditional sectors affected by the positive externalities of increased connectivity, such as transportation and retail), the emergence of cloud-native, the development of the database industry, and distributed technology, have brought up new requirements and renewed pressure on businesses and their infrastructure.

Additionally, changes in societies at large, coupled with changes to people's lifestyles, have also raised new issues, concerns, and requirements for any modern company. Accordingly, companies must review their products, services, and architectures for their end users and consider upgrading and innovating from the frontend to the backend. Ultimately, they must consider the database and data as the most vital parts of this evolutionary process.

Simply put, data drives businesses. Stakeholders from C-suite executives, such as CIOs, to database managers are aware of the important role that data plays in transforming their businesses, satisfying users, and allowing them to maintain or create new competitive advantages.

Such recognition created a focus on three key areas all related to data – data collection, data storage, and data security – all of which will be discussed in detail in this book. The absence of databases from this list is by no means a lack of appreciation toward their integral role within organizations, but only the omission of an obvious fact.

Overlooking databases can create inefficiencies that can quickly snowball and become seriously threatening problems, such as a poor database experience for employees and customers, cost overruns, and poor workload optimization. At the same time, enterprises also need capable experts to leverage their databases and manage and efficiently utilize the data. Hence, the data, database, and database administrator (DBA) form a system that allows enterprises to efficiently store, protect, and leverage their assets.

In this chapter, we will cover the following topics:

The evolution of DBMSsThe evolving role of the DBAThe opportunities and future directions for DBMSsUnderstanding Apache ShardingSphere

By the end of this chapter, you will have developed a comprehensive understanding of the current challenges for DBMSs. For those of you that are already familiar with the ongoing evolution of the database industry, this chapter will serve either as a refresher of the most pressing challenges or as a reference that organizes these challenges for you into one place.

Understanding these challenges will be followed by an introduction to the Apache ShardingSphere ecosystem and its driving concepts. Finally, you will be able to answer how ShardingSphere can help you solve the most pressing DBMS challenges and support you well into the future evolution of the database industry.

The evolution of DBMSs

With the rapid adoption of the cloud, SaaS delivery models, and open source repositories that are driving innovation, the proliferation of data has exploded in the past 10 years. These large datasets have made it mandatory for organizations who want an optimal customer experience to deploy effective and reliable database management systems (DBMSs). Nevertheless, this renewed focus for organizations on DBMSs and their requirements has not only created multiple opportunities for new technologies and new players in the industry but also numerous challenges. If you are reading this book, you are probably looking to upskill yourself and improve or expand your knowledge on how to effectively manage DBMSs.

Databases exist to store and access information. As a result, organizations now find it crucial to understand the latest techniques, technologies, and best practices to store and retrieve extensive data and the resulting traffic. The shift to cloud-based storage has also led to the expanded use of data clusters, and the related data science around data storing strategies. Data use for apps goes up and down throughout a typical day.

Reliable and scalable databases are required to help collect and process data by breaking large datasets into smaller ones. Such a need gave rise to concepts such as database sharding and partitioning, where both are used to scale extensive datasets into smaller ones while preserving performance and uptime. These concepts will be discussed in Chapter 3, Key Features and Use Cases – Your Distributed Database Essentials, in the Understanding data sharding section, and Chapter 10, Testing Frequently Encountered Application Scenarios.

Let's summarize what open source means according to The Open Source Definition (https://opensource.org/osd) – when we talk about open source, we refer to software that's released under a license where the copyright holder gives you and any other user the rights to use or change and distribute the software, even its source code, to anyone for any purpose deemed fit.

When it comes to databases, the role of open source is not only non-negligible, but it may come as a surprise to many. As of June 2021, over 50% of database management systems worldwide use an open source license (DB-Engines, Statista 2021). If we consider the recent developments of open source database software, we'll notice the proliferation of initiatives and communities dedicated to cloud-native database software.

Cloud-native databases have become increasingly important with the ushering-in of the cloud computing era. Its benefits include elasticity and the ability to meet on-demand application usage needs. Such a development creates the need for cloud migration capabilities and skills as businesses migrate workloads to different cloud platforms.

Currently, hybrid and multi-cloud environments are the norm, with nearly 75% of organizations reporting usage of a multi-cloud environment (https://www.lunavi.com/blog/multi-cloud-survey-72-using-multiple-cloud-providers-but-56-have-no-multi-cloud-strategy). The data that remains stored on-premises is, more often than not, composed of sensitive information that organizations are wary of migrating, or data that is connected to legacy applications or environments that make it too challenging to migrate.

This changed the concept of databases as we used to understand them, creating a new concept that includes data that is on-premises and in the cloud, with workloads running across various environments. The next big thing in terms of databases and infrastructure is the distributed cloud, which can be defined as an architecture where multiple clouds are used concurrently and managed centrally from a public cloud. It brings cloud-based services to organizations and blurs the lines between the cloud and on-premises systems.

The next section will introduce you to the challenges that are currently considered to be significant pain points in the industry. You may be familiar with some or all of them – if you are not, that is OK, and you will find that they are all explained in the next section.

These pain points will then be followed by equally important needs that currently haven't been met or are currently creating new opportunities in the industry.

Industry pain points

Because of the ever-expanding number of database types, engineers have to dedicate more of their time to learning SDKs and SQL dialects, and less time to developing. For an enterprise, technology selection is hard because of more complex tech stacks and the need to match their application frameworks, which can cause an oversized architecture.

The next few sections will introduce you to the most notable industry pain points, followed by new industry needs that are creating new opportunities for DBMSs.

Low-efficiency database management

Database administrators (DBAs) need to dedicate much of their time to surveying and using new databases to identify the differences in cooperation and monitoring methods, as well as to understand how to optimize performance.

The peripheral services and experience of a certain database are not universal or replicable. In production, the usage and maintenance cost of databases rises. The more database types a company deploys, the more investment will be required. If an enterprise adopts new databases suitable for new scenarios without a second thought, the investment is doomed to exponentially grow sooner or later.

New demands and increasingly frequent iteration

Different code is required to meet what could seem to be similar demands, with the only difference being the database type and the type of code that it supports. At the time of writing, while iteration frequency is already expected to rise sharply, developer response capability is reduced and inversely proportional to the number of database types. The exponential growth of common demands and database types slows down iteration significantly. The larger the number of databases, the slower the iteration pace and the lower the iteration performance level.

If, for example, the desired outcome is to encrypt all sensitive data at once, but doing so on a one-to-many database failed, the only possible solution is to modify the code on the business application side. Large firms frequently operate with dozens or even hundreds of systems, which poses great challenges for developers in encrypting all systems' data. Data encryption is only one of the many possible example challenges of this kind that developers may face, with other common demands such as permissions control, audit, and others all being frequently encountered in heterogeneous databases.

Lack of database inter-compatibility

We know for a fact that heterogeneous databases currently co-exist and will continue to do so for a long time, but without a common standard, we cannot collaboratively use databases. By common standard, we mean a universally accepted (or at least by a majority) technology reference such as the USB 2.0 or USB-C standard is for external hardware peripherals. If you are looking for a software example, look no further than SDKs that have been released to make apps for iOS or Android.

For databases, as you will learn throughout this book, we at the Apache ShardingSphere community are proponents of what we call Database Plus – which in simple terms means software that allows you to manage and improve any type of database, even to integrate different database types into the same system.

In terms of data computing, demands for a collaborative query engine and transaction management plans across heterogeneous databases are increasing. Nevertheless, at the moment, developers can only contribute to the development on the application side, making it difficult for their contribution to be developed into an infrastructure.

The new industry needs are creating new opportunities for DBMSs

The changing landscape within which enterprises operate is bound to affect their business decisions and operating procedures. This can be traced back to the expanding amounts of data and the internet argument mentioned in the Industry pain points section.

This section will give you an insight into what enterprises are looking to get from their database management systems across different industrial sectors. After that, we will look at the evolving role of a DBA, which some of you will be expected to step into.

Querying and storing enormous chunks of data

A large volume of data can crash standalone databases. We need more storage and servers to house the current enormous amount of data that will only increase in the future. A single database is unable to accommodate this data fortune.

Achieving prompt query data response time

Even though a DBMS has to accommodate enormous amounts of data, the experience and response time that's expected by customers and users do not allow DBMS downtime to organize the data little by little. How to retrieve the requested data from the data lake will be one of the top issues.

Querying and storing fragmented data types

Furthermore, the relational data structure has become one part of various data types. Documents, JSON, graphs, and key-value pairs are all attracting people's attention. This is reasonable since all of them come from varying business scenarios that involve keeping the world moving smoothly and efficiently.

All these new changes and requirements will bring necessary challenges and needs to databases and their operation and maintenance.

You may have been aware of or even already encountered some of these expectations in your professional experience. If you are just stepping into the professional world, you are bound to encounter these expectations, no matter your future industry. This is because the role of the database administrator has changed. More precisely, it has evolved, and the next section will tell you how.

The evolving role of the DBAs

These changes in industry needs have reshaped the role of the DBA as we know it. While the role of DBAs is crucial within any organization, whether it is a technology business or not, its importance has been growing at a speed that is directly correlated to the digital technology adoption rate. They are constantly looking for ways to optimize their database management systems and are the primary strategy designers to counter data spikes and ensure data safety and data availability.

They've been long considered to be key guardians of the vital strategic asset of data. This responsibility is not narrow in scope as it includes many other duties. DBAs must ensure their organizations can meet their data needs, that databases perform at optimal levels and function properly, and that, in case of any issues, they are called upon to recover the data.

Over the past decade, their responsibilities have also been reshaped thanks to new data-producing devices (smartphones and IoT devices, for example) that continue to drive data growth, thus ultimately increasing the number of database instances under management, as well as a wider array of database management systems. More recent developments have even seen DBAs increasingly involved in application development, making them emerging key influencers in the overall data management infrastructure.

In the next few sections, we will look into the most common and pressing challenges that DBMSs are facing today, and for which a DBA should be prepared.

Overwhelming traffic load increase

Ever since the introduction of the iPhone, mobile phones have gained an increasingly important role in our lives, allowing us to do more than place and receive phone calls while on the go. We now shop, order food, book our vacations, do our banking, hunt for jobs, consume entertainment, and connect with our family and friends thanks to the little devices in our pockets. While this interconnectivity gave rise to multiple new industries and business models (think sharing economy and calling an Uber), they all have one thing in common: data. The amount of data we consume and produce has ballooned to levels that were inconceivable just 15 years ago.

With the advent of the internet, it has become the norm for successful websites or business services that support apps to be receiving visits that reach well into the billions every week.

Sales days such as Cyber Monday in North America or 11/11 (also known as Singles Day) in China (the largest shopping festival in the world) are excellent examples of traditional retail enterprises that adapted to the digital world. Now, they must contend with new needs to successfully achieve their business goals. In cases such as these, retailers are looking to drive traffic to their pages or online stores. But what happens if they succeed and their database clusters are put under incredible pressure? The question becomes a technical one, with DBAs and R&D teams wondering if their database cluster will be able to handle the visitors' traffic.

Microservice architecture for frontend services

To deal with a large number of visitors, the monolithic architecture has since been phased out and officially became history. Instead, microservices architecture has become the new favorite.

A microservices architecture integrates an application as an ensemble of weakly related services. In other words, this results in an application being built as a set of independent components running the process as a service, performing a part of the whole system. Lightweight APIs are how these components communicate, with each service allowing for deployment, updates, and scaling according to specific business requirements as they are run independently.

Cloud-native disrupts delivery and stale deployment practices

The advent of the cloud has brought deep and significant changes, including overturning the way to host, deliver, and start up software.

One of the major changes that can be attributed to the advent of the cloud is the conceptual advance it brought by breaking the barrier between hardware and software. All our media, emails, and the digits of our bank accounts are spread across thousands of servers controlled by hundreds of companies. This is even more impressive if we consider that, not even 20 years ago, the internet was in its inception stages, and only used by early adopters or academics that knew how to search a directory or operate an FTP file.

In a sense, the cloud is the natural result of the stars aligning and all the right conditions being met. If we look back, we can see how the success of the cloud was thanks to the wider adoption of broadband internet, the higher penetration rate of smartphones, allowing constant internet connectivity, and all the other innovations that made data centers easier to build and maintain. This is one of the rare instances where enterprise and consumer innovation seem to be advancing at a comparable pace. From a consumer angle, we can already see how physical storage will soon be unnecessary thanks to the internet, while for business needs, we now find offerings that allow us to run computing tasks on third-party servers – even for free.

In the perennial pursuit of flexibility, many enterprises are now moving their technologies to the cloud because of the scalability and affordability it brings. Being flexible can arguably be interpreted as being adaptable, which is exactly what executives would be after to be able to respond to industry or broader market changes. Plus, it opens the door for startups to sell their product and services directly on the cloud. It also allows them to build, manage, and deploy their applications anywhere with freedom and flexibility.

Considering the significant potential opportunities offered by the cloud, some organizations have already started to adopt a cloud-first strategy, which simply means including or moving to a cloud-based solution at the expense of a strategy built around in-house data centers. This new IT trend is going to move the databases to the cloud as a Database-as-a-Service (DBaaS).

Considering the numerous and significant changes and requirements that businesses and services face in their quest for digital transformation, to keep up the pace with their relative industries, we can easily understand the drive behind companies' motivation to change the way they store, query, and manage data from their databases. The following diagram shows how databases are used to store, query, and manage data:

Figure 1.1 – Database challenges flow

As you can see, the databases on the right are marked with a question mark. This represents two things: what are the possibilities, and what are the directions that you can undertake in your role as a database professional to be prepared for them?

In the next section, you will be introduced to the opportunities and future directions that you should be aware of when it comes to databases. Not only can they give you an advantage in your profession, but they can also help you chart your career if you keep them in mind when it's time to make decisions about your professional development.

The opportunities and future directions for DBMSs

Let's review the opportunities, as well as future directions, that DBMSs are headed in. In the next few subsections, you will encounter topics ranging from database security to industry novelties such as DBaaS.

Database safety

Database safety has been one of the key focus areas for DBMSs. On the one hand, database vendors strive to deliver and iterate on existing solutions to solve database issues.

Cloud vendors are committed to protecting the data and applications that exist in the cloud infrastructure. The internet, software, load balancers, and all the components of the data transmission flow are seeing their safety measures being upgraded one by one.

Considering this ongoing improvement process, the natural question that arises is this: how can we achieve the seamless integration that's needed between the projects that are developed in different languages and various databases?

To answer this question and the necessary challenges that come with dealing with such important questions, we are seeing an increasingly significant number of resources being dedicated to both leading enterprises and promising new start-up ventures.

More than two-thirds of CIOs are concerned about the constraints that could emerge because of cloud providers. It is for these reasons that open source databases are becoming the go-to solution.

Data security has not only become paramount for enterprises but can be the determinant between survival or being forgotten forever as another firm that went out of business. If you think about ransomware and how it is increasingly widespread, you may be able to understand how open source technology empowers organizations to defend themselves against such risks. Open source allows organizations to be in total control of their security needs by giving them complete access to source code, as well as the flexibility that comes with being able to configure and extend the software as they see fit.

There is certainly a counter-argument to the criticism about the security of open source that was prevalent years ago. Rapid adoption by enterprises seems to be settling the argument in favor of open source. No company will remain untouched by the power of open source database progress.

SQL, NoSQL, and NewSQL

When SQL is brought up in a conversation, people immediately think about the good old relational database, which has been supporting higher-level services for the past couple of decades.

Unfortunately, the relational database has since started to show its age and is now considered by many as not adequate to meet the new requirements that businesses must nowadays respond to. This has caused industry giants in the database field to take aggressive actions to reshape their product offerings or deliver new solutions.

NoSQL is one such example. It was the initiator of the non-relational database, which provides a mechanism for storing and retrieving data modeled in a non-relational fashion, such as key-value pairs, graphs, documents, or wide columns. Nevertheless, many NoSQL products compromise consistency in favor of availability and partition tolerance. Without transaction and SQL's standard advantages, NoSQL databases gain the high availability and elastic scale-out that's necessary to respond to the vital concerns of the new era. The success of Couchbase, HBase, MongoDB, and others all stand as clear evidence in support of this thesis. NoSQL databases also sometimes emphasize that they are Not Only SQL and that they do recognize the value of the traditional SQL database. This type of appreciation has led to NoSQL databases gradually adopting some of the benefits of mainstream SQL products.

NewSQL can be defined as a type of relational database management system (RDBMS) looking to make NoSQL systems scalable for online transaction processing (OLTP) tasks, all while keeping the ACID qualities of a traditional database system.

The discussion is still ongoing both in academia and in the industry, with the definition being regarded as fluid and evolving. An excellent resource is the paper What's Really New with NewSQL? (https://dl.acm.org/doi/10.1145/3003665.3003674https://dl.acm.org/doi/10.1145/3003665.3003674), which set out to categorize the databases according to their architecture and functions.

All the databases shouting out they are one of the NewSQL products are seeking a nice balance between capability, availability, and partition tolerance (CAP theorem). But which products belong to NewSQL?

New architecture

Among the opportunities for DBMSs that are currently available and stated to bring significant changes to the industry in the short to medium term, new database architectures certainly merit consideration. This is where databases are effectively designed from an entirely new code base, thus leaving behind any of the architectural baggage of legacy systems – a clean slate of sorts that allows for near endless possibilities, as new databases are being conceptualized and built to meet the needs of the new era.

Embracing a transparent sharding middleware

A transparent sharding middleware splits a database into multiple shards that are stored across a cluster of a single-node DBMS instance, just as Apache ShardingSphere does. Sharding middleware exists to allow a user – or in this case, an organization – to split a database into multiple shards to be stored across multiple single-node DBMS instances, such as Apache ShardingSphere. This section will help you understand what data sharding is. Database administrators are constantly looking for ways to optimize their database management systems. When data input spikes, you must have strategies in place to handle it. One of the best techniques for this is to split the data into separate rows and columns, and such examples include data sharding or partitioning. The following sections will introduce you to, or refresh, these concepts and the difference between them.

Data sharding

When a large database table is split into multiple small tables, shards are created. The newly created tables are called shards or partitions. These shards are stored across multiple nodes to work efficiently, improving scalability and performance. This type of scalability is known as horizontal scalability. Sharding eventually helps database administrators such as yourself utilize computing resources in the most efficient way possible and is collectively known as database optimization.

Optimizing computing resources is one key benefit. More critical is that the network can scan fewer rows and respond to queries on the user side much faster than going through one colossal database.

Data partitioning

When we talk about partitioning, it may sound confusing. The reason for your potential confusion is completely normal as data partitioning is often mistakenly thought about when it comes to data sharding.

Partitioning refers to a database that has been broken down into different subsets but is still stored within a single database. This single database is sometimes referred to as the database instance. So what is the difference between sharding and partitioning? Both sharding and partitioning include breaking large data sets into smaller ones. But a key difference is that sharding implies that the breakdown of data is spread across multiple computers, either as horizontal partitioning or vertical.

Database-as-a-Service

The DBaaS providers not only provide the remodeled cloud databases but are responsible for maintaining their physical configuration as well. Users do not need to care about where the database is located; the cloud allows the cloud database providers to take care of the physical databases' maintenance and related operations.

NoSQL and NewSQL are unavoidable opportunities, towards which most if not all database vendors are moving and represent the future of DBMSs. Many startups are moving into this space to fill this market gap and deliver services that directly complete with the ones provided by established industry giants.

AI database management platform

The technological developments of the last 10 years are allowing advances in nascent fields such as machine learning (ML) and artificial intelligence (AI). Such technologies will eventually impact all aspects of our lives, and enterprises and their databases are no different.

AI database operation and maintenance are poised to become the main growth drivers for the future of DBMSs. The relationship between AI and databases may not seem to be evident at first; while AI has become a sort of buzzword these days, database management has remained automatic, platform-based, and observed while requiring intensive human interaction.

When AI technology is eventually integrated into databases' operation and maintenance work, new avenues will be opened. The historical experience of the previous operations that were performed during database management tasks will be machine-learned, and databases empowered by AI will be able to provide suggestions and specific actions to manage, operate, maintain, and protect database clusters.

Furthermore, AI database management platforms will also be able to contact the monitoring and warning system, or even undertake some pressing operations to avoid significant production accidents. Productivity improvement and headcount optimization reduction are always central concerns of enterprises.

Database migration

When it comes to database migration, there is some good news and some bad news. In the spirit of optimism about the future, let's consider the good news first: we have new database candidates, such as all the NewSQL and NoSQL offerings that have hit the market recently.

When it comes to the bad news, it'll be necessary to be able to deliver data migration at the lowest price.

In this old-to-new process, data migration and database selection occupy an important part of peoples' minds. Many enterprises choose to stick with stale database architecture to avoid any negative effect on production and the instability that could be caused by new databases.

Additionally, legacy and complicated IT systems contribute significantly to discouraging risk-taking, and confidence in performing data migration. In such cases, many database vendors or database service companies will offer to develop new products for this bulky work and insert themselves into this market to get a piece of the billion dollars' worth pie that is the database industry.

To recap, some of the main opportunities for DBMSs in the future include database security, leveraging new database architectures, considering embracing data sharding or DBaaS, and fully mastering database migration.

Before moving on to the next section, there is one last thing you have probably already thought of at some point in your career. There are still concerns during this old-to-new-database transition period, such as the following:

On-premises versus the cloudThe lowest cost to migrate data to new databasesIncreased program refactoring work costs caused by using multiple databases

The following diagram illustrates

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben:

A Definitive Guide to Apache ShardingSphere E-Book

Trista Pan

A Definitive Guide to Apache ShardingSphere

A Definitive Guide to Apache ShardingSphere

Contributors

About the authors

About the reviewers

Table of Contents

Preface

Section 1: Introducing Apache ShardingSphere

Chapter 1: The Evolution of DBMSs, DBAs, and the Role of Apache ShardingSphere

The evolution of DBMSs

Industry pain points

The new industry needs are creating new opportunities for DBMSs

The evolving role of the DBAs

Overwhelming traffic load increase

Microservice architecture for frontend services

Cloud-native disrupts delivery and stale deployment practices

The opportunities and future directions for DBMSs

Database safety

SQL, NoSQL, and NewSQL

New architecture

Embracing a transparent sharding middleware

Database-as-a-Service

AI database management platform

Database migration

Understanding Apache ShardingSphere

Connect

Enhance

Pluggable

Summary

Chapter 2: Architectural Overview of Apache ShardingSphere

What is a distributed database architecture?

The SQL-based load-balancing layer

Sidecar improves performance and availability

Database Mesh innovates the cloud-native database development path

Apache ShardingSphere and Database Mesh

Solving database pain points with Database Plus

An architecture inspired by the Database Plus concept

Feature architecture

Introduction to the feature layer

Deployment architecture

Plugin platform

Microkernel ecosystem

Simple Push Down Engine

SQL Federation Engine

Summary

Section 2: Apache ShardingSphere Architecture, Installation, and Configuration

Chapter 3: Key Features and Use Cases – Your Distributed Database Essentials

Distributed database solutions

Understanding data sharding

Understanding vertical sharding

Understanding horizontal sharding

Data sharding key points

Why you need sharding

Understanding SQL optimization

SQL optimization definition

Overview and characteristics of distributed transactions

Distributed transactions

ShardingSphere's support for transactions

Transaction modes comparison

An introduction to elastic scaling

Mastering elastic scaling

The workflow to implement elastic scaling

Elastic scaling key points

How to leverage this technology to solve real-world issues

Read/write splitting

Read/write splitting definition

Key points regarding the read/write splitting function

How it works

Application scenarios

Summary

Chapter 4: Key Features and Use Cases – Focusing on Performance and Security

Understanding High Availability

Database HA

ShardingSphere HA

Application scenarios

Introducing data encryption and decryption

What are data encryption and decryption?

Key components