23,99 €
Data can be found everywhere, from cloud environments and relational and non-relational databases to data lakes, data warehouses, and data lakehouses. Data management practices can be standardized across the cloud, on-premises, and edge devices with Data Fabric, a powerful architecture that creates a unified view of data. This book will enable you to design a Data Fabric solution by addressing all the key aspects that need to be considered.
The book begins by introducing you to Data Fabric architecture, why you need them, and how they relate to other strategic data management frameworks. You’ll then quickly progress to grasping the principles of DataOps, an operational model for Data Fabric architecture. The next set of chapters will show you how to combine Data Fabric with DataOps and Data Mesh and how they work together by making the most out of it. After that, you’ll discover how to design Data Integration, Data Governance, and Self-Service analytics architecture. The book ends with technical architecture to implement distributed data management and regulatory compliance, followed by industry best practices and principles.
By the end of this data book, you will have a clear understanding of what Data Fabric is and what the architecture looks like, along with the level of effort that goes into designing a Data Fabric solution.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 295
Veröffentlichungsjahr: 2023
Become a data-driven organization by implementing Data Fabric solutions efficiently
Sonia Mezzetta
BIRMINGHAM—MUMBAI
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Product Manager: Heramb Bhavsar
Content Development Editor: Manikandan Kurup
Technical Editor: Kavyashree K S
Copy Editor: Safis Editing
Project Coordinator: Farheen Fathima
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Production Designer: Prashant Ghare
Marketing Coordinators: Nivedita Singh
First published: April 2023
Production reference: 1310323
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80461-522-5
www.packtpub.com
To my daughter, Melania; you are and will always be my forever inspiration in everything I do. You are my hero. I know it wasn’t easy at times not having my undivided attention, so thank you for your patience while I wrote this book.
To Mike, thank you for your love and significant support throughout this journey. I couldn’t have done this without your help.
To my parents and sisters, thank you for always being there for me and for rooting me on. Family is everything.
To my loving pets at home, Cody and Stella, and to those pets no longer with us, Bella and Lobo. I miss you both dearly.
– Sonia Mezzetta
Sonia Mezzetta is a senior certified IBM architect working as a Data Fabric program director. She has an eye for detail and enjoys problem-solving data pain points. She started her data management career at IBM as a data architect. She is an expert in Data Fabric, DataOps, Data Governance, and Data Analytics. With over 20 years of experience, she has designed and architected several enterprise data solutions. She has authored numerous data management white papers and has a master’s and bachelor’s degree in computer science. Sonia is originally from New York City and currently resides in the area of Westchester County, New York.
Lena Woolf is an experienced senior technical staff member of IBM Watson Knowledge Catalog, Data, and AI. Over the course of her IBM career, Lena has made sustained technical contributions to IBM software products and has been recognized as a deep technical expert and thought leader in the fields of information management and governance. Lena regularly speaks at professional conferences. As an inventor, she has contributed to many patents and constantly pushes the boundaries of what’s possible with IBM technology. Lena is passionate about providing a gender-inclusive workspace where everyone can collaborate to drive innovation and business growth.
Jo Ramos is a distinguished engineer and the Director and Chief Solutions Architect for Data Fabric at IBM Expert Labs. Jo leads the technology and platform architecture team to support clients on their data modernization and transformation journey to accelerate the adoption of data and AI enterprise capabilities. Jo has extensive experience working as a technologist and thought leader across multiple industries, designing innovative data and analytics solutions for enterprises. His specialties includes Data Fabric, Data Mesh, DataOps, Data Governance, Data Integration, Big Data, Data Science and Analytics.
Rosalind Radcliffe is an IBM Fellow and CIO DevSecOps CTO. Rosalind is responsible for driving DevSecOps and application modernization transformation for the IBM CIO office with the goal of making the office the showcase for hybrid cloud. In this role, she works with the CIO office and partners on research and development to drive the adoption of common practices and tools. Ultimately, this effort will transform, standardize, and automate the processes, tools, and methodologies used to make IBM the most secure, agile, efficient, and automated hybrid cloud engineering organization. In her prior role, she was responsible for bringing open modern toolchains to the z/OS platform and working with clients on their DevOps transformation. She is a frequent speaker at conferences, a master inventor, a member of the IBM Academy of Technology, and the author of Enterprise Bug Busting.
Data constitutes facts, statistics, and information based on real-world entities and events. The word fabric represents a body of material with texture and structure, such as silk cloth. These two keywords, Data Fabric, create a representation of disparate data that has been connected by a data architecture driven by governance, active metadata, automated data integration, and self-service. In today’s big data era, there are many complexities faced by enterprises looking to become data driven. Many of these issues, such as data silos, agility, lack of collaboration between business and IT, high maintenance costs, data breach, and data integrity, revolve around the large volume and velocity of proliferated data. Data Fabric is a mature, composable data architecture that faces these complexities head-on to enable the management of data at a high scale with established business value.
I wrote this book to introduce a slightly different perspective on the definition of Data Fabric architecture. The view I offer is flexible and use case agnostic and supports diverse data management styles, operational models, and technologies. I describe Data Fabric architecture as taking a people, process, and technology approach that can be applied in a complementary manner with other trending data management frameworks, such as Data Mesh and DataOps. The main theme of this book is to provide a guide to the design of Data Fabric architecture, explain the foundational role of Data Governance, and provide an understanding of how Data Fabric architecture achieves automated Data Integration and Self-Service. The technique I use is by describing “a day in the life of data” as it steps through the phases of its life cycle: create, ingest, integrate, consume, archive, and destroy. I talk about how each layer in Data Fabric architecture executes in a high-performing and thorough manner to address today’s big data complexities. I provide a set of guidelines, architecture principles, best practices, and key concepts to enable the design and implementation of a successful Data Fabric architecture.
The perspective I offer is based on decades of experience in the areas of Enterprise Architecture, Data Architecture, Data Governance, and Product Management. I remember when I started my career in Data Governance, I faced many challenges convincing others of the business value that successful data management with Data Governance achieves. I saw what many others failed to see at that time, and that was when I knew data was my passion! Since then, I’ve broadened and increased my knowledge and experience. I have learned from brilliant thought leaders at IBM and a diverse set of clients. All these experiences have shaped the frame of reference in this book.
As technologists, we are very passionate about our points of view, ideas, and perspectives. This is my point of view on what a Data Fabric architecture design represents, which aims to achieve significant business value while addressing the complexities enterprises face today.
Note
The views expressed in the book belong to the author and do not necessarily represent the opinions or views of their employer, IBM.
This book is for an organization looking to venture on a digital transformation journey, or an existing data-driven organization looking to mature further in their data journey. It is intended for a diverse set of roles, both business and technical, with a vested interest in strategic, automated, and modern data management, including the following:
Executive leaders such as chief data officers, chief technology officers, chief information officers, and data leaders prioritizing strategic investments to execute an enterprise data strategyEnterprise architects, data architects, Data Governance roles such as data security, data privacy roles, and technical leaders tasked with designing and implementing a mature and governed Self-Service data platformBusiness analysts and data scientists looking to understand their role as data producers or data consumers in a Self-Service ecosystem leveraging Data Fabric architectureDevelopers such as data engineers, software engineers, and business intelligence developers looking to comprehend Data Fabric architecture to learn how it achieves the rapid development of governed, trusted dataChapter 1, Introducing Data Fabric, presents an introduction to the definition of Data Fabric architecture. It offers a position on what Data Fabric is and what it isn’t. Key characteristics and architectural principles are explained. Essential concepts and terminology are defined. The business value statement of Data Fabric architecture is discussed and the core building blocks that make up its design are established.
Chapter 2, Show Me the Business Value, is a chapter focused on providing a business point of view on the benefits of Data Fabric architecture. It establishes the business value the architecture offers by explaining how the building blocks that make up a Data Fabric design address pain points faced by enterprises today. Data Fabric architecture takes a strategic and multi-faceted approach to achieve data monetization. Real-life examples have been positioned on the impact of not having the right level of focus provided by each of Data Fabric’s building blocks. Finally, a perspective is offered on how Data Fabric architecture can be leveraged by large, medium-sized, and small organizations.
Chapter 3, Choosing between Data Fabric and Data Mesh, provides an overview of the key principles of Data Mesh architecture. Both Data Fabric and Data Mesh are discussed, including where they share similar objectives and where they take different but complementary approaches. Both architectures represent sophisticated designs focused on data trust and enable the high-scale sharing of quality data. This chapter closes with a view on how Data Fabric and Data Mesh can be used together to achieve rapid data access, high-quality data, and automated Data Governance.
Chapter 4, Introducing DataOps, introduces the DataOps framework. It discusses the business value it provides and describes the 18 driving principles that make up DataOps. The role of data observability and its relationship to the Data Quality and Data Governance pillar is explained. This chapter concludes by explaining how to apply DataOps as an operational model for Data Fabric architecture.
Chapter 5, Building a Data Strategy, kicks off the creation and implementation of a data strategy document. It describes a data strategy document as a visionary statement and a plan for profitable revenue and cost savings. You will familiarize yourself with the different sections that should be defined in a data strategy document, and have a reference of three data maturity frameworks to use as input in a data strategy. The chapter ends with tips on how Data Fabric architecture can be positioned as part of a data strategy document.
Chapter 6, Designing a Data Fabric Architecture, sets the foundation for the design of a Data Fabric architecture. It introduces key architecture concepts and architecture principles that compose the logical data architecture of a Data Fabric. The three architecture layers, Data Governance, Data Integration, and Self-Service, in a Data Fabric architecture are introduced. The objectives of each layer are highlighted, with a discussion on the necessary capabilities represented as components.
Chapter 7, Designing Data Governance, dives into the design of the Data Governance layer of a Data Fabric architecture. Key architecture patterns, such as metadata-driven and event-driven architectures, are discussed. The architecture components, such as active metadata, metadata knowledge graphs, and life cycle governance, are explained. The chapter ends with an explanation of how the Data Governance layer executes and governs data at each phase in its life cycle.
Chapter 8, Designing Data Integration and Self-Service, drills into the design of the two remaining architecture layers in a Data Fabric, Data Integration and Self-Service. The Data Integration layer is reviewed, which focuses on the development of data with a DataOps lens. The Self-Service layer is also discussed, including how it aims to democratize data. An understanding is provided of how both architecture layers work with each other, and how they rely on the Data Governance layer. At the end of the chapter, a Data Fabric reference architecture is presented.
Chapter 9, Realizing a Data Fabric Technical Architecture, positions a technical Data Fabric architecture as modular and composable, consisting of several tools and technologies. The required capabilities and the kinds of tools to implement each of the three layers in a Data Fabric architecture are discussed. Two use cases are reviewed – distributed data management via Data Mesh and regulatory compliance – as examples of how to apply a Data Fabric architecture. The chapter ends by presenting a Data Fabric with Data Mesh technical reference architecture.
Chapter 10, Industry Best Practices, presents 16 best practices in data management. Best practices are grouped into four categories: Data Strategy, Data Architecture, Data Integration and Self-Service, and Data Governance. Each best practice is described and has a why should you care statement.
To understand the key concepts and themes in this book, you should have a general understanding of the IT industry, enterprise architectures, data management, and Data Governance.
There are a number of text conventions used throughout this book.
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Select System info from the Administration panel.”
Tips or important notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Once you’ve read Principles of Data Fabric, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily
Follow these simple steps to get the benefits:
Scan the QR code or visit the link belowhttps://packt.link/free-ebook/9781804615225
Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directlyData Fabric architecture, alongside its distinguishing qualities and business value proposition, needs to first be defined to enable its adoption as part of any data management strategy.
The first part of this book introduces Data Fabric architecture by establishing the core building blocks and their business value proposition. I offer a different perspective on what defines Data Fabric architecture than ones on the market today. Data Fabric is a flexible and composable architecture capable of adopting several data management styles and operational models. Foundational Data Governance pillars and their intended focus are explained, and a list of key characteristics of what does and doesn’t define Data Fabric is provided.
By the end of Part 1, you will have an understanding of what Data Fabric architecture is, its differentiating characteristics and architecture principles, and the impact of not having a Data Governance-centric architecture.
This part comprises the following chapters:
Chapter 1, Introducing Data FabricChapter 2, Show Me the Business ValueData Fabric is a distributed data architecture that connects scattered data across tools and systems with the objective of providing governed access to fit-for-purpose data at speed. Data Fabric focuses on Data Governance, Data Integration, and Self-Service data sharing. It leverages a sophisticated active metadata layer that captures knowledge derived from data and its operations, data relationships, and business context. Data Fabric continuously analyzes data management activities to recommend value-driven improvements. Data Fabric works with both centralized and decentralized data systems and supports diverse operational models. This book focuses on Data Fabric and describes its data management approach, differentiating design, and emphasis on automated Data Governance.
In this chapter, we’ll focus on understanding the definition of Data Fabric and why it’s important, as well as introducing its building blocks. By the end of this chapter, you’ll have an understanding of what a Data Fabric design is and why it’s essential.
In this chapter, we’ll cover the following topics:
What is Data Fabric?Why is Data Fabric important?Data Fabric building blocksOperational Data Governance modelsNote
The views expressed in the book belong to the author and do not necessarily represent the opinions or views of their employer, IBM.
Data Fabric is a distributed and composable architecture that is metadata and event driven. It’s use case agnostic and excels in managing and governing distributed data. It integrates dispersed data with automation, strong Data Governance, protection, and security. Data Fabric focuses on the Self-Service delivery of governed data.
Data Fabric does not require the migration of data into a centralized data storage layer, nor to a specific data format or database type. It can support a diverse set of data management styles and use cases across industries, such as a 360-degree view of a customer, regulatory compliance, cloud migration, data democratization, and data analytics.
In the next section, we’ll touch on the characteristics of Data Fabric.
Data Fabric is a composable architecture made up of different tools, technologies, and systems. It has an active metadata and event-driven design that automates Data Integration while achieving interoperability. Data Governance, Data Privacy, Data Protection, and Data Security are paramount to its design and to enable Self-Service data sharing. The following figure summarizes the different characteristics that constitute a Data Fabric design.
Figure 1.1 – Data Fabric characteristics
Data Fabric takes a proactive and intelligent approach to data management. It monitors and evaluates data operations to learn and suggest future improvements leading to productivity and prosperous decision-making. It approaches data management with flexibility, scalability, automation, and governance in mind and supports multiple data management styles. What distinguishes Data Fabric architecture from others is its inherent nature of embedding Data Governance into the data life cycle as part of its design by leveraging metadata as the foundation. Data Fabric focuses on business controls with an emphasis on robust and efficient data interoperability.
In the next section, we will clarify what is not representative of a Data Fabric design.
Let’s understand what Data Fabric is not:
It is not a single technology, such as data virtualization. While data virtualization is a key Data Integration technology in Data Fabric, the architecture supports several more technologies, such as data replication, ETL/ELT, and streaming.It is not a single tool like a data catalog and it doesn’t have to be a single data storage system like a data warehouse. It represents a diverse set of tools, technologies, and storage systems that work together in a connected ecosystem via a distributed data architecture, with active metadata as the glue.It doesn’t just support centralized data management but also federated and decentralized data management. It excels in connecting distributed data.Data Fabric is not the same as Data Mesh. They are different data architectures that tackle the complexities of distributed data management using different but complementary approaches. We will cover this topic in more depth in Chapter 3, Choosing between Data Fabric and Data Mesh.The following diagram summarizes what Data Fabric architecture does not constitute:
Figure 1.2 – What Data Fabric is not
We have discussed in detail what defines Data Fabric and what does not. In the next section, we will discuss why Data Fabric is important.
Data Fabric enables businesses to leverage the power of connected, trusted, protected, and secure data no matter where it’s geographically located or stored (cloud, multi-cloud, hybrid cloud, on-premises, or the edge). Data Fabric handles the diversity of data, use cases, and technologies to create a holistic end-to-end picture of data with actionable insights. It addresses the shortcomings of previous data management solutions while considering lessons learned and building on industry best practices. Data Fabric’s approach is based on a common denominator, metadata. Metadata is the secret sauce of Data Fabric architecture, along with automation enabled by machine learning and artificial intelligence (AI), deep Data Governance, and knowledge management. All these aspects lead to the efficient and effective management of data to achieve business outcomes, therefore cutting down on operational costs and increasing profit margins through strategic decision-making.
Some of the key benefits of Data Fabric are as follows:
It addresses data silos with actionable insights from a connected view of disparate data across environments (cloud, multi-cloud, hybrid cloud, on-premises, or the edge) and geographiesData democratization leads to a shorter time to business value with frictionless Self-Service data accessIt establishes trusted, secure, and reliable data via automated Data Governance and knowledge managementIt enables a business user with intuitive discovery, understanding, and access to data while addressing a technical user’s needs, supporting various data processing techniques in order to manage data. Such approaches are batch or real time, including ETL/ELT, data virtualization, change data capture, and streamingNow that we have a view of why Data Fabric is important and how it takes a modern approach to data management, let’s review some of the drawbacks of earlier data management approaches.
Data is spread everywhere: on-premises, across cloud environments, and on different types of databases, such as SQL, NoSQL, data lakes, data warehouses, and data lakehouses. Many of the challenges associated with this in the past decade, such as data silos, still exist today. The traditional data management approach to analytics is to move data into a centralized data storage system. Moving data into one central system facilitates control and decreases the necessary checkpoints across the large number of different environments and data systems. Thinking about this logically, it makes total sense. If you think about everyday life, we are successful at controlling and containing things if they are in one central place.
As an example, consider the shipment of goods from a warehouse to a store that requires inspection during delivery. Inspecting the shipment of goods in one store will require a smaller number of people and resources as opposed to accomplishing this for 100 stores located across different locations. Seamless management and quality control become a lot harder to achieve across the board. The same applies to data management, and this is what led to the solution of centralized data management.
While centralized data management was the de facto approach for decades and is still used today, it has several shortcomings. Data movement and integration come at an expensive cost, especially when dealing with on-premises data storage solutions. It heavily relies on data duplication to satisfy a diverse set of use cases requiring different contexts. Complex and performance-intensive data pipelines built to enable data movement require intricate maintenance and significant infrastructure investments, especially if automation or governance is nowhere in the picture. In a traditional operating model, IT departments centrally manage technical platforms for business domains. In the past and still today, this model creates bottlenecks in the delivery of and access to data, minimizing the time to value.
Enterprise data warehouses are complex systems that require consensus across business domains on common definitions of data. An enterprise data model is tightly coupled to data assets. Any changes to the physical data model without proper dependency management breaks downstream consumption. There are also challenges in Data Quality, such as data duplication and the lack of business skills to manage data within the technical platform team.
Data lakes came after data warehouses to offer a flexible way of loading data quickly without the restrictions of upfront data modeling. Data lakes can load raw data as is and later worry about its transformation and proper data modeling. Data lakes are typically managed in NoSQL databases or file-based distributed storage such as Hadoop. Data lakes support semi-structured and unstructured data in addition to structured data. Challenges with data lakes come from the very fact that they bypass the need to model data upfront, therefore creating unusable data without any proper business context. Such data lakes have been referred to as data swamps, where the data stored has no business value.
Data lakehouses is a new technology and is a combination of both Data Warehouse and Data Lake design. Data lakehouses support structured, unstructured and semi-structured data and are capable of addressing data science and business intelligence use cases.
While there are several great capabilities in centralized data systems, such as data warehouses, data lakes, and data lakehouses, the reality is, we are at a time where all these systems have a role and create the need for decentralized data management. A single centralized data management system is not equipped to handle all possible use cases in an organization and at the same time excel in proper data management. I’m not saying there is no need for a centralized data system, but rather, it can represent a progression. For example, a small company might start with one centralized system that fits their business needs, and as they grow, they evolve into more decentralized data management.
Another example is a business domain within a large company that might own and manage a data lake, or a data lakehouse that needs to co-exist with several other data systems owned by other business domains. This again represents decentralized data management. Cloud technologies have further provoked the proliferation of data. There is a multitude of cloud providers with their own set of capabilities and cost incentives, leading to organizations having multi-cloud and hybrid cloud environments.
We have evolved from a world of centralized data management as the best practice to a world in which decentralized data management is necessary. There is a seat at the table for all types of centralized systems. What’s important is for these systems to have a data architecture that connects data in an intelligent and cohesive manner. This means a data architecture with the right level of control and rigor while balancing quick access to trusted data, which is where Data Fabric architecture plays a major role.
In the next section, let’s briefly discuss considerations in building Data Fabric architecture.
Building Data Fabric architecture is not an easy undertaking. It’s not a matter of building a simple 1-2-3 application or applying specific technologies. It requires collaboration, business alignment, and strategic thinking about the design of the data architecture; the careful evaluation and selection of different tools, data storage systems, and technologies; and thought into when to buy or build. Metadata is the common thread that ties data together in a Data Fabric design. Metadata must be embedded into every aspect of the life cycle of data from start to finish. Data Fabric actively manages metadata, which enables scalability and automation and creates a design that can handle the growing demands of businesses. It offers a future-proof design that can grow to add subsequent tools and technologies.
Now, with this in mind, let’s introduce a bird’s-eye view of a Data Fabric design by discussing its building blocks.
Data Fabric’s building blocks represent groupings of different components and characteristics. They are high-level blocks that describe a package of capabilities that address specific business needs. The building blocks are Data Governance and its knowledge layer, Data Integration, and Self-Service. Figure 1.3 illustrates the key architecture building blocks in a Data Fabric design.
Figure 1.3 – Data Fabric building blocks
Data Fabric’s building blocks have foundational principles that must be enforced in its design. Let’s introduce what they are in the following subsection.
Data Fabric’s foundational principles ensure that the data architecture is on the right path to deliver high-value and high-quality data management that ensures data is secure, and protected. The following list introduces the principles that need to be incorporated as part of a Data Fabric design. In Chapter 6, Designing a Data Fabric Architecture, we’ll discuss each principle in more depth:
Data are Assets that can evolve into Data Products (TOGAF & Data Mesh): Represents a transition where assets have active product management across its life cycle from creation to end of life, specific value proposition and are enabled for high scale data sharing.Data is shared (TOGAF): Empower high-quality data sharingData is accessible (TOGAF): Ease of access to dataData product owner (TOGAF and Data Mesh): The Data Product owner manages the life cycle of a Data Product, and is accountable for the quality, business value, and success of dataCommon vocabulary and data definitions (TOGAF): Business language and definitions associated with dataData security (TOGAF): Data needs to have the right level of Data Privacy, Data Protection and Data Security.Interoperable (TOGAF): Defined data standards that achieve data interoperabilityAlways be architecting (Google): Continuously evolve a data architecture to keep up with business and technology changesDesign for automation (Google): Automate repeatable tasks to accelerate time to valueThese principles have been referenced directly or inspired the creation of new principles. from different noteworthy sources: TOGAF (https://pubs.opengroup.org/togaf-standard/adm-techniques/chap02.html#tag_02_06_02), Google (https://cloud.google.com/blog/products/application-development/5-principles-for-cloud-native-architecture-what-it-is-and-how-to-master-it), and Data Mesh, created by Zhamak Dehghani. They capture the essence of what is necessary for a modern Data Fabric architecture. I have slightly modified a couple of the principles to better align with today’s data trends and needs.
Let’s briefly discuss the four Vs in big data management, which are important dimensions that need to be considered in the design of Data Fabric.
In data management, the four Vs – Volume, Variety, Velocity, and Veracity (https://www.forbes.com/sites/forbestechcouncil/2022/08/23/understanding-the-4-vs-of-big-data/?sh=2187093b5f0a) – represent dimensions of data that need to be addressed as part of Data Fabric architecture. Different levels of focus are needed across each building block. Let’s briefly introduce each dimension:
Volume: The size of data impacts Data Integration and Self-Service approaches. It requires a special focus on performance and capacity. Social media and IoT data have led to the creation of enormous volumes of data in today’s data era. The size of data is at an infinite point. Classifying data to enable its prioritization is necessary. Not all data requires the same level of Data Governance rigor and focus. For example, operational customer data requires high rigor when compared to an individual’s social media status. Variety: Data has