Erhalten Sie Zugang zu diesem und mehr als 300000 Büchern ab EUR 5,99 monatlich.
The overall objective of this book is to show that data management is an exciting and valuable capability that is worth time and effort. More specifically it aims to achieve the following goals: 1. To give a “gentle” introduction to the field of DM by explaining and illustrating its core concepts, based on a mix of theory, practical frameworks such as TOGAF, ArchiMate, and DMBOK, as well as results from real-world assignments. 2. To offer guidance on how to build an effective DM capability in an organization.This is illustrated by various use cases, linked to the previously mentioned theoretical exploration as well as the stories of practitioners in the field. The primary target groups are: busy professionals who “are actively involved with managing data”. The book is also aimed at (Bachelor’s/ Master’s) students with an interest in data management. The book is industry-agnostic and should be applicable in different industries such as government, finance, telecommunications etc. Typical roles for which this book is intended: data governance office/ council, data owners, data stewards, people involved with data governance (data governance board), enterprise architects, data architects, process managers, business analysts and IT analysts. The book is divided into three main parts: theory, practice, and closing remarks. Furthermore, the chapters are as short and to the point as possible and also make a clear distinction between the main text and the examples. If the reader is already familiar with the topic of a chapter, he/she can easily skip it and move on to the next.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 464
Veröffentlichungsjahr: 2020
Das E-Book (TTS) können Sie hören im Abo „Legimi Premium” in Legimi-Apps auf:
Data Management: a gentle introduction
Van Haren Publishing (VHP) specializes in titles on Best Practices, methods and standards within four domains:
- IT and IT Management
- Architecture (Enterprise and IT)
- Business Management and
- Project Management
Van Haren Publishing is also publishing on behalf of leading organizations and companies: ASLBiSL Foundation, BRMI, CA, Centre Henri Tudor, CM Partners, Gaming Works, IACCM, IAOP, IFDC, Innovation Value Institute, IPMA-NL, ITSqc, NAF, KNVI, PMI-NL, PON, The Open Group, The SOX Institute.
Topics are (per domain):
IT and IT Management
ABC of ICT
ASL®
CMMI®
COBIT®
e-CF
ISO/IEC 20000
ISO/IEC 27001/27002
ISPL
IT4IT®
IT-CMF™
IT Service CMM
ITIL®
MOF
MSF
SABSA
SAF
SIAM™
TRIM
VeriSM™
Enterprise Architecture
ArchiMate®
GEA®
Novius Architectuur
Methode
TOGAF®
Business Management
BABOK ®Guide
BiSL® and BiSL® Next
BRMBOKTM
BTF
CATS CM®
EFQM
eSCM
IACCM
ISA-95
ISO 9000/9001
OPBOK
SixSigma
SOX
SqEME®
Project Management
A4-Projectmanagement
DSDM/Atern
ICB / NCB
ISO 21500
MINCE®
M_o_R®
MSP®
P3O®
PMBOK ®Guide
Praxis®
PRINCE2®
For the latest information on VHP publications, visit our website: www.vanharen.net.
Title:
Data Management: a gentle introduction
Subtitle:
Balancing theory and practice
Author:
Bas van Gils, managing partner @ Strategy Alliance
Illustrations:
Andy Lo Tam Loi
Reviewers:
Mirjam Visscher and Tanja Glisin
Text editor:
Lisa Gaudette and Steve Newton (Galatea)
Publisher:
Van Haren Publishing, ’s-Hertogenbosch
ISBN hard copy:
978 94 018 0550 6
ISBN eBook (pdf):
978 94 018 0552 0
ISBN ePUB:
978 94 018 0555 1
Edition:
First edition, first impression, February 2020
Lay out and DTP
Coco Bookmedia, Amersfoort – NL
Copyright:
© 2020 Van Haren Publishing
Trademark noticesTOGAF® and ArchiMate® are registered trademarks of The Open Group. All rights reserved.DMBOK® is a registered trademark of DAMA International
All rights reserved. No part of this publication may be reproduced in any form by print, photo print, microfilm or any other means without written permission by the publisher.Although this publication has been composed with much care, neither author, nor editor, nor publisher can accept any liability for damage caused by possible errors and/or incompleteness in this publication.
Although this publication has been composed with most care, neither author nor publisher can accept any liability for damage caused by possible errors and/or incompleteness in this publication.
No part of this publication may be reproduced in any form by print, photo print, microfilm or other means without written permission by the publisher.
I wonder if Bas van Gils had in mind the quote by Albert Einstein, that “Everything should be made as simple as possible, but not simpler”, because in this book you are about to read, he has created a “gentle introduction” which truly serves the purpose of explaining data and data management. Personally, when I first got into the world of data 20+ years ago, and coming from a background in marketing and business development, I had to learn about data management through the gradual osmosis of interacting with data professionals. While this is useful in understanding the “what” of real-world practice, it doesn’t fill in the theoretical foundations of “how” and “why” which are necessary to understand why that real-world practice works the way it does. I know I would have come up to speed a whole lot faster, if I’d had access to this book.
One of the big themes in corporate data today is dataliteracy, and as organizations strive to become more data-driven, then it’s a theme that will only grow in relevance. Data is not a trend that’s going to flame out in a few years, so just like financial literacy and human capital management, it is now obvious that data literacy is going to be a critical knowledge requirement for all managers and executives in the future. As such, we should be thinking about data education in the same way we think about financial and HR education, building the foundations in schools and universities, then continuing to apply those foundations to practical experience through employee onboarding programs, and broader corporate training.
This book serves these objectives well. All the important enterprise-level data management topics are included. It serves as a valuable curriculum for someone just starting out in a professional data career, or indeed for someone who like me, who picked up bits and pieces without much structure to my learning. Bas’s explanations are clear, and build upon each other systematically. I personally appreciate the research that has gone into identifying the clearest definitions available, even when that means quoting other sources. Bas has effectively curated the “best of” from existing industry literature, and tied everything together into a consistent whole, through his own lucid insight, analysis and explanations.
I wish you, the reader, well whether this is the start of your data management journey, or like me, you are finding structure for your fragmented knowledge. You have found an excellent resource to help you fulfill your objectives.
Tony Shaw, CEO & Founder of Dataversity
October 2019
“Language (die Sprache) is always a mediator”, the famous Von Humboldt wrote 200 years ago. “It is between the finite and the infinite”, he continues, “and at the same time between one individual and the other”. In traditional philosophical categories: as a subject-object relator and a subject-subject relator. That Von Humboldt spoke using the terms finite and infinite says something about his view of the human subject (its finiteness, in several respects). It is important to note that when Von Humboldt calls language a mediator, he explicitly wants to say that the two things that get mediated do not exist independently of each other, but that in a way they come into existence through the mediation. The mediator is more than a formal relationship. That is why for him language is not a coding system where an (arbitrary) sign is determined for something that already exists for us. Such a coding system does not make language, it presupposes language.
To some extent, the characterization of Von Humboldt for language can also be applied to data, the subject of this book. Yes, the formal data structures in a computer have been designed, so as such they are not language in the Von Humboldt sense. Still, they draw on language, and so take over some of its characteristics. Data also mediates between subjects. This is one reason why data needs to be protected, as identified in chapters 17 and 21 of this book, and why “shared understanding” is a fundamental goal. It is also mediating with an infinite world around us. To use a phrase of Bas, “data codifies what we know about the world”. At another place, data is defined as the combination of fact and meaning. If this is true (and who am I am to question Bas?), it means that managing data has two rather different faces. Because managing facts, as stored in files on a disk, is quite different from managing such an intangible thing as “meaning”. I don’t want to push this point too much, but I think here is one reason why data management is not simple and not comparable to the management of physical assets such as vehicles or library books, in spite of some similarities.
When data is a mediator, it also runs the risks of the fate of the mediator: always to fall in between. So that neither the IT department nor the business unit cares for it; that there is no budget for it. That it is seen as instrumental only, and so is not a genuine concern in its own right. In the short history of IT so far we have learned that this would a big mistake. Data needs to be recognized as an asset, and needs to be managed. Not as a goal in its own of course – a point that is stressed by Bas several times in this book. It remains a mediator, but still, it needs to be managed properly. Therefore I am glad with this book that takes data management seriously. A book that tries to integrate insights on data management from theory and practice. A book that can not only serve practitioners and companies that struggle with data management but that can also be a good reference text for academic courses in the field of Information Management or Data Science. I wish it all the best!
Dr. Hans Weigand, Associate Professor Information Systems, Tilburg UniversityOctober 2019
When I started my studies at Tilburg University in 1998, one of the first things that I learned was an appreciation for the ‘golden triangle’ of processes, data, and systems. Only through careful alignment of these three can organizations function well. It was interesting to see that so many people – academics and professionals alike – worried mostly about either systems or processes, while data appeared to take the back seat.
After my studies, I started working on my dissertation at Nijmegen University. The focus of my research was Web information retrieval. The main idea behind my research was based on economic principles: if you have demand and supply of data, then all you have to do is “match” the two. How hard can that be? After all, the topic of information retrieval had been studied for decades. Let’s just say that I learned a lot in those days, not just about the information needs of people surfing the Internet, but also about semantics, data modeling, data structures, etc.
Since then, I have worked in many different roles, from IT professional to strategy consultant and pretty much every role in between. Over the years, I noticed that data was becoming an increasingly important topic. People started to recognize that mishandling data was costing the organization in missed opportunities, rework, reputational damage, etc. and that products and services could be greatly enhanced when enriched with data. Around this time, people started talking about data as “the new oil” and recognized it for the valuable asset that it really was. This was further strengthened by the apparent rise of topics such as artificial intelligence, data science, and big data.
I started studying data management in earnest around 2008. A few years later, Tanja Glisin suggested I study the DAMA DMBOK [MBEH09] which really opened my eyes to the depth and breadth of the field. I found that the DMBOK was the reference within our field at the time, especially when complemented with other – more in-depth – publications. The second version of the DMBOK was published in 2017 and showed the significant improvement of our knowledge of the field [Hen17]. I have used both versions of the DMBOK over the years, both as a reference during consultancy assignments and teaching.
The DMBOK is a great reference, but may practitioners find it too theoretical to be of practical use. A more pragmatic book that combines theory with practical recommendations is missing. After much debate and discussions with friends, many of whom I have interviewed for this book, I decided to attempt to fill this gap.
The decision to actually move forward with the writing project was made in March of 2019, while visiting the Enterprise Data World conference in Boston, Massachusetts. I wrote the first version of the book during the summer months of 2019 and am forever grateful for all the support and help I received. There are so many people to thank and I sincerely hope I am not forgetting anyone. First of all, I would like to thank my colleagues at Strategy Alliance for their patience and help in preparing the manuscript. I would also like to thank Maurits van der Plas, Ivo van Haren, and Bart Verbrugge of Van Haren Publishing: I know that I have strong opinions on how/ what I want with the book - and I have probably tried your patience over and over. Then, of course, there are the people who graciously granted me interviews to use in this book – you are all heroes:
■Marco van der Winden is manager of the corporate data management office at PGGM, a Dutch pension provider.
■Marc van den Berg is managing director of IT and Innovation at PGGM, a Dutch pension provider.
■Frank Harmsen is managing director at PNA and professor at Maastricht University.
■Lisa Gaudette is director in the Office of Sponsored Programs and Research of Clark University.
■Jan Robat is head of data quality management at ABN AMRO.
■Fanny Vuillemin is senior data manager at AXA.
■Céline Lescop is lead data architect at AXA.
■Piethein Strengholt is principle data architect at ABN AMRO.
■Eric D. Schabell is global technology evangelist and portfolio architect director at Red Hat.
■Tanja Glisin is an experienced data management professional and frequent collaborator of the author of this book.
■Norbert van de Ven is data governance consultant at Hot ITem.
■Stijn Hoppenbrouwers is professor of Data & Knowledge Engineering at HAN University of Applied Sciences, Arnhem and assistant professor at Radboud University Nijmegen.
■Jeroen Cloo is partner at Novius Adviesgroep.
■Kiean Bitaraf is data management consultant at Deloitte.
■Raymond Slot is managing partner at Strategy Alliance.
■Paul Heisen is senior enterprise architect at De Lage Landen (DLL).
■Robin Vuyk is head of business architecture and design at PGGM, a Dutch pension provider.
■Daan Riepma is a smart data consultant at Axians.
■Ronald Damhof, “just a data-guy”, self-employed, often in the role of enterprise (data) architect in large (mostly public) organizations.
The book wouldn’t have been nearly as good without the help of Lisa Gaudette. Thank you so much for your patience, hard work, and grammar/ punctuation lessons. Whenever I thought we had cleaned up a piece of text, you always found more ways to make it better. I would also like to thank Mirjam Visser for her extensive review of the manuscript as well as the pleasant discussions we had on data management. Last but not least, I would like to thank my family for their support. I know I have been hiding behind my computer to finish the manuscript and wouldn’t have been able to make so much progress without your flexibility and support.
As a last remark, I would like to point out that a lot of time and effort went into checking the material. Any errors that remain are my own. I hope you find the book interesting and useful. Enjoy the read!
Bas van Gils
October 2019
1 INTRODUCTION
1.1 Goals for this book
1.2 Intended audience
1.3 Approach
2 DATA AS AN ASSET
2.1 Data
2.2 Asset
2.3 Data and process
2.4 Visual summary
3 DATA MANAGEMENT: WHY BOTHER?
3.1 A definition of data management
3.2 Value of DM
3.3 Key challenges for DM
3.4 Visual summary
4 POSITIONING DATA MANAGEMENT
4.1 The center of the universe
4.2 DM and business process management
4.3 DM and IT management
4.4 Information/data analysis
4.5 Database management
4.6 DM and enterprise architecture management
4.7 Philosophical considerations
4.8 Visual summary
PART I: THEORY
5 INTRODUCTION
6 TERMINOLOGY
6.1 Introduction
6.2 Data codifies what we know about the world
6.3 Storing data in systems
6.4 Data in processes
6.5 Connecting the business and IT perspective
6.6 Outlook
6.7 Visual summary
7 DATA MANAGEMENT: A DEFINITION
7.1 Introduction
7.2 Managing the lifecycle of data
7.3 Deconstructing DM
7.4 Visual summary
8 TYPES OF DATA
8.1 Classifying data
8.2 Five fundamentally different types of data
8.3 Transaction data
8.4 Master data
8.5 Business intelligence data
8.6 Reference data
8.7 Metadata
8.8 Visual summary
9 DATA GOVERNANCE
9.1 Introduction
9.2 Data governance and data management
9.3 Data governance activities in DMBOK
9.4 A modern approach to data governance
9.5 Position of data governance
9.6 Visual summary
10 METADATA
10.1 Types of metadata
10.1.1 Business metadata
10.1.2 Technical metadata
10.1.3 Operational metadata
10.2 Metadata is the foundation
10.3 Metadata repositories
10.4 Visual summary
11 MODELING
11.1 Scope
11.2 Abstraction levels
11.3 Modeling languages
11.3.1 Fact-based modeling
11.3.2 Entity relationship modeling
11.3.3 Architecture modeling with ArchiMate
11.4 Relationship to other DM capabilities
11.5 Visual summary
12 ARCHITECTURE
12.1 Architecture
12.2 Data architecture
12.3 Relationship to other (data management) capabilities
12.4 Visual summary
13 INTEGRATION
13.1 Introduction to data integration
13.2 Common integration patterns
13.2.1 Batch integration
13.2.2 Accessing data through services
13.2.3 Change data capture
13.2.4 Streaming data integration
13.2.5 Data virtualization
13.3 Integration from an architecture perspective
13.3.1 Dealing with the number of potential connections
13.3.2 Dealing with different names and structures
13.3.3 Dealing with different patterns
13.4 Visual summary
14 REFERENCE DATA
14.1 Definition
14.2 Using reference data to harmonize the meaning of data
14.3 Historic versions of reference data sets
14.4 Reference data and governance
14.5 Visual summary
15 MASTER DATA
15.1 Multiple versions of the truth
15.2 Basic MDM concepts
15.3 Relationship to other data management capabilities
15.4 Visual summary
16 QUALITY
16.1 Introduction
16.2 The notion of quality
16.3 Data quality
16.4 Data quality management
16.5 Critical data elements
16.6 Relationship to other capabilities
16.7 Visual summary
17 RISK AND SECURITY
17.1 Risks and risk mitigating measures
17.2 ISO standards
17.3 Data security management
17.4 Training and certification
17.5 Relationship to other capabilities
17.6 Visual summary
18 BUSINESS INTELLIGENCE & ANALYTICS
18.1 Defining business intelligence and analytics
18.2 Common system types
18.3 Structuring data
18.4 Self-service BI
18.5 Relationship to other capabilities
18.6 Visual summary
19 BIG DATA
19.1 Definition of big data
19.2 Dealing with big data
19.3 Technical capabilities and architecture
19.4 Relationship to other capabilities
19.5 Visual summary
20 TECHNOLOGY
20.1 People are key
20.2 Observations about technology
20.3 Technology and the functional areas of DMBOK
20.3.1 Data governance and stewardship
20.3.2 Metadata
20.3.3 Modeling
20.3.4 Architecture
20.3.5 Integration
20.3.6 Reference and master data
20.3.7 Quality
20.3.8 Security
20.3.9 Business intelligence
20.3.10 Big data
20.4 Technology adoption
20.5 Visual summary
21 DATA (HANDLING) ETHICS & COMPLIANCE
21.1 Ethics in data
21.2 Ethical handling of data
21.2.1 Ethical principles behind data protection
21.2.2 The data lifecycle
21.2.3 Using ethical principles in the data lifecycle
21.3 The relationship between ethics and governance
21.4 Visual summary
PART II: PRACTICE
22 INTRODUCTION
23 BUILDING THE BUSINESS CASE FOR DATA MANAGEMENT
23.1 The need for a business case
23.2 Qualitative and quantitative business case
23.3 Incremental approach to building a business case
24 KICK-STARTING DATA QUALITY MANAGEMENT
24.1 Top-down approach
24.2 A motivation for starting small
24.3 Setting up your first experiments with data quality management
24.4 Scaling up after successful experimentation
25 FINDING DATA OWNERS AND DATA STEWARDS
25.1 Top-down and bottom-up
25.2 Ownership/stewardship models
25.3 Finding owners and stewards
26 THE ROLE OF TRAINING
26.1 People first, and the need for training
26.2 Types of training
26.3 How to design a training program
27 SETTING UP A DATA MANAGEMENT POLICY
27.1 Data management policy
27.2 Typical structure for a data management policy
27.3 Setting up a data management policy
27.3.1 Top-down
27.3.2 Bottom-up
27.4 Recommendations
28 BUSINESS CONCEPTS AND THE CONCEPTUAL DATA MODEL
28.1 Freezing language
28.2 Definitions and conceptual data models
28.3 Definitions in a context
28.4 Recommendations
29 SETTING UP A METADATA REPOSITORY
29.1 The importance of metadata
29.2 Metadata repository architectures
29.3 Implementation strategies
29.3.1 Top-down metadata strategy
29.3.2 Bottom-up metadata strategy
29.3.3 Matching the strategy to the situation
29.4 Recommendations
30 LEVERAGING ENTERPRISE ARCHITECTURE
30.1 EA as a source of information
30.2 EA models and visualizations
30.3 Building effective solutions
30.4 Recommendations
31 INTEGRATION ARCHITECTURE
31.1 Data is everywhere
31.2 Start simple
31.3 Keep it simple
31.4 Recommendations
32 A PRAGMATIC APPROACH TO DATA SECURITY
32.1 Motivation for a security framework
32.2 Security use cases
32.3 Security levels in business terms
32.4 The link to security measures and controls
32.5 Tying it together
33 ROLES IN DATA MANAGEMENT
33.1 Change and run
33.2 Roles in the DMBOK
33.3 Skills in the SFIA framework
33.4 Definition of roles
33.4.1 Architect
33.4.2 Business management
33.4.3 Data owner, data steward
33.4.4 Project management
33.4.5 Chief data officer
33.4.6 Business analyst, process analyst, and system analyst
33.5 Reflection and recommendation
34 WORKING WITH BIG DATA
34.1 Observations about big data adoption
34.2 Building a culture of innovation
34.3 Linking to data management defense
34.4 The future of big data
35 BUILDING A DATA MANAGEMENT ROADMAP
35.1 To roadmap or not to roadmap
35.2 The steps towards an effective roadmap
35.3 Techniques
35.3.1 Vision phase
35.3.2 Analysis phase
35.3.3 Portfolio phase
35.3.4 Execution phase
35.4 Recommendations
PART III: CLOSING REMARKS
36 SYNTHESIS OF THE RECOMMENDATIONS
36.1 Data management
36.2 Antifragility and complexity
36.3 Expected benefits
37 CONCLUSION
37.1 Review
37.2 Outlook
37.3 Call to action
BIBLIOGRAPHY
INDEX
ABOUT THE AUTHOR
Figure 2.1 Fact, data, information and intelligence
Figure 4.1 Positioning data management
Figure 4.2 From architecture to a more “detailed design”
Figure 4.3 The Cynefin framework, based on [SB07]
Figure 7.1 The DMBOK wheel
Figure 8.1 Five types of data
Figure 9.1 Data Governance & Data Management (Taken from [Hen17])
Figure 9.2 Data governance model
Figure 12.1 Nested scopes
Figure 13.1 Data virtualization
Figure 13.2 Introducing a “hub” to reduce the number of connections between systems
Figure 15.1 Four MDM patterns
Figure 18.1 Typical BI architecture, from source systems to end-users
Figure 18.2 Example BI architecture, including self-service
Figure 19.1 Big data adoption (taken from [Agr19] and based on research by Dresner Advisory)
Figure 19.2 Example big data architecture
Figure 20.1 Balancing DM offense and defense with people, process, (meta)data, and technology
Figure 23.1 System dynamics model as input for a business case
Figure 25.1 Stewardship models, inspired by [Pol13]
Figure 25.2 Publishing an overview of data owners and data stewards
Figure 27.1 Position of policies
Figure 28.1 Concepts in context
Figure 29.1 Metadata from different sources
Figure 32.1 (Cluster of) security use case(s)
Figure 32.2 Visualizing impact of security measures
Figure 33.1 Structure of the SFIA framework
Figure 34.1 Start-up, scale-up, benefits
Figure 35.1 TOGAF’s Architecture Development Method (taken from [The11])
Figure 35.2 Benefit realization diagram
Figure 35.3 Business blueprint
Figure 35.4 Capability analysis
Figure 35.5 Portfolio analysis
Figure 36.1 Balancing data management offense and defense, theory and practice
Figure 36.2 Dynamic framework for social change
Figure 36.3 Synthesis of recommendations in part II
It is often said that “data is the new oil”. It is hard to figure out with any certainty who wrote about this metaphor first. A cursory search on Google suggests it was used originally in an article by The Economist [Par17] with many authors following suit by describing why, for all practical reasons, data is not the new oil (e.g. [Mar18]). Whatever the practical implications, the metaphor at least illustrates that data is an important business asset that deserves to be managed as such. This is the field of data management (or DM for short). See also sidebar 1.
Sidebar 1. Interview with Marco van der Winden (Summer 2019)
My experience is that the importance of data is underestimated in the way that there was/ is no primary focus on it. Living in the low countries where there is an abundance of water, data is mostly seen as something that can be easily be obtained, just like water. To continue the comparison, the Dutch are very good with containing the water streams and keeping the seawater outside with dikes. But with data we are less experienced. We let data sometimes uncontrollably flow though our fields without knowing where it goes or even why we are doing it.
We are not in the Middle Ages (when we became increasingly proficient at water management) and it should be clear that data must be governed in a way that we are more in control and that we can profit more from it. By the way, I think that a comparison with oil is not a smart one. Sooner or later there will be a shortage of oil. Above that, there are also some environmental disadvantages with oil. Data is more like water. It’s the source of all living things. You can’t live without it and there will always be water.
Marco van der Winden is manager of the corporate data management office at PGGM, a Dutch pension provider.
A key question that needs answering is: what does that entail? In other words: what is data management (DM) and how do you make it work? These are hard questions. Data is often seen as an abstract “thing” that sits in the realm of the IT department. This isn’t helped by the fact that a lot of technology is so closely related to data that it is easy to confuse one for the other. Worse, data management professionals are prone to using complicated terminology such as metadata, master data, lineage and so on, which makes it hard for outsiders to truly understand what is going on. This is not a good thing: DM is an important capability that organizations must master1.
To illustrate this point, I will borrow a slightly altered example from [Soa11] in example 1.
Example 1. Data management benefits
Assume you are working for a large global company with approximately 10 million customers. On average each customer purchases 1.2 products every year. Your strategy is to attempt to get more revenue from the existing customer base, rather than try to capture a bigger market share. To that end, a global customer 360 initiative is considered. The data management team and marketing have worked together to compile a business case.
First, it is expected that a better overview of each customer will increase the number of purchases from 1.2 to 1.4, which is expected to raise an extra 8 million dollars in revenues over three years. Furthermore, it is estimated that the direct cost of wading through duplicated/ inconsistent data about customers by customer service representatives adds up to about half a million dollars over three years. The direct cost of the IT department around data integration issues is expected to be reduced by another half a million dollars over three years. This adds up to nine million dollars in benefits. Would that justify a significant investment in data management?
One of the best ways to make progress in our field is to put knowledge in the public domain such that everyone can benefit from it. There are many ways to do this: scientific studies provide academic rigor but tend to be low on practical relevance. Handbooks such as the DMBOK2 are the inverse: there is a lot of practical value but they tend to be low on the academic rigor [Hen17]. Balancing rigor and relevance is tricky to say the least. This book leans towards the practical relevance side and provides academic rigor whenever possible. The unique selling point of this book will lie in the fact that it offers (1) an up-to-date overview of the field, (2) with practical guidance in the form of a capability-based framework, and (3) is supported by real-world evidence through mini case studies.
The overall objective is to show that data management (DM) is an exciting and valuable capability that is worth time and effort. More specifically, I hope to achieve the following goals. First, I hope to give a “gentle” introduction to the field of DM by explaining and illustrating its core concepts. In doing so, I will demystify terminology as much as possible. To this end, I will use a mix of theory, practical frameworks such as TOGAF, ArchiMate, and DMBOK, as well as results from real-world assignments [The11, The16a, Hen17].
Second, I will offer guidance on how to build an effective DM capability for your organization. I will do so by considering various use cases, linked to the previously mentioned theoretical exploration as well as the stories of practitioners in the field.
The book aims at a broad audience: busy professionals who “are actively involved with managing data”. This might be a bit too broad because it is hard to imagine a book that would successfully address the needs of strategic decision makers all the way down to analysts and database administrators. The book is also aimed at (Bachelor’s/ Master’s) students with an interest in data management. A more specific characterization of the (professional) audience is:
■ In the strategic/ tactical/ operational continuum, I will go for the middle ground. This means: stay away from executives and top management. It also means: stay away from true day-to-day business operations.
■ In the business/ technology continuum, again, I will aim for the middle ground. It is increasingly true that there is no real difference between business and IT but for the sake of the argument: I am aiming at business people with a sense of IT, IT people with a sense of business and those who straddle both worlds.
■ Industry-wise, the book should be agnostic and should be applicable in different industries such as government, finance, telecommunications etc.
Typical roles that come to mind are: data governance office/ council, data owners, data stewards, people involved with data governance (data governance board), enterprise architects, data architects, process managers, business analysts and IT analysts.
In this book, I will combine elements from theory and from practice. The former comes in the shape of citations to books, articles and web resources. I will attempt to link to original sources whenever possible, but also make an attempt to give the book a look-and-feel that is not too academic. The same goes for the practical part: I will combine my own experience of 15+ years as a consultant and teacher with stories from other professionals. I will provide the names of organizations and people whenever possible. In some places, stories have been anonymized to ensure privacy, or to comply with non-disclosure agreements. The theory part of the book will give a broad overview of the field of data management. The practical part will cover specific topics and use cases in more depth. More detailed coverage of specific topics can be found by following the citations or reaching out to listed practitioners.
The book is mainly aimed at busy professionals - while I also take into account that students and perhaps even scholars will find the book useful. Because of this, I have made two decisions with respect to the book structure. First of all, I have chosen to split the book into three main parts: theory, practice, and closing remarks. Furthermore, I have chosen to keep the chapters as short and to the point as possible and also make a clear distinction between the main text and the examples. Because of this choice, the book will have many short chapters. If you are already familiar with the topic of a chapter, you can easily skip it and move on to the next.
2 The DMBOK is the Data Management Body of Knowledge. It is a reference book by DAMA, the Data Management Association. The DMBOK compiles data management principles and best practices.
Synopsis - In this chapter, I will give an overview of why data is one of the key assets of an organization. To achieve this, I will first define the notions of data and asset. Then I will show what it means for data to be an asset. I will do this by stressing the relationship between processes (the “engine” of the organization), and data (the “fuel”) which are both needed to create value. I will illustrate the value of data through two short examples.
So far, I have been using the word “data” colloquially without really defining it. Experience shows that people use the word differently so I will explore this concept first. On any such venture, the first step is to check a dictionary. The lemma for data from the Merriam-Webster Dictionary has three definitions:
1. Factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation.
2. Information in digital form that can be transmitted or processed.
3. Information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful.
These definitions are very similar to the way of thinking in the Design & Engineering Methodology for Organizations (DEMO) approach where a distinction is made between three levels of abstraction: forma - being all about documenting/ expressing facts and data; informa - being all about thought and reasoning; and finally performa - being all about using facts and data in the real-world, for example to decide on a course of action [RD99, Die06].
Citing earlier work from the mid 1980s by Appleton, Peter Aiken - one of the eminent writers about DM - positions the term data in relation to other concepts such as facts and information [App86, AG13]. Figure 2.1 summarizes this way of thinking. One of the things that can be learned from this diagram is that data is said to consist of facts which have a meaning. Another important aspect is that data can be used, which shows intelligence. Comparing this approach to the previously cited definition, the question arises whether it is possible, or even useful, to clearly and unambiguously distinguish between the concepts of data and information: the Merriam-Webster Dictionary definition for data heavily relies on the notion of information and vice versa.
Figure 2.1 Fact, data, information and intelligence
For purposes of this book, I will not make a hard distinction between the two concepts. I will use the term data as an umbrella term, meaning all three definitions from the Merriam-Webster Dictionary. Even more, I intend to use it both as the “raw ingredient” (data codified in systems) and how it is used in business processes (sometimes called “information” by other authors). I will expand on this discussion further in chapter 6. Example 2 clarifies this way of thinking further.
Example 2. Data management benefits
Suppose you are an avid runner, like me. Your coach has explained that your heart rate provides a good indicator of how your body is doing and that it should be used to guide your bi-weekly training sessions. After purchasing a heart rate monitor, you go out for your first run.
During your run, you can check your new gadget. It will measure how you are doing and individual data points are shown as you go along. Presumably, the gadget will also store this data, so that it can later be transferred to some online application for further processing. Together with your coach, you can use this data to analyze your fitness and training schedule for weeks to come.
As stated in the opening paragraph of chapter 1: it is often said that “data is an asset”. For example, the DMBOK states [Hen17]:
Data and information are not just assets in the sense that organizations invest in them in order to derive future value. Data and information are also vital to the day-to-day operations of most organizations. They have been called the “currency”, the “life blood” and even the “new oil” of the information economy. Whether or not an organization gets value from its analytics, it cannot even transact business without data.
The question that needs to be answered is: what is an asset? Relying once more on the dictionary, an asset can be defined as “an item of value that is owned or possessed”. Let’s explore that further through the cases listed in example 3.
Example 3. Examples of assets
Assume the asset is a car. It has different types of value to me: it gets me from A to B, but it also has monetary value. Now assume that the asset is money. Its value is in the security that I have some buying power to take care of myself. Finally assume that the asset is customer data. Its value is that I know who my customers are, where they live and what they have purchased in the past so that I can help them well in the future.
The examples show that assets can be tangible or intangible. They also show that assets have value. The latter point deserves further exploration. In previous research, I have shown that value is both personal (one person may see it differently than another person) and situational (in one situation it may be worth more than another) [Gil06]. Again, two small examples illustrate the point:
Example 4. Value of assets
The first example pertains to art. Let’s take a famous painting such as White on White1 by Kazimir Malevich. Some will claim it priceless, whereas others will claim it to be something so simple that a five-year-old can create it. Both observers, of course, are correct. This shows the personal nature of the valuation of assets.
The second example pertains to the value of water when compared to money. In most cases, I would value $10 over a small bottle of water. When standing in the middle of the desert, though, I may think differently. This shows the situational nature of valuation of assets.
1https://en.wikipedia.org/wiki/White_on_White, last checked 2 June 2019.
The implications for data as an asset are clear: when we say that we consider data to be an important asset then we mean that we believe that the data in our systems has much value, either intrinsic (we have data that is worth money, for example if we sell it) or indirectly (which means we can use it in our processes to create value). This, finally, brings us to the relationship between data and business processes.
Before we dive into this relationship, there is one point that should be made. There is a big distinction between data assets and tangible assets: there is only one copy of a tangible asset but this doesn’t have to be the case for (intangible) data assets. To put it differently: you can make as many copies of data assets as you like without affecting the original. If this were the case for physical assets then we would all be as rich as Croesus for sure. This property of data is important in chapters to come when we talk about storing, using, transferring, and managing data.
This brings me to the final part of this chapter: the relationship between data and process. It is safe to say that data does not magically spring into existence. On the contrary: creating data takes effort by business professionals, for example by adding data into computer systems or by manipulating existing data to create new data.
The fact that we are not so (consciously) aware of this is not surprising. Years ago – before the computer era – a lot of our data sat in paper files and records. Creating data meant getting in there and updating the files. More data meant more paper. More paper meant more space required to store the data. This, eventually, lead to bigger and bigger libraries1. In the computer-age this is different: most data is now stored digitally and adding more bits and bytes requires very little extra physical space.
Producing data in business processes is useful in itself. Things become more interesting when we consider where else that data can be used/ where else data can be put to good use to create value. Example 5 illustrates this point.
Example 5. Data and processes
Suppose you work at a company that leases expensive medical equipment to hospitals. Each time the company closes a new deal with a hospital, its records are updated (new data is added to their systems). The value of this data is that it proves that the transaction took place and that the company is owed a certain fee each time.
The data is likely to be used in other parts of the company as well. For example, sales and marketing representatives are interested in the data to investigate whether they can cross-sell insurance products with the newly leased equipment, whilst management will be interested in monthly sales reports to see how well the company is doing.
This example illustrates a point that I cannot make enough: there is a strong relationship between business processes and data (see e.g. [BRS19] for a recent discussion of this topic, bridging the gap between research and practice). Data without use in processes has no value. Processes without data cannot happen: if processes are the value creation engine of the organization, then data is its fuel. As a corollary of this discussion, this book will also have much say about processes and not just about data.
Data can only be used if it is of the right quality and can be found. The former point is easily understood: just like poor materials will likely lead to the construction of a poor physical asset, so does poor data lead to poor process performance. The latter point requires a bit more explanation. The general thinking seems to be: our data is stored in our systems and we know which systems we have – so how hard can it be to find out data? Example 6 shows that in practice this may not be as easy as it seems.
Example 6. Finding data
Let’s go back to the library case that was mentioned previously. Libraries are structured in such a way that, by and large, it should be straightforward to find the books and articles that you need. In the old days this was done through extensive cataloguing, classification, and index systems. These days all of this is automated1. It is true that in most organizations all data is stored electronically in systems. In theory it should be easy to find. However, do you have any idea how many systems your organization has for storing data about customers or products? Chances are there are dozens! Finding the right information for use is one of the key challenges for many organizations.
1 If you want to know more about information retrieval, consider reading e.g. [Pai99] - which also has a good historical overview.
The point that this example tries to make is that data is often dispersed across many systems which makes it harder to locate the right data for the right person doing his/ her job at the right time. This, in turn, shows that the value of data depends on more than it being a correct representation of the real-world: being able to use it in processes in a timely manner might be just as important. If your data is “correct” but it can’t be found in time to be used in a process then, in fact, its value is very low, or even zero.
1 An interesting overview of the history of libraries can be found in [Mur09].
Synopsis - This chapter picks up where the previous chapter left off: if data is an important asset, then it should be managed as such. In this chapter, I will briefly introduce the Data Management Body of Knowledge (DMBOK) reference work on data management upon which part I of this book is based. I will use this as a backdrop to discuss some of its key challenges for data management. The challenges are illustrated with small examples.
In the previous chapter, I have discussed the concept of data as an asset to signify the importance of data for an organization. We pick up the discussion with a claim: if data is such an important asset to the organization, then it should be managed as such. This is the realm of data management.
Simply put, DM is the capability that is concerned with managing data as an asset. This definition is still somewhat vague and requires further clarification. In [AB13], Peter Aiken points out that “any holistic examination of the information technology field will reveal that it is largely about technology – not about information”. We begin by stating that data management is largely about putting the “I” back in “IT”. This observation shows that DM is not solely an IT capability.
Sidebar 2. Interview with Marc van den Berg (summer 2019)
Many organizations are currently experiencing challenges with data due to past decisions and are paying the price because of the investment they have to make to fix their data after the fact. At the same time, these organizations want to make a quantum leap forward and reap the benefits from new technologies such as big data and artificial intelligence. This will not work, as first you must have your house in order. In my view this means: make sure you have shared goals about what you want to achieve with data, and subsequently align business and IT to attain those goals.
Marc van den Berg is managing director of IT and Innovation at PGGM, a Dutch pension provider.
It appears that in most organizations there is no longer a real, meaningful difference between “the business side of the organization” and “the IT side of the organization”, at least not in the classic sense of business/ IT alignment literature from the 1980s and 1990s [PB89, HV93]. With the rise of process automation, digital/ digitalization we see that the two perspectives are now intertwined to such a degree that the distinction is fading rapidly (see e.g. [RBM19, Gue12] in which a distinction is made between digitalization of existing processes, or by a more radical departure and creating digital, information-enriched value propositions). In this context, it feels safe to say that DM is an important capability for the organization, regardless of whether it leans towards business, IT, or both.
The DMBOK definition of DM is as follows [Hen17]:
Data management is the development, execution, and supervision of plans, policies, programs, and practices that deliver, control, protect, and enhance the value of data and information assets throughout their lifecycle.
The interesting aspect that can be learned from this definition is that data management encompasses many activities that together enable the organization to use data effectively. For now, this exploration of the definition of DM will have to suffice. A more detailed discussion will follow in chapter 7.
The DMBOK also states that these activities are likely to be cross-functional and that “the primary driver for data management is to enable organizations to get value from their data assets, just as effective management of financial and physical assets enables organizations to get value from those assets”. The value of DM is discussed further in the next section.
The key point of DM is to manage data as an asset which helps the organization to derive value from its data assets. As such, it has no direct business value. Its value is more indirect; it enables the organization to achieve goals through data. This means that organizations should think carefully about which goals they want to achieve through the use of data and what would be required to realize these goals.
In a recent article about data strategy, this was compared to the world of sports [DD17] such as soccer or ice hockey. In these sports, you’ll never win the game if you only do defense: it will be hard for the opponent to score goals, but you’ll never get to score goals yourself either. The inverse is also true: you’ll never win the game if you only do offense: you’ll probably score a few goals, but it will be super easy for the opponent to score goals since there is no one to defend your own goal.
The trick to being successful is to balance between offense and defense and to make sure that the two stay connected. Example 7 illustrates this point.
Example 7. Balancing data management offense and defense
This example stems from the early 2000s when I did a consultancy assignment with a large Dutch governmental organization. Roughly speaking, the organization had several units which served citizens as well as businesses. The organization was structured along the lines of a classic front-office, mid-office, and back-office pattern. At the front-office level the units operated independently. At the mid-office and back-office level, this organization was attempting to standardize several processes and systems. This included the launch of a data delivery platform which served both analytics and reporting functions.
From a business perspective it was very clear what the value of data was and how it could be used to fuel their business processes (data management offense). From an IT perspective it was – after some searching – clear what data was available in which system and how it should be transported to the data delivery platform in a timely manner while retaining high levels of data quality (data management defense).
Unfortunately, communication between the two groups was less than optimal – to say the least. The effect was that it took years before their supply of data on this platform was well suited to meet the demands of business stakeholders, and a lot of the data that had been loaded on the platform early on was never actually used. This endeavor was not only costly, it also gave data/ DM a bad reputation at this organization.
The same line of thinking also applies to DM. Here, defense pertains to “grip on data”, meaning the activities through which the organization knows what data assets they have, where and when they were created, what their quality is, etc. This is what traditionally was seen as DM. In this context, offense pertains to generating value through the use of data, meaning the activities related to using data in business processes. This can be in various shapes and forms such as selling the data itself, handling business transactions, using big data analyses to detect fraud patterns or to use traditional business intelligence reports to manage some business unit.
The final topic for this chapter deals with two questions: what are the key challenges that DM attempts to solve and what are key challenge to overcome when getting started with DM?
The first challenge you have to tackle is for the organization (or at least key stakeholders in the organization) to recognize that DM is really a “thing” they should worry about. As stated previously, many people seem to think along these lines: data is stored in our systems, we know which systems we have, so what’s the big deal? The thinking has to change to: processes are the value creation engine of the organization and we change systems all the time so we should really take good care of our data to help us to be successful. This transition is usually the biggest challenge. Sidebar 3 illustrates this point.
Sidebar 3. Interview with Marco van der Winden (Summer 2019)
We are now realizing that data is the link between business(-operations) and systems. It is the universal language between business and IT. We have to understand that it will make our lives easier instead of more complex by focusing on data and not on systems or our own operation. My experience is that people only think that focusing on data is about more rules, more work, and being more accountable. I think (and hope) that we’ll understand we have to spend less time on acquiring data and changing our operations in favor of the more exciting things we can do with our data.
Marco van der Winden is manager of the corporate data management office at PGGM, a Dutch pension provider.
It is often the case that discussions about DM lead to the question of a business case, possibly with the exception of situations where regulators simply demand that an organization has a strong DM capability. Making a business case is the second challenge and it is a topic that we will address in greater detail in chapter 23. This challenge ties in with the previous one: if people confuse data for systems, then it is hard to argue that the organization should invest in managing its data. One aspect that I would like to mention is this: rather than boiling the ocean1 it often makes more sense to identify a small area that needs improvement, solve it, and use the “win” as a catalyst to set up the next improvement iteration.
The third challenge is related to building a DM capability that is “just right” for the needs of the organization. In many cases we see that this capability is over-engineered or too focused on implementing tools that will act like a silver bullet and make all the problems go away. The purpose of part II of this book is to show how specific topics in this category can be solved by building the DM capability one step at a time.
1 The phrase “boil the ocean” is a colloquialism that refers to taking on an overly large and potentially impossible task given the reality of your resource.
Synopsis - This book is about data management, so I will position data management as the center of the universe, at least as far as this book is concerned. In this chapter, I will clarify the role of data management, by relating it to other (management) capabilities such as business process management (BPM), enterprise architecture, and IT management. I will also briefly discuss the philosophical considerations related to the challenge of building an effective data management capability. I will base this discussion on the Cynefin framework and the concept of antifragility [SB07, Tal12].
In the introduction of chapter 2, I presented an analogy between processes as the value creation engine of the organization, and data as the fuel for this engine. The point of this analogy is that it is hard to meaningfully separate the two, as “data” and “process” are so intertwined. Yet, this book is about data management (DM) so this topic will be front and center in most of this book. More specifically, in part I, I will discuss DM and its functional areas from a theoretical perspective. I will attempt to do so in an objective manner1. In part II, I will offer good practices for building an effective data management capability.
While DM is important, it can hardly be discussed without considering its context. For understanding DM from a theoretical perspective, as well as for designing a program to build or improve a DM capability, it is recommended to take a systems perspective of the organization, meaning that (1) we should consider the
