Data Management: a gentle introduction – 2nd edition - Bas van Gils - E-Book

Data Management: a gentle introduction – 2nd edition E-Book

Bas van Gils

0,0

Beschreibung

#html-body [data-pb-style=EGG8YBP]{justify-content:flex-start;display:flex;flex-direction:column;background-position:left top;background-size:cover;background-repeat:no-repeat;background-attachment:scroll}The overall objective of this second edition is to reaffirm that data management is an exciting and valuable capability - one that deserves dedicated time and effort. Building on the foundation of the first edition, this updated version introduces new chapters, fresh insights, and additional interviews with practitioners to reflect the evolving landscape of the field. More specifically, the book now aims to: Provide an enriched introduction to data management, combining core concepts with updated theory, practical frameworks such as TOGAF, ArchiMate, and DMBOK, and new real-world examples drawn from recent assignments. Offer guidance on building effective data management capabilities, illustrated through a broader set of use cases and enriched by new practitioner stories that highlight current challenges and solutions. The book continues to serve busy professionals actively involved in managing data, as well as Bachelor’s and Master’s students interested in the field. It remains industry-agnostic, with relevance across sectors such as government, finance, telecommunications, and more. Intended roles include: members of data governance offices or councils, data owners, data stewards, enterprise and data architects, process managers, business analysts, and IT analysts. The structure remains clear and accessible, divided into three main parts: theory, practice, and closing remarks. Chapters are concise and focused, with a clear separation between main text and examples. Readers familiar with a topic can easily skip ahead, while newcomers will find a smooth and engaging learning path.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 493

Veröffentlichungsjahr: 2025

Das E-Book (TTS) können Sie hören im Abo „Legimi Premium” in Legimi-Apps auf:

Android
iOS
Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Data Management: a gentle introduction - 2nd edition

Other publications by Van Haren Publishing

Van Haren Publishing (VHP) specializes in titles on Best Practices, methods, frameworks and standards within four domains:

- IT Management

- Architecture (Enterprise and IT)

- Business Management and

- Project Management

Van Haren Publishing is publishing on behalf of leading organizations and companies: Agile Consortium, World Commerce and Contracting, IAOP, IPMA World, KNVI, PMI-NL, NLAIC and The Open Group.

Van Haren Publishing is part of the Van Haren Group and additional to the book publishing also provides the following services: accredited training materials and e-learning through Van Haren Learning Solutions, as well as independent professional certification via examination through Van Haren Certify.

Topics are (per domain):

IT Management

IT Service Management

FitSM, ISM®, ISO/IEC20000, IT4IT®, ITIL®, VerISM®, SAF, TRIM, XLA®

Data Management

Data literacy, Data visualization, DMBOK

IT Asset Management

HAM, ITAM, SAM

IT Security Management

BIO, ISO/IEC27001, NIS2

Test Management

CTAP

Application Management

ASL

Other

eCF, IT-CMF, Scrum

Project Management

Project Management

Half Double, ICB, ISO/IEC21500, P3.express, PM2, PMBOK Guide, Praxis, PRINCE2

Agile

Agile, Agile PM

Other

PMO

Business Management

Operations Management

Lean, Lean Six Sigma, OBM, OMC, RASCI

Contract Management

CATS CM, CATS RVM, WorldCC

Business Information Management

BiSL, DID

Artificial Intelligence

AI, Generative AI

Outsourcing

OPBOK

Enterprise Architecture

Enterprise Architecture

BIAN, TOGAF

Modeling

ArchiMate, BPMN

Software Architecture

ISAQB

Other

Open Agile Architecture

For the latest information on VHP publications, visit our website: www.vanharen.net.

Colophon

Title:

Data Management: a gentle introduction - 2nd edition

Subtitle:

Balancing theory and practice

Author:

Bas van Gils, managing partner @ Strategy Alliance

Illustrations:

Andy Lo Tam Loi

Reviewers:

Mirjam Visscher and Tanja Glisin

Text editors:

Lisa Gaudette and Steve Newton (Galatea)

Publisher:

Van Haren Publishing, ’s-Hertogenbosch

ISBN hard copy:

978 94 018 1312 9

ISBN eBook (pdf):

978 94 018 1313 6

ISBN ePUB:

978 94 018 1314 3

Editions:

First edition, first impression, February 2020

Second edition, first impression, September 2025

Lay out and DTP:

Coco Bookmedia, Amersfoort – NL

Copyright:

© 2020, 2025 Van Haren Publishing

Trademark notices

TOGAF® and ArchiMate® are registered trademarks of The Open Group. All rights reserved. DMBOK® is a registered trademark of DAMA International

Disclaimer

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of the copyright owners

Specifically, without such written permission, the use or incorporation of this publication, in whole or in part, is not permitted for the purposes of training or developing large language models (LLM) or any other generative artificial intelligence systems. It is also not permitted for use in, or in connection with, such technologies, tools, or models to generate any data or content and/or to synthesize or combine it with any other data or content.

Foreword by Tony Shaw

I wonder if Bas van Gils had in mind the quote by Albert Einstein, that “everything should be made as simple as possible, but not simpler”, because in this book you are about to read, he has created a “gentle introduction” which truly serves the purpose of explaining data and data management. Personally, when I first got into the world of data 20+ years ago, and coming from a background in marketing and business development, I had to learn about data management through the gradual osmosis of interacting with data professionals. While this is useful in understanding the “what” of real-world practice, it doesn’t fill in the theoretical foundations of “how” and “why” which are necessary to understand why that real-world practice works the way it does. I know I would have come up to speed a whole lot faster, if I’d had access to this book.

One of the big themes in corporate data today is data literacy, and as organizations strive to become more data-driven, then it’s a theme that will only grow in relevance. Data is not a trend that’s going to flame out in a few years, so just like financial literacy and human capital management, it is now obvious that data literacy is going to be a critical knowledge requirement for all managers and executives in the future. As such, we should be thinking about data education in the same way we think about financial and HR education, building the foundations in schools and universities, then continuing to apply those foundations to practical experience through employee onboarding programs, and broader corporate training.

This book serves these objectives well. All the important enterprise-level data management topics are included. It serves as a valuable curriculum for someone just starting out in a professional data career, or indeed for someone who like me, who picked up bits and pieces without much structure to my learning. Bas’s explanations are clear, and build upon each other systematically. I personally appreciate the research that has gone into identifying the clearest definitions available, even when that means quoting other sources. Bas has effectively curated the “best of” from existing industry literature, and tied everything together into a consistent whole, through his own lucid insight, analysis and explanations.

I wish you, the reader, well whether this is the start of your data management journey, or like me, you are finding structure for your fragmented knowledge. You have found an excellent resource to help you fulfill your objectives.

Tony Shaw, CEO & Founder of Dataversity

October 2019

Foreword by Hans Weigand

“Language (die Sprache) is always a mediator”, the famous Von Humboldt wrote 200 years ago. “It is between the finite and the infinite”, he continues, “and at the same time between one individual and the other”. In traditional philosophical categories: as a subject-object relator and a subject-subject relator. That Von Humboldt spoke using the terms finite and infinite says something about his view of the human subject (its finiteness, in several respects). It is important to note that when Von Humboldt calls language a mediator, he explicitly wants to say that the two things that get mediated do not exist independently of each other, but that in a way they come into existence through the mediation. The mediator is more than a formal relationship. That is why for him language is not a coding system where an (arbitrary) sign is determined for something that already exists for us. Such a coding system does not make language, it presupposes language.

To some extent, the characterization of Von Humboldt for language can also be applied to data, the subject of this book. Yes, the formal data structures in a computer have been designed, so as such they are not language in the Von Humboldt sense. Still, they draw on language and so take over some of its characteristics. Data also mediates between subjects. This is one reason why data needs to be protected, as identified in chapters 17 and 21 of this book, and why “shared understanding” is a fundamental goal. It is also mediating with an infinite world around us. To use a phrase of Bas, “data codifies what we know about the world”. At another place, data is defined as the combination of fact and meaning. If this is true (and who am I am to question Bas?), it means that managing data has two rather different faces. Because managing facts, as stored in files on a disk, is quite different from managing such an intangible thing as “meaning”. I don’t want to push this point too much, but I think here is one reason why data management is not simple and not comparable to the management of physical assets such as vehicles or library books, in spite of some similarities.

When data is a mediator, it also runs the risks of the fate of the mediator: always to fall in between. So that neither the IT department nor the business unit cares for it; that there is no budget for it. That it is seen as instrumental only and so is not a genuine concern in its own right. In the short history of IT so far we have learned that this would a big mistake. Data needs to be recognized as an asset and needs to be managed. Not as a goal in its own of course – a point that is stressed by Bas several times in this book. It remains a mediator, but still, it needs to be managed properly. Therefore, I am glad with this book that takes data management seriously. A book that tries to integrate insights on data management from theory and practice. A book that can not only serve practitioners and companies that struggle with data management but that can also be a good reference text for academic courses in the field of Information Management or Data Science. I wish it all the best!

Dr. Hans Weigand, Associate Professor Information Systems, Tilburg University

October 2019

Preface

When I started my studies at Tilburg University in 1998, one of the first things that I learned was an appreciation for the ‘golden triangle’ of processes, data, and systems. Only through careful alignment of these three can organizations function well. It was interesting to see that so many people – academics and professionals alike – worried mostly about either systems or processes, while data appeared to take the back seat.

After my studies, I started working on my dissertation at Nijmegen University. The focus of my research was Web information retrieval. The main idea behind my research was based on economic principles: if you have demand and supply of data, then all you have to do is “match” the two. How hard can that be? After all, the topic of information retrieval had been studied for decades. Let’s just say that I learned a lot in those days, not just about the information needs of people surfing the Internet, but also about semantics, data modeling, data structures, etc.

Since then, I have worked in many different roles, from IT professional to strategy consultant and pretty much every role in between. Over the years, I noticed that data was becoming an increasingly important topic. People started to recognize that mishandling data was costing the organization in missed opportunities, rework, reputational damage, etc. and that products and services could be greatly enhanced when enriched with data. Around this time, people started talking about data as “the new oil” and recognized it for the valuable asset that it really was. This was further strengthened by the apparent rise of topics such as artificial intelligence, data science, and big data.

I started studying data management in earnest around 2008. A few years later, Tanja Glisin suggested I study the DAMA DMBOK® [MBEH09] which really opened my eyes to the depth and breadth of the field. I found that the DMBOK was the reference within our field at the time, especially when complemented with other – more in-depth – publications. The second version of the DMBOK was published in 2017 and showed the significant improvement of our knowledge of the field [Hen17]. I have used both versions of the DMBOK over the years, both as a reference during consultancy assignments and teaching.

The DMBOK is a great reference, but may practitioners find it too theoretical to be of practical use. A more pragmatic book that combines theory with practical recommendations is missing. After much debate and discussions with friends, many of whom I have interviewed for this book, I decided to attempt to fill this gap.

The decision to actually move forward with the writing project was made in March of 2019, while visiting the Enterprise Data World conference in Boston, Massachusetts. I wrote the first version of the book during the summer months of 2019 and am forever grateful for all the support and help I received. A few years later, I wrote my second book on data management [Gil23]. That publication picked up where this book leaves off. It also takes the DMBOK as a basis but goes much deeper. One could say that the Gentle Introduction is more pragmatic whereas Data in Context is more theoretical in nature. In the fall of 2024, I decided it was time for an update of the Gentle Introduction. Life happened (several challenges in the family) and caused the update to take a bit longer than expected. Still, we got it done and this new edition will provide the reader with more up-to-date insights.

For the update, I adopted the following strategy. I went through each of the chapters individually and asked myself two questions: (1) In teaching/speaking about this topic, have I received any feedback that I should process? (2) Have I learned something new that requires me to update the material? Somewhat surprisingly, most of the material still seems very relevant and up-to-date. This is the result of the choice to stay away from specific technologies and focus on core concepts. All in all, I did feel the need to add several topics, include some new interviews, and make some (small) changes.

There are so many people to thank, and I sincerely hope I am not forgetting anyone. First of all, I would like to thank my colleagues at Strategy Alliance for their patience and help in preparing the manuscript. I would also like to thank Maurits van der Plas, Ivo van Haren, and Bart Verbrugge of Van Haren Publishing: I know that I have strong opinions on how/what I want with the book - and I have probably tried your patience over and over.

The book wouldn’t have been nearly as good without the help of Lisa Gaudette. She is my rock and “language hero”. Thank you so much for your patience, hard work, and grammar/punctuation lessons. Whenever I thought we had cleaned up a piece of text, you always found more ways to make it better. I would also like to thank Mirjam Visser for her extensive review of the first version of the manuscript as well as the pleasant discussions we had on data management. My colleagues at both Antwerp Management School, Strategy Alliance, and DAMA Netherlands also deserve a big thank you: writing is an intensive process, and I know I have been busier than normal over the last few months. So, thank you for your patience and help! Last but not least, I would like to thank my family for their support. I know I have been hiding behind my computer to finish the manuscript and wouldn’t have been able to make so much progress without your flexibility and support.

Regarding the interviews and intermezzos in this book, I want to mention that some respondents have changed jobs since the first version of this book. On the one hand, I was tempted to change the roles to reflect their new positions. On the other hand, it seemed better to keep the original roles since those capture the context of the interviews best. I decided to go with the latter.

As a final remark, I would like to point out that a lot of time and effort went into checking the material. Any errors that remain are my own. I hope you find the book interesting and useful. Enjoy the read!

Bas van Gils

August 2025

Contents

1 INTRODUCTION

1.1 Goals for this book

1.2 Intended audience

1.3 Approach

2 DATA AS AN ASSET

2.1 Data

2.2 Asset

2.3 Data and process

2.4 Visual summary

3 DATA MANAGEMENT: WHYBOTHER?

3.1 A definition of data management

3.2 Value of DM

3.3 Key challenges for DM

3.4 Visual summary

4 POSITIONING DATA MANAGEMENT

4.1 The center of the universe

4.2 DM and business process management

4.3 DM and IT management

4.4 Information/data analysis

4.5 Database management

4.6 DM and enterprise architecture management

4.7 Philosophical considerations

4.8 Visual summary

PART I: THEORY

5 INTRODUCTION

6 TERMINOLOGY

6.1 Introduction

6.2 Data codifies what we know about the world

6.3 Storing data in systems

6.4 Data in processes

6.5 Connecting the business and IT perspective

6.6 Outlook

6.7 Visual summary

7 DATA MANAGEMENT: A DEFINITION

7.1 Introduction

7.2 Managing the lifecycle of data

7.3 Deconstructing DM

7.4 Visual summary

8 TYPES OF DATA

8.1 Classifying data

8.2 Five fundamentally different types of data

8.3 Transaction data

8.4 Master data

8.5 Business intelligence data

8.6 Reference data

8.7 Metadata

8.8 Visual summary

9 DATA GOVERNANCE

9.1 Introduction

9.2 Data governance and data management

9.3 Data governance activities in DMBOK®

9.4 A modern approach to data governance

9.5 Position of data governance

9.6 Visual summary

10 METADATA

10.1 Types of metadata

10.1.1 Business metadata

10.1.2 Technical metadata

10.1.3 Operational metadata

10.2 Metadata is the foundation

10.3 Metadata repositories

10.4 Visual summary

11 MODELING

11.1 Scope

11.2 Abstraction levels

11.3 Modeling languages

11.3.1 Fact-based modeling

11.3.2 Entity relationship modeling

11.3.3 Architecture modeling with ArchiMate

11.4 Relationship to other DM capabilities

11.5 Visual summary

12 ARCHITECTURE

12.1 Architecture

12.2 Data architecture

12.3 Relationship to other (data management) capabilities

12.4 Visual summary

13 INTEGRATION

13.1 Introduction to data integration

13.2 Common integration patterns

13.2.1 Batch integration

13.2.2 Accessing data through services

13.2.3 Change data capture

13.2.4 Streaming data integration

13.2.5 Data virtualization

13.3 Integration from an architecture perspective

13.3.1 Dealing with the number of potential connections

13.3.2 Dealing with different names and structures

13.3.3 Dealing with different patterns

13.4 Data mesh

13.5 Visual summary

14 REFERENCE DATA

14.1 Definition

14.2 Using reference data to harmonize the meaning of data

14.3 Historic versions of reference data sets

14.4 Reference data and governance

14.5 Visual summary

15 MASTER DATA

15.1 Multiple versions of the truth

15.2 Basic MDM concepts

15.3 Relationship to other data management capabilities

15.4 Visual summary

16 QUALITY

16.1 Introduction

16.2 The notion of quality

16.3 Data quality

16.4 Data quality management

16.5 Critical data elements

16.6 Relationship to other capabilities

16.7 Visual summary

17 DOCUMENT AND CONTENT MANAGEMENT

17.1 Characteristics of documents

17.2 Lifecycle and archives

17.2.1 Documents, originals and copies

17.2.3 Archives: authenticity and proof

17.2.4 Records continuum model

17.2.5 Implications

17.3 Other document collections

17.3 Visual summary

18 RISK AND SECURITY

18.1 Risks and risk mitigating measures

18.2 ISO standards

18.3 Data security management

18.4 Training and certification

18.5 Relationship to other capabilities

18.6 Visual summary

19 BUSINESS INTELLIGENCE & ANALYTICS

19.1 Defining business intelligence and analytics

19.2 Common system types

19.3 Structuring data

19.4 Self-service BI

19.5 Relationship to other capabilities

19.6 Visual summary

20 DATA SCIENCE & AI

20.1 Algorithms

20.2 Data science

20.3 ARtificial intelligence

20.4 Offense and defense

20.5 Visual summary

21 TECHNOLOGY

21.1 People are key

21.2 Observations about technology

21.3 Technology and the functional areas of DMBOK®

21.3.1 Data governance and stewardship

21.3.2 Metadata

21.3.3 Modeling

21.3.4 Architecture

21.3.5 Integration

21.3.6 Reference and master data

21.3.7 Quality

21.3.8 Security

21.3.9 Business intelligence

21.3.10 Big data

21.4 Technology adoption

22 DATA (HANDLING) ETHICS & COMPLIANCE

22.1 Ethics in data

22.2 Ethical handling of data

22.2.1 Ethical principles behind data protection

22.2.2 The data lifecycle

22.2.3 Using ethical principles in the data lifecycle

22.3 The relationship between ethics and governance

22.4 Visual summary

PART II: PRACTICE

23 INTRODUCTION

24 BUILDING THE BUSINESS CASE FOR DATA MANAGEMENT

24.1 The need for a business case

24.2 Qualitative and quantitative business case

24.3 Incremental approach to building a business case

25 KICK-STARTING DATA QUALITY MANAGEMENT

25.1 Top-down approach

25.2 A motivation for starting small

25.3 Setting up your first experiments with data quality management

25.4 Scaling up after successful experimentation

26 FINDING DATA OWNERS AND DATA STEWARDS

26.1 Top-down and bottom-up

26.2 Ownership/stewardship models

26.3 Finding owners and stewards

27 THE ROLE OF TRAINING

27.1 People first, and the need for training

27.2 Types of training

27.3 How to design a training program

28 SETTING UP A DATA MANAGEMENT POLICY

28.1 Data management policy

28.2 Typical structure for a data management policy

28.3 Setting up a data management policy

28.3.1 Top-down

28.3.2 Bottom-up

28.4 Recommendations

29 BUSINESS CONCEPTS AND THE CONCEPTUAL DATA MODEL

29.1 Freezing language

29.2 Definitions and conceptual data models

29.3 Definitions in a context

29.4 Recommendations

30 SETTING UP A METADATA REPOSITORY

30.1 The importance of metadata

30.2 Metadata repository architectures

30.3 Implementation strategies

30.3.1 Top-down metadata strategy

30.3.2 Bottom-up metadata strategy

30.3.3 Matching the strategy to the situation

30.4 Recommendations

31 LEVERAGING ENTERPRISE ARCHITECTURE

31.1 EA as a source of information

31.2 EA models and visualizations

31.3 Building effective solutions

31.4 Recommendations

32 INTEGRATION ARCHITECTURE

32.1 Data is everywhere

32.2 Start simple

32.3 Keep it simple

32.4 Recommendations

33 A PRAGMATIC APPROACH TO DATA SECURITY

33.1 Motivation for a security framework

33.2 Security use cases

33.3 Security levels in business terms

33.4 The link to security measures and controls

33.5 Tying it together

34 ROLES IN DATA MANAGEMENT

34.1 Change and run

34.2 Roles in the DMBOK

34.3 Skills in the SFIA framework

34.4 Definition of roles

34.4.1 Architect

34.4.2 Business management

34.4.3 Data owner, data steward

34.4.4 Project management

34.4.5 Chief data officer

34.4.6 Business analyst, process analyst, and system analyst

34.5 Reflection and recommendation

35 BUILDING A DATA MANAGEMENT ROADMAP

35.1 To roadmap or not to roadmap

35.2 The steps towards an effective roadmap

35.3 Techniques

35.3.1 Vision phase

35.3.2 Analysis phase

35.3.3 Portfolio phase

35.3.4 Execution phase

35.4 Recommendations

PART III: CLOSING REMARKS

36 SYNTHESIS OF THE RECOMMENDATIONS

36.1 Data management

36.2 Antifragility and complexity

36.3 Expected benefits

37 CONCLUSION

37.1 Review

37.2 Outlook

37.3 Call to action

BIBLIOGRAPHY

INDEX

ABOUT THE AUTHOR

1 Introduction

It is often said that “data is the new oil”. It is hard to figure out with any certainty who wrote about this metaphor first. A cursory search on Google suggests it was used originally in an article by The Economist [Par17] with many authors following suit by describing why, for all practical reasons, data is not the new oil (e.g. [Mar18]). Whatever the practical implications, the metaphor at least illustrates that data is an important business asset that deserves to be managed as such. This is the field of data management (or DM for short). See also sidebar 1.

Sidebar 1. Interview with Marco van der Winden (Summer 2019)

My experience is that the importance of data is underestimated in the way that there was/is no primary focus on it. Living in the low countries where there is an abundance of water, data is mostly seen as something that can be easily obtained, just like water. To continue the comparison, the Dutch are very good with containing the water streams and keeping the seawater outside with dikes. But with data we are less experienced. We let data sometimes uncontrollably flow though our fields without knowing where it goes or even why we are doing it.

We are not in the Middle Ages (when we became increasingly proficient at water management) and it should be clear that data must be governed in a way that we are more in control and that we can profit more from it. By the way, I think that a comparison with oil is not a smart one. Sooner or later there will be a shortage of oil. Above that, there are also some environmental disadvantages with oil. Data is more like water. It’s the source of all living things. You can’t live without it and there will always be water.

Marco van der Winden is manager of the corporate data management office at PGGM, a Dutch pension provider.

A key question that needs answering is: what does that entail? In other words: what is data management (DM) and how do you make it work? These are hard questions. Data is often seen as an abstract “thing” that sits in the realm of the IT department. This isn’t helped by the fact that a lot of technology is so closely related to data that it is easy to confuse one for the other. Worse, data management professionals are prone to using complicated terminology such as metadata, master data, lineage and so on, which makes it hard for outsiders to truly understand what is going on. This is not a good thing: DM is an important capability that organizations must master1.

Years later, I interviewed Marco again. I asked him for his latest insights on the same question as a few years earlier: what are your views on data/introducing data management in the organization. His responses can be found in sidebar 2.

Sidebar 2. Interview with Marco van der Winden (2)

The importance of good data that is made available quickly (at the right time for the right stakeholders) has increased further in recent years. The comparison with water, which is essential for human life, is apt: the same thing applies to a healthy business operation, where data is the water that flows through the organization. Establishing a solid data infrastructure is no easy task as it involves a combination of introducing new data technologies and the agility (of employees and management) within an organization to adapt and change.

Investing in your data infrastructure requires a long-term vision—not only in terms of what you want to achieve with your data to meet your business goals but also on what needs to be done to structure your data infrastructure accordingly. For the first goal, the offensive side, you can usually gain broad support. After all, for part of the organization, it is quite exciting to imagine all the things that could be done with data in the future. It stimulates the entrepreneurial spirit within the company. However, the fact that this also requires significant investment in technology and employees is less appealing.

This investment in the defensive side of data management often demands more effort than the offensive side. It means spending large sums on new data applications and, more importantly, dedicating a lot of time to changing the “way of working” or, in other words, increasing data literacy within the organization. The realization that substantial investment in the defensive side is necessary to enable the offensive side — in other words, to achieve your business strategy — is essential. You cannot harvest fruit from a tree that you do not water.

Another factor at play is the rapid pace of technological development in this field. For many organizations, this means that it is not just a matter of adjusting their way of working and investing, but also of becoming more agile and absorbing new technologies more quickly. By the time you have “completed” your data program, you will likely have already been overtaken by technological advancements. As a result, the speed of implementation is becoming an increasingly important factor.

In the field of Artificial Intelligence, for example, we are still at the beginning of what is possible. This could potentially lead to a paradigm shift, where the challenge is no longer mastering the technology itself but rather leveraging technology to become more proficient in data management. This shift will place an even greater emphasis on the human element as the key to the success of any data management project.

Marco van der Winden is manager of the corporate data management office at PGGM, a Dutch pension provider.

It appears that the insights have remained largely the same. The emphasis is on striking a balance between data management offense (creating value with data) and data management defense (getting to grips with the complexities of managing data as an asset). The focus has shifted somewhat, though. Marco mentioned technological developments and they are certainly a key factor. New technologies and architectures (e.g. “data platforms”) were hardly mentioned years ago and are now a part of normal business conversations. The same is true for artificial intelligence (AI) and generative AI (GenAI) which also rely heavily on data and data management.

To illustrate the relationship between offense (value) and defense (grip, investment), I will borrow a slightly altered example from [Soa11] in example 1.

Example 1. Data management benefits

Assume you are working for a large global company with approximately 10 million customers. On average each customer purchases 1.2 products every year. Your strategy is to attempt to get more revenue from the existing customer base, rather than try to capture a bigger market share. To that end, a global customer 360 initiative is considered. The data management team and marketing have worked together to compile a business case.

First, it is expected that a better overview of each customer will increase the number of purchases from 1.2 to 1.4, which is expected to raise an extra 8 million dollars in revenues over three years. Furthermore, it is estimated that the direct cost of wading through duplicated/inconsistent data about customers by customer service representatives adds up to about half a million dollars over three years. The direct cost of the IT department around data integration issues is expected to be reduced by another half a million dollars over three years. This adds up to nine million dollars in benefits. Would that justify a significant investment in data management?

1.1 GOALS FOR THIS BOOK

One of the best ways to make progress in our field is to put knowledge in the public domain such that everyone can benefit from it. There are many ways to do this: scientific studies provide academic rigor but tend to be low on practical relevance. Handbooks such as the DMBOK®2 are the inverse: there is a lot of practical value but they tend to be low on the academic rigor [Hen17]. Balancing rigor and relevance is tricky to say the least. This book leans towards the practical relevance side and provides academic rigor whenever possible. The unique selling point of this book will lie in the fact that it offers (1) an up-to-date overview of the field, (2) with practical guidance in the form of a capability-based framework, and (3) is supported by real-world evidence through mini case studies.

The overall objective is to show that data management (DM) is an exciting and valuable capability that is worth time and effort. More specifically, I hope to achieve the following goals. First, I hope to give a “gentle” introduction to the field of DM by explaining and illustrating its core concepts. In doing so, I will demystify terminology as much as possible. To this end, I will use a mix of theory, practical frameworks such as TOGAF, ArchiMate, and DMBOK, as well as results from real-world assignments [The11, The16a, Hen17]. I will shy away from the latest technological trends. They change so often that this text would be outdated by the time the proverbial ink is dry. Instead, I will focus on concepts and patterns that will remain relevant for a longer time. However: nothing lasts forever.

Second, I will offer guidance on how to build an effective DM capability for your organization. I will do so by considering various use cases, linked to the previously mentioned theoretical exploration as well as the stories of practitioners in the field.

1.2 INTENDED AUDIENCE

The book aims at a broad audience: busy professionals who “are actively involved with managing data”. This might be a bit too broad because it is hard to imagine a book that would successfully address the needs of strategic decision makers all the way down to analysts and database administrators. The book is also aimed at (Bachelor’s/Master’s) students with an interest in data management. A more specific characterization of the (professional) audience is:

■ In the strategic/tactical/operational continuum, I will go for the middle ground. This means: stay away from executives and top management. It also means: stay away from true day-to-day business operations.

■ In the business/technology continuum, again, I will aim for the middle ground. It is increasingly true that there is no real difference between business and IT but for the sake of the argument: I am aiming at business people with a sense of IT, IT people with a sense of business and those who straddle both worlds.

■ Industry-wise, the book should be agnostic and should be applicable in different industries such as government, finance, telecommunications etc.

Typical roles that come to mind are: data governance office/council, data owners, data stewards, people involved with data governance (data governance board), enterprise architects, data architects, process managers, business analysts and IT analysts. Since “data” is increasingly pervasive, I also kept a broader business audience in mind when writing this text. Business professionals — both managerial and in the trenches — are involved in managing, using, and creating data. This text should be “gentle” enough to also interest that audience.

1.3 APPROACH

In this book, I will combine elements from theory and from practice. The former comes in the shape of citations to books, articles and web resources. I will attempt to link to original sources whenever possible but also seek to give the book a look-and-feel that is not too academic. The same goes for the practical part: I will combine my own experience of 15+ years as a consultant and teacher with stories from other professionals. I will provide the names of organizations and people whenever possible. In some places, stories have been anonymized to ensure privacy, or to comply with non-disclosure agreements. The theory part of the book will give a broad overview of the field of data management. The practical part will cover specific topics and use cases in more depth. More detailed coverage of specific topics can be found by following the citations or reaching out to listed practitioners.

The book is mainly aimed at busy professionals — while I also take into account that students and perhaps even scholars will find the book useful. Because of this, I have made two decisions with respect to the book structure. First of all, I have chosen to split the book into three main parts: theory, practice, and closing remarks. Furthermore, I have chosen to keep the chapters as short and to the point as possible and also make a clear distinction between the main text and the examples. Because of this choice, the book will have many short chapters. If you are already familiar with the topic of a chapter, you can easily skip it and move on to the next.

 

________

2 The DMBOK is the Data Management Body of Knowledge. It is a reference book by DAMA, the Data Management Association. The DMBOK compiles data management principles and best practices.

2 Data as an asset

Synopsis - In this chapter, I will give an overview of why data is one of the key assets of an organization. To achieve this, I will first define the notions of data and asset. Then I will show what it means for data to be an asset. I will do this by stressing the relationship between processes (the “engine” of the organization), and data (the “fuel”) which are both needed to create value. I will illustrate the value of data through two short examples.

2.1 DATA

So far, I have been using the word “data” colloquially without really defining it. Experience shows that people use the word differently so I will explore this concept first. On any such venture, the first step is to check a dictionary. The lemma for data from the Merriam-Webster Dictionary has three definitions:

1. Factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation.

2. Information in digital form that can be transmitted or processed.

3. Information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful.

These definitions are very similar to the way of thinking in the Design & Engineering Methodology for Organizations (DEMO) approach where a distinction is made between three levels of abstraction: forma - being all about documenting/expressing facts and data; informa - being all about thought and reasoning; and finally performa - being all about using facts and data in the real-world, for example to decide on a course of action [RD99, Die06].

Citing earlier work from the mid-1980s by Appleton, Peter Aiken - one of the eminent writers about DM - positions the term data in relation to other concepts such as facts and information [App86, AG13]. Figure 2.1 summarizes this way of thinking. One of the things that can be learned from this diagram is that data is said to consist of facts which have a meaning. Another important aspect is that data can be used, which shows intelligence. Comparing this approach to the previously cited definition, the question arises whether it is possible, or even useful, to clearly and unambiguously distinguish between the concepts of data and information: the Merriam-Webster Dictionary definition for data heavily relies on the notion of information and vice versa.

Figure 2.1 Fact, data, information and intelligence

For purposes of this book, I will not make a hard distinction between the two concepts1. I will use the term data as an umbrella term, meaning all three definitions from the Merriam-Webster Dictionary. Even more, I intend to use it both as the “raw ingredient” (data codified in systems) and how it is used in business processes (sometimes called “information” by other authors). I will expand on this discussion further in chapter 6. Example 2 clarifies this way of thinking further. In my book Data in context: using models as enablers for managing and using data [Gil23] I go a few steps further in fleshing out the formal definitions of these elusive concepts.

Example 2. Data management benefits

Suppose you are an avid runner, like me. Your coach has explained that your heart rate provides a good indicator of how your body is doing and that it should be used to guide your bi-weekly training sessions. After purchasing a heart rate monitor, you go out for your first run.

During your run, you can check your new gadget. It will measure how you are doing and individual data points are shown as you go along. Presumably, the gadget will also store this data, so that it can later be transferred to some online application for further processing. Together with your coach, you can use this data to analyze your fitness and training schedule for weeks to come.

2.2 ASSET

As stated in the opening paragraph of chapter 1: it is often said that “data is an asset”. For example, the DMBOK states [Hen17]:

Data and information are not just assets in the sense that organizations invest in them in order to derive future value. Data and information are also vital to the day-to-day operations of most organizations. They have been called the “currency”, the “life blood” and even the “new oil” of the information economy. Whether or not an organization gets value from its analytics, it cannot even transact business without data.

The question that needs to be answered is: what is an asset? Relying once more on the dictionary, an asset can be defined as “an item of value that is owned or possessed”. Let’s explore that further through the cases listed in example 3.

Example 3. Examples of assets

Assume the asset is a car. It has different types of value to me: it gets me from A to B, but it also has monetary value. Now assume that the asset is money. Its value is in the security that I have some buying power to take care of myself. Finally assume that the asset is customer data. Its value is that I know who my customers are, where they live and what they have purchased in the past so that I can help them well in the future.

The examples show that assets can be tangible or intangible. They also show that assets have value. The latter point deserves further exploration. In previous research, I have shown that value is both personal (one person may see it differently than another person) and situational (in one situation it may be worth more than another) [Gil06]. Again, two small examples illustrate the point:

Example 4. Value of assets

The first example pertains to art. Let’s take a famous painting such as White on White1 by Kazimir Malevich. Some will claim it priceless, whereas others will claim it to be something so simple that a five-year-old can create it. Both observers, of course, are correct. This shows the personal nature of the valuation of assets.

The second example pertains to the value of water when compared to money. In most cases, I would value $10 over a small bottle of water. When standing in the middle of the desert, though, I may think differently. This shows the situational nature of valuation of assets.

__

1https://en.wikipedia.org/wiki/White_on_White, last checked 2 June 2019.

The implications for data as an asset are clear: when we say that we consider data to be an important asset then we mean that we believe that the data in our systems has much value, either intrinsic (we have data that is worth money, for example if we sell it) or indirectly (which means we can use it in our processes to create value). This, finally, brings us to the relationship between data and business processes.

Before we dive into this relationship, there is one point that should be made. There is a big distinction between data assets and tangible assets: there is only one copy of a tangible asset but this doesn’t have to be the case for (intangible) data assets. To put it differently: you can make as many copies of data assets as you like without affecting the original. If this were the case for physical assets then we would all be as rich as Croesus for sure. This property of data is important in chapters to come when we talk about storing, using, transferring, and managing data. This point is emphasized also in ISO standard 55013 on asset management. That standard explicitly refers to asset data as (paraphrased) “data about assets”. Even more, it emphasizes the link to processes by stating that understanding requirements around asset data is a key success factor when organizations wish to maximize the value of assets along their lifecycle.

2.3 DATA AND PROCESS

This brings me to the final part of this chapter: the relationship between data and process. It is safe to say that data does not magically spring into existence. On the contrary: creating data takes effort by business professionals, for example by adding data into computer systems or by manipulating existing data to create new data.

The fact that we are not so (consciously) aware of this is not surprising. Years ago – before the computer era – a lot of our data sat in paper files and records. Creating data meant getting in there and updating the files. More data meant more paper. More paper meant more space required to store the data. This, eventually, led to bigger and bigger libraries2. In the computer age this is different: most data is now stored digitally and adding more bits and bytes requires very little extra physical space.

Producing data in business processes is useful in itself. Things become more interesting when we consider where else that data can be used/where else data can be put to good use to create value. Example 5 illustrates this point.

Example 5. Data and processes

Suppose you work at a company that leases expensive medical equipment to hospitals. Each time the company closes a new deal with a hospital, its records are updated (new data is added to their systems). The value of this data is that it proves that the transaction took place and that the company is owed a certain fee each time.

The data is likely to be used in other parts of the company as well. For example, sales and marketing representatives are interested in the data to investigate whether they can cross-sell insurance products with the newly leased equipment, whilst management will be interested in monthly sales reports to see how well the company is doing.

This example illustrates a point that I cannot make enough: there is a strong relationship between business processes and data (see e.g. [BRS19] for a recent discussion of this topic, bridging the gap between research and practice). Data without use in processes has no value. Processes without data cannot happen: if processes are the value creation engine of the organization, then data is its fuel. As a corollary of this discussion, this book will also have much say about processes and not just about data.

Data can only be used if it is of the right quality and can be found. The former point is easily understood: just like poor materials will likely lead to the construction of a poor physical asset, so poor data leads to poor process performance. The latter point requires a bit more explanation. The general thinking seems to be: our data is stored in our systems and we know which systems we have – so how hard can it be to find out data? Example 6 shows that in practice this may not be as easy as it seems. Even more, it may seem that the rise of artificial intelligence (AI) and generative AI (GenAI) have “solved” many of the problems around accessing data. Getting your hands on a data set with a nice visual appears to be just a good prompt away. This may be true but please keep in mind that (1) this costs quite a few computing resources, so you are impacting upon the environment and (2) the AI may not be as smart as you think it is, so you’d better verify the results it gives you.

Example 6. Finding data

Let’s go back to the library case that was mentioned previously. Libraries are structured in such a way that, by and large, it should be straightforward to find the books and articles that you need. In the old days this was done through extensive cataloguing, classification, and index systems. These days all of this is automated1. It is true that in most organizations all data is stored electronically in systems. In theory it should be easy to find. However, do you have any idea how many systems your organization has for storing data about customers or products? Chances are there are dozens! Finding the right information for use is one of the key challenges for many organizations.

__

1 If you want to know more about information retrieval, consider reading e.g. [Pai99] - which also has a good historical overview.

The point that this example tries to make is that data is often dispersed across many systems which makes it harder to locate the right data for the right person doing his/her job at the right time. This, in turn, shows that the value of data depends on more than it being a correct representation of the real-world: being able to use it in processes in a timely manner might be just as important. If your data is “correct” but it can’t be found in time to be used in a process then, in fact, its value is very low, or even zero.

2.4 VISUAL SUMMARY

 

________

1 As a small aside, note that it is often a legal or even philosophical discussion whether something is a “fact”. That is, whether it is considered to be “factual” and therefore “true”. It is easy to get lost in this discussion. I will avoid using the word “fact” in this book.

2 An interesting overview of the history of libraries can be found in [Mur09]. Even more, I highly recommend visiting Museum Plantin Moretus in Antwerp: it gives an excellent view on how books were published in the 16th century. Insights on maintaining quality, checking content, and adding good illustrations are still highly relevant.

3 Data management: why bother?

Synopsis - This chapter picks up where the previous chapter left off: if data is an important asset, then it should be managed as such. In this chapter, I will briefly introduce the Data Management Body of Knowledge (DMBOK) reference work on data management upon which part I of this book is based. I will use this as a backdrop to discuss some of its key challenges for data management. The challenges are illustrated with small examples.

3.1 A DEFINITION OF DATA MANAGEMENT

In the previous chapter, I have discussed the concept of data as an asset to signify the importance of data for an organization. We pick up the discussion with a claim: if data is such an important asset to the organization, then it should be managed as such. This is the realm of data management.

Simply put, DM is the capability that is concerned with managing data as an asset. This definition is still somewhat vague and requires further clarification. In [AB13], Peter Aiken points out that “any holistic examination of the information technology field will reveal that it is largely about technology – not about information”. We begin by stating that data management is largely about putting the “I” back in “IT”. This observation shows that DM is not solely an IT capability.

Sidebar 3. Interview with Marc van den Berg (summer 2019)

Many organizations are currently experiencing challenges with data due to past decisions and are paying the price because of the investment they have to make to fix their data after the fact. At the same time, these organizations want to make a quantum leap forward and reap the benefits from new technologies such as big data and artificial intelligence. This will not work, as first you must have your house in order. In my view this means: make sure you have shared goals about what you want to achieve with data, and subsequently align business and IT to attain those goals.

At the time, Marc van den Berg was managing director of IT and Innovation at PGGM, a Dutch pension provider.

It appears that in most organizations there is no longer a real, meaningful difference between “the business side of the organization” and “the IT side of the organization”, at least not in the classic sense of business/IT alignment literature from the 1980s and 1990s [PB89, HV93]. With the rise of process automation, digital/digitalization we see that the two perspectives are now intertwined to such a degree that the distinction is fading rapidly (see e.g. [RBM19, Gue12] in which a distinction is made between digitalization of existing processes, or by a more radical departure and creating digital, information-enriched value propositions). In this context, it feels safe to say that DM is an important capability for the organization, regardless of whether it leans towards business, IT, or both.

The DMBOK definition of DM is as follows [Hen17]:

Data management is the development, execution, and supervision of plans, policies, programs, and practices that deliver, control, protect, and enhance the value of data and information assets throughout their lifecycle.

The interesting aspect that can be learned from this definition is that data management encompasses many activities that together enable the organization to use data effectively. For now, this exploration of the definition of DM will have to suffice. A more detailed discussion will follow in chapter 7.

The DMBOK also states that these activities are likely to be cross-functional and that “the primary driver for data management is to enable organizations to get value from their data assets, just as effective management of financial and physical assets enables organizations to get value from those assets”. The value of DM is discussed further in the next section.

3.2 VALUE OF DM

The key point of DM is to manage data as an asset which helps the organization to derive value from its data assets. As such, it has no direct business value. Its value is more indirect; it enables the organization to achieve goals through data. This means that organizations should think carefully about which goals they want to achieve through the use of data and what would be required to realize these goals.

In a recent article about data strategy, this was compared to the world of sports [DD17] such as soccer or ice hockey. In these sports, you’ll never win the game if you only do defense: it will be hard for the opponent to score goals, but you’ll never get to score goals yourself either. The inverse is also true: you’ll never win the game if you only do offense: you’ll probably score a few goals, but it will be super easy for the opponent to score goals since there is no one to defend your own goal.

The trick to being successful is to balance between offense and defense and to make sure that the two stay connected. Example 7 illustrates this point.

Example 7. Balancing data management offense and defense

This example stems from the early 2000s when I did a consultancy assignment with a large Dutch governmental organization. Roughly speaking, the organization had several units which served citizens as well as businesses. The organization was structured along the lines of a classic front-office, mid-office, and back-office pattern. At the front-office level the units operated independently. At the mid-office and back-office level, this organization was attempting to standardize several processes and systems. This included the launch of a data delivery platform which served both analytics and reporting functions.

From a business perspective it was very clear what the value of data was and how it could be used to fuel their business processes (data management offense). From an IT perspective it was – after some searching – clear what data was available in which system and how it should be transported to the data delivery platform in a timely manner while retaining high levels of data quality (data management defense).

Unfortunately, communication between the two groups was less than optimal – to say the least. The effect was that it took years before their supply of data on this platform was well suited to meet the demands of business stakeholders, and a lot of the data that had been loaded on the platform early on was never actually used. This endeavor was not only costly, it also gave data/DM a bad reputation at this organization.

The same line of thinking also applies to DM. Here, defense pertains to “grip on data”, meaning the activities through which the organization knows what data assets they have, where and when they were created, what their quality is, etc. This is what traditionally was seen as DM. In this context, offense pertains to generating value through the use of data, meaning the activities related to using data in business processes. This can be in various shapes and forms such as selling the data itself, handling business transactions, using big data analyses to detect fraud patterns or to use traditional business intelligence reports to manage some business unit.

In a more recent publication, I explored this line of thinking a bit further and came to the conclusion that we need to think of a double means-end relationship [Gil23]. First, we can say that data is a means to achieve the ends in our (business) strategy. This links to the data management offense perspective. Second, we can say that data management is a means to achieve the end of having good enough data (to achieve the ends in our business strategy). This links to the data management defense perspective.

3.3 KEY CHALLENGES FOR DM

The final topic for this chapter deals with two questions: what are the key challenges that DM attempts to solve and what are key challenge to overcome when getting started with DM?

The first challenge you have to tackle is for the organization (or at least key stakeholders in the organization) to recognize that DM is really a “thing” they should worry about. As stated previously, many people seem to think along these lines: data is stored in our systems, we know which systems we have, so what’s the big deal? Thinking has to change to: processes are the value creation engine of the organization and we change systems all the time so we should really take good care of our data to help us to be successful. This transition is usually the biggest challenge. Sidebar 4 illustrates this point.

Sidebar 4. Interview with Marco van der Winden (Summer 2019)

We are now realizing that data is the link between business(-operations) and systems. It is the universal language between business and IT. We have to understand that it will make our lives easier instead of more complex by focusing on data and not on systems or our own operation. My experience is that people only think that focusing on data is about more rules, more work, and being more accountable. I think (and hope) that we’ll understand we have to spend less time on acquiring data and changing our operations in favor of the more exciting things we can do with our data.

Marco van der Winden is manager of the corporate data management office at PGGM, a Dutch pension provider.

It is often the case that discussions about DM lead to the question of a business case, possibly with the exception of situations where regulators simply demand that an organization has a strong DM capability. Making a business case is the second challenge and it is a topic that we will address in greater detail in chapter 23. This challenge ties in with the previous one: if people confuse data for systems, then it is hard to argue that the organization should invest in managing its data. One aspect that I would like to mention is this: rather than boiling the ocean1 it often makes more sense to identify a small area that needs improvement, solve it, and use the “win” as a catalyst to set up the next improvement iteration.

The third challenge is related to building a DM capability that is “just right” for the needs of the organization. In many cases we see that this capability is over-engineered or too focused on implementing tools that will act like a silver bullet and make all the problems go away. The purpose of part II of this book is to show how specific topics in this category can be solved by building the DM capability one step at a time.

The last challenge is, once again, related to the people in the organization and pertains to the necessity to hold the attention (on the data management initiative) long enough to keep it going after the initial excitement fades. Implementing data management is not a one-shot initiative. As business circumstances continue to evolve, so should the data management structures that are implemented in the organization. Failing to adjust leads to strategic drift and a data management function that fails to deliver on its promises.