Data as a Service - Pushpak Sarkar - E-Book

Data as a Service E-Book

Pushpak Sarkar

0,0
62,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Data as a Service shows how organizations can leverage "data as a service" by providing real-life case studies on the various and innovative architectures and related patterns * Comprehensive approach to introducing data as a service in any organization * A reusable and flexible SOA based architecture framework * Roadmap to introduce 'big data as a service' for potential clients * Presents a thorough description of each component in the DaaS reference architecture so readers can implement solutions

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 558

Veröffentlichungsjahr: 2015

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



IEEE Press Editorial Board Tariq Samad, Editor in Chief

George W. Arnold

     Jeffrey Nanzer

Dmitry Goldgof

     Ray Perez

Ekram Hossain

     Linda Shafer

Mary Lanzerotti

     Zidong Wang

Vladimir Lumelsky

     MengChu Zhou

Pui-In Mak

     George Zobrist

Technical Reviewer

Frank Ferrante, College of William and Mary

About IEEE Computer Society

IEEE Computer Society is the world's leading computing membership organization and the trusted information and career-development source for a global workforce of technology leaders including: professors, researchers, software engineers, IT professionals, employers, and students. The unmatched source for technology information, inspiration, and collaboration, the IEEE Computer Society is the source that computing professionals trust to provide high-quality, state-of-the-art information on an on-demand basis. The Computer Society provides a wide range of forums for top minds to come together, including technical conferences, publications, and a comprehensive digital library, unique training webinars, professional training, and the TechLeader Training Partner Program to help organizations increase their staff's technical knowledge and expertise, as well as the personalized information tool myComputer. To find out more about the community for technology leaders, visit http://www.computer.org.

IEEE/Wiley Partnership

The IEEE Computer Society and Wiley partnership allows the CS Press authored book program to produce a number of exciting new titles in areas of computer science, computing, and networking with a special focus on software engineering. IEEE Computer Society members continue to receive a 15\% discount on these titles when purchased through Wiley or at wiley.com/ieeecs.

To submit questions about the program or send proposals, please contact Mary Hatcher, Editor, Wiley-IEEE Press: Email: [email protected], Telephone: 201-748-6903, John Wiley & Sons, Inc., 111 River Street, MS 8-01, Hoboken, NJ 07030-5774.

Data as a Service

A Framework for Providing Reusable Enterprise Data Services

Pushpak Sarkar

Copyright © 2015 by the IEEE Computer Society. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data is available.

ISBN: 978-1-119-04658-5

 

 

 

Dedicated to my parents and family for making me believe that

Contents

Guest Introduction

Guest Introduction

Preface (Includes the Reader's Guide)

The Reader's Guide

PART 1: Overview of Fundamental Concepts Includes Chapters 1 to 3

PART 2: DaaS Architecture Framework and Components Includes Chapters 4 to 8

PART 3: DaaS Solution Blueprints Includes Chapters 9 to 11

PART 4: Ensuring Organizational Success Includes Chapters 12 to 14

What Is Not Covered in this Book

Acknowledgments

Part One: Overview of Fundamental Concepts

Chapter 1: Introduction to DaaS

Topics covered in this chapter

Data-Driven Enterprise

Defining a Service

Drivers for Providing Data as a Service

Data as a Service Framework: A Paradigm Shift

Chapter 2: DaaS Strategy and Reference Architecture

Topics Covered in this Chapter

Enterprise Data Strategy, Goals, and Principles

Critical Success Factors

Reference Architecture of the DaaS Framework

How to leverage the DaaS Reference Architecture

Summary

Chapter 3: Data Asset Management

Topics Covered in this Chapter

Introduction to Major Categories of Enterprise Data

Transaction Data (Includes Big Data)

Significance of EIM in Supporting the DaaS Program

Role of Enterprise Data Architect

Summary

Part Two: DaaS Architecture Framework and Components

Chapter 4: Enterprise Data Services

Topics Covered in this Chapter

Emergence of Enterprise Data Services

Need for an Enterprise Perspective

Emergence of Enterprise Data Services

Publication of Enterprise Data

Interdependencies between DaaS, EIM, and SOA

Case Study: Amazon's Adoption of Public Data Service Interfaces

Summary

Chapter 5: Enterprise and Canonical Modeling

Topics Covered in this Chapter

A Model-Driven Approach Toward Developing Reusable Data Services

Defining a Standards-Driven Approach toward Developing New Data Services

Role of the Enterprise Data Model

Developing the Canonical Model

Enterprise Data Model

Canonical Model

Implementing the Canonical Model

Publishing Data Services with the Canonical Model as a Foundation

Implementing the Canonical Model in Real-life Projects

Data Services Roll Out and Future Releases

Case Study: DaaS in Real Life, Electronic-Data Interchange in U.S. Healthcare Exchanges

Summary

Chapter 6: Business Glossary for DaaS

Topics Covered in this Chapter

Problem of Meaning and the Case for a Shared Business Glossary

Using Metadata in Various Disciplines

Role of an Organization's Business Glossary

Enterprise Metadata Repository

Implementing the Enterprise Metadata Repository

Metadata Standards for Enterprise Data Services

Metadata Governance

Summary

Chapter 7: SOA and Data Integration

Topics Covered in this Chapter

SOA as an Enabler of Data Integration

Role of Enterprise Service Bus

What is a Data Service?

Foundational Components of a Data Service

Service Interface

Major Service Categories

Overview of Data Virtualization

Consolidated Data Infrastructure Platform

Summary

Chapter 8: Data Quality and Standards

Topics Covered in this Chapter

Where to Begin Data Standardization Efforts in Your Organization

Role of Data Discovery/Profiling to Identify DaaS Quality Issues

Data Quality and the Investment Paradox

Quality of a Data Service

Setting Up Standards in a DaaS environment

Summary

Part Three: DaaS Solution Blueprints

Chapter 9: Reference Data Services

Topics Covered in this Chapter

Delivering Market and Reference Data Using Real-Time Data Services

Comparing Usage of Reference Data Against Master Data

Understanding Challenges of Reference Data Management

Other Reference Data Management Challenges

Role of Reference Data Standards and Vocabulary Management

Collaborative Reference Data Management Implementation Using Business Process Management/Workflow

Summary

Chapter 10: Master Data Services

Topics Covered in this Chapter

Introduction to Master Data Services

Pros and Cons of Master Data Services (Virtual Master Data Management)

Leveraging the Golden Source to Resolve Deep-Rooted Source Differences

Future Trends in Master Data Management Using DaaS

Comparing Master Data Services Approach (Virtual) with Master Data Management Approach Involving Physical Consolidation

Case Study: Master Data Services for a Premier Investment Bank

Detailed Scope and Benefits

Proposed Solution Architecture for Master Data Services

Enterprise and Canonical Model for Master Data Management Implementation

Summary

Chapter 11: Big Data and Analytical Services

Topics Covered in this Chapter

Big Data

Big Data Analytics

Relationship Between DaaS and Big Data Analytics

Future Impact of DaaS on Big Data Analytics

Extending DaaS Reference Architecture for Big Data and Cloud Services

Fostering an Enterprise Data Mindset

Case Study: Big DaaS in the Automotive Industry

Summary

Part Four: Ensuring Organizational Success

Chapter 12: DaaS Governance Framework

Topics Covered in this Chapter

Role of Data Governance

Data Governance

People Governance

Process Governance

Service Governance

Technology Governance

Summary

Chapter 13: Securing the DaaS Environment

Topics covered in this chapter

Impact of Data Breach on DaaS Operations

Major Security Considerations for DaaS

Multilayered Security for the DaaS Environment

Identity and Access Management

Data Entitlements to Safeguard Privacy

Impact of Increased Privacy Regulations on Data Providers

Information Risk Management

Important Data Security and Privacy Regulations that Impact DaaS

Checklist to Protect Data Providers from Data Breaches

Summary

Chapter 14: Taking DaaS from Concept to Reality

Topics Covered in this Chapter

Service Performance Measurement Using the Balanced Scorecard

Implementing the Performance Scorecard to Improve Data Services

Embarking on the DaaS Journey with a Vision

Using AGILE Principles for New Data Services Development

Sustaining DaaS in an Organization: How to Keep the Program Going

In Conclusion

Appendix A: Data Standards Initiatives and Resources

Appendix B: Data Privacy & Security Regulations

Appendix C: Terms and Acronyms

Appendix D: Bibliography

Internet Resources and Further Reading

Index

EULA

List of Illustrations

Preface

Figure 1 Key topics covered in the book by chapter

Figure 2 Roadmap the book's different chapters

Chapter 1

Figure 1.1 Daas in the business environment

Figure 1.2 Data Service Bus

Figure 1.3 Key features of a service

Figure 1.4 Real-life example of data services sold by D&B Hoovers (company search and results)

Figure 1.5 Overview of Cloud-based Data Services

Figure 1.6 Example of Data Services provided by a leading UN Data agency

Figure 1.7 Example of DaaS in the Retail Sector

Figure 1.8 Key phases of enabling the DaaS vision phase

Figure 1.9 Data Services Blueprint: key activities and deliverables

Figure 1.10 Service Delivery Model (SDM)

Chapter 2

Figure 2.1 Accessing and sharing data with Enterprise Data Services

Figure 2.2 Identifying critical success factors

Figure 2.3 DaaS reference architecture description

Figure 2.4 Reference architecture for DaaS framework

Figure 2.5 Evolution of data analysis needs within an organization

Figure 2.6 Trade-offs between data sharing and data privacy/legal considerations

Figure 2.7 Linking DaaS reference architecture components

Chapter 3

Figure 3.1 Scope of data asset management within a typical enterprise

Figure 3.2 Key considerations for data asset management

Figure 3.3 Key categories of enterprise data and their usage in the real world

Figure 3.4 Differentiating between enterprise data and other types of data

Figure 3.5 Examples of master data elements

Figure 3.6 Reference data example for ISO-specified currency codes

Figure 3.7 Example of real-life usage of enterprise data

Figure 3.8 Role of EIM in building future DaaS roadmaps

Chapter 4

Figure 4.1 Business and technology trends driving the future growth of DaaS

Figure 4.2 Building blocks of a typical enterprise data service

Figure 4.3 Publication of enterprise data services within the DaaS framework

Figure 4.4 Getting virtual enterprise access with data services

Figure 4.5 Multi-disciplinary approach to building reusable data services

Figure 4.6 Evolving role of big data analytics

Figure 4.7 Overview of data services at Amazon

Figure 4.8 Amazon's use of standardized data services to exchange data

Chapter 5

Figure 5.1 Role of canonical models in efficient data access and exchange

Figure 5.2 Mapping a standardized data exchange model to the XML messages used

Figure 5.3 Overview of an enterprise and canonical model in a DAAS environment

Figure 5.4 Developing the canonical model

Figure 5.5 Comparing the EDM with the canonical model

Figure 5.6 Service integration and reuse with and without a canonical model

Figure 5.7 Critical success factors for developing the canonical model

Figure 5.8 Responsibility assignment matrix for canonical model-related tasks

Figure 5.9 Standard process for deployment of reusable data services across the enterprise

Figure 5.10 Sample canonical model-based XML schema of an address messaging block

Figure 5.11 Evolving role of healthcare information exchanges

Figure 5.12 Major list of healthcare EDI transactions

Figure 5.13 Healthcare claim processing using EDI

Chapter 6

Figure 6.1 Role of the business glossary in the organization

Figure 6.2 Example of metadata required by DaaS consumers in the online retail sector

Figure 6.3 Major components stored in the business glossary

Figure 6.4 Real-life example of a business glossary

Figure 6.5 Example of varied business term definitions across multiple divisions

Figure 6.6 Varying instances of product definitions

Figure 6.7 Metadata repository: Initial setup

Figure 6.8 Mapping semantic inconsistencies across customer applications in an enterprise

Figure 6.9 Structural definition of data

Figure 6.10 Illustrative example of value domains

Chapter 7

Figure 7.1 Functional components of SOA

Figure 7.2 Data service to check airline flight status

Figure 7.3 Multilayer services used for airline reservations

Figure 7.4 Data service interface is deployed based on enterprise-data definitions

Figure 7.5 Service categories and types

Figure 7.6 Comparing data integration and data virtualization approaches

Figure 7.7 Conceptual framework for data virtualization

Figure 7.8 Representative technology for hosting data services

Chapter 8

Figure 8.1 Achieving data interoperability in a real-life environment

Figure 8.2 Identifying the root cause of quality problems for a data service

Figure 8.3 Data profiling results on reference data

Figure 8.4 Data profiling process flow

Figure 8.5 Periodic data assessments can drive data quality improvement efforts

Figure 8.6 Major dimensions of DaaS quality assessment

Figure 8.7 Major categories of data standards

Figure 8.8 Usage of KPIs for data service quality monitoring and improvement

Chapter 9

Figure 9.1 An example of reference data from the airline sector

Figure 9.2 Reference data services for market data

Figure 9.3 Overview of reference data management

Figure 9.4 Example of hierarchies associated with reference data for ISO country codes and state codes

Figure 9.5 Exchange of LEI reference data using data services

Figure 9.6 Leveraging BRMS reference data management

Figure 9.7 Diagnosis codes for sprained and strained ankles: ICD-9 CM versus ICD-10 CM

Figure 9.8 Healthcare claim submission and payment exchanges

Chapter 10

Figure 10.1 Role of the enterprise data model

Figure 10.2 Virtual master data management approach leveraging data services

Figure 10.3 Role of ODS/staging area in master data services implementation

Figure 10.4 Solution blueprint for master data services in banking

Figure 10.5 DaaS supports a 360-degree view of a customer with party identifier

Figure 10.6 ELDM for customer (demographics) subject area

Figure 10.7 Canonical messaging blocks (XML) for data exchange of customer master data

Figure 10.8 Mapping XML message schemas to existing financial applications in the organization

Figure 10.9 Overview of data integration environment in the global bank

Chapter 11

Figure 11.1 Real-life applications of big data and analytics

Figure 11.2 Major components of a big data analytics solution

Figure 11.3 Kayak's price trend predictor

Figure 11.4 Integrating big data and Cloud services with enterprise data warehouse

Figure 11.5 Reference architecture components required to support big data analytics

Figure 11.6 Real-time mobile DaaS application with data visualization

Figure 11.7 Lifecycle of real-time analytical processing of big data using stream computing

Figure 11.8 Use of data virtualization in big data analytics environment

Figure 11.9 Leveraging the metadata repository for ensuring big data privacy

Figure 11.10 Data collection from automobile sensors to support big DaaS

Chapter 12

Figure 12.1 Critical pillars for IT governance to support a DaaS program

Figure 12.2 Why do we need governance?

Figure 12.3 Major components of enterprise information management (EIM)

Figure 12.4 Mission statement for DaaS governance

Figure 12.5 High-level process for governance of enterprise data

Figure 12.6 Enterprise and local governance committees

Figure 12.7 Sub-committees set up for specialized areas in data governance

Figure 12.8 Co-ordination across individual data services development teams

Chapter 13

Figure 13.1 Multi-layered security framework for data services

Figure 13.2 Multiple levels of IT security

Figure 13.3 Multiple identity profiles for a consumer

Figure 13.4 Data privacy and entitlements for sensitive client data (external)

Chapter 14

Figure 14.1 Initial assessment and planning for setting up a DaaS framework

Figure 14.2 Example of balanced scorecard for driving data service quality improvements

Figure 14.3 Data maturity curve for data providers

Figure 14.4 Role of feedback/learning process in data services scorecard initiatives

Figure 14.5 Key benefits of adopting DaaS

Guide

Cover

Table of Contents

Preface

Pages

xiii

xiv

xv

xvi

xvii

xviii

xix

xxi

xxii

xxiii

xxiv

xxv

xxvii

1

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

204

205

206

207

208

209

210

211

212

213

214

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

237

238

239

240

241

242

243

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

Guest Introduction

With the advent of social media and the Internet of Things (IoT), businesses are receiving a lot more data than they ever did in the past. The volume of data is increasing exponentially, the variety is increasing, and so is the velocity of its arrival. Companies who can analyze this data, derive insights and share their learnings across business lines within the company and with the ecosystem of partners externally in an effective manner to transform their businesses are invariably the ones who are going to win. This specific trend has been captured in Accenture's Technology Vision 2013 as “Data Velocity” and “Design for Analytics” and again in 2014 as “Data Supply Chain.” Personally, as a Managing Director of Accenture, I have seen this trend resonate with our Fortune 500 clients across Accenture's five Operating Groups: Communications, Media & High Tech, Financial Services, Health and Public Services, Resources, and Products.

Given the need to consume data from heterogeneous sources, both internal and external to a company, hosted either in premises or in the cloud, and on the flip side, to make its own data available in exactly the same reuseable form for partners to consume, companies can no longer afford to keep data locked into silos of applications, nor can they treat it as a second class object when it comes to architecting its IT infrastructure. Data needs to be decoupled from applications so that the data generated by one application can be used effectively by a completely different set of applications, and the insights generated by analyzing the data within one business line of a company can be shared with other business lines in order to maximize the Return on Investment (RoI) on the data available to the company as a whole. I have seen this happen with a leading drugstore in the United States where sharing of data between the store's loyalty program and the sales department helped better targeting of products leading to significantly increased sales.

The most effective way of sharing the data and insights is to make data a first class object in the design of IT architecture and make it available as a service. Once exposed as a service, any application, whether internal or external to a company, can consume data in a seamless manner and use it creatively to make a tangible difference to business. In fact, there are several examples of completely new businesses created across industries from healthcare to insurance to automotive to real estate, fuelled by the sharing of data in the form of APIs by a company with its ecosystem of partners; and the huge impact created, in turn, by the ecosystem on the company's existing business due to the sharing of data, leading to mutual business benefits. For example, GM exposed their OnStar Application Programming Interface (API) to power a new business service via a start-up called RelayRides that enabled individuals to rent their personal cars, thereby disrupting the rental car business. We have seen the same trend with Walgreens who is offering access to its data through a variety of APIs and Software Development Kits (SDKs) to fuel new businesses with its ecosystem of partners.

Similarly, there is a plethora of examples of how companies have successfully exploited the synergy across their business lines by sharing data and insights within the company, leading to higher efficiency and creation of new revenue streams. The previously cited example of the leading drug store sharing data between the customer loyalty program and the sales department fits this category. Thus, data sharing internally as well as externally has proven to be transformational for businesses across industries.

With business transformations happening across the globe based on the availability of huge amount of data and its analysis, this book on Data as a Service, providing a comprehensive view into the world of Data Engineering and its implications on business, is a must read for every IT professional and business leader.

SANJOY PAUL, PhD Managing Director – Accenture Technology Labs

Guest Introduction

When I wrote my first book, Data Crush, I attempted to capture the ways in which the technical innovations of mobility, Cloud computing, and big data were leading to entirely new social and business phenomena. Several of the impacts that these new technologies have had on our world are driving the demand for Data as a Service, hence I was elated when Pushpak asked me to introduce his work, that you now hold in your hands. There are three social forces that are making Data as a Service a new business imperative, and they are quantification, appification, and cloudification. Let us look at each in turn.

Quantification is the growing trend of measuring absolutely everything, across all aspects of business. I recently met the CIO of a commercial property management company that is spending over $1 billion to quantify his business. Over a two year period, his company will connect to the Internet every lightbulb in every one of their buildings. When I asked him what data he hoped to learn from these connected bulbs his response was, “I have no idea, but what I do know is that if I don't have the data there's nothing to analyze.” You will likely see this sort of pervasive data collection occurring throughout every process in every organization over the coming decade.

Appification is our growing expectation of instant gratification, at little or no cost, regardless of how irrational this expectation may be. Indeed, we are becoming so appified that we expect our needs to be met predictively. Delivering on this expectation demands that organizations not only analyze data, they must do so perpetually and rapidly. The notion thatbusiness insights only come from a Research and Development department, or from IT is outdated, because there simply is not time to push analytics to a central organization. Rather, appification means that organizations must collect, digest, and act upon data as close to the customer as possible, in both time and space.

Finally, Cloudification is the notion that the paradigm of building and owning the assets of your business has become obsolete. Cloud initially entered the world of applications with Software as a Service, and is rapidly spreading to all other aspects of business operations. More and more, companies will simply aggregate third-party services in order to meet customer needs, rather than produce those outputs themselves. Data management and analysis will follow this trend, leading to Data as a Service being the standard mode of putting data to work in organizations.

Acting upon these societal forces is challenging. Much of this mode of operating runs counter to how we have run IT for half of a century. Nonetheless, it is imperative that organizations embrace Data as a Service if they hope to remain relevant in our accelerating world. This book provides a practical, implementable approach to reaching this goal. I trust that you will find Pushpak's guidance valuable as you work to meet the new expectations of an ever-more-competitive world.

CHRISTOPHER SURDAK Engineer, ex-Rocket Scientist, Juris Doctor,Technology Evangelist and author of “Data Crush,”GetAbstract's International Book of the Year for 2014

Preface (Includes the Reader's Guide)

Typically, once every couple of decades a disruptive new technology emerges that fundamentally changes the business landscape. Innovative, high tech products that often start a trend come to the mainstream market with such rapidity that they transform the existing way of doing business. These trends also create a new market that eventually disrupts the existing market and related network, often displacing the earlier technology.

In most cases, organizations that understand underlying competitive dynamics of innovation and who adapt to these disruptive trends, win. Today such fundamental shifts take place in the world of data and analytics daily, and they are changing the global business landscape significantly.

If one closely observes the global marketplace, it is safe to say that many businesses are trying to harness an unprecedentedly large amount of data to derive new insights that support their competitive analyses. A huge amount of data that is gathered from diverse channels (e.g., social media, clickstream analysis) need to be translated by businesses to enable concrete actions. Organizations that understand the competitive dynamics at play and those that can then predictively analyze that data will win, whereas those that fail to recognize this challenge and respond to it will become extinct.

While data has always been considered an essential part of IT infrastructure across most organizations to support their business operations, today it is recognized as the key commodity upon which an enterprise runs its business and day-to-day operations. A complete paradigm shift has occurred in which data is increasingly recognized as an asset that can be commercially sold as a service, in and of itself.

Based on the author's first-hand experience and expertise, this book offers a proven framework for sharing core enterprise data using reusable data services. The book covers how organizations can generate business revenues by providing Data as a Service to their clients for fee-based subscriptions. The book goes on to explain in detail how to acquire and distribute data across heterogeneous platforms effectively using enterprise SOA principles, industry data standards, and leveraging new technologies such as data virtualization, cloud, and big data stream computing. The book also offers the following:

Presents a comprehensive approach for introducing Data as a Service (DaaS) in any organization for the first time.

Recommended best practices and industry standards for sharing master, reference, and big data with data consumers.

Commercialization aspects of Data as a Service and its potential for generating revenues.

Covers real-world applications of DaaS such as big Data as a Service.

Real-life case studies on various innovative architecture blueprints and related patterns.

The topics covered in this book are wide ranging, starting with a presentation on the need for providing DaaS and the technical challenges involved in making that transformation. Some of the areas of the book that may particularly appeal to readers include:

How DaaS can become a strategic enabler for sharing data with customers on company products they are interested in purchasing, browsing online, or viewing on social media.

How the DaaS framework can help many organizations recognize monetizable intent and dependency of their customers on accessing their data while buying their company products.

How enhanced on-demand data services can lead to potential clients by organizations that plan on mining customer, social media, and online conversations over a big data platform, using sophisticated predictive algorithms and data analytics tools.

How to adopt best practices for successfully deploying reusable data services in your organization along with a reference architecture comprising common sets of data standards, guidelines, and processes.

Covering so much ground—from canonical modeling to data governance and XML based services—can be challenging for some readers, so the book offers a roadmap to help guide you through it.

The Reader's Guide

The Reader's Guide is provided to help readers determine who should read the book and why they need to read the book. A summary of each chapter to explain the step-by-step approach required for the successful introduction of DaaS in any organization is also provided.

The successful adoption of DaaS in any organization is based on three fundamental areas—architecture, adopting organizational processes, and ensuring the appropriate technology components are deployed. However, this should be based on real-world experiences and lessons learned from prior IT/DaaS implementations. This is one of the reasons this book includes case studies in several chapters.

The next section will guide readers on how best to use the book by sharing details of every chapter. It will also help guide readers to determine the best approach to use the DaaS framework in their current IT landscape within their organization. Figures 1.1 and 1.2 illustrate key topics in the book along with the suggested roadmap.

Figure 1.1 Key topics covered in the book by chapter

Figure 1.2 Roadmap the book's different chapters

PART 1: Overview of Fundamental Concepts Includes Chapters 1 to 3

The introductory section of the book introduces you to Data as a Service (DaaS). It also provides readers with a clear overview on how an organization can deliver on the promise of providing DaaS to its business stakeholders and end customers.

Chapter 1: “Introduction to DaaS” provides a high-level overview on the core concepts of the DaaS framework. It also explores commercialization aspects of Data as a Service, its immense potential for generating revenues for most organizations, as well as some of its common limitations. It describes the details of service delivery management while suggesting necessary key steps for preparing the blueprint for enterprise data services in your organization.

Chapter 2: “DaaS Strategy and Reference Architecture” provides an overview of DaaS reference architecture along with the key components that make up the DaaS framework. It also explains the long-term significance of formally creating an enterprise data strategy in an organization that formulates a long-term roadmap to deliver Data as a Service (DaaS).

Chapter 3: “Data Asset Management” explores the significance of enterprise data and the foundational role it plays to make enterprise data services successful in any organization. It explains the underlying principles of data asset management and why companies need to treat data as a corporate asset. It also examines the various major types of enterprise data and contrasts their major features.

PART 2: DaaS Architecture Framework and Components Includes Chapters 4 to 8

This section of the book focuses on the architecture framework and components required to deploy DaaS in your organization. It also describes in detail common patterns, standards, and processes that can help shape the DaaS Reference Architecture. This section also provides readers with a high-level overview on best practices from a few related disciplines (e.g., EIM, EA, SOA, data services) to make DaaS a scalable data delivery mechanism for organizations.

Chapter 4: “Enterprise Data Services” describes the core concepts about enterprise data services as a fundamental component of the DaaS framework. It illustrates with examples how several organizations have successfully developed a set of standardized service interfaces (termed EDS) to enable data sharing with their various stakeholders (customers, vendors, regulatory agencies, government, etc.).

Chapter 5: “Enterprise and Canonical Modeling” explains the significance of enterprise and canonical modeling and its foundational role to promote consistent and reliable data exchange across disparate systems spread out over the organization. It also explains the significance of the enterprise data model (EDM) as the foundational component required for building a robust and mature set of data structures that can be reused across the entire organization.

Chapter 6: “Business Glossary for DaaS” environment provides a detailed overview of the underlying reasons why organizations need to develop a standardized business glossary for data services published for user consumption. Storing glossary terms in a shared metadata repository across the organization will improve the overall productivity of both the businesses and the external subscribers to enterprise data services (EDS).

Chapter 7: “SOA and Data Integration” provides a high-level overview on key data acquisition and integration patterns with service-oriented architecture (SOA) as the underlying foundation. It also covers a few technologies, e.g., data virtualization, stream computing for big data, data federation, which can be leveraged by the DaaS framework to publish data services with enhanced efficiency, performance, and a scalable architecture.

Chapter 8: “Data Quality and Standards” provides details on how to ensure that the quality of data published by enterprise data services is suitable and fit for public consumption. It explains the significance of data standards for the success of any DaaS program. The chapter also discusses the role of data profiling as a foundational process for the success of any DaaS quality program. Finally, it looks at some of the major data profiling and quality measures that are critical for implementing a DaaS project in real life.

PART 3: DaaS Solution Blueprints Includes Chapters 9 to 11

This section of the book provides a number of important solution blueprints where the DaaS framework can benefit organizations across several industries. Solution blueprints of data services can be very useful for readers as they can help explain the relationship between the architecture patterns explained earlier to the specific business requirements of organizations to exchange various types of enterprise data. Solution blueprints are based on the DaaS reference architecture also explained in the earlier sections of the book. Finally, this section covers a variety of real-life case studies on how organizations have successfully utilized the DaaS framework and its architectural patterns to improve their business efficiency over the long term.

Chapter 9: “Reference Data Services” presents a detailed overview on how DaaS can be deployed successfully in organizations for disseminating shared reference data to downstream data subscribers and consumers. It also presents real-life case studies on reference data services from the financial and healthcare sectors.

Chapter 10: “Master Data Services” provides a detailed architectural pattern for designing and developing Master Data Services (MDS) that can be reused across an enterprise by using common design components and standards. It also evaluates how MDS can be utilized by organizations as an effective alternative to the existing styles of MDM implementation without physically consolidating master data in a single hub. A detailed case study on a MDS implementation at a large financial institution is presented.

Chapter 11: “Big Data and Analytical Services” explains how big data analytics users can leverage data services to access data they need for advanced analytics and take decisions in real time. This chapter includes several case studies presented from organizations that have successfully implemented big data and mobile-based analytics services, leveraging the DaaS framework. It provides a detailed solution blueprint for designing and developing big Data as a Service that can be reused across the enterprise by using the design components and standards proposed under the DaaS framework.

PART 4: Ensuring Organizational Success Includes Chapters 12 to 14

Introducing DaaS is uncharted territory for many organizations. Not all businesses are likely to face the same urgency for providing Data as a Service to their consumer, nor will they encounter the same challenges. An organizational roadmap has been included containing several best practices with respect to DaaS program management and service delivery-related aspects. Adopting these best practices and guidelines will ensure that the DaaS program continues to be useful and provides business value to stakeholders over the long term.

Chapter 12: “DaaS Governance” explores the critical nature of data governance in DaaS and how people, process, and technology factors can be leveraged to successfully deploy data services within any organization. This chapter also suggests various governance policies and controls that an organization can utilize to track and monitor the overall user experience while using a reusable enterprise data service (EDS). It examines the emerging role of the chief data officer (CDO) across organizations, as a key change agent to align data initiatives with the business strategy of an organization.

Chapter 13: “Securing the DaaS Environment” explains why data security and privacy-related issues have become such a critical consideration for any organization interested in publishing data services. It also demonstrates the key features of a comprehensive information risk management program that can mitigate risks to the DaaS program. It provides a practical list of data security and privacy measures that can be deployed by any organization planning to set up DaaS operations.

Chapter 14: “Taking DaaS from Concept to Reality” discusses best practices with respect to DaaS project management and delivery. Adopting these best practices and guidelines will ensure that the DaaS program continues to be useful and relevant to stakeholders over the long term. It discusses the benefits of employing AGILE methodology for new data services development as an alternative to the traditional software development life cycle. The chapter also illustrates steps to build a DaaS performance scorecard monitoring overall service performance of a data provider organization.

Again, I strongly reiterate that adopting DaaS will decouple data from underlying business and application complexities, although technology constraints will not become entirely irrelevant. The flexibility gained from the de-coupling, should help IT organizations react more flexibly and quickly to technological changes. At the same time, business decision makers can focus on what they really need from their data organization and not how they circumvent their existing system or platform-related constraints. As is explained with numerous illustrative examples from the real-world, DaaS can potentially also offer a new monetization capability to some organizations by leveraging data as a revenue generating service. In short, reading this book will provide an excellent overview to the exciting possibilities of leveraging data assets in your organization as well as uncover its inherent commercial value in the business market.

Who Should Read this Book

This book should appeal to any practitioner interested in implementing or selling the value of the DaaS program to business stakeholders. It should be of value to a diverse business and technical audience, ranging from business executives to experienced IT architects to those new to the topic of DaaS. Given the wide range of readers, who may benefit from reading this book, there is no pre-determined order or sequence suggested on how to read it.

Some of the ways this book can be useful to specific reader communities are listed here.

Business executives: If you are a stakeholder responsible for providing direction or governing data in your organization, then this book gives you an excellent overview of the exciting possibilities to leverage your organization's data so as to meet the needs of your consumers as well as formulate the economic value proposition of providing Data as a Service. If your organization has plans to become a DaaS service provider, this book will help you understand the requirements of your data customers and suggest service-based solutions that can help address the customer's data needs.

Enterprise architects: If you are an enterprise architect, the book provides a good introduction to the key enterprise design considerations while developing a data services strategy. In addition to this benefit, you will learn how DaaS can add to your overall business strategy, by ensuring long-term improvements to the data infrastructure of an enterprise.

Data architects: If you are a data architect, this book gives you valuable advice on the design of a valuable data foundation layer. You will learn how to ensure long-term improvements to the data infrastructure of an enterprise while leveraging the DaaS framework for fulfilling the master data, reference data, and analytical data needs of your consumers.

SOA architects: If you are a SOA/data services architect, this book provides detailed guidelines on how to apply various technology and architecture patterns while deploying DaaS in your organization. It will also make you aware of the various data security standards and best practices to ensure integrity of published data services.

IT applications designers or developers: If you are an experienced applications designer or developer, then you will find this book useful to understand the entire process of developing data services with an awareness on the specific benefits of data reuse and how reusing service patterns can help with quicker deployment of applications in your organization. The book also gives practical advice and detailed guidelines on how your business applications can save development time and costs by leveraging reusable data services.

Systems management and IT/MIS students: If you are relatively unfamiliar with the role of data in IT Systems Management, this book provides you an excellent introduction to key data related disciplines like enterprise modeling, data governance, metadata, and SOA from a data practitioner's perspective.

What Is Not Covered in this Book

As mentioned earlier, this book should serve most readers as a comprehensive guide for setting up DaaS in their organizations. While the book attempts to cover all the key business and technical aspects of DaaS, one size rarely fits all. Subsequently, the book does not attempt to cover any physical implementation or related details such as those recommended by software products and vendor tools that are specific to your individual organization's needs. There are several organizational and IT aspects that are unique to every industry and country regarding implementation and deployment of DaaS solutions. Therefore, such detailed decision-making at the organizational level is best left to the people who know their organization needs closely. However, guidance has been provided throughout this book on how to address some of these implementation challenges from a larger perspective.

Acknowledgments

The creation of this book on such a complex and innovative area such as Data as a Service required the participation and support of a number of individuals. In fact, this book would not have been possible without their active support and encouragement.

I want to thank a number of thought leaders in data management, architecture, and analytics who have provided me their guidance and insights while writing the book: John Zachman, Prof. Peter Aiken, Dr. Sanjoy Paul, Aaron Zornes, Steve Hoberman, Krish Krishnan, and John Ladley.

I also want to thank Shiraz Kassam, Dr. Arka Mukherjee, and Dr. S. Kaisar Alam for helping me stay inspired while writing this book and sustain the effort. I want to acknowledge the contributions of Prithvijit Mazumder and Aditya Mehta in helping review and enhance various portions of the work. I want to thank Ms. Shreya Sarkar for her terrific edits to the initial manuscript and also the editorial team from Wiley, Mary Hatcher and Brady Chin, for their continued advice, help, and support during the authorship of this book.

Last but not least, I owe special gratitude to my family and friends for their time, patience, encouragement, and support in innumerable ways.

Part OneOverview of Fundamental Concepts

Chapter 1Introduction to DaaS

Topics covered in this chapter

This chapter introduces the Data as a Service (DaaS) framework and the approach taken by several organizations to introduce DaaS into their organization.

It provides an introductory overview of the underlying drivers for transformation of data as a monetized asset and evaluates how commercial trends in the marketplace will further drive this service trend.

It also suggests several key steps for preparing the blueprint for Enterprise Data Services in your organization. These steps include establishing a service delivery model (SDM) comprised of a service catalog, service governance, and a resourcing strategy.

Finally, this chapter looks at commercialization aspects of data as a service, its potential for generating revenues as well as some of its common limitations.

The most profound technologies are those that disappear. They weave themselves into the fabric of our everyday life until they are indistinguishable from it.

—Late Prof. Mark Weiser (Father of Ubiquitous Computing)

This book offers a huge undertaking to its readers. It aims to offer a definitive roadmap on how to significantly transform your organization by providing Data as a Service (DaaS) to consumers of your data across the enterprise. It also suggests ways to explore the promise of data and its expanded role as a strategic business enabler.

Using DaaS as the unifying conceptual framework, the book shows readers how they can successfully integrate distributed systems across heterogeneous platforms virtually and publish data to subscribers securely using industry data standards and governance mechanisms.

This introductory chapter provides an overview of the exciting possibilities around leveraging reusable data services across any organization as well as the economic value proposition of providing DaaS to your customers. It also explains the overall approach and necessary steps for any data provider to establish a service delivery model (SDM) for offering DaaS to subscribers.

Data-Driven Enterprise

In the words of Peter Drucker, a world-renowned management visionary, an information-based organization requires “clear, simple, and common objectives that translate into actions.”

In this chapter, we examine what these guiding objectives are and how they define the new persona of a successful information-based organization.

The DaaS framework presented in this book entails a paradigm shift in a fundamental sense, a shift that can help any organization transform itself into a data services-driven organization. Indeed, the DaaS framework can offer end users the capability to have convenient and timely access to data from multiple, heterogeneous data sources within the company as reusable data services. These data services can be useful to external and internal data subscribers, business partners, regulatory agencies, etc., (Figure 1.1). Additionally, this capability can be leveraged by some organizations interested in becoming commercial data providers, by publishing data for their customers and subscribers as a marketable service.

Figure 1.1 Daas in the business environment

For example, if we look at the high-tech sector, the underlying shift toward IT services is being driven by new advances in technology and its resulting societal consequences. In effect, many organizations need to change how they do business. They will need to respect demands from an increasingly tech-savvy generation of customers who now spend more time interacting with each other on mobile devices, through texting, and on social media sites.

All these factors have created a marketplace that will be dominated by organizations that understand new trends driving the global market. Organizations need to anticipate these changes before their competitors do and provide services rapidly whenever requested by their customers. Companies that undergo this business transformation are data-driven enterprises.

Concept of a Data Service Bus

To become more prompt and effective in responding to business or market demands, any service-based organization needs to place a larger emphasis on information sharing. The challenges faced while exchanging data usually result from a fragmented data environment made up of different platforms having no common standards. Consequently, the data entities and attributes of these systems often do not share the same syntax and semantics or even a common meaning, which is a necessary condition for systems to reliably share information. Currently, the majority of systems also have not been designed for data interoperability and sharing. This is where the DaaS framework can enhance the implementation of data services with the basic concept of a real-world Data Service Bus. The Data Service Bus can act as a key foundation for data reuse in any DaaS deployment.

For effective sharing of enterprise data across divisions, it is essential for large organizations to build an underlying data foundation (similar to a bus architecture) that provides a consistent view of enterprise-level data in the organization. The concept of a data service bus, which is a logical data abstraction layer created at the enterprise level, can act as a foundation for virtually sharing and reusing information across IT applications. However, it should not be confused with the enterprise service bus (ESB). In some ways, the Data Service Bus can be compared to a data broker that facilitates exchange of enterprise data from a DaaS Provider, or Data Provider, to its subscribers.

In my view, the true potential of DaaS can be realized by an organization if it sets up a well-architected Data Service Bus, comprising common data modules for reuse by downstream applications and customers as well as using standardized Enterprise Data Services. In addition to the data foundation layer, successful DaaS deployments also need to maintain standardized business logic and rules to process data that downstream systems can exploit (Figure 1.2).

Figure 1.2 Data Service Bus

To align the Data Service Bus with long-term business strategy, an organization interested in setting up DaaS should also establish an overall data strategy that integrates data from both internal and external data sources (social media, twitter feeds, etc.). Also recommended are the adoption of a few architectural principles and goals that will enable data sharing and interoperability across the enterprise as part of the DaaS architectural framework. This topic is explained in greater detail in Chapter 2 of this book.

Let us now try to understand the concept of a data-driven organization and what it means in the context of data-oriented services.

Defining a Service

Over the last few years, businesses have increasingly felt pressure to transform into providers of value-added services. Often, these services become necessary for customers to fulfill some of their daily needs. This concept is not entirely new or radically different from the traditional definition of a service. As per the Merriam-Webster's Collegiate dictionary, service is defined as a “facility supplying some public demand.” Consequently, in real life, we find the utility company providing households with water or electricity services. Similarly, a life insurance company exists in the service marketplace, primarily for fulfilling the need felt by most people for security and well-being (Figure 1.3).

Figure 1.3 Key features of a service

Any type of service displays a few common characteristics:

It provides the means of providing a clear

value

to customers.

It facilitates

outcomes

that the customers want to achieve.

It is delivered through a few

capabilities

, while managing associated

risks

.

Service Taxonomy and Decomposition

In the context of DaaS, a data service is referred to as a remotely accessible, self-contained module that provides data to authorized service consumers to help them carry out their business. Consumers can access the service in a standardized manner that is well documented and listed in a service catalog. The catalog can provide consumers with the ability to find whether a service exists and its functionality.

Drivers for Providing Data as a Service

The increasing pressure to provide data services to customers is being confronted by organizations around the world. Along with other business drivers, this pressure is often caused by several technology advances in the IT sector.

Engaging Customers with Data-Driven Choices

Over the last few years, we have witnessed a large trend toward “social shopping.” Many online shoppers embrace the social-media ecosystem as their preferred channel. These shoppers usually conduct their own informal research by browsing products that they need or they find the latest products or services through what others find interesting on social media. For example, Facebook makes this process quite convenient by registering our likes and dislikes. Shoppers then compare online prices offered by different retailers, before committing to their actual purchase. Consequently, with this trend, a larger segment of customers have become dependent on the social network ecosystem and their online behavior will affect businesses on a significant scale in the future (Shih, 2009).

As an outcome of this new trend, customers are likely to feel encouraged by taking a more proactive role themselves, while deciding on their day-to-day purchases. Over the past few years, several online retailers (e.g., Amazon, Groupon, Alibaba) are seeing huge growth in their business globally, by providing customers with useful data that can help them decide on what products to purchase. In the face of new competition, many traditional retailers such as Walmart and Target have also followed suit. Similarly, supermarket chains such as UK-based Tesco have grown to be a market leader in recent years by transforming themselves to data-driven enterprises.

Leveraging data, predictive analytics, and customer insight have become part of retailers' competitive weaponry. In most of these cases, however, the customer has become the real beneficiary because they can now take fuller advantage of personalized discounts and reward coupons offered by web-based and traditional retailers.

Monetization

While the majority of business organizations offer DaaS to their customers as a complimentary service, some companies have been able to identify corporate data assets that they can rent to customers on a fee-based model also called monetization. Using monetization, several data providers within the DaaS market have generated revenues to seize initiative and grow their data services commercially. A good example of a business monetizing DaaS in the current market is Dun & Bradstreet (D&B), in particular, a subsidiary named Hoovers (Figure 1.4). This pioneer organization provides business data to their corporate clients and individual subscribers for a specific service fee. The D&B Hoovers website can stream data to its client organizations in the form of a list of specific leads, which go directly to sales teams who then contact people to make sales. There are several other firms in the market who have also been taking the lead as DaaS pioneers, providing various kinds of data services to interested subscribers. Some of these data services range from providing financial data to supplying data on a manufacturer's parts catalog for distributors as part of the supply-chain and logistics management (Soderling, 2010).

Figure 1.4 Real-life example of data services sold by D&B Hoovers (company search and results)

Another good example of an organization monetizing DaaS in the current market is cloud-based data services provider Treasure Data, a company recently named among the coolest big data vendors by Gartner. This company provides DaaS to several clients charging them a flat monthly rate for data offerings.

As part of their services, Treasure Data collects, manages, and analyzes massive volumes of big data for their clients (Figure 1.5). They can also store the client's data on the Cloud, based on a pre-built data model that supports easy data integration and export (storing different types of data formats).

Figure 1.5 Overview of Cloud-based Data Services

The data provider can quickly set up the data requirements for their client in the cloud environment in a matter of weeks. The client can then focus on analyzing data without worrying about database administration or the other underlying DBinfrastructure-related maintenance issues. This includes 24-hour support and monitoring, seven days a week, after the initial implementation.

Public Sector and Government

Today, a similar story is taking shape in the public sector and government. Data is delivered by these agencies to their consumers in several innovative ways. For example, the United Nations Statistics Division now provides statistical data as an online data service to its members across the world (Figure 1.6). They disseminate information on country-specific statistics such as country gross domestic product (GDP), population, education, life expectancy, crime, and so on.

Figure 1.6 Example of Data Services provided by a leading UN Data agency

Similarly, several community-based organizations in the healthcare sector are creating results from big-data analyses of patient data accessible to physicians and healthcare workers in real-time through data services to save innumerable lives. A prime example of this was witnessed recently when Harvard's HealthMap service (http://healthmap.org) spotted the Ebola outbreak and alerted the medical community before the World Health Organization formally announced the epidemic. HealthMap's role in tracking Ebola was heavily dependent on using big data analytics to harness public health information. HealthMap compiles, collates, and creates a visual report of global disease outbreaks, after sifting through millions of social media posts from health care workers in the affected African countries blogging about their work.

Technology Shift

Finally, the advent of new technology (e.g., mobile computing, big data) will expand exponentially as a higher number of customers in the world become more tech-savvy. For example, in the insurance sector, customers are finding it convenient to use automobile insurers such as geico.com