The Data Industry - Chunlei Tang - E-Book

The Data Industry E-Book

Chunlei Tang

0,0
70,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Provides an introduction of the data industry to the field of economics

This book bridges the gap between economics and data science to help data scientists understand the economics of big data, and enable economists to analyze the data industry. It begins by explaining data resources and introduces the data asset. This book defines a data industry chain, enumerates data enterprises’ business models versus operating models, and proposes a mode of industrial development for the data industry. The author describes five types of enterprise agglomerations, and multiple industrial cluster effects. A discussion on the establishment and development of data industry related laws and regulations is provided. In addition, this book discusses several scenarios on how to convert data driving forces into productivity that can then serve society. This book is designed to serve as a reference and training guide for ata scientists, data-oriented managers and executives, entrepreneurs, scholars, and government employees.

  • Defines and develops the concept of a “Data Industry,” and explains the economics of data to data scientists and statisticians
  • Includes numerous case studies and examples from a variety of industries and disciplines
  • Serves as a useful guide for practitioners and entrepreneurs in the business of data technology

The Data Industry: The Business and Economics of Information and Big Data is a resource for practitioners in the data science industry, government, and students in economics, business, and statistics.

CHUNLEI TANG, Ph.D., is a research fellow at Harvard University. She is the co-founder of Fudan’s Institute for Data Industry and proposed the concept of the “data industry”. She received a Ph.D. in Computer and Software Theory in 2012 and a Master of Software Engineering in 2006 from Fudan University, Shanghai, China.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 417

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



TABLE OF CONTENTS

COVER

TITLE PAGE

COPYRIGHT

BIBLIOGRAPHY

DEDICATION

ENDORSEMENTS

PREFACE

CHAPTER 1: WHAT IS DATA INDUSTRY?

1.1 DATA

1.2 INDUSTRY

1.3 DATA INDUSTRY

CHAPTER 2: DATA RESOURCES

2.1 SCIENTIFIC DATA

2.2 ADMINISTRATIVE DATA

2.3 INTERNET DATA

2.4 FINANCIAL DATA

2.5 HEALTH DATA

2.6 TRANSPORTATION Data

2.7 TRANSACTION DATA

CHAPTER 3: DATA INDUSTRY CHAIN

3.1 INDUSTRIAL CHAIN DEFINITION

3.2 INDUSTRIAL CHAIN STRUCTURE

3.3 INDUSTRIAL CHAIN FORMATION

3.4 EVOLUTION OF INDUSTRIAL CHAIN

3.5 INDUSTRIAL CHAIN GOVERNANCE

3.6 THE DATA INDUSTRY CHAIN AND ITS INNOVATION NETWORK

CHAPTER 4: EXISTING DATA INNOVATIONS

4.1 WEB CREATIONS

4.2 DATA MARKETING

4.3 PUSH SERVICES

4.4 PRICE COMPARISON

4.5 DISEASE PREVENTION

CHAPTER 5: DATA SERVICES IN MULTIPLE DOMAINS

5.1 SCIENTIFIC DATA SERVICES

5.2 ADMINISTRATIVE DATA SERVICES

5.3 INTERNET DATA SERVICES

5.4 FINANCIAL DATA SERVICES

5.5 HEALTH DATA SERVICES

5.6 TRANSPORTATION DATA SERVICES

5.7 TRANSACTION DATA SERVICES

CHAPTER 6: DATA SERVICES IN DISTINCT SECTORS

6.1 NATURAL RESOURCE SECTORS

6.2 MANUFACTURING SECTOR

6.3 LOGISTICS AND WAREHOUSING SECTOR

6.4 SHIPPING SECTOR

6.5 REAL ESTATE SECTOR

6.6 TOURISM SECTOR

6.7 EDUCATION AND TRAINING SECTOR

6.8 SERVICE SECTOR

6.9 MEDIA, SPORTS, AND THE ENTERTAINMENT SECTOR

6.10 PUBLIC SECTOR

CHAPTER 7: BUSINESS MODELS IN THE DATA INDUSTRY

7.1 GENERAL ANALYSIS OF THE BUSINESS MODEL

7.2 DATA INDUSTRY BUSINESS MODELS

7.3 INNOVATION OF DATA INDUSTRY BUSINESS MODELS

CHAPTER 8: OPERATING MODELS IN THE DATA INDUSTRY

8.1 GENERAL ANALYSIS OF THE OPERATING MODEL

8.2 DATA INDUSTRY OPERATING MODELS

8.3 INNOVATION OF DATA INDUSTRY OPERATING MODELS

CHAPTER 9: ENTERPRISE AGGLOMERATION OF THE DATA INDUSTRY

9.1 DIRECTIVE AGGLOMERATION

9.2 DRIVEN AGGLOMERATION

9.3 INDUSTRIAL SYMBIOSIS

9.4 WHEEL-AXLE TYPE AGGLOMERATION

9.5 REFOCUSING AGGLOMERATION

CHAPTER 10: CLUSTER EFFECTS OF THE DATA INDUSTRY

10.1 EXTERNAL ECONOMIES

10.2 INTERNAL ECONOMIES

10.3 TRANSACTION COST

10.4 COMPETITIVE ADVANTAGES

10.5 NEGATIVE EFFECTS

CHAPTER 11: A MODE OF INDUSTRIAL DEVELOPMENT FOR THE DATA INDUSTRY

11.1 GENERAL ANALYSIS OF THE DEVELOPMENT MODE

11.2 A BASIC DEVELOPMENT MODE FOR THE DATA INDUSTRY

11.3 AN OPTIMIZED DEVELOPMENT MODE FOR THE DATA INDUSTRY

CHAPTER 12: A GUIDE TO THE EMERGING DATA LAW

12.1 DATA RESOURCE LAW

12.2 DATA ANTITRUST LAW

12.3 DATA FRAUD PREVENTION LAW

12.4 DATA PRIVACY LAW

12.5 DATA ASSET LAW

REFERENCES

INDEX

END USER LICENSE AGREEMENT

Pages

v

ix

xix

xx

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

135

136

137

138

139

140

141

142

143

144

145

147

148

149

150

151

152

153

154

155

156

157

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

183

184

185

186

187

188

189

190

191

192

193

194

195

Guide

Cover

Table of Contents

Preface

Begin Reading

List of Illustrations

CHAPTER 1: WHAT IS DATA INDUSTRY?

Figure 1.1 DIKW pyramid. Reproduced by permission of Gene Bellinger

Figure 1.2 Advantages of managing data assets. Reproduced by permission of Wiley [13]

Figure 1.3 Structure of the data industry

CHAPTER 2: DATA RESOURCES

Figure 2.1 Evolution of the blog

CHAPTER 3: DATA INDUSTRY CHAIN

Figure 3.1 Data industry chain

Figure 3.2 Evolution of data industry chain

CHAPTER 5: DATA SERVICES IN MULTIPLE DOMAINS

Figure 5.1 Possible congestion contributing factors

CHAPTER 7: BUSINESS MODELS IN THE DATA INDUSTRY

Figure 7.1 Business model building blocks: 4 pillars and 9 main elements. Adapted from [58]

Figure 7.2 Ways of value creation and acquisition. Adapted from [61]

CHAPTER 9: ENTERPRISE AGGLOMERATION OF THE DATA INDUSTRY

Figure 9.1 Directive agglomeration. The dashed outline shows a location area with certain resources, the × symbol shows the product-specific interactions, and the triangles the locations of the enterprises

CHAPTER 10: CLUSTER EFFECTS OF THE DATA INDUSTRY

Figure 10.1 Measurement scale of an enterprise's efficiency

Figure 10.2 Coopetition between data industry and traditional industry players

THE DATA INDUSTRY: THE BUSINESS AND ECONOMICS OF INFORMATION AND BIG DATA

 

 

CHUNLEI TANG

 

 

 

Copyright © 2016 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Names: Tang, Chunlei, author.

Title: The data industry : the business and economics of information and big data / Chunlei Tang.

Description: Hoboken, New Jersey : John Wiley & Sons, 2016. | Includes bibliographical references and index.

Identifiers: LCCN 2015044573 (print) | LCCN 2016006245 (ebook) | ISBN 9781119138402 (cloth) | ISBN 9781119138419 (pdf) | ISBN 9781119138426 (epub)

Subjects: LCSH: Information technology–Economic aspects. | Big data–Economic aspects.

Classification: LCC HC79.I55 T36 2016 (print) | LCC HC79.I55 (ebook) | DDC 338.4/70057–dc23

LC record available at http://lccn.loc.gov/2015044573

BIBLIOGRAPHY

The data industry is a reversal, derivation, and upgrading of the information industry that touches nearly every aspect of modern life. This book is written to provide an introduction of this new industry to the field of economics. It is among the first books on this topic. The data industry ranges widely. Any domain (or field) can be called a “data industry” if it has a fundamental feature: the use of data technologies. This book (1) explains data resources; (2) introduces the data asset; (3) defines a data industry chain; (4) enumerates data enterprises' business models and operating model, as well as a mode of industrial development for the data industry; (5) describes five types of enterprise agglomeration, and multiple industrial cluster effects; and (6) provides a discussion on the establishment and development of data industry related laws and regulations.

DEDICATION

To my parents, for their tireless support and love

To my mentors, for their unquestioning support of my moving forward in my way

ENDORSEMENTS

“I have no doubt that data will become a fundamental resource, integrated into every fiber of our society. The data industry will produce incredible value in the future. Dr. Tang, a gifted young scientist in this field, gives a most up-to-date and systematic account of the fast-growing data industry. A must read of any practitioner in this area.”

Chen, Yixin Ph.D., Full Professor of Computer Science and Engineering, Washington University in St Louis

“Data is a resource whose value can only be realized when analyzed effectively. Understanding what our data can tell us will help organizations lead successfully and accelerate business transformation.” This book brings new insights into how to best optimize our learning from data, so critical to meeting the challenges of the future.

Volk, Lynn A., MHS, Associate Director, Clinical and Quality Analysis, Information Services, Partners HealthCare

PREFACE

In late 2009 my doctoral advisor, Dr. Yangyong Zhu at Fudan University published his book Datalogy, and sent me a copy as a gift. On the title page he wrote: “Every domain will be implicated in the development of data science theory and methodology, which definitely is becoming an emerging industry.” For months I probed the meaning of these words before I felt able to discuss this point with him. As expected, he meant to encourage me to think deeply in this area and plan for a future career that combines my work experience and doctoral training in data science.

Ever since then, I have been thinking about this interdisciplinary problem. It took me a couple of years to collect my thoughts, and an additional year to write them down in the form of a book. I chose to put “data industry” in the book's title to impart the typical resource nature and technological feature of “data.” That manuscript was published in Chinese in 2013 by Fudan University Press. In the title The Data Industry, I also wanted to clarify the essence of this new industry, which expands on the theory and concepts of data science, supports the frontier development of multiple scientific disciplines, and explains the natural correlation between data industrial clusters and present-day socioeconomic developments.

With the book now published, I intend to begin my journey into healthcare, with an ultimate goal of achieving the best in experience for all in healthcare through big data analytics. To date, healthcare has been a major battlefield of data innovations to help upgrade the collective human health experiences. In my postdoctoral research at Harvard, I work with Dr. David W. Bates, an internationally renowned expert on innovation science in healthcare. My focus is on commercialization-oriented healthcare services, and this has led to my engagement in several activities including composing materials of healthcare big data, proposing an Allergy Screener app, and designing a workout app for Promoting Bones Health in Children. However, there still exists a gap between data technology push and medical application pull. At present, many clinicians consider commercialization of healthcare data application to be irrelevant, and do not know how to translate research into technology commercialization, despite the fact that “big data” is at the peak of inflated expectations in Gartner's Hype Cycles. To address this gap, I plan to rewrite my book in English, mainly to address many of the shifting opinions, my own included.

Data science is an application-oriented technology as its developments are driven by the needs of other domains (e.g., financial, retail, manufacturing, medicine). Instead of replacing the specific area, data science serves as the foundation to improve and refine the performance of that area. There are two basic strengths of data technologies: one is its ability to promote the efficiency and increase the profit of existing industrial systems; the other is its application to identify hidden patterns and trends that cannot be found utilizing traditional analytic tools, human experience, or intuition. Findings concluded from data combined with human experience and rationality, are usually less influenced by prejudices. In my forthcoming book, I will discuss several scenarios on how to convert data-driven forces into productivities that can serve society.

Several colleagues have helped me in writing and revising this book, and have contributed to the formation of my viewpoints. I want to extend my special thanks to them for their valuable advice. Indeed, they are not just colleagues but dear friends Yajun Huang, Xiaojia Yu, Joseph M. Plasek, and Changzheng Yuan.

CHAPTER 1WHAT IS DATA INDUSTRY?

The next generation of information technology (IT) is an emerging and promising industry. But, what's truly the “next generation of IT”? Is it the next generation mobile networks (NGMN), Internet of Things (IoT), high-performance computing (HPC), or is it something else entirely? Opinions vary widely.

From the academic perspective, the debates, or arguments, over specific and sophisticated technical concepts are merely hype. How so? Let's take a quick look at the essence of information technology reform (IT reform) – digitization. Technically, it is a process that stores “information” that is generated in the real world from the human mind in digital form as “data” into cyberspace. No matter what types of new technologies emerge, the data will stay the same. As the British scholar Viktor Mayer-Schonberger once said [1], it's time to focus on the “I” in the IT reform. “I,” as information, can only be obtained by analyzing data. The challenge we expect to face is the burst of a “data tsunami,” or “data explosion,” so data reform is already underway. The world of “being digital,” as advocated some time ago by Nicholas Negroponte [2], has been gradually transformed to “being in cyberspace.”1

With the “big data wave” touching nearly all human activities, not only are academic circles resolved to change the way of exploring the world as the “fourth paradigm”2 but industrial community is looking forward to enjoying profits from “inexhaustible” data innovations. Admittedly, given the fact that the emerging data industry will form a strategic industry in the near future, this is not difficult to predict. So the initiative is ours to seize, and to encourage the enterprising individual who wants to seek means of creative destruction in a business startup or wants to revamp a traditional industry to secure its survival. We ask the reader to follow us, if only for a cursory glimpse into the emerging big data industry, which handily demonstrates the properties property of the four categories in Fisher–Clark's classification, which is to say: the resource property of primary industry, the manufacturing property of secondary industry, the service property of tertiary industry, and the “increasing profits of other industries” property of quaternary industry.

At present, industrial transformation and the emerging business of data industry are big challenges for most IT giants. Both the business magnate Warren Buffett and financial wizard George Soros are bullish that such transformations will happen. For example,3 after IBM switched its business model to “big data,” Buffett and Soros increased their holdings in IBM (2012) by 5.5 and 11%, respectively.

1.1 DATA

Scientists who are attempting to disclose the mysteries of humankind are usually interested in intelligence. For instance, Sir Francis Galton,4 the founder of differential psychology, tried to evaluate human intelligence by measuring a subject's physical performance and sense perception. In 1971, another psychologist, Raymond Cattell, was acclaimed for establishing Crystallized Intelligence and Fluid Intelligence theories that differentiate general intelligence [3]. Crystallized Intelligence describes to “the ability to use skills, knowledge, and experience”5 acquired by education and previous experiences, and this improves as a person ages. Fluid Intelligence is the biological capacity “to think logically and solve problems in novel situations, independently of acquired knowledge.”5

The primary objective of twentieth-century IT reform was to endow the computing machine with “intelligence,” “brainpower,” and, in effect, “wisdom.” This all started back in 1946 when John von Neumann, in supervising the manufacturing of the ENIAC (electronic numerical integrator and computer), observed several important differences between the functioning of the computer and the human mind (such as processing speed and parallelism) [4]. Like the human mind, the machine used a “storing device” to save data and a “binary system” to organize data. By this analogy, the complexities of machine's “memory” and “comprehension” could be worked out.

What, then, is data? Data is often regarded as the potential source of factual information or scientific knowledge, and data is physically stored in bytes (a unit of measurement). Data is a “discrete and objective” factual description related to an event, and can consist of atomic data, data item, data object, and a data set, which is collected data [5]. Metadata, simply put, is data that describes data. Data that processes data, such as a program or software, is known as a data tool. A data set refers to a collection of data objects, a data object is defined in an assembly of data items, a data item can be seen as a quantity of atomic data, and an atomic data represents the lowest level of detail in all computer systems. A data item is used to describe the characteristics of data objects (naming and defining the data type) without an independent meaning. A data object can have other names [6] (record, point, vector, pattern, case, sample, observation, entity, etc.) based on a number of attributes (e.g., variable, feature, field, or dimension) by capturing what phenomena in nature.

1.1.1 Data Resources

Reaping the benefits of Moore's law, mass storage is generally credited for the drop in cost per megabyte from US$6,000 in 1955 to less than 1 cent in 2010, and the vast change in storage capacity makes big data storage feasible.

Moreover, today, data is being generated at a sharply growing speed. Even data that was handwritten several decades ago is collected and stored by new tools. To easily measure data size, the academic community has added terms that describe these new measurement units for storage: kilobyte (KB), megabyte (MB), gigabyte (GB), terabyte (TB), petabyte (PB), exabyte (EB), zettabyte (ZB), yottabyte (YB), nonabyte (NB), doggabyte (DB), and coydonbyte (CB).

To put this in perspective, we have, thanks to a special report, “All too much: monstrous amounts of data,”6 in The Economist (in February 2010), an ingenious descriptions of the magnitude of these storage units. For instance, “a kilobyte can hold about half of a page of text, while a megabyte holds about 500 pages of text.”7 And on a larger scale, the data in the American Library of Congress amounts to 15 TB. Thus, if 1 ZB of 5 MB songs stored in MP3 format were played nonstop at the rate of 1 MB per minute, it would take 1.9 billion years to finish the playlist.

A study by Martin Hilbert of the University of Southern California and Priscila López of the Open University of Catalonia at Santiago provides another interesting observation: “the total amount of global data is 295 EB” [7]. A follow-up to this finding was done by the data storage giant EMC, which sponsored an “Explore the Digital Universe” market survey by the well-known organization IDC (International Data Corporation). Some subsequent surveys, from 2007 to 2011, were themed “The Diverse and Exploding Digital Universe,” “The Expanding Digital Universe: A Forecast of Worldwide Information,” “As the Economy Contracts, The Digital Universe Expands,” “A Digital Universe – Are You Ready?” and “Extracting Value from Chaos.”

The 2009 report estimated the scale of data for the year and pointed out that despite the Great Recession, total data increased by 62% compared to 2008, approaching 0.8 ZB. This report forecasted total data in 2010 to grow to 1.2 ZB. The 2010 report forecasted that total data in 2020 would be 44 times that of 2009, amounting to 35 ZB. Additionally the increase in the amount of data objects would exceed that amount in total data. The 2011 report brought us further to the unsettling point that we have reached a stage where we need to look for a new data tool to handle the big data that is sure to change our lifestyles completely.

As data organizations connected by logics and data areas assembled by huge volumes of data reach a “certain scale,” those massive different data sets become “data resources” [5]. The reason why a data resource can be one of the vital modern strategic resources for humans – even possibly exceeding, in the twenty-first century, the combined resources of oil, coal, and mineral products – is that currently all human activities, and without exception including the exploration, exploitation, transportation, processing, and sale of petroleum, coal, and mineral products, will generate and rely on data.

Today, data resources are generated and stored for many different scientific disciplines, such as astronomy, geography, geochemistry, geology, oceanography, aerograph, biology, and medical science. Moreover various large-scale transnational collaborative experiments continuously provide big data that can be captured, stored, communicated, aggregated, and analyzed, such as CERN's LHC (Large Hadron Collider),8 American Pan-STARRS (Panoramic Survey Telescope and Rapid Response System),9 Australian radio telescope SKA (Square Kilometre Array),10 and INSDC (International Nucleotide Sequence Database Collaboration).11 Additionally INSDC's mission is to capture, preserve, and present globally comprehensive public domain biological data. As for economic areas, there are the data resources constructed by financial organizations and the economic data, social behavior data, personal identity data, and Internet data, namely the data generated by social networking computations, electronic commerce, online games, emails, and instant messaging tools.

1.1.2 The Data Asset

As defined in academe, a standard asset has four characteristics: (1) it should have unexpired value, (2) it should be a debit balance, (3) it should be an economic resource, and (4) it should have future economic benefits. The US Financial Accounting Standards Board expands on this definition: “[assets are] probable future economic benefits obtained or controlled by a particular entity as a result of past transactions or events.”12 Basically, by this definition, assets have two properties: (1) an economic property, in that an asset must be able to produce an economic benefit, and (2) a legal property, in that an asset must be controllable.

Our now common understanding is that the intellectual asset, as one of the three key components13 of intellectual capital, is a “special asset.” This is based on the concept of intellectual capital introduced in 1969 by John Galbraith, an institutional economist of the Keynesian school, and later expanded by deductive argument due to Annie Brooking [8], Thomas Stewart [9], and Patrick Sullivan [10]. In more recent years the concept of intellectual asset was further refined to a stepwise process by the British business theorist Max Boisot, who theorized on the “knowledge asset” (1999) [11]; by Chicago School of Economics George Stigler, who added an “information asset” (2003) [12]; and by DataFlux CEO Tony Fisher, who suggested a “data asset” specification process (2009) [13] that would closely follow the rules presented in the DIKW (data, information, knowledge, and wisdom) pyramid shown in Figure 1.1.

Figure 1.1 DIKW pyramid. Reproduced by permission of Gene Bellinger

According to the ISO 27001:2005 standard, data assets are an important component of information assets, in that they contain source code, applications, development tools, operational software, database information, technical proposals and reports, job records, configuration files, topological graphs, system message lists, and statistical data.

We therefore want to treat data asset in the broadest sense of the term. That is to say, we want to redefine the data asset as data exceeding a certain scale that is owned or controlled by a specific agent, collected from the agent's past transactions involved in information processes, and capable of bringing future economic benefits to the agent.

According to Fisher's book The Data Asset, the administrative capacity of a data asset may decide competitive advantages of an individual enterprise, so as to mitigate risk, control cost, optimize revenue, and increase business capacity, as is shown in Figure 1.2. In other words, the data asset management perspective should closely follow the data throughout its life cycle, from discovery, design, delivery, support, to archive.

Figure 1.2 Advantages of managing data assets. Reproduced by permission of Wiley [13]

Our view14 is that the primary value of data assets lies in the willingness of people to use data, and for some purpose as is reflected by human activities arising from data ownership or application of data. In a sense, data ownership, which defines and provides information about the rightful owner of data assets, depends on the “granularity of data items.” Here is a brief clinical example of how to determine data ownership. Diagnostic records are associated with (1) patient's disease status, in terms of disease activity, disease progression, and prognosis, and (2) physician's medical experience with symptoms, diagnosis, and treatments. Strictly speaking, the patient and physician are both data owners of diagnostic records. However, we can minimize diagnostic records to patient's disease status, namely reduce its granularity such that only the patient takes data ownership of the diagnostic records.

1.2 INDUSTRY

The division of labor mentioned in one of Adam Smith's two classic works An Inquiry into the Nature and Causes of the Wealth of Nations (1776), is generally recognized as the foundation of industry [14], the industry cluster, and other industry schemes.

Industry is the inevitable outcome of the social division of labor. It was spawned by scientific and technological progress and by the market economy. Industry is in fact a generic term for a market composed of various businesses having interrelated benefits and related divisions of labor.

1.2.1 Industry Classification

In economics, classification is usually the starting point and the foundation of research for industries. Industries can be classified in various ways:

By Economic Activity

. Primary industry refers to all the resource industries dealing with “the extraction of resources directly from the Earth,” secondary industry to industries involved in “the processing products from primary industries,” tertiary industry to all service industries, and the quaternary industry to industries that can significantly increase the industrial profits of other industries. The classification of tertiary industries is due to Fisher (1935) and the classification of quaternary industries is due to Clark (1940).

By Level of Industrial Activity

. There are three levels: use of similar products as differentiated by an “industrial organization,” use of similar technologies or processes as differentiated by an “industrial linkage,” and use of similar economic activities as differentiated by an “industrial structure.”

By a System of Standards

. For international classification standards, we have the North American Industry Classification System (NAICS), International Standard Industrial Classification of All Economic Activities (ISIC), and so forth.

Of course, industries can be further identified by products, such as the chemical industry, petroleum industry, automotive industry, electronic industry, meatpacking industry, hospitality industry, food industry, fish industry, software industry, paper industry, entertainment industry, and semiconductor industry.

1.2.2 The Modern Industrial System

Computational optimization, modeling, and simulation as a paradigm not only produced IT reform of the information industry but also a fuzzy technology border, as new trends were added to the industry, such as software as a service, embedded software, and integrated networks. In this way, IT reform atomized the traditional industries and transformed their operation modes, thus prompting the birth of a new industrial system. The industries in this modern industrial system include, but are not limited to, the knowledge economy, high-technology industry, information industry, creative industries, cultural industries, and wisdom industry.

Knowledge Economy

The “knowledge economy” is a term introduced by Austrian economist Fritz Machlup of Princeton University in his book The Production and Distribution of Knowledge in the United States (1962). It is a general category that has enabled the classification of education, research and development (R&D), and information service industries, but excluding “knowledge-intensive manufacturing,” in “an economy directly based on the production, distribution, and use of knowledge and information,” in accord with the 1997 definition by the OECD (Organization for Economic Co-operation and Development).

High-Technology Industry

The high-technology industry is a derivative of the knowledge economy that uses “R&D intensity” and “percentage of R&D employees” as a standard of classification. The main fields are information, biology, new materials, aerospace, nuclear, and ocean, and characterized by (1) high demand for scientific research and intensity of R&D expenditure, (2) high level of innovativeness, (3) fast diffusion of technological innovations, (4) fast process of obsolescence of the prepared products and technologies, (5) high level of employment of scientific and technical personnel, (6) high capital expenditure and high rotation level of technical equipment, (7) high investment risk and fast process of the investment devaluation, (8) intense strategic domestic and international cooperation with other high-technology enterprises and scientific and research centers, (9) implication of technical knowledge in the form of numerous patents and licenses, (10) increasing competition in international trade.

Information Industry

The “information industry” concept was developed in the 1970s and is also associated with the pioneering efforts of Machlup. In 1977 it was advanced by Marc Uri Porat [15] who estimated the predominant occupational sector in 1960 was involved in information work, and established Porat's measurements. The North American Industry Classification System (NAICS) sanctioned the information industry as an independent sector in 1997. According to the NAICS, the information industry includes three establishments engaged “(1) producing and distributing information and cultural products,” “(2) providing the means to transmit or distribute these products as well as data or communications,” and “(3) processing data.”

Creative Industries

Paul Romer, an endogenous growth theorist, suggested in 1986 that countless derived new products, new markets, and new opportunities for wealth creation [16] could lead to the creation of new industries. Although Australia put forward in 1994 the concept of a “creative nation,” Britain was first to actually give us a manifestation of the “creative industries” when it established a new strategic industry with the support of national policy. According to the UK Creative Industries Mapping Document (DCMS) definition, creative industries as an industry whose “origin (is) in individual creativity, skill and talent and which has a potential for wealth and job creation through the generation and exploitation of intellectual property (1998).” This concept right away swept the globe. From London it spread to New York, Tokyo, Paris, Singapore, Beijing, Shanghai, and Hong Kong.

Cultural Industries

The notion of a culture industry can be credited to the popularity of mass culture. The term “cultural industries” was coined by the critical theorists Max Horkheimer and Theodor Adorno. In the post-industrial age, overproduction of material similarly influenced culture, to the extent that the monopoly of traditional personal creations was broken. To criticize such “logic of domination in post-enlightenment modern society by monopoly capitalism or the nation state,” Horkheimer and Adorno argued that “in attempting to realise enlightenment values of reason and order, the holistic power of the individual is undermined.”15 Walter Benjamin, an eclectic thinker also from the Frankfurt School, had the opposite view. He regarded culture as due to “technological advancements in art.” The divergence of those views reflects the process of culture “from elites to the common people” or “from religious to secular,” and it is such argumentations that accelerated culture industrialization to emerge as the “cultural industry.” In the 1960s, the Council of Europe and UNESCO (United Nations Educational, Scientific and Cultural Organization) changed “industry” to the plural form “industries,” to effect a type of industry economy in a broader sense. In 1993, the UNESCO revised the 1986 cultural statistics framework, and defined the cultural industries as “those industries which produce tangible or intangible artistic and creative outputs, and which have a potential for wealth creation and income generation through the exploitation of cultural assets and production of knowledge-based goods and services (both traditional and contemporary).” Additionally what cultural industries “have in common is that they all use creativity, cultural knowledge, and intellectual property to produce products and services with social and cultural meaning.” The cultural industries therefore include cultural heritage, publishing and printing, literature, music, performance art, visual arts, new digital media, sociocultural activities, sports and games, environment, and nature.

Wisdom Industry

Taking the lead in exalting “wisdom,” in a commercial sense, IBM has been a vital player in the building of a “Smarter Planet” (2008). In the past IBM had advanced two other such commercial hypes: “e-Business” in 1996 and “e-Business on Demand” in 2002. These commercial concepts, as they were expanded both in connotation and denotation, allowed IBM to thus explore both market depth and width. With the intensive propaganda related to Cloud computing and the IoT, there are now hundreds of Chinese second-tier and third-tier cities that have discussed constructing a “Smart City.” In the last couple of years IBM has won bids for huge projects in Shenyang, Nanjing, Shenzhen, among other places. To the best of our knowledge, however, the wisdom industry, which has only temporarily appeared in China, is based on machines and, we believe, will never have the ability to possess wisdom, knowledge, and even information, without the human input of data and thus data mining.

From these related descriptions of industries, we can see that cultural industries have a relatively broad interpretation. The United States treats cultural industries as copyright industries in the commercial and legal sense, whereas Japan has shifted to the expression “content industries” based on the transmission medium. In the inclination to emphasize “intellectual property” over “commoditization,” the wisdom industry, knowledge economy, and information industry (disregarding the present order of appearance) are externally in compliance with the DIKW pyramid. The information industry may be further divided into two sectors. The first sector is the hardware manufacturing sector that includes equipment manufacturing, optical communication, mobile communication, integrated circuit, display device, and application electronics. The second is the information component of the services sector that includes the software industry, network information service (NIS), digital publishing, interactive entertainment, and telecommunications service.16 The wisdom industry, which is essentially commercial hype despite being labeled “an upgraded version of creative industries,” is no more than a use of “human beings” disguised as industrial carriers to “machines.”

1.3 DATA INDUSTRY

From the foregoing description one could say that the information industry may be simply understood as digitization. Technically, IT is a process that stores “information” generated in the real world by human minds in digital form amassed as “data” in cyberspace, as is the process of producing data. In time the accumulated data can be sourced from multiple domains and distinct sectors.

The mining of “data resources” and extracting useful information already is seemingly “inexhaustible” as data innovations keep on emerging. Thus, to effectively endow all the data innovations with a business model – namely industrialization – would call for us to rename this strategic emerging industry, which is strong enough to influence the world economy, “data industry.” The data industry is the reversal, derivation, and upgrading of the information industry.

1.3.1 Definitions

Connotation and denotation are two principal ways of describing objects, events, or relationships. Connotation relates to a wide variety of natural associations, whereas denotation consists in a precise description. Here, based on these two types of descriptions, we offer two definitions, in both a wide and a narrow sense, for the data industry.

In a wide sense, the data industry has evolved three technical processes: data preparation, data mining, and visualization. By these means, the data industry connotes rational development and utilization of data resources, effective management of data assets, breakthrough innovation of data technologies, and direct commoditization of data products. Accordingly, by definition then, the existing industrial sectors – such as publishing and printing, new digital media, electronic library and intelligence, digital content, specific domain data resources development, and data services in distinct sectors – should be included in the data industry. To these we should add the existing data innovations of web creations, data marketing, push services, price comparison, and disease prevention.

In a narrow sense, the data industry is usually divided into three major components: upstream, midstream, and downstream. In this regard, by definition, the data industry denotes data acquisition, data storage, data management, data processing, data mining, data analysis,17 data presentation, data product pricing, valuation, and trading.

1.3.2 An Industry Structure Study

To understand profitability of a new industry, one must look at the distinctive structure that shapes the unfolding nature of competitive interactions. On the surface, the data industry is extremely complex. However, there are only four connotative factors associated with the data industry. These factors include: data resources, data assets, data technologies, and data products. In a nutshell, from a vertical bottom-top view, the structure of the data industry (as shown in Figure 1.3) could be expressed by (1) data assets precipitation that forms the foundation of the data industry, (2) data technologies innovation as its core, and (3) data products circulation as its means. Theoretically, these three layers rely on data sources via mutually independent units that form underlying substructures, and then vertically form the entire data industry chain.

Figure 1.3 Structure of the data industry

Technology Substructure

The essence of the industry is to cope with conversion technologies. The corresponding term for the data industry is “data science,” which is “a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics, similar to Knowledge Discovery in Databases (KDD).”18

Peter Naur, a Danish pioneer in computer science and Turing award winner, once coined a new word – dataloy – in 1966 because he disliked the term computer science. Subsequently datalogy was adopted in Denmark and in Sweden as datalogi. However, Naur lost to William Cleveland of Purdue University in the influence of new word combinations or coinages, despite the fact that Naur was far more known than Cleveland. In 2001, Cleveland suggested a new word combination – data science – as an extension of statistics and, using that term, published two academic journals Data Science Journal and The Journal of Data Science (on the two disciplines of statistics) in 2002 and 2003, respectively. Cleveland's proposal has had an enormous impact over the years. Whenever or wherever people mentioned “data analysis” now, they first associate it with statistical models. Yet this curious episode did not stop data technologies from evolving.

Back to the technology substructure, the data industry has developed through the following three steps.

Step 1:

 

Data Preparation.

Similar to the geological survey and analysis during mineral exploration [5], data preparation determines data quality and selection of follow-up mining. These methods include (1) judgment of the availability of a data set (e.g., if a data source is isomeric and the data set is accessible); (2) analysis of the physical and logical structure of a data set; and (3) metadata acquisition and integration.

Step 2:

 

Data Mining.

As an efficient and scalable tool, data mining draws on ideas [6] from other disciplines. The ideas include (1) query optimization techniques, like indexing, labeling, and join algorithms, to enhance query processing from traditional database technologies; (2) sampling, estimation, and hypothesis testing from statistics; (3) search algorithms, modeling techniques, and learning theories from artificial intelligence, pattern recognition, and machine learning; and (4) high-performance or parallel computing, optimization, evolutionary computing, information theory, signal processing, and information retrieval from other areas. In general, data mining tasks are divided into two categories [6]: predictive tasks and descriptive tasks. Both of these data mining processes utilize massive volumes of data and exploratory rules to discover hidden patterns or trends in the data that cannot be found with traditional analytical tools or by human intuition.

Step 3:

 

Visualization.

The idea of visualization originated with images created by computer graphics. Exploration in the field of information visualization [17] became popular in the early 1990s, and was used to help understand abstract analytic results. Visualization has remained an effective way to illuminate cognitively demanding tasks. Cognitive applications increased in sync with the large heterogeneous data sets in fields such as retail, finance, management, and digital media. Data visualization [18], an emerging word combination containing both “scientific visualization” and “information visualization,” has been gradually accepted. Its scope has been extended to include the interpretation of data through 3D graphics modeling, image rendering, and animation expression.

Resource Substructure

Data resources have problems similar to those of traditional climate, land, and mineral resources. These include an uneven distribution of resource endowments, reverse configuration of production and use, and difficulties in development. That is to say, a single property or combination of properties of a data resource (e.g., diversity, high dimensionality, complexity, and uncertainty) can simultaneously reflect the position and degree of priority for a specific region within a given time frame so as to directly dictate regional market performance.

The resource substructure of the data industry consists of (1) a resource spatial structure (i.e., the spatial distribution of isomorphic data resources in different regions); (2) a resource type structure (i.e., the spatial distribution of non-isomorphic data resources in the same region); (3) a resource development structure (i.e., the spatial-temporal distribution of either to-be-developed data resources or having-been-developed data resources that were allowed for development); (4) a resource utilization structure (i.e., the spatial-temporal distribution of multilevel deep processing of having-been-developed data resource); and (5) a resource protection structure (i.e., the spatial-temporal distribution of protected data resources according to a specific demand or a particular purpose).

Sector Substructure

The sector substructure of the data industry is based on the relationships of various data products arising from the commonness and individuality in the processing of production, circulation, distribution, and consumption.

In regard to the information industry, sub-industries of the data industry may have two methods of division. First is whether data products are produced. This can be divided into (a) nonproductive sub-industry and (b) productive sub-industry. In this regard data acquisition, data storage, and data management belong to the nonproductive sub-industry, and in the productive sub-industry, data processing and data visualization directly produce data products while data pricing, valuation, and trading indirectly produce data products. Second is whether data products are available to a society. Data product availability can be divided into (c) an output projection sub-industry and (d) an inner circulation sub-industry, whereby the former provides data products directly to society and the latter provides data products within a sub-industry or to other sub-industries.

1.3.3 Industrial Behavior

Industrial behavior of the data industry is concentrated on four areas: data scientist (or quant [19]), data privacy, product pricing, and product rivalry.

Data Scientist

Victor Fuchs, often called the “Dean of health economists”, named the physician “the captain of the team” in his book Who Shall Live? Health, Economics, and Social Choice (1974). Data scientists could be similarly regarded the “captains” of the data industry.

In October 2010, Harvard Business Review announced19 that the data scientist has been becoming “the sexiest job of the 21st century.” Let's look at what it means to be called “sexiest.” It is not only the attraction of this career path that is implied, it is more likely the art implied by “having rare qualities that are much in demand.” The authors of this HBS article were Thomas Davenport and D. J. Patil, both men well known in academe and in industrial circles. Davenport is a famous academic author, and the former chief of the Accenture Institute for Strategic Change (now called Accenture Institute for High Performance Business, based in Cambridge, Massachusetts). Davenport was named one of the world's “Top 25 Consultants” by Consulting in 2003. Patil is copartner at Greylock Partners, and was named the first US Chief Data Scientist by the White House in February 2015. In the article they described the data scientist as a person having clear data insights through the use of scientific methods and mining tools. Data scientists need to test hunches, find patterns, and form theories. Data scientists not only need to have a professional background in “math, statistics, probability, or computer science” but must also have “a feel for business issues and empathy for customers.” In particular, the top data scientists should be developers of new data mining algorithms or innovators of data products and/or processes.

According to an earlier report by the McKinsey Global Institute,20 data scientists are in demand worldwide and their talents are especially highly sought after by many large corporations like Google, Facebook, StumbleUpon, and Paypal. Almost 80% of the related employees think that the yearly salary of this profession is expected to rise. The yearly salary for a vice president of operations may be as high as US$132,000. MGI estimated that “by 2018, in the United States, 4 million positions will require skills” gained from experience working with big data and “there is a potential shortfall of 1.5 million data-savvy managers and analysts.”

Data Privacy

Russian-American philosopher Ayn Rand wrote in his 1943 book The Fountainhead that “Civilization is the progress toward a society of privacy.” As social activities increasingly “go digital,” privacy becomes more of an issue related to posted data. Every January 28 is designated as Data Privacy Day (DPD) in the United States, Canada, and 47 European countries, to “raise awareness and promote privacy and data protection best practices.”21

Private data includes medical and social insurance records, traffic tickets, credit history, and other financial information. There is a striking metaphor on the Internet: computers, laptops, and smart phones are the “windows” – that is to say, more and more people (not just identifying thieves and fraudsters) are trying to break them into your “private home,” to access your private information. The simple logic behind this metaphor is that your private data, if available in sufficient quantity for analysis, can have huge commercial interest for some people.

Over the past several years, much attention has been paid to private data snooping, and to the storage of tremendous amounts of raw data in the name of national security. For instance, in 2011, Google received 12,271 requests to hand over its users' private data to US government agencies, and among them law enforcement agencies, according to company's annual Transparency Report. Telecom operators responded to “a portion of the 1.3 million”22 law enforcement requests for text messages and phone location data were largely without issued warrants. However, a much greater and more immediate data privacy threat is coming from large number of companies, probably never even heard of, called “data brokers.”23 They are electronically collecting, analyzing, and packaging some of the most sensitive personal information and often electronically selling it without the owner's direct knowledge to other companies, advertisers, and even the government as a commodity. A larger data broker named Acxiom, for example, has boasted that it has, on average, “1,500 pieces of information on more than 200 million Americans [as of 2014].”23

No doubt, data privacy will be a central issue for many years to come. The right of transfer options for private electronic data should be returned to owners from the handful of companies that profiteer by utilizing other people's private information.

Product Pricing

We use the search engine (a primary data product) to demonstrate how to price a product. It is noteworthy that a search engine is not really software and is not really free. As early as 1998, Bill Gross, the founder of GoTo.com, Inc. (now called Overture), applied for a patent for search engine pricing.

Today's popular search engines operate using an open and free business model, meaning they do not make money from users but instead are paid by advertisers. There are two types of advertising in the search engine. One is the pay-per-click (PPC) model used by Google, whereby no payment is solicited from the advertiser if no user clicks on the ad. The other is the “ranking bid” model “innovated” by Baidu, whereby search results are ranked according to the payment made by advertisers. Google, in October 2010, adjusted its cost-per-click (CPC) pricing by adding a 49% premium to wrestler-type advertising sponsors24 who want to take the optimum position in the results. Cost-per-click is similar to Baidu's left ranking that has existed for long time and contributes almost 80% of the revenue from advertisements.