19,99 €
While Artificial intelligence is considered to be the engine of innovation and growth for years to come, little is known about the factors that secure a competitive advantage for companies using it. This thesis addresses this gap. Combining case study research and survey research, this study provides empirical evidence for the resource data as a potential source of competitive advantage but contingent to the type of offering. The study further propose data as a complementary asset that partially explains a strong increase of corporate research in the field of artificial intelligence contradictory to an overall decline of corporate science activities.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 214
Veröffentlichungsjahr: 2020
TECHNISCHE UNIVERSITÄT MÜNCHEN
Fakultät für Wirtschaftswissenschaften
Dr. Theo Schöller-Stiftungslehrstuhl für Technologie- und
Innovationsmanagement
Vollständiger Abdruck der von der Fakultät für Wirtschaftswissenschaften der Technischen Universität München zur Erlangung des akademischen Grades eines
Doktors der Wirtschaftswissenschaften (Dr. rer. pol.)
genehmigten Dissertation.
Vorsitzende:
Prof. Dr. Alwine Mohnen
Prüfer der Dissertation:
1. Prof. Dr. Joachim Henkel
2. Prof. Dr. Christoph Ungemach, Ph.D.
Die Dissertation wurde am 24.07.2019 bei der Technischen Universität München eingereicht und durch die Fakultät für Wirtschaftswissenschaften am 15.10.2019 angenommen.
Writing this dissertation would not have been possible without the support of many people, to whom I would like to express my deep gratitude.
First and foremost, I would like to thank my supervisor Prof. Joachim Henkel for supporting, advising, and challenging me during this dissertation project. With his deep passion for research, wealth of experience, and integrity, he is the doctoral supervisor every student hopes to have. Furthermore, I would like to thank Prof. Christoph Ungemach for acting as a second supervisor of my dissertation, as well as Prof. Alwine Mohnen for chairing my dissertation committee.
Presenting my work at several seminars - in particular the TIME seminar - and conferences gave me the opportunity to receive valuable feedback; thanks to all official and unofficial discussants for their comments that further shaped this dissertation.
I had the great opportunity to spend a few months at the University of Cambridge and would like to thank Prof. Andy Neely and Dr. Mohamed Zaki for inviting and hosting me during this time.
Many industry experts contributed to this thesis by dedicating time for interviews - their willingness to openly share their experiences laid the foundation for this research. Furthermore, I am grateful to all of the participants of my survey.
Writing this dissertation would have been twice as hard and half as much fun without the wonderful colleagues at the TIM chair that always helped with any academic and non-academic issues. Furthermore, I had the pleasure to work with some very talented student assistants that supported my research and carried out even tedious tasks with great dedication.
Finally, I would like to thank my family for their continuous support and encouragement. In particular, I would like to thank my wife Nikola and my son Leopold - you are my sunshine!
List of figures
List of tables
List of abbreviations
Abstract
Introduction
1.1 Motivation
1.2 Research objectives and research design
1.3 Structure of the thesis
Theoretical foundations
2.1 Introduction
2.2 Toward a definition of data-driven innovation
2.2.1 Overview
2.2.2 Existing definitions of data-driven innovation
2.2.3 What does data-driven mean?
2.2.4 What is an innovation?
2.2.5 Definition of data-driven innovation
2.3 Technical foundations: Artificial intelligence and machine learning
2.3.1 Artificial intelligence
2.3.2 Machine learning
2.3.3 Deep learning
2.3.4 Machine learning-based programs
2.4 Profiting from data-driven innovation
2.4.1 Overview
2.4.2 Data-driven process innovation
2.4.3 Data-driven product or service innovation
2.4.4 Data-driven business model innovation
2.4.5 Data as a barrier to competition
2.5 Conclusion
A
resource-based perspective on data-driven innovation
3.1 Introduction
3.2 The resource-based view
3.2.1 Propositions of the resource-based view
3.2.2 Critique of the resource-based view
3.2.3 Resource complementarity
3.3 Resource-based view of IT and data-driven innovation
3.3.1 Overview
3.3.2 Resource-based view in information systems research
3.3.3 Resource-based research on data-driven innovation
3.4 Resources for data-driven innovation
3.4.1 Overview
3.4.2 Data
3.4.3 Algorithm
3.4.4 IT infrastructure
3.4.5 Technical knowledge
3.4.6 Domain knowledge
3.4.7 Organizational assets
3.5 Summary and conclusion
How companies create value from data-driven innovation - qualitative research
4.1 Introduction
4.2 Methodology
4.2.1 Research Design
4.2.2 Sample selection
4.2.3 Data collection
4.2.4 Data analysis
4.2.5 Validity and reliability of research
4.3 Results
4.3.1 Types of data-driven innovations
4.3.2 Resources for data innovation
4.3.3 Factors hindering success
4.4 Discussion of qualitative results
4.4.1 Resources for data-driven innovation
4.4.2 Relation to existing research
4.4.3 Limitations
4.5 Summary and Conclusion
Securing a competitive advantage in data-driven innovation - quantitative study
5.1 Introduction
5.2 Methodology
5.2.1 Sample and sample selection
5.2.2 Data collection
5.3 Data and variables
5.3.1 Research design
5.3.2 Dependent variable
5.3.3 Independent variables: VRIN Scores
5.3.4 Control variables
5.3.5 Robustness of the research design
5.4 Descriptive results
5.4.1 Sample characteristics
5.4.2 Resource importance
5.4.3 VRIN characteristics
5.4.4 Sources of resources
5.5 Regression analysis
5.5.1 Results
5.5.2 Regression diagnostics
5.6 Discussion
5.7 Conclusion and limitations
The rise of corporate research in AI
6.1 Introduction
6.2 The rise and decline of corporate science
6.2.1 What is corporate science?
6.2.2 Motivation for corporate science
6.2.3 Motivation for revealing the results
6.2.4 The decline of corporate science
6.3 Data and data collection
6.4 Results
6.4.1 Share of corporate publications
6.4.2 Quality of corporate science paper
6.4.3 Major publishers of corporate research in AI
6.4.4 Affiliation of top researchers
6.4.5 Patents of major tech companies
6.4.6 Acquisition of AI companies
6.4.7 Key financial data
6.5 Discussion
6.6 Conclusion, implications, and limitations
Conclusion
7.1 Key results and research implications
7.2 Managerial implications
7.3 Policy implications
Appendix
A 1 Interview questionnaire
A 2 Qualitative research (Chapter 4) coding tree
A 3 Survey (Chapter 5) sample demographics
A 4 Survey (Chapter 5) descriptive results
References
Figure 1: Structure of thesis
Figure 2: Google search trend for related concepts
Figure 3: Basic principle of deep learning (adapted from LeCun et al., 2015)
Figure 4: Differences between traditional programs and machine learning/Deep learning-based programs (adapted from Goodfellow et al., 2016, p. 10)
Figure 5: Stated goals of using artificial intelligence (Davenport and Ronanki, 2018)..
Figure 6: Categories of data-driven innovation
Figure 7: Overview of resource conceptualizations in the context of big data analytics
Figure 8: Resources for data-driven innovation
Figure 9: Data analysis process
Figure 10: Overview of applications of DDI, types of innovations and maturity
Figure 11: Factors hindering data-driven innovation
Figure 12: Resources for data-driven innovation
Figure 13: Summary of survey design process (based on Groves, 2009)
Figure 14: Summary of sampling process
Figure 15: Overview of research design
Figure 16: Calculation of additive and multiplicative VRIN scores
Figure 17: Characteristics of sample
Figure 18: Importance of resources
Figure 19: Additive VRIN Scores for resources
Figure 20: Multiplicative VRIN Scores for resources
Figure 21: Correlation among VRIN Scores
Figure 22: Fitted values plotted against standardized residuals
Figure 23: Standardized residuals plotted against standard normal distribution
Figure 24: Cook's Distance Di for all observations i
Figure 25: Average indexed share of corporate publications for both groups of conferences
Figure 26: LOESS fitted line of average indexed corporate share
Figure 27: Indexed relative citations for AI conferences
Figure 28: Number of yearly filed patents in AI (patent families)
Table 1: Overview of IS resources
Table 2 Overview of key AI advances and the respective algorithms and datasets (Wissner-Gross, 2016)
Table 3: Overview of case firms
Table 4: List of qualitative interviews
Table 5: Overview of control variables
Table 6: VRIN characteristics of resources
Table 7: Sources of algorithms and data used by companies in the sample
Table 8: Ordinary least squares (OLS) Regressions
Table 9 Conference sample for AI as well as control group
Table 10: Absolute number (abs.) and relative share (rel.) of corporate publications
Table 11: Most active companies contributing to the AI conferences in the samples
Table 12: Former and current affiliation of top researchers (deep learning)
Table 13: Companies employing the leading researchers
Table 14: Number of Al-related acquisitions
AI
Artificial intelligence
DDI
Data-driven innovation
IS
Information systems
IT
Information technology
ML
Machine learning
RBV
Resource-based view
USD
United States dollar
VC
Venture capital
VRIN
Valuable, rare, inimitability, non-substitutable—attributes of a resource to be a potential source of competitive advantage (Barney, 1991)
Artificial intelligence and machine learning—its most important subfield — are considered the engines of innovation and growth for years to come and data is considered the fuel. Companies across a range of industries are already using data and machine learning to improve existing processes, create new products or entirely new business models to solve problems for everything from fraud detection to crop predictions. However, so far, little is known about the factors needed to secure a competitive advantage for companies that use machine learning. This thesis addresses this gap. Overall, the results suggest it is the resource data that secures a competitive advantage in machine learning while the technology is relatively less important. However, this is contingent to the specific application. This thesis makes three key contributions.
First, based on a mixed-methods study, combining qualitative and quantitative research, I provide empirical evidence for the resource data as a potential source of competitive advantage but contingent to the respective offering. A set of different resources is required for data-driven innovation. But not all of these resources are equally important in securing a competitive advantage. The qualitative study shows it is primarily data that firms consider key for securing a competitive advantage. The results of the quantitative study confirm this. Start-up companies with data as a strategic asset are associated with more VC funding and regarded a proxy for future success. However, this relationship depends on the type of offering.
Second, I propose complementary assets as a novel explanation for companies to invest in basic research. I propose it is the availability of data as a complementary asset that (partially) explains the strong increase of corporate research in artificial intelligence contradictory to an overall decline of corporate science activities. I document this increase using corporate publications at the most important AI and machine learning conferences. By controlling large data assets, the companies that invest in corporate AI research benefit strongly from scientific advances, have an advantage in researching AI, and can appropriate a large share of the value of their research.
Third, by providing a definition of data-driven innovation and a set of resources needed, I lay the foundation for subsequent research. Research on data-driven innovation is nascent, and several related concepts are used to describe a similar phenomenon. The proposed definition of data-driven innovation unites the different streams of literature. I further propose a set of resources for data-driven innovation derived from literature and empirically tested through case studies and quantitative survey data that might guide future resource-based research on data-driven innovation.
Artificial intelligence (AI) and machine learning—its most important subfield1— are considered the engines of innovation and growth for years to come and data is considered the fuel (Brynjolfsson and McAfee, 2017b; Henke et al., 2016; OECD, 2015). Companies across a range of industries are already using data and machine learning to improve existing products and processes and to create new ones while start-up companies are creating entirely new business models to solve problems for everything from fraud detection to crop predictions (Gann et al., 2014). However, so far little is known about the factors securing a competitive advantage for companies that use machine learning (George et al., 2014).
Artificial intelligence is an umbrella term that is concerned with building "intelligent entities" (Russell and Norvig, 2009, p. 1) and encompasses various subfields including machine learning and robotics. The term 'artificial intelligence' was first coined in the 1950s and high expectations regarding potential applications had already been raised at that time. However, these high expectations were mostly unfulfilled as a lack of computing power as well as available datasets limited the practical applicability of the technology. In the last years, the processing power of computers has improved while costs for data storage and processing has decreased. At the same time the amount of data generated grew rapidly.
Subsequently AI and in particular machine learning applications have already achieved an impressive performance for specific tasks even surpassing human experts for tasks that would have been considered difficult for machines a few years ago. Most prominently, in 2016 Google's AlphaGo, a computer program for the board game GO, defeated grandmaster Lee Sedol, the eight-time GO world champion. In image and speech recognition machine learning algorithms are at par with humans (He et al., 2015). For example in healthcare, machine learning-based systems are better than human experts at detecting breast cancer metastasis2 or skin cancer3 (Esteva et al., 2017; Liu et al., 2017). And machine learning is already behind many products that are used daily by millions of people. Virtual personal assistants such as Amazon's Alexa or Google Assistant are able to understand voice-based commands and provide automated answers. Transportation company Uber uses machine learning to provide precise estimated arrival times (Hermann and Del Balso, 2017). Entertainment company Netflix uses data and analytics to provide its customers with new movie recommendations (Amatriain and Basilico, 2012). Google uses machine learning to reduce the energy consumption of its data centers by up to 40 percent (Knight, 2018).
There is an abundance of articles from business press and white papers from technology vendors and consulting firms that provide case evidence for the use of machine learning and claim its potential for innovation. However, the academic literature lags behind and is scattered across various disciplines. While there is some early evidence that the use of data and analytics does have a positive effect on firm performance (cf, e.g., Bughin, 2016), little is known about the factors that help companies benefit from the use of machine learning (George et al., 2014).
Frequently, data is claimed to be the source of competitive advantage—it is said to be "the new oil" (Economist, 2017; OECD, 2015). Anecdotal evidence seems to confirm this view. Some of the most valuable firms in the world—among them the U.S. technology companies Alphabet4, Facebook, and Amazon as well as their Chinese counterparts Tencent, Alibaba, and Baidu—are to a large extent built upon the data that they possess and create while offering many services for free (Henke et al., 2016).
However, the same companies are also investing huge sums in basic research in AI. For example, in 2018 the number of AI researchers employed by Google numbers over 1700 (Google, 2018c) and universities are reporting a massive hiring of their leading scientists (Sample, 2017).
This thesis addresses the overarching questions of how companies that innovate using data and machine learning —hereinafter referred to as data-driven innovation—can secure a competitive advantage. Addressing this question and the aforementioned gap in the literature, this thesis has three research objectives. First is to provide a definition of data-driven innovation and a comprehensive review of the related literature (Research objective 1). Second is to investigate which resources secure a competitive advantage from data-driven innovation and which factors hinder data-driven innovation (Research objective 2). Third is to understand why companies do corporate research in AI (Research objective 3).
As outlined above, the literature on data and machine learning from an innovation perspective is nascent and scattered across various disciplines. There is no universally accepted definition of the term data-driven innovation and different authors use the term with a varying scope. Thus, the first research objective lays the foundation for this study as well as for subsequent research on data-driven innovation. I define data-driven innovation building upon existing definitions as well as a review of related concepts.
To answer the second research objective, following the suggestion by Kohli and Tan (2016), I apply a resource-based perspective. The resource-based view is frequently used in strategic management as well as in information systems (IS) research to understand sources of competitive advantage. I use a mixed-methods approach combining qualitative and quantitative research. In particular, I use an exploratory sequential research design consisting of two phases (Creswell and Piano Clark, 2011, p. 71). First, I use a case-study approach and second, building on the results from the qualitative analysis, I collect quantitative survey data to further detail and generalize the qualitative findings. However, both research studies were conducted separately and sequentially, thus it can be considered a partially mixed sequential design (Leech and Onwuegbuzie, 2009). Through the use of a mixed-method design, a more complete understanding of the novel field of data-driven innovation can be achieved and the results are more robust (Yauch and Steudel, 2016).
The third research objective addresses the, at first glance, contradicting observation that the large technology companies invest large sums into research in AI and machine learning. This is contrary to an overall decline of corporate research activities as well as the widespread assumption that data is the key to competitive advantage. I use secondary data on conference publications as well as other sources to document and describe this phenomenon and develop an explanation.
The remainder of this thesis is structured in three main parts. Figure 1 illustrates the structure of the thesis and the allocation of the chapters to the three research objectives.
Figure 1: Structure of thesis
Following this introduction, I first provide a definition of data-driven innovation as well as a brief overview of the technology foundations needed to understand this thesis research objective from different perspectives. Chapter 3 investigates the resources needed for data-driven innovation from a theoretical perspective. It includes a brief discussion of the resource-based view (RBV) as a theory, a review of existing resource-based research related to data-driven innovation, a proposed resource framework derived from literature, as well as a detailed discussion of these resources. Chapter 4 uses a case-study approach to empirically test the resource framework, identify key resources for data-driven innovation and for factors hindering the implementation. Chapter 5 provides a survey-based quantitative discussion of the resources for data-driven innovation. Chapter 6 addresses the third research objective by providing an empirical documentation of the rise of corporate science in AI as well as potential explanations for this trend. A summarizing conclusion in which the key contributions as well as implications for future research, firms, and policy makers are discussed (Chapter 7) closes this thesis.
1 The terms "Artificial Intelligence" and "machine learning" are often used interchangeably. However, machine learning is a subfield of AI, though its most important. Cf. Chapter 2.4 for a discussion of the different terms.
2 Liu et al. (2017) report a detection rate of 92.4% (true positives) of tumors compared to 73.2% detected by a human pathologist attempting exhaustive search
3 Esteva et al. (2017) report an accuracy of 72.1% of the machine learning system compared to 65.5% and 66.0% respectively by two dermatologists for classifying images of skin lesions into three diseases classes.
4 Alphabet Inc. is the parent company of Google.
Data and analytics are frequently proposed as key drivers of innovation as well as a potential source of value creation (George and Lin, 2017; Henke et al., 2016; OECD, 2015). Yet, the academic literature addressing the question of how data actually drives innovation is still nascent (George et al., 2016; Günther et al., 2017). Empirical evidence of data-driven innovation is mostly limited to anecdotal evidence from the business press and white papers from consulting and technology companies.
The objective of this chapter is to lay the theoretical foundation for this thesis by providing a definition of the key concepts as well as a review of the related literature. The chapter is structured as follows. First, I provide a definition of data-driven innovation that is based upon a review of existing definitions as well as a review of related concepts. Second, I give an overview of the technical concepts as well as the historic development of AI and machine learning that are required for the understanding of the subsequent chapters. Third, I review the literature on how data is driving innovation. A summarizing conclusion closes this chapter.
Various business publications and white papers claim that the use of data and analytics is a major source of innovation. For instance, consulting firm McKinsey proposes that the use of data “enables companies to create new products and services, enhance existing ones, and invent entirely new business models” (Manyika et al., 2011, p. 6). In this context the term data-driven innovation is increasingly used among scholars as well as practitioners, however, it not yet established in the academic literature and lacks an agreed upon definition. In the following section I develop a definition of data-driven innovation that serves as a foundation for this thesis. To develop this definition, I briefly discuss the two terms that make up data-driven innovation—what does data-driven mean and what is an innovation?
There are only a few existing definitions of the term data-driven-innovation and they disagree about what is included in it. One of the earliest definitions of the term was put forth from the Organisation for Economic Co-operation and Development (OECD). Acknowledging the important role of data and analytics for the economy, the OECD defines data-driven innovation as the “use of data and analytics to improve or foster new products, organizational methods and markets” (OECD, 2015). Other authors use similar broad definitions of the term. Curley and Sahnelin (2018), for instance, define data-driven innovation as “using data for innovation” (p. 123) and propose six patterns of how data can be used for innovation. Consulting firm Deloitte defines data-driven innovation as “innovative applications derived from data analytics” (Deloitte, 2016, p. 5).
However, some authors limit the definition of data-driven innovation to the use of data and analytics to support the innovation process (e.g., Chien et al., 2016; Kusiak, 2009; Zillner et al., 2016). Such a narrow definition of data-driven innovation is used, for example, by Zillner et al. (2016), who describe data-driven innovation as the “exploitation of any kind of data in the innovation process” (Zillner et al., 2016, p. 171).
Data-driven refers to decisions, processes or products that are “determined by or dependent on the collection or analysis of data” (Oxford Dictionaries, 2018). Collecting and analyzing data is not a new phenomenon; 20,000 years ago, people stored data using tally sticks made from animal bones5 (Everett, 2017, 35ff). In classical antiquity philosophers such as Aristotle were collecting and interpreting empirical data (Leroi et al., 2014). In 1660, John Graunt published his “bills of mortality,” what is frequently considered the birth of statistics (Fienberg et al., 1992). Starting in the 1940s and 50s computers were increasingly used for analyzing data. For instance, in 1950 the first computerized weather forecast was calculated (Holsapple et al., 2014).
With the increasing prevalence of computer use in companies and the increase of digitally stored data, the analysis of this data became more and more important in practice, which is also reflected in an increasing academic interest in the topic. Several different concepts in the information systems (IS) as well as management literature are related to this phenomenon, among others, business analytics, big data, and AI. Those concepts are not congruent but closely related and the popularity of the different terms has changed over time. Figure 2 shows the change of interest over time in these concepts, measured by the normalized, relative worldwide Google search volume for these terms (Google, 2018b).6 While business intelligence was the most popular concept as measured by this proxy until 2012, big data became increasingly popular starting in 2011 and was only recently surpassed by the interest in machine learning.
Figure 2: Google search trend for related concepts
The term business intelligence became popular in the 1990s and was later complemented by the term business analytics (Chen et al., 2012). Even though the term business analytics is widely used among scholars as well as practitioners, there is no commonly accepted definition (Holsapple et al., 2014).7 A frequently adopted definition by Davenport and Harris (2007) describes business analytics as “the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions” (p. 7). Often, three types of business analytics techniques are distinguished—descriptive, predictive, and prescriptive analytics (Delen and Demirkan, 2013). Descriptive analytics is mainly concerned with summarizing and reporting past data, predictive analytics tries to predict future outcomes using existing data as inputs, and prescriptive analytics attempts to prescribe an optimal decision given a set of objectives and constraints. Traditionally, business analytics applications are concerned with data stored in a structured database format (Chen et al., 2012).
Since the 2000s, driven by, among other events, a strong increase of Internet usage, the emergence of user-generated content, the diffusion of smartphones, as well as an increase in the use of sensors embedded in various applications, the amount of stored data increased rapidly (Chen et al., 2012; Manyika et al., 2011). Furthermore, a large share of the new data sources produce data that is considered 'unstructured' from a traditional data base perspective, such as videos, images, or text content (Chen et al., 2012). These new types of data posed a challenge to traditional database and data analysis techniques (Constantiou and Kallinikos, 2015). The term big data,