16,99 €
Media technologies now provide facts, answers, and “knowledge” to people – search engines, apps, and virtual assistants increasingly articulate responses rather than direct people to other sources.
Semantic Media is about this emerging era of meaning-making technologies. Companies like Apple, Google, Facebook, Amazon, and Microsoft organize information in new media products that seek to “intuitively” grasp what people want to know and the actions they want to take. This book describes some of the insidious technological practices through which organizations achieve this while addressing the changing contexts of internet searches, and examines the social and political consequences of what happens when large companies become primary sources of information.
Written in an accessible style, Semantic Media will be of interest to students and scholars in media, science and technology, communication, and internet studies, as well as professionals wanting to learn more about the changing dynamics of contemporary data practices.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 309
Veröffentlichungsjahr: 2022
Cover
Epigraph
Title Page
Copyright Page
Figures and Tables
Figures
Tables
Acknowledgments
Introduction
Macedonia
Where Trees Stand in Water
The Power to Name
Let’s Take a Trip
Almost Everyone
A Shift in Search
Can You Trust a Knowledge Panel?
Semantic Media
Critique
Outline of the Book
Notes
1. A History of Semantics
Early Semantics
Ontological Semantics
Subjects, Predicates, Objects
Computational Semantics
The Semantic Web
Language Models
Metadata, Ontologies, and Semantic Technologies
Notes
2. Knowledge Graphs
Semantic Search
Knowledge Bases
Knowledge Graphs
Google’s Knowledge Graph
Critiques of Google’s Knowledge Graph
Facebook’s Graphs
Amazon’s Graphs
Microsoft’s Graphs
Airbnb, Uber, Netflix, Pinterest, and Spotify Graphs
News and Journalism Graphs
3. One Schema to Rule them All
Schemas
Semantic Platformization
Schema.org
Methods
Analysis and Findings
Fact-Checks and Misinformation
COVID-19 and Data Commons
A Global Ontology for Whom?
Schema Governance
Notes
4. The Wiki Wrangler
Wikidata
Upper Ontology
Wikidata’s Ontology
Methods
Results
Just the Facts
Common Knowledge?
The Spread of Wikidata
Notes
5. “An Ontology-Driven Application for the Masses”
Flat Earth
Ambiguity
Virtual Assistant History
Siri
Google Assistant
Alexa
Virtual Assistants and Politics
Levels of Communication
Visions
A Plea for Access
A Farewell to Facts?
Notes
Conclusion
Returning to the Politics of Search
Five Theses
Notes
References
Index
End User License Agreement
Introduction
Figure 0.1:
Knowledge panel for Armensko
Figure 0.2:
Bing results showing carousels, featured boxes, and knowledge panels
Figure 0.3:
Overlapping foundations of semantic media
Chapter 1
Figure 1.1:
Semantic triple
Figure 1.2:
From top: semantic triples as network, list, and code
Figure 1.3:
From top: Linked Open Data Cloud (lod-cloud.net), Linked Open Vocabularies (lov....
Figure 1.4:
Hidden technical debt in machine learning systems
Chapter 2
Figure 2.1:
“Question Answering to Populate Knowledge Base”
Figure 2.2:
Facebook entity knowledge representation
Figure 2.3:
Intentional search for entities on Bing from Microsoft
Chapter 3
Figure 3.1:
Schema.org semantic network
Chapter 4
Figure 4.1:
Kardashian family as extracted from Wikidata
Figure 4.2:
Wikidata entity and its values
Figure 4.3:
Wikidata’s top-level ontology as a graph
Figure 4.4:
Wikidata statements and references as of Nov 30, 2021
Chapter 5
Figure 5.1:
Siri’s fact processing (top) described in Cheyer and Guzzoni (2006), and ...
Figure 5.2:
Alexa ontology breaking down statements into entities and relations
Figure 5.3:
The political responses of virtual assistants
Chapter 2
Table 2.1:
Quantities in five knowledge bases
Table 2.2:
Media technology companies’ knowledge graphs
Chapter 3
Table 3.1:
Summary of last 10 Schema.org updates as of 08/23/21
Cover
Table of Contents
Begin Reading
ii
iii
iv
vi
vii
viii
ix
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
“Nothing should be named lest by so doing we change it.”
The Waves, Virginia Woolf
“The perfect search engine should understand exactly what you mean.”
Amit Singhal, former Senior Vice President of Search, Google
Andrew Iliadis
polity
Copyright © Andrew Iliadis 2023
The right of Andrew Iliadis to be identified as Author of this Work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988.
First published in 2023 by Polity Press
Polity Press
65 Bridge Street
Cambridge CB2 1UR, UK
Polity Press
111 River Street
Hoboken, NJ 07030, USA
All rights reserved. Except for the quotation of short passages for the purpose of criticism and review, no part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the publisher.
ISBN-13: 978-1-5095-4257-4
ISBN-13: 978-1-5095-4258-1(pb)
A catalogue record for this book is available from the British Library.
Library of Congress Control Number: 2022938560
by Fakenham Prepress Solutions, Fakenham, Norfolk NR21 8NL
The publisher has used its best endeavours to ensure that the URLs for external websites referred to in this book are correct and active at the time of going to press. However, the publisher has no responsibility for the websites and can make no guarantee that a site will remain live or that the content is or will remain appropriate.
Every effort has been made to trace all copyright holders, but if any have been overlooked the publisher will be pleased to include any necessary credits in any subsequent reprint or edition.
For further information on Polity, visit our website: politybooks.com
0.1: Knowledge panel for Armensko
0.2: Bing results showing carousels, featured boxes, and knowledge panels
0.3: Overlapping foundations of semantic media
1.1: Semantic triple
1.2: From top: semantic triples as network, list, and code
1.3: From top: Linked Open Data Cloud (lod-cloud.net), Linked Open Vocabularies (lov.linkeddata.es/dataset/lov).
1.4: Hidden technical debt in machine learning systems
2.1: “Question Answering to Populate Knowledge Base”
2.2: Facebook entity knowledge representation
2.3: Intentional search for entities on Bing from Microsoft
3.1:Schema.org semantic network
4.1: Kardashian family as extracted from Wikidata
4.2: Wikidata entity and its values
4.3:Wikidata’s top-level ontology as a graph
4.4: Wikidata statements and references as of Nov 30, 2021
5.1: Siri’s fact processing (top) described in Cheyer and Guzzoni (2006), and Active Ontology (bottom) described in Gruber et al. (2010)
5.2: Alexa ontology breaking down statements into entities and relations
5.3: The political responses of virtual assistants
2.1: Quantities in five knowledge bases
2.2: Media technology companies’ knowledge graphs
3.1: Summary of last 10 Schema.org updates as of 08/23/21
Writing this book would not have been possible without the people who, directly or indirectly, supported it along the way. I’d like to thank all the wonderful mentors, teachers, students, colleagues, and friends that I have had the good fortune of working with and learning from over the last several years, before and after the start of my position at Temple University. My home in Temple’s Department of Media Studies and Production, within the Klein College of Media and Communication, has been a happy one.
The students at Temple have been outstanding. I want to thank those who participated in my Introduction to Media Theory, Technology and Culture, and Information and Society undergraduate classes and those in my Communication Research Methods and Social Media Analytics graduate classes. Our conversations and projects have helped me reflect on my research practices and approaches to pedagogy, and I have learned a lot from our shared experiences. I have also benefited from working with several fantastic student research assistants, including Wesley Stevens, Christiana Dillard, Aiden Kosciesza, and Sezgi Kavakli.
I’m grateful to be working in an environment that fosters collaboration and mentorship among colleagues. I’d particularly like to thank my colleagues in the department who have made research at Temple a joy. My thanks to Geoffrey Baym, Amy Caples, Alice Castellini, Sherri Hope Culver, Jan Fernback, Matt Fine, Paul Gluck, Tom Jacobson, Peter Jaroff, Jack Klotz, Lauren Kogen, Joseph Kraus, Marc Lamont Hill, Matthew Lombard, Larisa Mann, Nancy Morris, Patrick Murphy, Wazhmah Osman, Hector Postigo, Clemencia Rodríguez, Adrienne Shaw, Betsy Leebron Tutelman, Barry Vacker, Kristine Weatherston, and Laura Zaylea.
As part of our college, I’ve had the good fortune of working on research and service with colleagues in other Klein departments over the last few years. I’d also like to give special thanks to Deborah Cai, Erin Coyle, Brian Creech, Fabienne Darling-Wolf, Edward Fink, Scott Gratson, Bruce Hardy, Don Heller, Lance Holbert, Tricia Jones, Carolyn Kitch, Magda Konieczna, Heather LaMarre, David Mindich, Logan Molyneux, Kathy Mueller, Devon Powers, Dana Saewitz, Soomin Seo, Meghnaa Tallapragada, Lori Tharps, Karen Turner, and Andrea Wenzel. Extra special thanks to our Dean at Klein, David Boardman, who has shown extraordinary leadership during my time at Temple.
This book would not have been possible without a Sabbatical Award and Grant-in-Aid Award from Temple. Both provided me with the time and resources to complete the manuscript.
I thank Neal Thomas, Ryan Shaw, Patrick Golden, Melanie Feinberg, and the Organization Research Group at the University of North Carolina at Chapel Hill. They provided constructive feedback on a draft of the manuscript. Heather Ford (School of Communication, University of Technology, Sydney) aided my thinking throughout the project and provided inspiration for many of its ideas, leading to new and exciting research projects. Amelia Acker (School of Information, University of Texas at Austin) also read a draft and helped improve the book; she has been a great writing partner, collaborator, and friend. Isabel Pedersen (Faculty of Social Science and Humanities, Ontario Tech University) has been an extraordinary mentor and confidant; I’m fortunate to continue collaborating with Isabel and everyone at Ontario Tech’s Decimal Lab and the Digital Life Institute.
The team at Polity did an incredible job helping me complete this book, from pitch to publication. Thanks to Mary Savigar, Stephanie Homer, Ellen MacDonald-Kramer, Ian Tuttle, and the anonymous reviewers who provided feedback.
I wanted to write a book that would be interesting for a non-specialist audience. I’m not sure if the book succeeds in this respect—if it does, it is thanks to those who have helped me rethink media scholarship; any shortcomings can be attributed solely to me.
Some of the research in this book has appeared in another form in my academic articles. I recognize these previous works, which include the following: “Algorithms, ontology, and social progress,” Global Media and Communication (2018), “The Tower of Babel problem: Making data make sense with basic formal ontology,” Online Information Review (2019), and “The seer and the seen: Surveying Palantir’s surveillance platform,” The Information Society (2022), and also as-yet-unpublished manuscripts titled “One schema to rule them all: How Schema.org models the world of search” and “Fabricating facts: A semantic network analysis of the Wikidata ontology.”
I wrote parts of this book in Canada, on Manitoulin Island. Special thanks to Nick, Amanda, and Charlie for hosting me.
Lastly, to my parents, Cathy and Jim, my brother Michael, Lindsay, Edward, my aunts, uncles, and cousins—nothing has enriched my life more than our big, happy family. I dedicate this book to the memories of my grandparents.
Andrew Iliadis, Philadelphia, 2022
Where do you usually look when you need to quickly find a piece of information? People who can afford them tend to use their smartphones or laptops to browse their preferred social media platforms to catch up on the day’s news (Pew Research Center, 2021). But what about when you casually want to find an answer to a specific question, like when a revolutionary figure was born, or when you want to look up a common fact (like the official capital of Canada)? Maybe you’re searching for a vegan recipe or the director of a documentary. Or perhaps you’re looking up the alma mater of a politician or an exciting book that you heard about but whose title currently escapes you.
No matter what fact you look for, our technologies can quickly provide you with an answer without hassle. If you’re like most people with an internet connection and a computer, you will type (or speak) your query into a search engine (or virtual assistant) to get the answer.1 Depending on your service, the device will neatly display your solution in panels, menus, and labels directly in the search results (or the device will read aloud to you). Most of the time, your search ends right there with the initial results without requiring you to seek any additional items. You usually don’t have to follow any links or visit other sites because the information and facts are already neatly presented.
According to some controversial search engine studies, searches like this done today are considered “zero-click”—people find what they need with a single search and do not need to navigate to other sources (Fishkin, 2019, 2021; Ferguson, 2021; Sullivan, 2021). The ability to conduct zero-click searches might seem helpful to many of us, efficient even. Some of us probably wouldn’t think twice about it, grateful to receive our information and go about our day.
Yet, when pressed, problems quickly arise. What if someone asked you from where the resulting facts (displayed or spoken) came? Could you identify a source or give a correct answer? “Google” or “Wikipedia” would likely be common responses. “Alexa” or “Siri” might be other replies. These would be excellent but vague guesses, and they would miss some insidious new features of internet-based media technologies. Alternatively, I might use a spatial image and ask you exactly where these facts were stored (in your phone, on a website, search engine, etc.) and how your chosen application found them. More pointedly, what if I asked you about the verifiability of the answer you received? How would you know if the solution you saw in the results was accurate? What criteria would you use to determine that accuracy?
More each day, media technologies provide facts in our searches instead of leading us to different sources. Search engines, applications, platforms, and virtual assistants are now in the business of articulating information for us in answer to our questions.2 Our searches increasingly do not lead to other sources (ranked lists of Google search results, Wikipedia pages, etc.) but end with answers that appear to be provided by the companies that own the products we are using. These processes suggest that we are in a relatively new media era. This era began about a decade ago when companies started to focus on products that try to guess our intentions and what we are trying to know by offering direct answers to queries based on context. Such products began to also provide mechanisms for actions based on our searches (buying tickets, scheduling an appointment, etc.), becoming more central in our daily lives (Kofler et al., 2016). Part of this shift is due to the changing nature of our search behaviors. Yet, a large part comes from the fact that internet companies attempt to position themselves as internet users’ “one-stop-shop” location for obtaining information, thus decreasing our need for browsing.
This book is about what I refer to as semantic media, which I describe as media technologies that orchestrate and directly convey facts, answers, meanings, and “knowledge” about things directly in media products, rather than leading people to other sources.3 The book describes the often-invisible ways (to the non-specialist) that internet companies are now actively involved in constructing information about the world. The book is about how organizations like Apple, Google, Facebook, Amazon, and Microsoft are in the business of creating and storing facts to be served up to users in new and emerging media products and what this might mean for knowledge in the future. It will look at how design decisions bake these facts into the apps and platforms people use daily while focusing on the infrastructures dedicated to orchestrating and presenting this information. The goal is to understand the technologies that will drive social and political outcomes when large internet companies become a primary conduit through which people directly acquire an understanding of facts about the world. But perhaps a more concrete example will help.
My family is Macedonian, and my parents have always told me that the name of the mountainous village from which my family emigrated (before I was born) is named Armensko, located in what is today known as northern Greece. Almost all my family is from the region; my father was born there, including my grandparents on both sides. Half my family still lives there. When large portions of the family immigrated to Toronto, they brought the culture and language with them. In Toronto, I grew up speaking Macedonian, attending Macedonian folk dancing lessons, and attending Macedonian weddings and social events.
When I use Google to search for “Armensko” on my laptop, the response I receive in the panels is for Armenia (Google thinks I’ve made a typo). If I search instead for “Armensko, Greece,” the results this time return a panel (Google calls these knowledge panels) for a village called Alona, including a Google Map image showing a geographic location identical to what I know of as Armensko. Also included are Wikipedia details (which Google has orchestrated) for the Alona entry, along with bits of information like its municipality, elevation, etc. I examine the images displayed directly in the knowledge panel where I see a photograph (probably one from the Wikipedia entry). The graphic is a small black and white photo of the village I recognize, dated 1917 (figure 0.1). The name printed on the photo reads “Armensko” (the name of my family’s village) and not “Alona” (the official name returned in the Google results).
Figure 0.1: Knowledge panel for Armensko
Source: WikimediaCommons
On the one hand, there’s an apparent reason that Google’s panels show information for a village called “Alona” instead of “Armensko.” “Alona” is the official Greek name of the location, and “Armensko” is its Macedonian name. Like similar geographical naming disputes, there is a long and complex history of war, migration, and identity tied to this tiny mountainous village in the Balkans.
The region that the village sits in is part of the Balkan Peninsula and a historically contested transnational space called “Macedonia” that has roots in antiquity and that today contains parts of Serbia, Kosovo, Albania, and Bulgaria, all of North Macedonia, and a large portion of northern Greece. Borders in the region shifted because of the Balkan Wars, the decline of the Ottoman Empire, and the dissolution of Yugoslavia, and Slavic Macedonians did not achieve statehood until 1991 (the country now called North Macedonia). The Macedonian area of what is now northern Greece is where Slavic Macedonians like my family live, yet many fled due to compelled emigration and forced renaming of towns and family names after the Greek Civil War. Greece and North Macedonia have both laid claim to the term “Macedonia” (one Greek, the other Slavic) since 1991, and this has resulted in over 30 years of tension and instability in the region.
No wonder Google has a hard time with this query about a Macedonian village; how is Google’s knowledge panel expected to capture these meaningful nuances? Yet, Google attempts to provide an answer in the knowledge panel when asked about the location of Armensko by giving information for Armenia (completely incorrect) and Alona (somewhat correct). According to Google’s knowledge panel, Slavic Armensko may never have existed, which is strange to me. Odd, considering that I grew up visiting a place people told me was called Armensko while speaking Macedonian with residents, many of whom still live there.4
Now, suppose I search for a larger city like Toronto. In this case, Google’s knowledge panel contains much more information, including an official summary that Google does not appear to take from Wikipedia. Instead, it seems as though Google wrote it (the description is authoritatively marked “—Google” at the end). There is information presented from Weather.com, population information from the United Nations, links to Toronto’s current mayor, area codes, universities, etc. Though Toronto’s municipal website foregrounds land acknowledgments in the Accessibility & Human Rights section, the Google knowledge panel neglects to mention that Toronto is on indigenous land, including that of the Mississaugas of the Credit, the Anishnabeg, the Chippewa, the Haudenosaunee, and the Wendat peoples. Like the Alona/Armensko example, the knowledge panel contains what Google thinks is the most relevant information about Toronto, rather than any indigenous information about Tkaronto, the original name that Aboriginal and Indigenous peoples use, which means something to the effect of “where the trees stand in the water” (Recollet & Johnson, 2019).
Some impatient people might argue that it makes sense that Google should present searchers with facts and information about Alona in its knowledge panels and not Armensko. Or that it should be acceptable that Google’s knowledge panel for Toronto does not mention Tkaronto or Indigenous land. Their hasty argument might be that most people who use the knowledge panel are looking for quick and contemporary information like the weather, people in leadership, and a description of the geography. Yet, one can also argue that focusing only on such details elides the complex histories and identities that make people and locations what they are. This type of decision is an example of the tradeoffs that today’s media companies are willing to make. Companies like Google are interested in providing quick facts and answers in their “intuitive” search results, providing the most “relevant” facts to most users. Yet, in doing so, such companies imagine a type of ideal proxy customer that does not exist (Mulvin, 2021).
Such results tell us something interesting about the temporality of the factual content that media technology companies think they should provide (always show content that applies to the “here and now”).5 They also tell us something about their approach to verifying information (seldom are references provided). For example, it is not clear that people will always want only a snapshot of the current and “official” facts about a geographic location instead of a deeper cultural understanding. As of this writing, search engines also do not always build into their search products things like provenance information (Hartig, 2009) concerning where data come from or digital heritage information (Kremers, 2020) about cultural history. In terms of reference, the most people can hope for are tiny blue links, but knowledge panels do not always include these.
Similarly, virtual assistants like Siri and Alexa do not usually inform users about the history of information or the sourcing and verifiability of facts that they retrieve. Sometimes we are lucky if they cite Wikipedia. These practices omit and foreclose what Acker (2015) has called the hermeneutics of data—how we might go about interpreting data that inform the facts that we receive.
Media technology companies have incredible power to name.6 Researchers such as Ford and Graham (2016a, 2016b) have critically addressed the semantic power that companies like Google have in naming, managing, and establishing popular understandings of places. “Semantics” should be understood here broadly as to how people linguistically and logically create meaning for words, sentences, and texts. Journalists have covered Ford and Graham’s work in The Washington Post in an article titled “You probably haven’t even noticed Google’s sketchy quest to control the world’s knowledge” (Dewey, 2016). The report examines how Google now attempts to answer controversial questions like “What is the capital of Israel?” or mundane queries like “What is D.C.’s best restaurant?” Google’s knowledge panels are supposed to bring this information directly to you in the form of facts, and these facts often include contested and controversial statements. Ford and Graham’s work on the status of contested cities and locations (for example, Jerusalem or Taiwan) and how Google represents them is evidence of Google’s power to make important decisions about how we understand facts and the meanings that are associated with those places (Graham, 2015).
Journalists at other newspapers also note the immense control that media technology companies have in presenting facts. In 2019, The Wall Street Journal published an article titled “How Google interferes with its search algorithms and changes your results” (Grind et al., 2019). The article describes the increasing importance of things like knowledge panels, which many people now rely on when looking up facts. The authors state that “Google engineers regularly make behind-the-scenes adjustments to other information the company is increasingly layering on top of its basic search results” and that these often include “boxes called ‘knowledge panels.’” The reporting documents that specific facts and words appear on internal blocklists that “affect the results in organic search and Google News, as well as other search products, such as Web answers and knowledge panels.” There is a significant amount of power that Google has regarding information articulated in these knowledge panels.
Media technology companies have the power to name and express facts and definitions about things, but they also now can organize these facts in interactive ways beyond static knowledge panels. Companies display facts in orchestrated carousels (movable tracks where people can cycle through results) and featured boxes (containing lists with dropdown menus). Like the knowledge panels described above, carousels and featured boxes often rely on a taxonomy (an organized set of definitions and classifications) created by the media company. These taxonomies can be about anything: German actors, aid organizations, dictators, biological information, tourist recommendations, etc. Who is it that is making these categories? How do media companies decide what pieces of information to include in them?
Let’s switch to a different search engine. If I search Bing for “Things to do in Philadelphia,” Microsoft will produce many buttons and dropdown menus with what Microsoft thinks are pertinent information categories (figure 0.2a). These include things like “Things to do,” “Events,” “What to Eat,” etc., which then break down into topic areas like “Architecture,” “Outdoors,” “Museums,” etc. The categories appear to be coming from Microsoft, as there is no discernible information about their creation. Different sites display their data as one browses these categories, including TripAdvisor, Wikipedia, etc. Yet, all this orchestrated information remains on Bing—as I click and browse, I never have to leave Bing’s page. It’s almost as if Bing has a micro-website for every topic, as it quickly retrieves information and organizes it for me using Bing’s proprietary algorithms and taxonomy. Bing quickly provides a full day’s worth of activities for me, and I didn’t even visit another website. This decrease in browsing has a negative effect on local websites and apps. Furthermore, such results often contain significant sociocultural biases. If I ask for a list of “American inventors,” the resulting featured box labeled “Popular inventors from United States” is made up entirely of men (figure 0.2b).
Figure 0.2:Bing results showing carousels, featured boxes, and knowledge panels
Internet-based media technology companies today want one thing: to allow you to do everything (look up facts, take specific actions, etc.) all directly on their media products. This process might seem innocuous at first. Who wouldn’t want helpful details about prices and store opening hours quickly made available? Yet, as we shall see, there are deeper problems associated with knowledge representation and information retrieval that this book will explain. How do Google and Microsoft decide what information to produce about events, things, people, and locations in their media products like knowledge panels and virtual assistants? Some data appear to be authoritative and come directly from the companies, while some appear from different sources (and attribution is not always discernible). The knowledge panels answer a searcher’s query regardless of the data source. Media companies orchestrate these facts somehow. This book is about the technologies that enable this type of information to be discoverable on knowledge panels, virtual assistants, platforms, and beyond.
The results concerning who receives a knowledge panel, what gets included, and where linked resources lead vary drastically. If I search for “Turkish coffee” or recipes for “Rhubarb pie,” Google or Bing will provide a knowledge panel that lists the main ingredients (figure 0.2c). Each has navigable links back to Google or Bing knowledge panels for every item (it’s a loop that keeps you on their product). Some information in the description is from Wikipedia, while on other occasions, it is not. Google also offers a “Feedback” button in the knowledge panel where I can correct specific sections, such as if the calories are off or the recipe is incorrect. This feedback goes to Google, and the company then makes some determinations about editing the knowledge panel. But not all recipes receive a knowledge panel—currently, there is no knowledge panel for the popular Ethiopian dish kitfo, for example. Some recipes appear in Google’s rich results and dropdown menus, lifted from things like personal websites. People will see these recipes and never have to visit the sites that originally posted the material.
On Facebook, if I search for Milo Yiannopoulos (the extreme rightwing media personality), I receive a knowledge panel about him, who he is, etc. (unlike some results on Alexa and Google, which do not indicate a source, Facebook notes this information is from Wikipedia). Like Google, Facebook also offers the opportunity to provide feedback by including an “Is this information accurate?” button. Yet, who receives a knowledge panel on Facebook (as with Google) seems somewhat random. For example, currently, Joseph Stalin has a knowledge panel, while Che Guevara does not. John Schnatter (the founder of Papa John’s pizzas) has a knowledge panel. But there is no knowledge panel for Sir Lady Java (the activist and entertainer), who helped start the transgender rights movement in the United States in the 1960s.
As Noble (2018) shows in her essential book Algorithms of oppression: How search engines reinforce racism, many large platform companies offer results that are culturally, racially, and sexually biased in their oversimplification of representation:
What we find in search engines about people and culture is important. They oversimplify complex phenomena. They obscure any struggle over understanding, and they can mask history. Search results can reframe our thinking and deny us the ability to engage deeply with essential information and knowledge we need, knowledge that has traditionally been learned through teachers, books, history, and experience.
Noble’s book clarifies how search results from large platform companies can often show significant racial bias in autocomplete term suggestions as users type search queries, what images display in search results, and how links are ranked. She describes how search “does not merely present pages but structures knowledge, and the results retrieved in a commercial search engine create their particular material reality.” A robust critical tradition of search engine studies focuses on these themes. Halavais’ (2008/2017) Search engine society laid the groundwork for critically reflecting on the habits of search engine users and the search results of major internet platforms, while Vaidhyanathan’s (2011) The Googlization of everything provided a significant political economy perspective on search companies like Google and their knowledge monopoly. As technology evolves, there are new opportunities to bring such sustained critiques to bear on the political economy of today’s corporate knowledge infrastructures, as Haider and Sundin (2019) have in their comprehensive work on the histories and new directions of search engines and search studies.7
One thing is clear: the media we use when searching for facts has changed. Whether the information comes from external or internal sources, platforms build mechanisms to present facts and answers about our world into their media products. They answer questions about everything from cities and people to ingredients and pies, which has enormous implications for how media function now and in the future. We already worry about companies like Google acting as gatekeepers to information regarding their ability to rank and link to sites (privileging some websites higher in the rankings) or Wikipedia containing uneven coverage in its content. We must also worry about how these companies directly convey facts in media products. Companies organize and manipulate this presentation of facts; the data is sometimes overtly biased, of unknown origin, and has social and political consequences.8
Search results have changed drastically since the days of Ask Jeeves, an early and popular search engine from the 1990s built for answering natural language questions (Ask Jeeves was a market competitor before the rise and domination of Google). Since roughly 2012, Google has similarly focused on deepening its search responses to answer questions about people, places, and things to create something like a “Star Trek computer” (Ingraham, 2012). Yet, it was not long before journalists started noticing less-than-stellar responses in the curated answers—including, for example, results offering creationist (religious) accounts of the dinosaurs and their extinction (Jacoby, 2016).
Errors aside, 2012 represented a significant change in how large media technology companies conceptualized information search and retrieval. They began to focus on framing inquiry in terms of the logical concepts of things (e.g., identifying people, places, entities) based on taxonomies rather than only the algorithmic sorting of strings (e.g., ranked lists of blue links) based on keyword similarity. The difference is between previously conceptualizing a search for a city like Philadelphia as a search for only the word “Philadelphia” (which would return websites that have the word “Philadelphia” on them) versus conceptualizing the inquiry as being for the place called “Philadelphia” (which would return information about the actual city). The second chapter provides a closer look at this transformational change in search.
Such a focus on concepts of things continues, particularly in the current increased climate of misinformation, the COVID-19 pandemic, and conspiracy theories. In 2018, in part as a response to misinformation from the 2016 US presidential election and the lies of former US President Donald Trump, Google announced that they would “spend $300 million over the next three years to help combat the spread of misinformation online and help journalism outlets” (Shaban, 2018). Knowledge graphs (large databases of facts) would play a crucial part in this project. Later the following year, Google released a white paper titled “How Google fights disinformation.” The report describes how the company is attempting to give users more context in searches, including through their knowledge and information panels that provide “high-level facts about a person or issue” and make it “easier to discover the work of fact-checkers on Google.” The company explains that they will include these facts in results to provide “users with contextual information from trusted sources to help them be more informed consumers of content on the platform” (Google, 2019).
Companies like Google are now engaged in the increasing semanticization of digital domains. They interpret, label, organize, and provide facts, acting as intermediaries between people and information. Such practices are evident with COVID-19, where search engines relay information in knowledge panels, such as statistics about current cases and deaths. Companies provide these statistics with accompanying charts and maps, and the data in this context typically reference sources such as Our World in Data or The New York Times. Google results for COVID-19 include a link labeled “About this data,” which, when clicked, leads to a page that states that “Data comes from Wikipedia, government health ministries, The New York Times, and other authoritative sources, as attributed.” This information is beneficial, and one can understand why it may be necessary to include it in semantically enriched search results.
