11,49 €
'Vital reading. This is the book on artificial intelligence we need right now.' Mike Krieger, cofounder of Instagram Artificial intelligence is rapidly dominating every aspect of our modern lives influencing the news we consume, whether we get a mortgage, and even which friends wish us happy birthday. But as algorithms make ever more decisions on our behalf, how do we ensure they do what we want? And fairly? This conundrum - dubbed 'The Alignment Problem' by experts - is the subject of this timely and important book. From the AI program which cheats at computer games to the sexist algorithm behind Google Translate, bestselling author Brian Christian explains how, as AI develops, we rapidly approach a collision between artificial intelligence and ethics. If we stand by, we face a future with unregulated algorithms that propagate our biases - and worse - violate our most sacred values. Urgent and fascinating, this is an accessible primer to the most important issue facing AI researchers today.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Veröffentlichungsjahr: 2021
‘An abundantly researched and captivating book that explores the road humanity has taken to create a successor for itself – a road that’s rich with surprising discoveries, unexpected obstacles, ingenious solutions and, increasingly, hard questions about the soul of our species.’
Jaan Tallinn, co-founder of Skype and the Future of Life Institute
‘A deeply enjoyable and meticulously researched account of how computer scientists and philosophers are defining the biggest question of our time: how will we create intelligent machines that will improve our lives rather than complicate or even destroy them? There’s no better book than The Alignment Problem at spelling out the issues of governing AI safely.’
James Barrat, bestselling author of Our Final Invention
‘Brian Christian is a fine writer and has produced a fascinating book. AI seems destined to become, for good or ill, increasingly prominent in our lives. We should be grateful for this balanced and hype-free perspective on its scope and limits.’
Martin Rees, Emeritus Professor of Cosmology and Astrophysics, University of Cambridge
‘A riveting and deeply complex look at artificial intelligence and the significant challenge in creating computer models that “capture our norms and values”... Lay readers will find Christian’s revealing study to be a helpful guide to an urgent problem in tech.’
Publishers Weekly
Brian Christian is the author of the acclaimed bestsellers The Most Human Human and Algorithms to Live By, which have been translated into nineteen languages. A visiting scholar at the University of California, Berkeley, he lives in San Francisco.
THE ALIGNMENT PROBLEM
How Can Machines Learn Human Values?
BRIAN CHRISTIAN
First published in the United States in 2020 by W. W. Norton & Company, Inc., New York.
First published in hardback in Great Britain in 2021 by Atlantic Books, an imprint of Atlantic Books Ltd.
Copyright © Brian Christian, 2020
The moral right of Brian Christian to be identified as the author of this work has been asserted by him in accordance with the Copyright, Designs and Patents Act of 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of both the copyright owner and the above publisher of this book.
Every effort has been made to trace or contact all copyright holders. The publishers will be pleased to make good any omissions or rectify any mistakes brought to their attention at the earliest opportunity.
10 9 8 7 6 5 4 3 2 1
A CIP catalogue record for this book is available from the British Library.
Hardback ISBN: 978 1 78649 430 6Trade paperback ISBN: 978 1 78649 431 3E-book ISBN: 978 1 78649 432 0
Printed in Great Britain
Atlantic BooksAn imprint of Atlantic Books LtdOrmond House26–27 Boswell StreetLondonWC1N 3JZ
www.atlantic-books.co.uk
For Peter who convinced me
And for everyone doing the work
I remember in 2000 hearing James Martin, the leader of the Viking missions to Mars, saying that his job as a spacecraft engineer was not to land on Mars, but to land on the model of Mars provided by the geologists.
— PETER NORVIG1
The world is its own best model.
— RODNEY BROOKS2
All models are wrong.
— GEORGE BOX3
PROLOGUE
INTRODUCTION
I. Prophecy
1 REPRESENTATION
2 FAIRNESS
3 TRANSPARENCY
II. Agency
4 REINFORCEMENT
5 SHAPING
6 CURIOSITY
III. Normativity
7 IMITATION
8 INFERENCE
9 UNCERTAINTY
CONCLUSION
Acknowledgments
Notes
Bibliography
Index
THE ALIGNMENT PROBLEM
1935, Detroit. Walter Pitts is running down the street, chased by bullies.
He ducks into the public library to take shelter, and he hides. He hides so well that the library staff don’t even realize he’s there, and they close for the night. Walter Pitts is locked inside.1
He finds a book on the shelves that looks interesting, and he starts reading it. For three days, he reads the book cover to cover.
The book is a two-thousand-page treatise on formal logic; famously, its proof that 1+1=2 does not appear until page 379.2 Pitts decides to write a letter to one of the authors—British philosopher Bertrand Russell—because he believes he’s found several mistakes.
Several weeks go by, and Pitts gets a letter in the mail postmarked from England. It’s Bertrand Russell. Russell thanks him for writing, and invites Pitts to become one of his doctoral students at Cambridge.3
Unfortunately, Walter Pitts must decline the offer—because he’s only twelve years old, and in the seventh grade.
Three years later, Pitts learns that Russell will be visiting Chicago to give a public lecture. He runs away from home to attend. He never goes back.
At Russell’s lecture, Pitts meets another teenager in the audience, named Jerry Lettvin. Pitts only cares about logic. Lettvin only cares about poetry and, a distant second, medicine.4 They become inseparable best friends.
Pitts begins hanging out around the University of Chicago campus, dropping in on classes; he still lacks a high school diploma and never formally enrolls. One of these classes is by the famed German logician Rudolf Carnap. Pitts walks into his office hours, declaring he’s found a few “flaws” in Carnap’s latest book. Skeptically, Carnap consults the book; Pitts, of course, is right. They talk awhile, then Pitts walks out without giving his name. Carnap spends months asking around about the “newsboy who knew logic.”5 Eventually Carnap finds him again and, in what will become a motif throughout Pitts’s academic life, becomes his advocate, persuading the university to give him a menial job so he will at least have some income.
It’s now 1941. Lettvin—still a poet first, in his own mind—has, despite himself, gotten into medical school at the University of Illinois, and finds himself working under the brilliant neurologist Warren McCulloch, newly arrived from Yale. One day Lettvin invites Pitts over to meet him. At this point Lettvin is twenty-one and still living with his parents. Pitts is seventeen and homeless.6 McCulloch and his wife take them both in.
Throughout the year that follows, McCulloch comes home in the evenings and he and Pitts—who is barely older than McCulloch’s own children—regularly stay up past midnight talking. Intellectually, they are the perfect team: the esteemed midcareer neurologist and the prodigy logician. One lives in practice—the world of nervous systems and neuroses—and the other lives in theory—the world of symbols and proofs. They both want nothing more than to understand the nature of truth: what it is, and how we know it. The fulcrum of this quest—the thing that sits at the perfect intersection of their two disparate worlds—is, of course, the brain.
It was already known by the early 1940s that the brain is built of neurons wired together, and that each neuron has “inputs” (dendrites) as well as an “output” (axon). When the impulses coming into a neuron exceed a certain threshold, then that neuron, in turn, emits a pulse. Immediately this begins to feel, to McCulloch and Pitts, like logic: the pulse or its absence signifying on or off, yes or no, true or false.7
They realize that a neuron with a low-enough threshold, such that it would fire if any of its inputs did, functioned like a physical embodiment of the logical or. A neuron with a high-enough threshold, such that it would only fire if all of its inputs did, was a physical embodiment of the logical and. There was nothing, then, that could be done with logic—they start to realize—that such a “neural network,” so long as it was wired appropriately, could not do.
Within months they have written a paper together—the middle-aged neurologist and teenage logician. They call it “A Logical Calculus of Ideas Immanent in Nervous Activity.”
“Because of the ‘all-or-none’ character of nervous activity,” they write, “neural events and the relations among them can be treated by means of propositional logic. It is found that the behavior of every net can be described in these terms . . . and that for any logical expression satisfying certain conditions, one can find a net behaving in the fashion it describes.”
The paper is published in 1943 in the Bulletin of Mathematical Biophysics. To Lettvin’s frustration, it makes little impact on the biology community.8 To Pitts’s disappointment, the neuroscience work of the 1950s, notably a landmark study of the optic nerve of the frog—done by none other than his best friend, Jerry Lettvin—will show that neurons appear to be much messier than the simple “true” or “false” circuits he envisioned. Perhaps propositional logic—its ands, ors, and nots—was not, ultimately, the language of the brain, or at least not in so straightforward a form. This kind of impurity saddened Pitts.
But the impact of the paper—of those long conversations into the night at McCulloch’s house—would be enormous, if not entirely in the way that McCulloch and Pitts envisioned. It would be the foundation for a completely new field: the project to actually build mechanisms out of these simplified versions of neurons, and see just what such “mechanical brains” could do.9
In the summer of 2013, an innocuous post appeared on Google’s opensource blog titled “Learning the Meaning Behind Words.”1
“Today computers aren’t very good at understanding human language,” it began. “While state-of-the-art technology is still a ways from this goal, we’re making significant progress using the latest machine learning and natural language processing techniques.”
Google had fed enormous datasets of human language, mined from newspapers and the internet—in fact, thousands of times more text than had ever been successfully used before—into a biologically inspired “neural network,” and let the system pore over the sentences for correlations and connections between the terms.
The system, using so-called “unsupervised learning,” began noticing patterns. It noticed, for instance, that the word “Beijing” (whatever that meant) had the same relationship to the word “China” (whatever that was) as the word “Moscow” did to “Russia.”
Whether this amounted to “understanding” or not was a question for philosophers, but it was hard to argue that the system wasn’t capturing something essential about the sense of what it was “reading.”
Because the system transformed the words it encountered into numerical representations called vectors, Google dubbed the system “word2vec,” and released it into the wild as open source.
To a mathematician, vectors have all sorts of wonderful properties that allow you to treat them like simple numbers: you can add, subtract, and multiply them. It wasn’t long before researchers discovered something striking and unexpected. They called it “linguistic regularities in continuous space word representations,”2 but it’s much easier to explain than that. Because word2vec made words into vectors, it enabled you to do math with words.
For instance, if you typed China + river, you got Yangtze. If you typed Paris − France + Italy, you got Rome. And if you typed king − man + woman, you got queen.
The results were remarkable. The word2vec system began humming under the hood of Google’s translation service and its search results, inspiring others like it across a wide range of applications including recruiting and hiring, and it became one of the major tools for a new generation of data-driven linguists working in universities around the world.
No one realized what the problem was for two years.
In November 2015, Boston University PhD student Tolga Bolukbasi went with his advisor to a Friday happy-hour meeting at Microsoft Research. Amid wine sipping and informal chat, he and Microsoft researcher Adam Kalai pulled out their laptops and started messing around with word2vec.
“We were playing around with these word embeddings, and we just started randomly putting words into it,” Bolukbasi says. “I was playing on my PC; Adam started playing.”3 Then something happened.
They typed:
doctor − man + woman
The answer came back:
nurse
“We were shocked at that point, and we realized there was a problem,” says Kalai. “And then we dug deeper and saw that it was even worse than that.”4
The pair tried another.
shopkeeper − man + woman
The answer came back:
housewife
They tried another.
computer programmer − man + woman
Answer:
homemaker
Other conversations in the room by this point had stopped, and a group had formed around the screen. “We jointly realized,” says Bolukbasi, “Hey, there’s something wrong here.”
In judiciaries across the country, more and more judges are coming to rely on algorithmic “risk-assessment” tools to make decisions about things like bail and whether a defendant will be held or released before trial. Parole boards are using them to grant or deny parole. One of the most popular of these tools was developed by the Michigan-based firm Northpointe and goes by the name Correctional Offender Management Profiling for Alternative Sanctions—COMPAS, for short.5 COMPAS has been used by states including California, Florida, New York, Michigan, Wisconsin, New Mexico, and Wyoming, assigning algorithmic risk scores—risk of general recidivism, risk of violent recidivism, and risk of pretrial misconduct—on a scale from 1 to 10.
Amazingly, these scores are often deployed statewide without formal audits.6 COMPAS is a proprietary, closed-source tool, so neither attorneys, defendants, nor judges know exactly how its model works.
In 2016, a group of data journalists at ProPublica, led by Julia Angwin, decided to take a closer look at COMPAS. With the help of a public records request to Florida’s Broward County, they were able to get the records, and the risk scores, of some seven thousand defendants arrested in 2013 and 2014.
Because they were doing their research in 2016, the ProPublica team had the equivalent of a crystal ball. Looking at data from two years prior, they actually knew whether these defendants, predicted either to reoffend or not, actually did. And so they asked two simple questions. One: Did the model actually correctly predict which defendants were indeed the “riskiest”? And two: Were the model’s predictions biased in favor of or against any group in particular?
An initial look at the data suggested something might be wrong. They found, for instance, two defendants arrested for similar counts of drug possession. The first, Dylan Fugett, had a prior offense of attempted burglary; the second, Bernard Packer, had a prior offense of nonviolently resisting arrest. Fugett, who is White, was assigned a risk score of 3/10. Packer, who is Black, was assigned a risk score of 10/10.
From the crystal ball of 2016, they also knew that Fugett, the 3/10 risk, went on to be convicted of three further drug offenses. Over the same time period, Packer, the 10/10 risk, had a clean record.
In another pairing, they juxtaposed two defendants charged with similar counts of petty theft. The first, Vernon Prater, had a prior record of two armed robberies and one attempted armed robbery. The other defendant, Brisha Borden, had a prior record of four juvenile misdemeanors. Prater, who is White, was assigned a risk score of 3/10. Borden, who is Black, was assigned a risk score of 8/10.
From the vantage of 2016, Angwin’s team knew that Prater, the “low-risk” defendant, went on to be convicted of a later count of grand theft and given an eight-year prison sentence. Borden, the “high-risk” defendant, had no further offenses.
Even the defendants themselves seemed confused by the scores. James Rivelli, who is White, was arrested for shoplifting and rated a 3/10 risk, despite having prior offenses including aggravated assault, felony drug trafficking, and multiple counts of theft. “I spent five years in state prison in Massachusetts,” he told a reporter. “I am surprised it is so low.”
A statistical analysis appeared to affirm that there was a systemic disparity.7 The article ran with the logline “There’s software used across the country to predict future criminals. And it’s biased against blacks.”
Others weren’t so sure—and ProPublica’s report, published in the spring of 2016, touched off a firestorm of debate: not only about COMPAS, not only about algorithmic risk assessment more broadly, but about the very concept of fairness itself. How, exactly, are we to define—in statistical and computational terms—the principles, rights, and ideals articulated by the law?
When US Supreme Court Chief Justice John Roberts visits Rensselaer Polytechnic Institute later that year, he’s asked by university president Shirley Ann Jackson, “Can you foresee a day when smart machines—driven with artificial intelligences—will assist with courtroom factfinding or, more controversially, even judicial decision-making?”
“It’s a day that’s here,” he says.8
That same fall, Dario Amodei is in Barcelona to attend the Neural Information Processing Systems conference (“NeurIPS,” for short): the biggest annual event in the AI community, having ballooned from several hundred attendees in the 2000s to more than thirteen thousand today. (The organizers note that if the conference continues to grow at the pace of the last ten years, by the year 2035 the entire human population will be in attendance.)9 But at this particular moment, Amodei’s mind isn’t on “scan order in Gibbs sampling,” or “regularizing Rademacher observation losses,” or “minimizing regret on reflexive Banach spaces,” or, for that matter, on Tolga Bolukbasi’s spotlight presentation, some rooms away, about gender bias in word2vec.10
He’s staring at a boat, and the boat is on fire.
He watches as it does donuts in a small harbor, crashing its stern into a stone quay. The motor catches fire. It continues to spin wildly, the spray dousing the flames. Then it slams into the side of a tugboat and catches fire again. Then it spins back into the quay.
It is doing this because Amodei ostensibly told it to. In fact it is doing exactly what he told it to. But it is not what he meant.
Amodei is a researcher on a project called Universe, where he is part of a team working to develop a single, general-purpose AI that can play hundreds of different computer games with human-level skill—a challenge that has been something of a holy grail among the AI community.
“And so I just, I ran a few of these environments,” Amodei tells me, “and I was VPNing in and looking to see how each one was doing. And then just the normal car race was going fine, and there was like a truck race or something, and then there was this boat race.” Amodei watches for a minute. “And I was looking at it, and I was like, ‘This boat is, like, going around in circles. Like, what in the world is going on?!’ ”11 The boat wasn’t simply acting randomly; it wasn’t wild or out of control. In fact, it was the opposite. It had settled on this. From the computer’s perspective, it has found a nearly perfect strategy, and was executing it to a T. Nothing made sense.
“Then I eventually looked at the reward,” he says.
Amodei had made the oldest mistake in the book: “rewarding A, while hoping for B.”12 What he wanted was for the machine to learn how to win the boat race. But it was complicated to express this rigorously—he would need to find a way to formalize complex concepts like track position, laps, placement among the other boats, and so on. Instead, he used what seemed like a sensible proxy: points. The machine found a loophole, a tiny harbor with replenishing power-ups where it could ignore the race entirely, do donuts, and rack up points . . . forever.
“And, of course, it’s partially my fault,” he says. “I just run these various games; I haven’t looked super closely at the objective function. . . . In the other ones, score was sensibly correlated to finishing the race. You got points for getting power-ups that were always along the road. . . . The proxy of score that came with the game was good for the other ten environments. But for this eleventh environment, it wasn’t good.”13
“People have criticized it by saying, ‘Of course, you get what you asked for,’ ” Amodei says. “It’s like, ‘You weren’t optimizing for finishing the race.’ And my response to that is, Well—” He pauses. “That’s true.”
Amodei posts a clip to his group’s Slack channel, where the episode is instantly deemed “hilarious” by all concerned. In its cartoonish, destructive slapstick, it certainly is. But for Amodei—who now leads the AI safety team at San Francisco research lab OpenAI—there is another, more sobering message. At some level, this is exactly what he’s worried about.
The real game he and his fellow researchers are playing isn’t to try to win boat races; it’s to try to get increasingly general-purpose AI systems to do what we want, particularly when what we want—and what we don’t want—is difficult to state directly or completely.
The boat scenario is admittedly just a warm-up, just practice. The property damage is entirely virtual. But it is practice for a game that is, in fact, no game at all. A growing chorus within the AI community—first a few voices on the fringe, and increasingly the mainstream of the field—believes, if we are not sufficiently careful, that this is literally how the world will end. And—for today at least—the humans have lost the game.
This is a book about machine learning and human values: about systems that learn from data without being explicitly programmed, and about how exactly—and what exactly—we are trying to teach them.
The field of machine learning comprises three major areas: In unsupervised learning, a machine is simply given a heap of data and—as with the word2vec system—told to make sense of it, to find patterns, regularities, useful ways of condensing or representing or visualizing it. In supervised learning, the system is given a series of categorized or labeled examples—like parolees who went on to be rearrested and others who did not—and told to make predictions about new examples it hasn’t seen yet, or for which the ground truth is not yet known. And in reinforcement learning, the system is placed into an environment with rewards and punishments—like the boat-racing track with power-ups and hazards—and told to figure out the best way to minimize the punishments and maximize the rewards.
On all three fronts, there is a growing sense that more and more of the world is being turned over, in one way or another, to these mathematical and computational models. Though they range widely in complexity—from something that might fit on a spreadsheet on the one hand, to something that might credibly be called artificial intelligence on the other—they are steadily replacing both human judgment and explicitly programmed software of the more traditional variety.
This is happening not only in technology, not only in commerce, but in areas with ethical and moral weight. State and federal law increasingly mandates the use of “risk-assessment” software to determine bail and parole. The cars and trucks on our freeways and neighborhood streets are increasingly driving themselves. We no longer assume that our mortgage application, our résumé, or our medical tests will be seen by human eyes before a verdict is rendered. It is as if the better part of humanity were, in the early twenty-first century, consumed by the task of gradually putting the world—figuratively and literally—on autopilot.
In recent years, alarm bells have gone off in two distinct communities. The first are those focused on the present-day ethical risks of technology. If a facial-recognition system is wildly inaccurate for people of one race or gender but not another, or if someone is denied bail because of a statistical model that has never been audited and that no one in the courtroom—including the judge, attorneys, and defendant—understands, this is a problem. Issues like these cannot be addressed within traditional disciplinary camps, but rather only through dialogue: between computer scientists, social scientists, lawyers, policy experts, ethicists. That dialogue has begun in a hurry.
The second are those worried about the future dangers that await as our systems grow increasingly capable of flexible, real-time decisionmaking, both online and in the physical world. The past decade has seen what is inarguably the most exhilarating, abrupt, and worrying progress in the history of machine learning—and, indeed, in the history of artificial intelligence. There is a consensus that a kind of taboo has been broken: it is no longer forbidden for AI researchers to discuss concerns of safety. In fact, such concerns have over the past five years moved from the fringes to become one of the central problems of the field.
Though there is a rivalry of sorts over whether the immediate or the longer-term issues should take priority, these two communities are united in their larger aims.
As machine-learning systems grow not just increasingly pervasive but increasingly powerful, we will find ourselves more and more often in the position of the “sorcerer’s apprentice”: we conjure a force, autonomous but totally compliant, give it a set of instructions, then scramble like mad to stop it once we realize our instructions are imprecise or incomplete—lest we get, in some clever, horrible way, precisely what we asked for.
How to prevent such a catastrophic divergence—how to ensure that these models capture our norms and values, understand what we mean or intend, and, above all, do what we want—has emerged as one of the most central and most urgent scientific questions in the field of computer science. It has a name: the alignment problem.
In reaction to this alarm—both that the bleeding edge of research is getting ever closer to developing so-called “general” intelligence and that real-world machine-learning systems are touching more and more ethically fraught parts of personal and civic life—has been a sudden, energetic response. A diverse group is mustering across traditional disciplinary lines. Nonprofits, think tanks, and institutes are taking root. Leaders within both industry and academia are speaking up, some of them for the first time, to sound notes of caution—and redirecting their research funding accordingly. The first generation of graduate students is matriculating who are focused explicitly on the ethics and safety of machine learning. The alignment problem’s first responders have arrived at the scene.
This book is the product of nearly a hundred formal interviews and many hundreds of informal conversations, over the course of four years and many tens of thousands of miles, with researchers and thinkers from this field’s young history and its sprawling frontier. What I found was a field finding its legs, amid exhilarating and sometimes terrifying progress. A story I thought I knew showed itself to be, by turns, more riveting, harrowing, and hopeful than I had understood.
Machine learning is an ostensibly technical field crashing increasingly on human questions. Our human, social, and civic dilemmas are becoming technical. And our technical dilemmas are becoming human, social, and civic. Our successes and failures alike in getting these systems to do “what we want,” it turns out, offer us an unflinching, revelatory mirror.
This is a story in three distinct parts. Part one explores the alignment problem’s beachhead: the present-day systems already at odds with our best intentions, and the complexities of trying to make those intentions explicit in systems we feel capable of overseeing. Part two turns the focus to reinforcement learning, as we come to understand systems that not only predict, but act; there are lessons here for understanding evolution, human motivation, and the delicacy of incentives, with implications for business and parenting alike. Part three takes us to the forefront of technical AI safety research, as we tour some of the best ideas currently going for how to align complex autonomous systems with norms and values too subtle or elaborate to specify directly.
For better or worse, the human story in the coming century is likely to be one of building such systems and setting them, one by one, in motion. Like the sorcerer’s apprentice, we will find ourselves just one set of agents among many, in a world crowded—as it were—with brooms.
How, exactly, do we intend to teach them?
And what?
In the summer of 1958, a group of reporters are gathered by the Office of Naval Research in Washington, D.C., for a demonstration by a twenty-nine-year-old researcher at the Cornell Aeronautical Laboratory named Frank Rosenblatt. Rosenblatt has built something he calls the “perceptron,” and in front of the assembled press corps he shows them what it can do.
Rosenblatt has a deck of flash cards, each of which has a colored square on it, either on the left side of the card or on the right. He pulls one card out of the deck and places it in front of the perceptron’s camera. The perceptron takes it in as a black-and-white, 20-by-20-pixel image, and each of those four hundred pixels is turned into a binary number: 0 or 1, dark or light. The four hundred numbers, in turn, are fed into a rudimentary neural network, the kind that McCulloch and Pitts had imagined in the early 1940s. Each of these binary pixel values is multiplied by an individual negative or positive “weight,” and then they are all added together. If the total is negative, it will output a −1 (meaning the square is on the left), and if it’s positive, it will output a 1 (meaning the square is on the right).
The perceptron’s four hundred weights are initially random, and its outputs, as a result, are nonsense. But every time the system guesses “wrong,” Rosenblatt “trains” it, by dialing up the weights that were too low and turning down the weights that were too high.
Fifty of these trials later, the machine now consistently tells left-side cards and right-side cards apart, including ones he hasn’t shown it before.
The demonstration itself is strikingly modest, but it signifies something grander. The machine is, in effect, learning from experience—what Rosenblatt calls a “self-induced change in the wiring diagram.”1
McCulloch and Pitts had imagined the neuron as a simple unit of input and output, of logic and arithmetic, and they had shown the enormous power of such rudimentary mechanisms, in great enough numbers and suitably connected. But they had said next to nothing about how exactly the “suitably connected” part was actually meant to be achieved.2
“Rosenblatt made a very strong claim, which at first I didn’t believe,” says MIT’s Marvin Minsky, coincidentally a former classmate of Rosenblatt’s at the Bronx High School of Science.3 “He said that if a perceptron was physically capable of being wired up to recognize something, then there would be a procedure for changing its responses so that eventually it would learn to carry out the recognition. Rosenblatt’s conjecture turned out to be mathematically correct, in fact. I have a tremendous admiration for Rosenblatt for guessing this theorem, since it is very hard to prove.”
The perceptron, simple as it is, forms the blueprint for much of the machine-learning systems we will go on to discuss. It contains a model architecture: in this case, a single artificial “neuron” with four hundred inputs, each with its own “weight” multiplier, which are then summed together and turned into an all-or-nothing output. The architecture has a number of adjustable variables, or parameters: in this case, the positive or negative multipliers attached to each input. There is a set of training data: in this case, a deck of flash cards with one of two types of shapes on them. The model’s parameters are tuned using an optimization algorithm, or training algorithm.
The basic training procedure for the perceptron, as well as its many contemporary progeny, has a technical-sounding name—“stochastic gradient descent”—but the principle is utterly straightforward. Pick one of the training data at random (“stochastic”) and input it to the model. If the output is exactly what you want, do nothing. If there is a difference between what you wanted and what you got, then figure out in which direction (“gradient”) to adjust each weight—whether by literal turning of physical knobs or simply the changing of numbers in software—to lower the error for this particular example. Move each of them a little bit in the appropriate direction (“descent”). Pick a new example at random, and start again. Repeat as many times as necessary.
This is the basic recipe for the field of machine learning—and the humble perceptron will be both an overestimation and an underestimation of what is to come.
“The Navy,” reports the New York Times, “revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.”4
The New Yorker writes that the perceptron, “as its name implies, is capable of original thought.” “Indeed,” they write, “it strikes us as the first serious rival to the human brain ever devised.”
Says Rosenblatt to the New Yorker reporter, “Our success in developing the perceptron means that for the first time a non-biological object will achieve an organization of its external environment in a meaningful way. That’s a safe definition of what the perceptron can do. My colleague disapproves of all the loose talk one hears nowadays about mechanical brains. He prefers to call our machine a self-organizing system, but, between you and me, that’s precisely what any brain is.”5
That same year, New Scientist publishes an equally hopeful, and slightly more sober, article called “Machines Which Learn.”6 “When machines are required to perform complicated tasks it would often be useful to incorporate devices whose precise mode of operation is not specified initially,” they write, “but which learn from experience how to do what is required. It would then be possible to produce machines to do jobs which have not been fully analysed because of their complexity. It seems likely that learning machines will play a part in such projects as the mechanical translation of languages and the automatic recognition of speech and of visual patterns.”
“The use of the term ‘learning machine’ invites comparison with the learning of people and animals,” the article continues. “The drawing of analogies between brains and machines requires caution to say the least, but in a general way it is stimulating for workers in either field to know something of what is happening in the other, and it is possible that speculation about machines which learn may eventually produce a system which is a true analogue of some form of biological learning.”
The history of artificial intelligence is famously one of cycles of alternating hope and gloom, and the Jetsonian future that the perceptron seemed to herald is slow to arrive.
Rosenblatt, with a few years of hindsight, will wish the press had used a bit more caution in their reactions to his invention. The popular press “fell to the task with all of the exuberance and sense of discretion of a pack of happy bloodhounds,” he says—while admitting, on his own behalf, a certain “lack of mathematical rigor in preliminary reports.”7
Minsky, despite his “tremendous admiration” for Rosenblatt and his machine, begins “to worry about what such a machine could not do.” In 1969, he and his MIT colleague Seymour Papert publish a book called Perceptrons that effectively slams the door shut on the entire vein of research. Minsky and Papert show, with the stiff formality of mathematical proof, that there are seemingly basic patterns that Rosenblatt’s model simply will never be able to recognize. For instance, it is impossible to train one of Rosenblatt’s machines to recognize when a card has an odd versus an even number of squares on it. The only way to recognize more complex categories like this is to use a network with multiple layers, with earlier layers creating a representation of the raw data, and the later layers operating on the representation. But no one knows how to tune the parameters of the early layers to make representations useful for the later ones. The field hits the equivalent of a brick wall. “There had been several thousand papers published on perceptrons up to 1969,” says Minsky.
“Our book put a stop to those.”8
It is as if a dark cloud has settled over the field, and everything falls apart: the research, the money, the people. Pitts, McCulloch, and Lettvin, who have all three moved to MIT, are sharply exiled after a misunderstanding with MIT’s Norbert Wiener, who had been like a second father figure to Pitts and now won’t speak to him. Pitts, alcoholic and depressed, throws all of his notes and papers into a fire, including an unpublished dissertation about three-dimensional neural networks that MIT tries desperately to salvage. Pitts dies from cirrhosis in May 1969, at the age of 46.9 A few months later Warren McCulloch, at the age of 70, succumbs to a heart seizure after a long series of cardiopulmonary problems. In 1971, while celebrating his 43rd birthday, Frank Rosenblatt drowns in a sailing accident on the Chesapeake Bay.
By 1973, both the US and British governments have pulled their funding support for neural network research, and when a young English psychology student named Geoffrey Hinton declares that he wants to do his doctoral work on neural networks, again and again he is met with the same reply: “Minsky and Papert,” he is told, “have proved that these models were no good.”10
It is 2012 in Toronto, and Alex Krizhevsky’s bedroom is too hot to sleep. His computer, attached to twin Nvidia GTX 580 GPUs, has been running day and night at its maximum thermal load, its fans pushing out hot exhaust, for two weeks.
“It was very hot,” he says. “And it was loud.”11
He is teaching the machine how to see.
Geoffrey Hinton, Krizhevsky’s mentor, is now 64 years old and has not given up. There is reason for hope.
By the 1980s it became understood that networks with multiple layers (so-called “deep” neural networks) could, in fact, be trained by examples just as a shallow one could.12 “I now believe,” admitted Minsky, “that the book was overkill.”13
By the late ’80s and early ’90s, a former postdoc of Hinton’s named Yann LeCun, working at Bell Labs, had trained neural networks to identify handwritten numerals from 0 to 9, and neural networks found their first major commercial use: reading zip codes in post offices, and deposit checks in ATMs.14 By the 1990s, LeCun’s networks were processing 10 to 20% of all checks in the United States.15
But the field hit another plateau, and by the 2000s, researchers were still largely stuck fiddling with databases of handwritten zip codes. It was understood that, in principle, a big-enough neural network, with enough training examples and time, can learn almost anything.16 But no one had fast-enough computers, enough data to train on, or enough patience to make good on that theoretical potential. Many lost interest, and the field of computer vision, along with computational linguistics, largely moved on to other things. As Hinton would later summarize, “Our labeled datasets were thousands of times too small. [And] our computers were millions of times too slow.”17 Both of these things, however, would change.
With the growth of the web, if you wanted not fifty but five hundred thousand “flash cards” for your network, suddenly you had a seemingly bottomless repository of images. There was only one problem, which was that they usually didn’t come with their category label readily attached. You couldn’t train a network unless you knew what the network’s output was supposed to be.
In 2005, Amazon launched its “Mechanical Turk” service, allowing for the recruiting of human labor on a large scale, making it possible to hire thousands of people to perform simple actions for pennies a click. (The service was particularly well suited to the kinds of things that future AI is thought to be able to do—hence its tagline: artificial artificial intelligence.) In 2007, Princeton professor Fei-Fei Li used Amazon Mechanical Turk to recruit human labor, at a scale previously unimaginable, to build a dataset that was previously impossible. It took more than two years to build, and had three million images, each labeled, by human hands, into more than five thousand categories. Li called it ImageNet, and released it in 2009. The field of computer vision suddenly had a mountain of new data to learn from, and a new grand challenge. Beginning in 2010, teams from around the world began competing to build a system that can reliably look at an image—dust mite, container ship, motor scooter, leopard—and say what it is.
Meanwhile, the relatively steady progress of Moore’s law throughout the 2000s meant that computers could do in minutes what the computers of the 1980s took days to do. One further development, however, turned out to be crucial. In the 1990s, the video-game industry began to produce dedicated graphics processors called GPUs, designed to render complex 3D scenes in real time; instead of executing instructions with perfect precision one after another, as a traditional CPU does, they are capable of doing a great many simple and sometimes approximate calculations at once.18 Only later, in the mid-2000s, did it come to be appreciated that the GPU could do a lot more than light and texture and shadow.19 It turned out that this hardware, designed for computer gaming, was in fact tailor-made for training neural networks.
At the University of Toronto, Alex Krizhevsky had taken a class on writing code for GPUs, and decided to try it on neural networks. He applied himself to a popular image-recognition benchmark called CIFAR-10, which contained thumbnail-sized images that each belonged to one of ten categories: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, or truck. Krizhevsky built a network and began using a GPU to train it to categorize CIFAR-10 images. Shockingly, he was able to train his network from a random starting configuration all the way to state-of-the-art accuracy. In eighty seconds.20
It is at this point Krizhevsky’s labmate, Ilya Sutskever, takes notice and offers him what will become a kind of siren song. “I bet,” Sutskever says, “you can make it work on ImageNet.”
They build an enormous neural network: 650,000 artificial neurons, arranged into 8 layers, connected by 60 million adjustable weights. In his bedroom at his parents’ house, Krizhevsky starts showing it pictures.
Step by step, piece by piece, the system gets a few percent more accurate.
The dataset—as big as it is, a few million pictures—isn’t enough. But Krizhevsky realizes he can fake it. He starts doing “data augmentation,” feeding the network mirror images of the data. That seems to help. He feeds it images that are cropped slightly, or tinted slightly. (A cat, after all, still looks like a cat when you lean forward or to the side, or go from natural to artificial light.) This seems to help.
He plays with different architectures—this number of layers, that number of layers—groping more or less blindly for what configuration might just happen to work best.
Krizhevsky occasionally loses the faith. Sutskever never does. Time and again he spurs Krizhevsky on. You can make it work.
“Ilya was like a religious figure,” he says. “It’s always good to have a religious figure.”
Trying out a new version of the model, and training it until the accuracy maxed out, takes about two weeks, running twenty-four hours a day—which means that the project, though at some level frantic, also has a lot of downtime. Krizhevsky thinks. And tinkers. And waits. Hinton has come up with an idea called “dropout,” where during training certain portions of the network get randomly turned off. Krizhevsky tries this, and it seems, for various reasons, to help. He tries using neurons with a so-called “rectified linear” output function. This, too, seems to help.
He submits his best model on the ImageNet competition deadline, September 30, and then the final wait begins.
Two days later, Krizhevsky gets an email from Stanford’s Jia Deng, who is organizing that year’s competition, cc’d to all of the entrants. In plain, unemotional language, Deng says to click the link provided to see the results.
Krizhevsky clicks the link provided and sees the results.
Not only has his team won, but they have obliterated the rest of the entire field. The neural network trained in his bedroom—its official name is “SuperVision,” but history will remember it simply as “AlexNet”—made half as many errors as the model that came in second.
By the Friday of the conference, when it is time for the ImageNet Large Scale Visual Recognition Challenge workshop, the word has spread. Krizhevsky has been given the final talk slot of the day, and at 5:05 p.m. he takes his place up at the presenter’s lectern. He looks around the room. In the front row is Fei-Fei Li; to the side is Yann LeCun. There is a majority, it seems, of the leading computer vision researchers in the world. The room is over capacity, with people standing along the aisles and walls.
“I was nervous,” he says. “I was not comfortable.”
And then, in front of the standing-room audience, not comfortable, Alex Krizhevsky tells them everything.
When Frank Rosenblatt was interviewed about his perceptron in 1958, he was asked what practical or commercial uses a machine like the perceptron might have. “At the moment, none whatever,” he replied cheerfully.21
“In these matters, you know, use follows invention.”
On Sunday evening, June 28, 2015, web developer Jacky Alciné was at home watching the BET Awards when he got a notification that a friend had shared a picture with him through Google Photos. When he opened Google Photos, he noticed the site had been redesigned. “I was like, ‘Oh, the UI’s changed!’ I remembered I/O [Google’s annual software developer conference] happened, but I was curious; I clicked through.”22 Google’s image recognition software had automatically identified groups of photos, and gave each a thematic caption. “Graduation,” said one—and Alciné was impressed that the system had managed to identify the mortarboard and tassel on his younger brother’s head. Another caption stopped him cold. The album cover was a selfie of Alciné and a friend of his. Alciné is Haitian-American; both he and his friend are Black.
“Gorillas,” it said.
“So I thought—To be honest, I thought that I did something.” He opened the album, expecting he had somehow misclicked or mistagged something. The album was full of dozens of photos of Alciné and his friend. And nothing else. “I’m like—This was seventy-plus photos. There’s no way. . . . That’s actually where I really realized what happened.”
Alciné took to Twitter. “Google Photos,” he wrote, “y’all fucked up. My friend’s not a gorilla.”23
Within two hours, Google+ chief architect Yonatan Zunger reached out. “Holy fuck,” he wrote. “This is 100% Not OK.”
Zunger’s team deployed a change to Google Photos within another few hours, and by the following morning, only two photos were still mislabeled. Then Google took a more drastic step: they removed the label entirely.
In fact, three years later, in 2018, Wired reported that the label “gorilla” was still manually deactivated on Google Photos. That means that, years later, nothing will be tagged as a gorilla, including gorillas.24
Curiously, the press in 2018, just as in 2015, appeared to repeatedly mischaracterize the nature of the mistake. Headlines proclaimed, “Two Years Later, Google Solves ‘Racist Algorithm’ Problem by Purging ‘Gorilla’ Label from Image Classifier”; “Google ‘Fixed’ Its Racist Algorithm by Removing Gorillas from Its Image-Labeling Tech”; and “Google Images ‘Racist Algorithm’ Has a Fix But It’s Not a Great One.”25
Being himself a programmer and familiar with machine-learning systems, Alciné knew the issue wasn’t a biased algorithm. (The algorithm was stochastic gradient descent, just about the most generic, vanilla, allpurpose idea in computer science: go through your training data at random, tune your model’s parameters to assign slightly higher probability to the correct category for that image, and repeat as needed.) No, what he immediately sensed was that something had gone terribly awry in the training data itself. “I couldn’t even blame the algorithm,” he says. “It’s not even the algorithm at fault. It did exactly what it was designed to do.”
The problem, of course, with a system that can, in theory, learn just about anything from a set of examples is that it finds itself, then, at the mercy of the examples from which it’s taught.
The extent to which we take everyday objects for granted is the precise extent to which they govern and inform our lives.
— MARGARET VISSER26
The single most photographed American of the nineteenth century—more than Abraham Lincoln or Ulysses S. Grant—was Frederick Douglass, the abolitionist author and lecturer who had himself escaped from slavery at the age of twenty.27 This was no accident; for Douglass, the photograph was just as important as the essay or the speech. The photograph was just coming into its own through the daguerreotype in the 1840s, and Douglass immediately understood its power.
Before the photograph, representations of Black Americans were limited to drawings, paintings, and engravings. “Negroes can never have impartial portraits at the hands of white artists,” Douglass wrote. “It seems to us next to impossible for white men to take likenesses of black men, without most grossly exaggerating their distinctive features.”28 One exaggeration, in particular, prevailed during Douglass’s time. “We colored men so often see ourselves described and painted as monkeys, that we think it a great piece of good fortune to find an exception to this general rule.”29
The photograph not only countered such caricatures but, further, made possible a kind of transcending empathy and recognition. “Whatever may be the prejudices of those who may look upon it,” said Douglass of a photograph of the first Black US senator, Hiram Revels, “they will be compelled to admit that the Mississippi Senator is a man.”30
But all was not entirely well. As photography became more standardized and mass-produced in the twentieth century, some began to feel that the field of photography was itself worthy of critique. As W.E.B. Du Bois wrote in 1923, “Why do not more young colored men and women take up photography as a career? The average white photographer does not know how to deal with colored skins and having neither sense of the delicate beauty or tone nor will to learn, he makes a horrible botch of portraying them.”
We often hear about the lack of diversity in film and television—among casts and directors alike—but we don’t often consider that this problem exists not only in front of the camera, not only behind the camera, but in many cases inside the camera itself. As Concordia University communications professor Lorna Roth notes, “Though the available academic literature is wide-ranging, it is surprising that relatively few of these scholars have focused their research on the skin-tone biases within the actual apparatuses of visual reproduction.”31
For decades, she writes, film manufacturers and film developers used a test picture as a color-balance benchmark. This test picture became known as the “Shirley card,” named after Shirley Page, a Kodak employee and the first model to pose for it.32 It perhaps goes without saying that Shirley and her successors were overwhelmingly White. The chemical processing of film was tuned accordingly, and as a result cameras simply didn’t take good photos of Black people.
(In video just as in photography, colors have for decades been calibrated to White skin. In the 1990s, Roth interviewed one of the camera operators on Saturday Night Live about the process of tuning the cameras before broadcast. He explained, “A good VCR person will have a color girl stand in front of the cameras and stay there while the technicians focus on her flesh tones to do their fine adjustments to balance the cameras. This color girl is always white.”)33
Amazingly, Kodak executives in the 1960s and ’70s described the major impetus for making film that was sensitive to a wider range of darker tones as having come not from the civil rights movement but from the furniture and chocolate industries, which complained that film wasn’t properly showing the grains of darker woods, or the difference between milk and dark chocolate.34
Former manager of Kodak Research Studios Earl Kage reflects on this period of research: “My little department became quite fat with chocolate, because what was in the front of the camera was consumed at the end of the shoot.” Asked about the fact that this was all happening against the backdrop of the civil rights movement, he adds, “It is fascinating that this has never been said before, because it was never Black flesh that was addressed as a serious problem that I knew of at the time.”35
In time Kodak began using models of more diverse skin tones. “I started incorporating black models pretty heavily in our testing, and it caught on very quickly,” recalls Kodak’s Jim Lyon. “I wasn’t attempting to be politically correct. I was just trying to give us a chance of making a better film, one that reproduced everybody’s skin tone in an appropriate way.”
By the 1990s, the official Kodak Shirley card now had three different models on it, of different races. Their Gold Max film—initially marketed with the claim that it could photograph a “dark horse in low light”—now featured in television commercials with diverse families. One depicts a Black boy in a bright white karate gi, smiling as he performs a kata and presumably receives his next belt. It says, “Parents, would you trust this moment to anything other than Kodak Gold film?”
Their original target audience had given them a problematic calibration measure. Now a new calibration measure had given them a new audience.
All machine-learning systems, from the perceptron onward, have a kind of Shirley card at their heart: namely, the set of data on which they were trained. If a certain type of data is underrepresented or absent from the training data but present in the real world, then all bets are off.36
As UC Berkeley’s Moritz Hardt argues, “The whole spiel about big data is that we can build better classifiers largely as a result of having more data. The contrapositive is that less data leads to worse predictions. Unfortunately, it’s true by definition that there is always proportionately less data available about minorities. This means that our models about minorities generally tend to be worse than those about the general population.”37
Alciné’s frustrated tweets the night of the incident echo exactly this sentiment. He’s a software engineer. He instantly diagnoses what has gone wrong. Google Photos, he infers, just didn’t have nearly as many pictures of Black people in it as pictures of White people. And so the model, seeing anything unfamiliar, was much more prone to error.
“Again, I can completely understand how that happens,” Alciné tells me.38 “Like if you take a picture of an apple, but only red apples, when it sees a green apple it might think it’s a pear. . . . Little things like that. That I understand. But then, you’re the world’s—Your mission is to index the entire world’s social knowledge, so how did you, like, just skip over an entire continent of people?”
The problems of the twentieth century appear to be repeating themselves uncannily in the twenty-first. Fortunately, it seems that some of the solutions are, too. All it would take was someone willing to question exactly who and what were represented in these twenty-first-century “Shirley cards,” anyway—and what a better one might look like.
When Joy Buolamwini was a computer science undergrad at Georgia Tech in the early 2010s, she was given an assignment to program a robot to play peekaboo. The programming part was easy, but there was one issue: the robot wouldn’t recognize Buolamwini’s face. “I borrowed my roommate’s face to get the project done, submitted the assignment, and figured, ‘You know what, somebody else will solve this problem.’ ”39
Later in her undergraduate studies, she traveled to Hong Kong for an entrepreneurship competition. A local startup was giving a demo of one of its “social robots.” The demo worked on everyone in the tour group . . . except for Buolamwini. As it happened, the startup was using the very same off-the-shelf, open-source face-recognition code that she herself had used back at Georgia Tech.
In one of the first articles explicitly addressing the notion of bias in computing systems, the University of Washington’s Batya Friedman and Cornell’s Helen Nissenbaum had warned that “computer systems, for instance, are comparatively inexpensive to disseminate, and thus, once developed, a biased system has the potential for widespread impact. If the system becomes a standard in the field, the bias becomes pervasive.”40
Or, as Buolamwini herself puts it, “Halfway around the world, I learned that algorithmic bias can travel as quickly as it takes to download some files off of the internet.”41
After a Rhodes Scholarship at Oxford, Buolamwini came to the MIT Media Lab, and there she began working on an augmented-reality project she dubbed the “Aspire Mirror.” The idea was to project empowering or uplifting visuals onto the user’s face—making the onlooker transform into a lion, for instance. Again, there was only one problem. The Aspire Mirror only worked on Buolamwini herself when she put on a white mask.
The culprit is not stochastic gradient descent; it is, clearly, the sets of images on which these systems are trained. Every face-detection or face-recognition system has, behind it and implicitly within it, a set of images—typically tens or hundreds of thousands—on which the system was originally trained and developed. This training data, the Shirley cards of the twenty-first century, is often invisible, or taken for granted, or absent entirely: a pretrained model disseminated online almost never comes with its training data included. But it is very much present, and will permanently shape the behavior of a deployed system.
A major movement in rooting out bias, then, is trying to better expose, and better understand, the training datasets behind major academic and commercial machine-learning systems.
One of the more popular public-domain databases of pictures of faces, for instance, is what’s known as the Labeled Faces in the Wild (LFW) dataset, painstakingly assembled in 2007 from online news articles and image captions by a team from UMass Amherst, and used by innumerable researchers thereafter.42 The composition of this database was not deeply studied, however, until many years later. In 2014, Michigan State’s Hu Han and Anil Jain analyzed the dataset and determined it was more than 77% male, and more than 83% White.43 The most common individual in the dataset is the person who appeared most often in online news photos in 2007: then-president George W. Bush, with 530 unique images. In fact, there are more than twice as many images of George W. Bush in the LFW dataset as there are of all Black women, combined.44
The original 2007 paper describing the database noted that a set of images gathered from online news articles “clearly has its own biases,” but these “biases” are considered from a technical, rather than social, standpoint: “For example, there are not many images which occur under extreme lighting conditions, or very low lighting conditions.” Such lighting issues aside, the authors write, “the range and diversity of pictures present is very large.”
Twelve years later, however, in the fall of 2019, a disclaimer suddenly appeared on the webpage of the Labeled Faces in the Wild dataset that takes a different view. It notes, “Many groups are not well represented in LFW. For example, there are very few children, no babies, very few people over the age of 80, and a relatively small proportion of women. In addition, many ethnicities have very minor representation or none at all.”45
In recent years, greater attention has been paid to the makeup of these training sets, though much remains to be done. In 2015, the United States Office of the Director of National Intelligence and the Intelligence Advanced Research Projects Activity released a face image dataset called IJB-A, boasting, they claimed, “wider geographic variation of subjects.”46 With Microsoft’s Timnit Gebru, Buolamwini did an analysis of the IJB-A and found that it was more than 75% male, and almost 80% light-skinned. Just 4.4% of the dataset were dark-skinned females.47
Eventually it became clear to Buolamwini that the “somebody else [who] will solve this problem” was—of course—her. She started a broad investigation into the current state of face-detection systems, which became her MIT thesis. She and Gebru set out first to build a dataset with a more balanced representation of both gender and skin tone. But where would they get their images from? Previous datasets, drawing from online news, for instance, were totally imbalanced. They decided on parliaments
