What Is ChatGPT Doing - Stephen Wolfram - E-Book

What Is ChatGPT Doing E-Book

Stephen Wolfram

0,0
7,49 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Nobody expected this—not even its creators: ChatGPT has burst onto the scene as an AI capable of writing at a convincingly human level. But how does it really work? What's going on inside its "AI mind"? In this short book, prominent scientist and computation pioneer Stephen Wolfram provides a readable and engaging explanation that draws on his decades-long unique experience at the frontiers of science and technology. Find out how the success of ChatGPT brings together the latest neural net technology with foundational questions about language and human thought posed by Aristotle more than two thousand years ago.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 113

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



What Is ChatGPT Doing ... and Why Does It Work?

Copyright © 2023 Stephen Wolfram, LLC

Wolfram Media, Inc. | wolfram-media.com

ISBN-978-1-57955-081-3 (paperback) ISBN-978-1-57955-082-0 (ebook)

Technology/Computers

Library of Congress Cataloging-in-Publication Data:

Names: Wolfram, Stephen, 1959- author. Title: What is ChatGPT doing ... and why does it work? / Stephen Wolfram. Other titles: ChatGPT Description: First edition. | [Champaign, Illinois] : Wolfram Media, Inc., [2023] | Includes bibliographical references. Identifiers: LCCN 2023009927 (print) | LCCN 2023009928 (ebook) | ISBN 9781579550813 (paperback) | ISBN 9781579550820 (ebook) Subjects: LCSH: Natural language generation (Computer science)—Computer programs. | Neural networks (Computer science) | ChatGPT. | Wolfram language (Computer program language) Classification: LCC QA76.9.N38 W65 2023 (print) | LCC QA76.9.N38 (ebook) | DDC 006.3/5—dc23/eng/20230310 LC record available at https://lccn.loc.gov/2023009927 LC ebook record available at https://lccn.loc.gov/2023009928

For permission to reproduce images, contact [email protected].

Visit the online version of this text at wolfr.am/SW-ChatGPT and wolfr.am/ChatGPT-WA. Click any picture to copy the code behind it.

ChatGPT screenshots were generated with GPT-3, OpenAI’s AI system that produces natural language.

First edition.

Contents

Preface

What Is ChatGPT Doing ... and Why Does It Work?

It’s Just Adding One Word at a Time · Where Do the Probabilities Come From? · What Is a Model? · Models for Human-Like Tasks · Neural Nets · Machine Learning, and the Training of Neural Nets · The Practice and Lore of Neural Net Training · “Surely a Network That’s Big Enough Can Do Anything!” · The Concept of Embeddings · Inside ChatGPT · The Training of ChatGPT · Beyond Basic Training · What Really Lets ChatGPT Work? · Meaning Space and Semantic Laws of Motion · Semantic Grammar and the Power of Computational Language · So ... What Is ChatGPT Doing, and Why Does It Work? · Thanks

Wolfram|Alpha as the Way to Bring Computational Knowledge Superpowers to ChatGPT

ChatGPT and Wolfram|Alpha · A Basic Example · A Few More Examples · The Path Forward

Additional Resources

Preface

This short book is an attempt to explain from first principles how and why ChatGPT works. In some ways it’s a story about technology. But it’s also a story about science. As well as about philosophy. And to tell the story, we’ll have to bring together a remarkable range of ideas and discoveries made across many centuries.

For me it’s exciting to see so many things I’ve so long been interested in come together in a burst of sudden progress. From the complex behavior of simple programs to the core character of language and meaning, and the practicalities of large computer systems—all of these are part of the ChatGPT story.

ChatGPT is based on the concept of neural nets—originally invented in the 1940s as an idealization of the operation of brains. I myself first programmed a neural net in 1983—and it didn’t do anything interesting. But 40 years later, with computers that are effectively a million times faster, with billions of pages of text on the web, and after a whole series of engineering innovations, the situation is quite different. And—to everyone’s surprise—a neural net that is a billion times larger than the one I had in 1983 is capable of doing what was thought to be that uniquely human thing of generating meaningful human language.

This book consists of two pieces that I wrote soon after ChatGPT debuted. The first is an explanation of ChatGPT and its ability to do the very human thing of generating language. The second looks forward to ChatGPT being able to use computational tools to go beyond what humans can do, and in particular being able to leverage the computational knowledge “superpowers” of our Wolfram|Alpha system.

It’s only been three months since ChatGPT launched, and we are just beginning to understand its implications, both practical and intellectual. But for now its arrival is a reminder that even after everything that has been invented and discovered, surprises are still possible.

Stephen Wolfram February 28, 2023

What Is ChatGPT Doing ... and Why Does It Work?

(February 14, 2023)

It’s Just Adding One Word at a Time

That ChatGPT can automatically generate something that reads even superficially like human-written text is remarkable, and unexpected. But how does it do it? And why does it work? My purpose here is to give a rough outline of what’s going on inside ChatGPT—and then to explore why it is that it can do so well in producing what we might consider to be meaningful text. I should say at the outset that I’m going to focus on the big picture of what’s going on—and while I’ll mention some engineering details, I won’t get deeply into them. (And the essence of what I’ll say applies just as well to other current “large language models” [LLMs] as to ChatGPT.)

The first thing to explain is that what ChatGPT is always fundamentally trying to do is to produce a “reasonable continuation” of whatever text it’s got so far, where by “reasonable” we mean “what one might expect someone to write after seeing what people have written on billions of webpages, etc.”

So let’s say we’ve got the text “The best thing about AI is its ability to”. Imagine scanning billions of pages of human-written text (say on the web and in digitized books) and finding all instances of this text—then seeing what word comes next what fraction of the time. ChatGPT effectively does something like this, except that (as I’ll explain) it doesn’t look at literal text; it looks for things that in a certain sense “match in meaning”. But the end result is that it produces a ranked list of words that might follow, together with “probabilities”:

And the remarkable thing is that when ChatGPT does something like write an essay what it’s essentially doing is just asking over and over again “given the text so far, what should the next word be?”—and each time adding a word. (More precisely, as I’ll explain, it’s adding a “token”, which could be just a part of a word, which is why it can sometimes “make up new words”.)

But, OK, at each step it gets a list of words with probabilities. But which one should it actually pick to add to the essay (or whatever) that it’s writing? One might think it should be the “highest-ranked” word (i.e. the one to which the highest “probability” was assigned). But this is where a bit of voodoo begins to creep in. Because for some reason—that maybe one day we’ll have a scientific-style understanding of—if we always pick the highest-ranked word, we’ll typically get a very “flat” essay, that never seems to “show any creativity” (and even sometimes repeats word for word). But if sometimes (at random) we pick lower-ranked words, we get a “more interesting” essay.

The fact that there’s randomness here means that if we use the same prompt multiple times, we’re likely to get different essays each time. And, in keeping with the idea of voodoo, there’s a particular so-called “temperature” parameter that determines how often lower-ranked words will be used, and for essay generation, it turns out that a “temperature” of 0.8 seems best. (It’s worth emphasizing that there’s no “theory” being used here; it’s just a matter of what’s been found to work in practice. And for example the concept of “temperature” is there because exponential distributions familiar from statistical physics happen to be being used, but there’s no “physical” connection—at least so far as we know.)

Before we go on I should explain that for purposes of exposition I’m mostly not going to use the full system that’s in ChatGPT; instead I’ll usually work with a simpler GPT-2 system, which has the nice feature that it’s small enough to be able to run on a standard desktop computer. And so for essentially everything I show I’ll be able to include explicit Wolfram Language code that you can immediately run on your computer.

For example, here’s how to get the table of probabilities above. First, we have to retrieve the underlying “language model” neural net:

Later on, we’ll look inside this neural net, and talk about how it works. But for now we can just apply this “net model” as a black box to our text so far, and ask for the top 5 words by probability that the model says should follow:

This takes that result and makes it into an explicit formatted “dataset”:

Here’s what happens if one repeatedly “applies the model”—at each step adding the word that has the top probability (specified in this code as the “decision” from the model):

What happens if one goes on longer? In this (“zero temperature”) case what comes out soon gets rather confused and repetitive:

But what if instead of always picking the “top” word one sometimes randomly picks “non-top” words (with the “randomness” corresponding to “temperature” 0.8)? Again one can build up text:

And every time one does this, different random choices will be made, and the text will be different—as in these 5 examples:

It’s worth pointing out that even at the first step there are a lot of possible “next words” to choose from (at temperature 0.8), though their probabilities fall off quite quickly (and, yes, the straight line on this log-log plot corresponds to an n–1“power-law” decay that’s very characteristic of the general statistics of language):

So what happens if one goes on longer? Here’s a random example. It’s better than the top-word (zero temperature) case, but still at best a bit weird:

This was done with the simplest GPT-2 model (from 2019). With the newer and bigger GPT-3 models the results are better. Here’s the top-word (zero temperature) text produced with the same “prompt”, but with the biggest GPT-3 model:

And here’s a random example at “temperature 0.8”:

Where Do the Probabilities Come From?

OK, so ChatGPT always picks its next word based on probabilities. But where do those probabilities come from? Let’s start with a simpler problem. Let’s consider generating English text one letter (rather than word) at a time. How can we work out what the probability for each letter should be?

A very minimal thing we could do is just take a sample of English text, and calculate how often different letters occur in it. So, for example, this counts letters in the Wikipedia article on “cats”:

And this does the same thing for “dogs”:

The results are similar, but not the same (“o” is no doubt more common in the “dogs” article because, after all, it occurs in the word “dog” itself). Still, if we take a large enough sample of English text we can expect to eventually get at least fairly consistent results:

Here’s a sample of what we get if we just generate a sequence of letters with these probabilities:

We can break this into “words” by adding in spaces as if they were letters with a certain probability:

We can do a slightly better job of making “words” by forcing the distribution of “word lengths” to agree with what it is in English:

We didn’t happen to get any “actual words” here, but the results are looking slightly better. To go further, though, we need to do more than just pick each letter separately at random. And, for example, we know that if we have a “q”, the next letter basically has to be “u”.

Here’s a plot of the probabilities for letters on their own:

And here’s a plot that shows the probabilities of pairs of letters (“2-grams”) in typical English text. The possible first letters are shown across the page, the second letters down the page:

And we see here, for example, that the “q” column is blank (zero probability) except on the “u” row. OK, so now instead of generating our “words” a single letter at a time, let’s generate them looking at two letters at a time, using these “2-gram” probabilities. Here’s a sample of the result—which happens to include a few “actual words”:

With sufficiently much English text we can get pretty good estimates not just for probabilities of single letters or pairs of letters (2-grams), but also for longer runs of letters. And if we generate “random words” with progressively longer n-gram probabilities, we see that they get progressively “more realistic”:

But let’s now assume—more or less as ChatGPT does—that we’re dealing with whole words, not letters. There are about 40,000 reasonably commonly used words in English. And by looking at a large corpus of English text (say a few million books, with altogether a few hundred billion words), we can get an estimate of how common each word is. And using this we can start generating “sentences”, in which each word is independently picked at random, with the same probability that it appears in the corpus. Here’s a sample of what we get:

Not surprisingly, this is nonsense. So how can we do better? Just like with letters, we can start taking into account not just probabilities for single words but probabilities for pairs or longer n-grams of words. Doing this for pairs, here are 5 examples of what we get, in all cases starting from the word “cat”:

It’s getting slightly more “sensible looking”. And we might imagine that if we were able to use sufficiently long n-grams we’d basically “get a ChatGPT”—in the sense that we’d get something that would generate essay-length sequences of words with the “correct overall essay probabilities”. But here’s the problem: there just isn’t even close to enough English text that’s ever been written to be able to deduce those probabilities.

In a