Transformer, BERT, and GPT - Mercury Learning and Information - E-Book

Transformer, BERT, and GPT E-Book

Mercury Learning and Information

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

This book offers an in-depth exploration of the Transformer architecture, BERT models, and the GPT series, including GPT-3 and GPT-4. Beginning with foundational concepts like the attention mechanism and tokenization techniques, it delves into the intricacies of Transformer and BERT architectures. Advanced topics cover the latest developments in the GPT series, including ChatGPT. Key chapters provide insights into the evolution and significance of attention in deep learning, the nuances of Transformer architecture, a detailed exploration of the BERT family, and hands-on guidance on working with GPT-3.
The journey continues with a comprehensive overview of ChatGPT, GPT-4, and visualization using generative AI. The book also discusses influential AI organizations such as DeepMind, OpenAI, Cohere, and Hugging Face. Readers will gain a thorough understanding of the current landscape of NLP models, their underlying architectures, and practical applications.
Companion files with numerous code samples and figures from the book enhance the learning experience, providing practical tools and resources. This book is an essential guide for those seeking to master the latest advancements in natural language processing and generative AI.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 460

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



LICENSE, DISCLAIMER OF LIABILITY, AND LIMITED WARRANTY

By purchasing or using this book and companion files (the “Work”), you agree that this license grants permission to use the contents contained herein, including the disc, but does not give you the right of ownership to any of the textual content in the book / disc or ownership to any of the information or products contained in it. This license does not permit uploading of the Work onto the Internet or on a network (of any kind) without the written consent of the Publisher. Duplication or dissemination of any text, code, simulations, images, etc. contained herein is limited to and subject to licensing terms for the respective products, and permission must be obtained from the Publisher or the owner of the content, etc., in order to reproduce or network any portion of the textual material (in any media) that is contained in the Work.

MERCURY LEARNING AND INFORMATION (“MLI” or “the Publisher”) and anyone involved in the creation, writing, or production of the companion disc, accompanying algorithms, code, or computer programs (“the software”), and any accompanying Web site or software of the Work, cannot and do not warrant the performance or results that might be obtained by using the contents of the Work. The author, developers, and the Publisher have used their best efforts to ensure the accuracy and functionality of the textual material and/or programs contained in this package; we, however, make no warranty of any kind, express or implied, regarding the performance of these contents or programs. The Work is sold “as is” without warranty (except for defective materials used in manufacturing the book or due to faulty workmanship).

The author, developers, and the publisher of any accompanying content, and anyone involved in the composition, production, and manufacturing of this work will not be liable for damages of any kind arising out of the use of (or the inability to use) the algorithms, source code, computer programs, or textual material contained in this publication. This includes, but is not limited to, loss of revenue or profit, or other incidental, physical, or consequential damages arising out of the use of this Work.

The sole remedy in the event of a claim of any kind is expressly limited to replacement of the book and/or disc, and only at the discretion of the Publisher. The use of “implied warranty” and certain “exclusions” vary from state to state, and might not apply to the purchaser of this product.

Companion files for this title are available by writing to the publisher at [email protected].

Copyright ©2024 by MERCURY LEARNING AND INFORMATION. An Imprint of DeGruyter, Inc. All rights reserved.

This publication, portions of it, or any accompanying software may not be reproduced in any way, stored in a retrieval system of any type, or transmitted by any means, media, electronic display or mechanical display, including, but not limited to, photocopy, recording, Internet postings, or scanning, without prior permission in writing from the publisher.

Publisher: David Pallai

MERCURY LEARNING AND INFORMATION

121 High Street, 3rd Floor

Boston, MA 02110

[email protected]

www.merclearning.com

800-232-0223

O. Campesato. Transformer, BERT, and GPT: Including ChatGPT and Prompt Engineering.

ISBN 978-1-68392-898-0

The publisher recognizes and respects all marks used by companies, manufacturers, and developers as a means to distinguish their products. All brand names and product names mentioned in this book are trademarks or service marks of their respective companies. Any omission or misuse (of any kind) of service marks or trademarks, etc. is not an attempt to infringe on the property of others.

Library of Congress Control Number: 2023945517

232425321  This book is printed on acid-free paper in the United States of America.

Our titles are available for adoption, license, or bulk purchase by institutions, corporations, etc. For additional information, please contact the Customer Service Dept. at 800-232-0223(toll free).

All of our titles are available in digital format at academiccourseware.com and other digital vendors. Companion files (code listings) for this title are available by contacting [email protected]. The sole obligation of MERCURY LEARNING AND INFORMATION to the purchaser is to replace the disc, based on defective materials or faulty workmanship, but not based on the operation or functionality of the product.

I’d like to dedicate this book to my parents

– may this bring joy and happiness into their lives.

CONTENTS

Preface

Chapter 1 Introduction

What is Generative AI?

Conversational AI Versus Generative AI

Is DALL-E Part of Generative AI?

Are ChatGPT-3 and GPT-4 Part of Generative AI?

DeepMind

OpenAI

Cohere

Hugging Face

AI21

InflectionAI

Anthropic

What are LLMs?

What is AI Drift?

Machine Learning and Drift (Optional)

What is Attention?

Calculating Attention: A High-Level View

An Example of Self Attention

Multi-Head Attention (MHA)

Summary

Chapter 2 Tokenization

What is Pre-Tokenization?

What is Tokenization?

Word, Character, and Subword Tokenizers

Trade-Offs with Character-Based Tokenizers

Subword Tokenization

Subword Tokenization Algorithms

Hugging Face Tokenizers and Models

Hugging Face Tokenizers

Tokenization for the DistilBERT Model

Token Selection Techniques in LLMs

Summary

Chapter 3 Transformer Architecture Introduction

Sequence-to-Sequence Models

Examples of seq2seq Models

What About RNNs and LSTMs?

Encoder/Decoder Models

Examples of Encoder/Decoder Models

Autoregressive Models

Autoencoding Models

The Transformer Architecture: Introduction

The Transformer is an Encoder/Decoder Model

The Transformer Flow and Its Variants

The transformers Library from Hugging Face

Transformer Architecture Complexity

Hugging Face Transformer Code Samples

Transformer and Mask-Related Tasks

Summary

Chapter 4 Transformer Architecture in Greater Depth

An Overview of the Encoder

What are Positional Encodings?

Other Details Regarding Encoders

An Overview of the Decoder

Encoder, Decoder, or Both: How to Decide?

Delving Deeper into the Transformer Architecture

Autoencoding Transformers

The “Auto” Classes

Improved Architectures

Hugging Face Pipelines and How They Work

Hugging Face Datasets

Transformers and Sentiment Analysis

Source Code for Transformer-Based Models

Summary

Chapter 5 The BERT Family Introduction

What is Prompt Engineering?

Aspects of LLM Development

Kaplan and Under-Trained Models

What is BERT?

BERT and NLP Tasks

BERT and the Transformer Architecture

BERT and Text Processing

BERT and Data Cleaning Tasks

Three BERT Embedding Layers

Creating a BERT Model

Training and Saving a BERT Model

The Inner Workings of BERT

Summary

Chapter 6 The BERT Family in Greater Depth

A Code Sample for Special BERT Tokens

BERT-Based Tokenizers

Sentiment Analysis with DistilBERT

BERT Encoding: Sequence of Steps

Sentence Similarity in BERT

Generating BERT Tokens (1)

Generating BERT Tokens (2)

The BERT Family

Working with RoBERTa

Italian and Japanese Language Translation

Multilingual Language Models

Translation for 1,000 Languages

M-BERT

Comparing BERT-Based Models

Web-Based Tools for BERT

Topic Modeling with BERT

What is T5?

Working with PaLM

Summary

Chapter 7 Working with GPT-3 Introduction

The GPT Family: An Introduction

GPT-2 and Text Generation

What is GPT-3?

GPT-3 Models

What is the Goal of GPT-3?

What Can GPT-3 Do?

Limitations of GPT-3

GPT-3 Task Performance

How GPT-3 and BERT are Different

The GPT-3 Playground

Inference Parameters

Overview of Prompt Engineering

Details of Prompt Engineering

Few-Shot Learning and Fine-Tuning LLMs

Summary

Chapter 8 Working with GPT-3 in Greater Depth

Fine-Tuning and Reinforcement Learning (Optional)

GPT-3 and Prompt Samples

Working with Python and OpenAI APIs

Text Completion in OpenAI

The Completion() API in OpenAI

Text Completion and Temperature

Text Classification with GPT-3

Sentiment Analysis with GPT-3

GPT-3 Applications

Open-Source Variants of GPT-3

Miscellaneous Topics

Summary

Chapter 9 ChatGPT and GPT-4

What is ChatGPT?

Plugins, Code Interpreter, and Code Whisperer

Detecting Generated Text

Concerns about ChatGPT

Sample Queries and Responses from ChatGPT

ChatGPT and Medical Diagnosis

Alternatives to ChatGPT

Machine Learning and ChatGPT: Advanced Data Analytics

What is InstructGPT?

VizGPT and Data Visualization

What is GPT-4?

ChatGPT and GPT-4 Competitors

LlaMa-2

When Will GPT-5 Be Available?

Summary

Chapter 10 Visualization with Generative AI

Generative AI and Art and Copyrights

Generative AI and GANs

What is Diffusion?

CLIP (OpenAI)

GLIDE (OpenAI)

Text-to-Image Generation

Text-to-Image Models

The DALL-E Models

DALL-E 2

DALL-E Demos

Text-to-Video Generation

Text-to-Speech Generation

Summary

Index

PREFACE

WHAT IS THE VALUE PROPOSITION FOR THIS BOOK?

This book begins with foundational concepts such as the attention mechanism, covers tokenization techniques, explores the nuances of Transformer and BERT architectures, and culminates in advanced topics related to the latest in the GPT series, including ChatGPT.

Key chapters provide insights into the evolution and significance of attention in deep learning, the intricacies of the Transformer architecture, a two-part exploration of the BERT family, and hands-on guidance on working with GPT-3. The concluding chapters present an overview of ChatGPT, GPT-4, and the world of visualization using DALL-E and generative AI.

In addition to the primary topics, the document also describes influential AI organizations such as DeepMind, OpenAI, Cohere, Hugging Face, and more. Through this guide, readers will gain a comprehensive understanding of the current landscape of NLP models, their underlying architectures, and practical applications.

THE TARGET AUDIENCE

This book is intended primarily for people who have a basic knowledge of machine learning or software developers who are interested in working with LLMs. Specifically, this book is for readers who are accustomed to searching online for more detailed information about technical topics. If you are a beginner, there are other books that may be more suitable for you, and you can find them by performing an online search.

This book is also intended to reach an international audience of readers with highly diverse backgrounds in various age groups. In addition, this book uses standard English rather than colloquial expressions that might be confusing to those readers. This book provide a comfortable and meaningful learning experience for the intended readers.

DO I NEED TO LEARN THE THEORY PORTIONS OF THIS BOOK?

Once again, the answer depends on the extent to which you plan to become involved in working with LLMs and generative AI. In addition to creating a model, you will use various algorithms to see which ones provide the level of accuracy (or some other metric) that you need for your project. In general, it’s probably worthwhile to learn the more theoretical aspects of LLMs that are discussed in this book.

GETTING THE MOST FROM THIS BOOK

Some people learn well from prose, others learn well from sample code (and lots of it), which means that there’s no single style that can be used for everyone.

Moreover, some programmers want to run the code first, see what it does, and then return to the code to delve into the details (and others use the opposite approach).

Consequently, there are various types of code samples in this book: some are short, some are long, and other code samples “build” from earlier code samples.

WHAT DO I NEED TO KNOW FOR THIS BOOK?

Although this book is introductory in nature, some knowledge of Python 3.x with certainly be helpful for the code samples. Knowledge of other programming languages (such as Java) can also be helpful because of the exposure to programming concepts and constructs. The less technical knowledge that you have, the more diligence will be required in order to understand the various topics that are covered.

If you want to be sure that you can grasp the material in this book, glance through some of the code samples to get an idea of how much is familiar to you and how much is new for you.

DOES THIS BOOK CONTAIN PRODUCTION-LEVEL CODE SAMPLES?

This book contains basic code samples that are written in Python, and their primary purpose is to show you how to access the functionality of LLMs such as BERT and GPT-3. Moreover, clarity has higher priority than writing more compact code that is more difficult to understand (and possibly more prone to bugs). If you decide to use any of the code in this book, you ought to subject that code to the same rigorous analysis as the other parts of your code base.

WHAT ARE THE NON-TECHNICAL PREREQUISITES FOR THIS BOOK?

Although the answer to this question is more difficult to quantify, it’s important to have a desire to learn about NLP, along with the motivation and discipline to read and understand the code samples. As a reminder, even simple APIs can be a challenge to understand them the first time you encounter them, so be prepared to read the code samples several times.

HOW DO I SET UP A COMMAND SHELL?

If you are a Mac user, there are three ways to do so. The first method is to use Finder to navigate to Applications > Utilities and then double click on the Utilities application. Next, if you already have a command shell available, you can launch a new command shell by typing the following command:

open /Applications/Utilities/Terminal.app

A second method for Mac users is to open a new command shell on a MacBook from a command shell that is already visible simply by clicking command+n in that command shell, and your Mac will launch another command shell.

If you are a PC user, you can install Cygwin (open source https://cygwin.com/) that simulates bash commands, or use another toolkit such as MKS (a commercial product). Please read the online documentation that describes the download and installation process. Note that custom aliases are not automatically set if they are defined in a file other than the main start-up file (such as .bash_login).

COMPANION FILES

All the code samples and figures in this book may be obtained by writing to the publisher at [email protected].

WHAT ARE THE “NEXT STEPS” AFTER FINISHING THIS BOOK?

The answer to this question varies widely, mainly because the answer depends heavily on your objectives. If you are interested primarily in NLP, then you can learn about other LLMs (large language models).

If you are primarily interested in machine learning, there are some subfields of machine learning, such as deep learning and reinforcement learning (and deep reinforcement learning) that might appeal to you. Fortunately, there are many resources available, and you can perform an Internet search for those resources. One other point: the aspects of machine learning you need depend on who you are: the needs of a machine learning engineer, data scientist, manager, student or software developer are all different.