31,19 €
BERT (bidirectional encoder representations from transformer) has revolutionized the world of natural language processing (NLP) with promising results. This book is an introductory guide that will help you get to grips with Google's BERT architecture. With a detailed explanation of the transformer architecture, this book will help you understand how the transformer’s encoder and decoder work.
You’ll explore the BERT architecture by learning how the BERT model is pre-trained and how to use pre-trained BERT for downstream tasks by fine-tuning it for NLP tasks such as sentiment analysis and text summarization with the Hugging Face transformers library. As you advance, you’ll learn about different variants of BERT such as ALBERT, RoBERTa, and ELECTRA, and look at SpanBERT, which is used for NLP tasks like question answering. You'll also cover simpler and faster BERT variants based on knowledge distillation such as DistilBERT and TinyBERT. The book takes you through MBERT, XLM, and XLM-R in detail and then introduces you to sentence-BERT, which is used for obtaining sentence representation. Finally, you'll discover domain-specific BERT models such as BioBERT and ClinicalBERT, and discover an interesting variant called VideoBERT.
By the end of this BERT book, you’ll be well-versed with using BERT and its variants for performing practical NLP tasks.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 384
Veröffentlichungsjahr: 2021
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Kunal Parikh Publishing Product Manager: Devika BattikeContent Development Editor:Sean LoboSenior Editor: Roshan KumarTechnical Editor:Manikandan KurupCopy Editor:Safis EditingProject Coordinator:Aishwarya MohanProofreader: Safis EditingIndexer:Priyanka DhadkeProduction Designer: Prashant Ghare
First published: January 2021
Production reference: 1210121
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-83882-159-3
www.packt.com
To my adorable mom, Kasthuri, and to my beloved dad, Ravichandiran.
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Sudharsan Ravichandiran is a data scientist, researcher, and bestselling author. He completed his bachelor's in information technology at Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning, including natural language processing and computer vision. He is an open source contributor and loves answering questions on Stack Overflow. He also authored a best seller, Hands-On Reinforcement Learning with Python, published by Packt Publishing.
Dr. ArmandoFandango creates AI-empowered products by leveraging reinforcement learning, deep learning, and distributed computing. Armando has provided thought leadership in diverse roles at small and large enterprises, including Accenture, Nike, Sonobi, and IBM, along with advising high-tech AI-based start-ups. Armando has authored several books, including Mastering TensorFlow, TensorFlow Machine Learning Projects, and Python Data Analysis, and has published research in international journals and presented his research at conferences. Dr. Armando’s current research and product development interests lie in the areas of reinforcement learning, deep learning, edge AI, and AI in simulated and real environments (VR/XR/AR).Ashwin Sreenivas is the cofounder and chief technology officer of Helia AI, a computer vision company that structures and understands the world's video. Prior to this, he was a deployment strategist at Palantir Technologies. Ashwin graduated in Phi Beta Kappa from Stanford University with a master's degree in artificial intelligence and a bachelor's degree in computer science.Gabriel Bianconi is the founder of Scalar Research, an artificial intelligence and data science consulting firm. Past clients include start-ups backed by YCombinator and leading venture capital firms (for example, Scale AI, and Fandom), investment firms, and their portfolio companies (for example, the Two Sigma-backed insurance firm MGA), and large enterprises (for example, an industrial conglomerate in Asia, and a leading strategy consulting firm). Beyond consulting, Gabriel is a frequent speaker at major technology conferences and a reviewer on top academic conferences (for example, ICML) and AI textbooks. Previously, he received B.S. and M.S. degrees in computer science from Stanford University, where he conducted award-winning research in computer vision and deep learning.Mani Kanteswara has a bachelor's and a master's in finance (tech) from BITS Pilani with over 10 years of strong technical expertise and statistical knowledge of analytics. He is currently working as a lead strategist with Google and has previously worked as a senior data scientist at WalmartLabs. He has worked in deep learning, computer vision, machine learning, and the natural language processing space building solutions/frameworks capable of solving different business problems and building algorithmic products. He has extensive expertise in solving problems in IoT, telematics, social media, the web, and the e-commerce space. He strongly believes that learning concepts with a practical implementation of the subject and exploring its application areas leads to a great foundation.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Getting Started with Google BERT
Dedication
About Packt
Why subscribe?
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1 - Starting Off with BERT
A Primer on Transformers
Introduction to the transformer
Understanding the encoder of the transformer
Self-attention mechanism
Understanding the self-attention mechanism
Step 1
Step 2
Step 3
Step 4
Multi-head attention mechanism
Learning position with positional encoding
Feedforward network
Add and norm component
Putting all the encoder components together
Understanding the decoder of a transformer
Masked multi-head attention
Multi-head attention
Feedforward network
Add and norm component
Linear and softmax layers
Putting all the decoder components together
Putting the encoder and decoder together
Training the transformer
Summary
Questions
Further reading
Understanding the BERT Model
Basic idea of BERT
Working of BERT
Configurations of BERT
BERT-base
BERT-large
Other configurations of BERT
Pre-training the BERT model
Input data representation
Token embedding
Segment embedding
Position embedding
Final representation
WordPiece tokenizer
Pre-training strategies
Language modeling
Auto-regressive language modeling
Auto-encoding language modeling
Masked language modeling
Whole word masking
Next sentence prediction
Pre-training procedure
Subword tokenization algorithms
Byte pair encoding
Tokenizing with BPE
Byte-level byte pair encoding
WordPiece
Summary
Questions
Further reading
Getting Hands-On with BERT
Exploring the pre-trained BERT model
Extracting embeddings from pre-trained BERT
Hugging Face transformers
Generating BERT embeddings
Preprocessing the input
Getting the embedding
Extracting embeddings from all encoder layers of BERT
Extracting the embeddings
Preprocessing the input
Getting the embeddings
Fine-tuning BERT for downstream tasks
Text classification
Fine-tuning BERT for sentiment analysis
Importing the dependencies
Loading the model and dataset
Preprocessing the dataset
Training the model
Natural language inference
Question-answering
Performing question-answering with fine-tuned BERT
Preprocessing the input
Getting the answer
Named entity recognition
Summary
Questions
Further reading
Section 2 - Exploring BERT Variants
BERT Variants I - ALBERT, RoBERTa, ELECTRA, and SpanBERT
A Lite version of BERT
Cross-layer parameter sharing
Factorized embedding parameterization
Training the ALBERT model
Sentence order prediction
Comparing ALBERT with BERT
Extracting embeddings with ALBERT
Robustly Optimized BERT pre-training Approach
Using dynamic masking instead of static masking
Removing the NSP task
Training with more data points
Training with a large batch size
Using BBPE as a tokenizer
Exploring the RoBERTa tokenizer
Understanding ELECTRA
Understanding the replaced token detection task
Exploring the generator and discriminator of ELECTRA
Training the ELECTRA model
Exploring efficient training methods
Predicting span with SpanBERT
Understanding the architecture of SpanBERT
Exploring SpanBERT
Performing Q and As with pre-trained SpanBERT
Summary
Questions
Further reading
BERT Variants II - Based on Knowledge Distillation
Introducing knowledge distillation
Training the student network
DistilBERT – the distilled version of BERT
Teacher-student architecture
The teacher BERT
The student BERT
Training the student BERT (DistilBERT)
Introducing TinyBERT
Teacher-student architecture
Understanding the teacher BERT
Understanding the student BERT
Distillation in TinyBERT
Transformer layer distillation
Attention-based distillation
Hidden state-based distillation
Embedding layer distillation
Prediction layer distillation
The final loss function
Training the student BERT (TinyBERT)
General distillation
Task-specific distillation
The data augmentation method
Transferring knowledge from BERT to neural networks
Teacher-student architecture
The teacher BERT
The student network
Training the student network
The data augmentation method
Understanding the masking method
Understanding the POS-guided word replacement method
Understanding the n-gram sampling method
The data augmentation procedure
Summary
Questions
Further reading
Section 3 - Applications of BERT
Exploring BERTSUM for Text Summarization
Text summarization
Extractive summarization
Abstractive summarization
Fine-tuning BERT for text summarization
Extractive summarization using BERT
BERTSUM with a classifier
BERTSUM with a transformer and LSTM
BERTSUM with an inter-sentence transformer
BERTSUM with LSTM
Abstractive summarization using BERT
Understanding ROUGE evaluation metrics
Understanding the ROUGE-N metric
ROUGE-1
ROUGE-2
Understanding ROUGE-L
The performance of the BERTSUM model
Training the BERTSUM model
Summary
Questions
Further reading
Applying BERT to Other Languages
Understanding multilingual BERT
Evaluating M-BERT on the NLI task
Zero-shot
TRANSLATE-TEST
TRANSLATE-TRAIN
TRANSLATE-TRAIN-ALL
How multilingual is multilingual BERT?
Effect of vocabulary overlap
Generalization across scripts
Generalization across typological features
Effect of language similarity
Effect of code switching and transliteration
Code switching
Transliteration
M-BERT on code switching and transliteration
The cross-lingual language model
Pre-training strategies
Causal language modeling
Masked language modeling
Translation language modeling
Pre-training the XLM model
Evaluation of XLM
Understanding XLM-R
Language-specific BERT
FlauBERT for French
Getting a representation of a French sentence with FlauBERT
French Language Understanding Evaluation
BETO for Spanish
Predicting masked words using BETO
BERTje for Dutch
Next sentence prediction with BERTje
German BERT
Chinese BERT
Japanese BERT
FinBERT for Finnish
UmBERTo for Italian
BERTimbau for Portuguese
RuBERT for Russian
Summary
Questions
Further reading
Exploring Sentence and Domain-Specific BERT
Learning about sentence representation with Sentence-BERT
Computing sentence representation
Understanding Sentence-BERT
Sentence-BERT with a Siamese network
Sentence-BERT for a sentence pair classification task
Sentence-BERT for a sentence pair regression task
Sentence-BERT with a triplet network
Exploring the sentence-transformers library
Computing sentence representation using Sentence-BERT
Computing sentence similarity
Loading custom models
Finding a similar sentence with Sentence-BERT
Learning multilingual embeddings through knowledge distillation
Teacher-student architecture
Using the multilingual model
Domain-specific BERT
ClinicalBERT
Pre-training ClinicalBERT
Fine-tuning ClinicalBERT
Extracting clinical word similarity
BioBERT
Pre-training the BioBERT model
Fine-tuning the BioBERT model
BioBERT for NER tasks
BioBERT for question answering
Summary
Questions
Further reading
Working with VideoBERT, BART, and More
Learning language and video representations with VideoBERT
Pre-training a VideoBERT model
Cloze task
Linguistic-visual alignment
The final pre-training objective
Data source and preprocessing
Applications of VideoBERT
Predicting the next visual tokens
Text-to-video generation
Video captioning
Understanding BART
Architecture of BART
Noising techniques
Token masking
Token deletion
Token infilling
Sentence shuffling
Document rotation
Comparing different pre-training objectives
Performing text summarization with BART
Exploring BERT libraries
Understanding ktrain
Sentiment analysis using ktrain
Building a document answering model
Document summarization
bert-as-service
Installing the library
Computing sentence representation
Computing contextual word representation
Summary
Questions
Further reading
Assessments
Chapter 1, A Primer on Transformers
Chapter 2, Understanding the BERT Model
Chapter 3, Getting Hands-On with BERT
Chapter 4, BERT Variants I – ALBERT, RoBERTa, ELECTRA, SpanBERT
Chapter 5, BERT Variants II – Based on Knowledge Distillation
Chapter 6, Exploring BERTSUM for Text Summarization
Chapter 7, Applying BERT to Other Languages
Chapter 8, Exploring Sentence- and Domain-Specific BERT
Chapter 9, Working with VideoBERT, BART, and More
Other Books You May Enjoy
Leave a review - let other readers know what you think
Section 1 - Starting Off with BERT
In this section, we will familiarize ourselves with BERT. First, we will understand how the transformer works, and then we will explore BERT in detail. We will also get hands-on with BERT and learn how to use the pre-trained BERT model.
The following chapters are included in this section:
Chapter 1
, A Primer on Transformers
Chapter 2
, Understanding the BERT Model
Chapter 3
, Getting Hands–On with BERT