Generative AI Foundations in Python - Carlos Rodriguez - E-Book

Generative AI Foundations in Python E-Book

Carlos Rodriguez

0,0
28,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The intricacies and breadth of generative AI (GenAI) and large language models can sometimes eclipse their practical application. It is pivotal to understand the foundational concepts needed to implement generative AI. This guide explains the core concepts behind -of-the-art generative models by combining theory and hands-on application.
Generative AI Foundations in Python begins by laying a foundational understanding, presenting the fundamentals of generative LLMs and their historical evolution, while also setting the stage for deeper exploration. You’ll also understand how to apply generative LLMs in real-world applications. The book cuts through the complexity and offers actionable guidance on deploying and fine-tuning pre-trained language models with Python. Later, you’ll delve into topics such as task-specific fine-tuning, domain adaptation, prompt engineering, quantitative evaluation, and responsible AI, focusing on how to effectively and responsibly use generative LLMs.
By the end of this book, you’ll be well-versed in applying generative AI capabilities to real-world problems, confidently navigating its enormous potential ethically and responsibly.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 289

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Generative AI Foundations in Python

Discover key techniques and navigate modern challenges in LLMs

Carlos Rodriguez

Generative AI Foundations in Python

Copyright © 2024 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Niranjan Naikwadi

Publishing Product Manager: Tejashwini R

Book Project Manager: Hemangi Lotlikar

Senior Editor: Shrishti Pandey

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Proofreader: Shrishti Pandey

Indexer: Manju Arasan

Production Designer: Alishon Mendonca

Senior DevRel Marketing Coordinator: Vinishka Kalra

First published: July 2024

Production reference: 1020724

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK

ISBN 978-1-83546-082-5

www.packtpub.com

To the memory of my late friend, Austin Tribble, for exemplifying resilience and determination. To my wife, Jill Rodriguez, whose brilliance and intellectual curiosity have inspired me every day since the day we met.

– Carlos Rodriguez

Foreword

Large Language Models (LLMs) are poised to transform the way we interact with technology, offering unprecedented capabilities in understanding and generating human language. They have become essential tools in numerous applications, from chatbots and virtual assistants to content creation and translation services. For a subject that is extremely dynamic and complex, Carlos has managed to distill years of expertise into a work that is both accessible and comprehensive. This book not only demystifies the complexities of LLMs but also provides a comprehensive guide for practitioners and enthusiasts alike. So, it is with great pride and excitement that I pen this foreword for my good friend and esteemed colleague, Carlos Rodriguez, whose work on LLMs delves into the intricacies of model architecture, training methodologies, and practical implementations, all while maintaining a clarity that ensures readers, regardless of their background, can grasp the fundamental principles and potential applications of LLMs. Our journey together began only two short years ago; however, we found ourselves to be kindred spirits in the ever-evolving world of AI. From the outset, I was struck by Carlos’ insatiable curiosity and unyielding dedication to the field of AI. Over numerous discussions and collaborative projects, I have witnessed firsthand the depth of his knowledge, the rigor of his research, and the passion that fuels his relentless pursuit of innovation. What sets Generative AI Foundations in Python apart is Carlos’ unique ability to blend technical depth with practical insights. Each chapter is a testament to his meticulous approach and his commitment to bridging the gap between theoretical concepts and real-world solutions. Interweaving real-world examples, code snippets, and practical considerations ensures that seasoned professionals or newcomers to the field will find this book to be an invaluable resource. In closing, I invite you to embark on this journey with an open mind and a passion for learning. The landscape of LLMs is vast; there is no better guide than the one you hold in your hands. May this book inspire, educate, and ignite a passion for learning and discovery in every reader. Enjoy the journey.

– Samira Shaikh, PhD.

VP of Data Science, Artificial Intelligence, and Advanced Analytics, Popular Bank Associate Professor of Computer Science, UNC Charlotte

Contributors

About the author

Carlos Rodriguez is the Director of AI risk at a major financial institution, where he oversees the validation of cutting-edge AI and machine learning models, including generative AI, to ensure that they remain trustworthy, unbiased, and compliant with stringent regulatory standards. With a degree in data science, numerous professional certifications, and two decades of experience in emerging technology, Carlos is a recognized expert in natural language processing and machine learning. Throughout his career, he has fostered and led high-performing machine learning engineering and data science teams specializing in natural language processing and AI risk, respectively. Known for his human-centered approach to AI, Carlos is a passionate autodidact who continuously expands his knowledge as a data scientist, machine learning practitioner, and risk executive. His current focus lies in developing a comprehensive framework for evaluating generative AI models within a regulatory setting, aiming to set new industry standards for responsible AI adoption and deployment.

I want to express my gratitude to everyone who supported me throughout this process, with special thanks to my wife, Jill, for her unwavering support. I also want to extend a thank you to my parents, particularly my mother, a lifelong educator who has always encouraged me to find every opportunity to teach others. Finally, a special thanks to Morgan, Eric, Jeremy, Rose, and Samira, who so graciously took the time to review the manuscript at various stages.

About the reviewers

Morgan Boyce’s education includes bachelor’s degrees in economics and finance, a master’s degree in mathematical finance, and a PhD in economics. He has worked in the financial services industry for nearly 20 years in various roles such as economic and quantitative research, model development, analytics, and model validation. Outside the financial services industry, Morgan’s research focuses on the economics of technological innovation as well as public entrepreneurship. He also teaches various economics courses at university level.

Eric Rui is a distinguished technology and data leader in the financial services industry, renowned for his expertise in Data and AI. With a career dedicated to driving innovation and efficiency, Eric leverages cutting-edge technologies and data-driven insights to transform organizational processes. His strategic vision and technical acumen make him a key influencer, excelling in creating robust solutions that enhance data utilization and analytical capabilities. Additionally, Eric has deep knowledge of practical generative AI, applying advanced machine learning techniques to drive business growth and optimize decision-making.

Table of Contents

Preface

Part 1: Foundations of Generative AI and the Evolution of Large Language Models

1

Understanding Generative AI: An Introduction

Generative AI

Distinguishing generative AI from other AI models

Briefly surveying generative approaches

Clarifying misconceptions between discriminative and generative paradigms

Choosing the right paradigm

Looking back at the evolution of generative AI

Overview of traditional methods in NLP

Arrival and evolution of transformer-based models

Development and impact of GPT-4

Looking ahead at risks and implications

Introducing use cases of generative AI

The future of generative AI applications

Summary

References

2

Surveying GenAI Types and Modes: An Overview of GANs, Diffusers, and Transformers

Understanding General Artificial Intelligence (GAI) Types – distinguishing features of GANs, diffusers, and transformers

Deconstructing GAI methods – exploring GANs, diffusers, and transformers

A closer look at GANs

A closer look at diffusion models

A closer look at generative transformers

Applying GAI models – image generation using GANs, diffusers, and transformers

Working with Jupyter Notebook and Google Colab

Stable diffusion transformer

Scoring with the CLIP model

Summary

References

3

Tracing the Foundations of Natural Language Processing and the Impact of the Transformer

Early approaches in NLP

Advent of neural language models

Distributed representations

Transfer Learning

Advent of NNs in NLP

The emergence of the Transformer in advanced language models

Components of the transformer architecture

Sequence-to-sequence learning

Evolving language models – the AR Transformer and its role in GenAI

Implementing the original Transformer

Data loading and preparation

Tokenization

Data tensorization

Dataset creation

Embeddings layer

Positional encoding

Multi-head self-attention

FFN

Encoder layer

Encoder

Decoder layer

Decoder

Complete transformer

Training function

Translation function

Main execution

Summary

References

4

Applying Pretrained Generative Models: From Prototype to Production

Prototyping environments

Transitioning to production

Mapping features to production setup

Setting up a production-ready environment

Local development setup

Visual Studio Code

Project initialization

Docker setup

Requirements file

Application code

Creating a code repository

CI/CD setup

Model selection – choosing the right pretrained generative model

Meeting project objectives

Model size and computational complexity

Benchmarking

Updating the prototyping environment

GPU configuration

Loading pretrained models with LangChain

Setting up testing data

Quantitative metrics evaluation

Alignment with CLIP

Interpreting outcomes

Responsible AI considerations

Addressing and mitigating biases

Transparency and explainability

Final deployment

Testing and monitoring

Maintenance and reliability

Summary

Part 2: Practical Applications of Generative AI

5

Fine-Tuning Generative Models for Specific Tasks

Foundation and relevance – an introduction to fine-tuning

PEFT

LoRA

AdaLoRA

In-context learning

Fine-tuning versus in-context learning

Practice project: Fine-tuning for Q&A using PEFT

Background regarding question-answering fine-tuning

Implementation in Python

Evaluation of results

Summary

References

6

Understanding Domain Adaptation for Large Language Models

Demystifying domain adaptation – understanding its history and importance

Practice project: Transfer learning for the finance domain

Training methodologies for financial domain adaptation

Evaluation and outcome analysis – the ROUGE metric

Summary

References

7

Mastering the Fundamentals of Prompt Engineering

The shift to prompt-based approaches

Basic prompting – guiding principles, types, and structures

Guiding principles for model interaction

Prompt elements and structure

Elevating prompts – iteration and influencing model behaviors

LLMs respond to emotional cues

Effect of personas

Situational prompting or role-play

Advanced prompting in action – few-shot learning and prompt chaining

Practice project: Implementing RAG with LlamaIndex using Python

Summary

References

8

Addressing Ethical Considerations and Charting a Path Toward Trustworthy Generative AI

Ethical norms and values in the context of generative AI

Investigating and minimizing bias in generative LLMs and generative image models

Constrained generation and eliciting trustworthy outcomes

Constrained generation with fine-tuning

Constrained generation through prompt engineering

Understanding jailbreaking and harmful behaviors

Practice project: Minimizing harmful behaviors with filtering

Summary

References

Index

Other Books You May Enjoy

Part 1: Foundations of Generative AI and the Evolution of Large Language Models

This part provides an overview of generative AI and the role of large language models. It covers the basics of generative AI, different types of generative models, including GANs, diffusers, and transformers, and the foundational aspects of natural language processing. Additionally, it explores how pretrained generative models can be applied from prototype to production, setting the stage for more advanced topics.

This part contains the following chapters:

Chapter 1, Understanding Generative AI: An IntroductionChapter 2, Surveying GenAI Types and Modes: An Overview of GANs, Diffusers, and TransformersChapter 3, Tracing the Foundations of Natural Language Processing and the Impact of the TransformerChapter 4, Applying Pretrained Generative Models: From Prototype to Production

1

Understanding Generative AI: An Introduction

In his influential book The Singularity Is Near (2005), renowned inventor and futurist Ray Kurzweil asserted that we were on the precipice of an exponential acceleration in technological advancements. He envisioned a future where technological innovation would continue to accelerate, eventually leading to a singularity—a point where artificial intelligence (AI) could transcend human intelligence, blurring the lines between humans and machines. Fast-forward to today and we find ourselves advancing along the trajectory Kurzweil outlined, with generative AI marking a significant stride along this path. Today, we are experiencing state-of-the-art generative models can behave as collaborators capable of synthetic understanding and generating sophisticated responses that mirror human intelligence.. The rapid and exponential growth of generative approaches is propelling Kurzweil’s vision forward, fundamentally reshaping how we interact with technology.

In this chapter, we lay the conceptual groundwork for anyone hoping to apply generative AI to their work, research, or field of study, broadening a fundamental understanding of what this technology does, how it was derived, and how it can be used. It establishes how generative models differ from classical machine learning (ML) paradigms and elucidates how they discern complex relationships and idiosyncrasies in data to synthesize human-like text, audio, and video. We will explore critical foundational generative methods, such as generative adversarial networks (GANs), diffusion models, and transformers, with a particular emphasis on their real-world applications.

Additionally, this chapter hopes to dispel some common misunderstandings surrounding generative AI and provides guidelines to adopt this emerging technology ethically, considering its environmental footprint and advocating for responsible development and adoption. We will also highlight scenarios where generative models are apt for addressing business challenges. By the conclusion of this chapter, we will better understand the potential of generative AI and its applications across a wide array of sectors and have critically assessed the risks, limitations, and long-term considerations.

Whether your interest is casual, you are a professional transitioning from a different field, or you are an established practitioner in the fields of data science or ML, this chapter offers a contextual understanding to make informed decisions regarding the responsible adoption of generative AI.

Ultimately, we aim to establish a foundation through an introductory exploration of generative AI and large language models (LLMs), dissected into two parts.

The beginning of the book will introduce the fundamentals and history of generative AI, surveying various types, such as GANs, diffusers, and transformers, tracing the foundations of natural language generation (NLG), and demonstrating the basic steps to implement generative models from prototype to production. Moving forward, we will focus on slightly more advanced application fundamentals, including fine-tuning generative models, prompt engineering, and addressing ethical considerations toward the responsible adoption of generative AI. Let’s get started.

Generative AI

In recent decades, AI has made incredible strides. The origins of the field stem from classical statistical models meticulously designed to help us analyze and make sense of data. As we developed more robust computational methods to process and store data, the field shifted—intersecting computer science and statistics and giving us ML. ML systems could learn complex relationships and surface latent insights from vast amounts of data, transforming our approach to statistical modeling.

This shift laid the groundwork for the rise of deep learning, a substantial step forward that introduced multi-layered neural networks (i.e., a system of interconnected functions) to model complex patterns. Deep learning enabled powerful discriminative models that became pivotal for advancements in diverse fields of research, including image recognition, voice recognition, and natural language processing.

However, the journey continues with the emergence of generative AI. Generative AI harnesses the power of deep learning to accomplish a broader objective. Instead of classifying and discriminating data, generative AI seeks to learn and replicate data distributions to “create” entirely new and seemingly original data, mirroring human-like output.

Distinguishing generative AI from other AI models

Again, the critical distinction between discriminative and generative models lies in their objectives. Discriminative models aim to predict target outputs given input data. Classification algorithms, such as logistic regression or support vector machines, find decision boundaries in data to categorize inputs as belonging to one or more class. Neural networks learn input-output mappings by optimizing weights through backpropagation (or tracing back to resolve errors) to make accurate predictions. Advanced gradient boosting models, such as XGBoost or LightGBM, further enhance these discriminative models by employing decision trees and incorporating the principles of gradient boosting (or the strategic ensembling of models) to make highly accurate predictions.

Generative methods learn complex relationships through expansive training in order to generate new data sequences enabling many downstream applications. Effectively, these models create synthetic outputs by replicating the statistical patterns and properties discovered in training data, capturing nuances and idiosyncrasies that closely reflect human behaviors.

In practice, a discriminative image classifier labels images containing a cat or a dog. In contrast, a generative model can synthesize diverse, realistic cat or dog images by learning the distributions of pixels and implicit features from existing images. Moreover, generative models can be trained across modalities to unlock new possibilities in synthesis-focused applications to generate human-like photographs, videos, music, and text.

There are several key methods that have formed the foundation for many of the recent advancements in Generative AI, each with unique approaches and strengths. In the next section, we survey generative advancements over time, including adversarial networks, variational autoencoders, diffusion models, and autoregressive transformers, to better understand their impact and influence.

Briefly surveying generative approaches

Modern generative modeling encompasses diverse architectures suited to different data types and distinct tasks. Here, we briefly introduce some of the key approaches that have emerged over the years, bringing us to the state-of-the-art models:

Generative adversarial networks (GANs) involve two interconnected neural networks—one acting as a generator to create realistic synthetic data and the other acting as a discriminator that distinguishes between real and synthetic (fake) data points. The generator and discriminator are adversaries in a zero-sum game, each fighting to outperform the other. This adversarial relationship gradually improves the generator’s capacity to produce vividly realistic synthetic data, making GANs adept at creating intricate image distributions and achieving photo-realistic image synthesis.Variational autoencoders (VAEs) employ a unique learning method to compress data into a simpler form (or latent representation). This process involves an encoder and a decoder that work conjointly (Kingma & Welling, 2013). While VAEs may not be the top choice for image quality, they are unmatched in efficiently separating and understanding complex data patterns.Diffusion models continuously add Gaussian noise to data over multiple steps to corrupt it. Gaussian noise can be thought of as random variations applied to a signal to distort it, creating “noise”. Diffusion models are trained to eliminate the added noise to recover the original data distribution. This type of reverse engineering process equips diffusion models to generate diverse, high-quality samples that closely replicate the original data distribution, producing diverse high-fidelity images (Ho et al., 2020).Autoregressive transformers leverage parallelizable self-attention to model complex sequential dependencies, showing exceptional performance in language-related tasks (Vaswani et al., 2017). Pretrained models such as GPT-4 or Claude have demonstrated the capability for generalizations in natural language tasks and impressive human-like text generation. Despite ethical issues and misuse concerns, transformers have emerged as the frontrunners in language modeling and multimodal generation.

Collectively, these methodologies paved the way for advanced generative modeling across a wide array of domains, including images, videos, audio, and text. While architectural and engineering innovations progress daily, generative methods showcase unparalleled synthesis capabilities across diverse modalities. Throughout the book, we will explore and apply generative methods to simulate real-world scenarios. However, before diving in, we further distinguish generative methods from traditional ML methods by addressing some common misconceptions.

Clarifying misconceptions between discriminative and generative paradigms

To better understand the distinctive capabilities and applications of traditional ML models (often referred to as discriminative) and generative methods, here, we clear up some common misconceptions and myths:

Myth 1: Generative models cannot recognize patterns as effectively as discriminative models.

Truth: State-of-the-art generative models are well-known for their impressive abilities to recognize and trace patterns, rivaling some discriminative models. Despite primarily focusing on creative synthesis, generative models display classification capabilities. However, the classes output from a generative model can be difficult to explain as generative models are not explicitly trained to learn decision boundaries or predetermined relationships. Instead, they may only learn to simulate classification based on labels learned implicitly (or organically) during training. In short, in cases where the explanation of model outcomes is important, classification using a discriminative model may be the better choice.

Example: Consider GPT-4. In addition to synthesizing human-like text, it can understand context, capture long-range dependencies, and detect patterns in texts. GPT-4 uses these intrinsic language processing capabilities to discriminate between classes, such as traditional classifiers. However, because GPT learns semantic relationships through extensive training, explaining its decision-making cannot be accomplished using any established methods.

Myth 2: Generative AI will eventually replace discriminative AI.

Truth: This is a common misunderstanding. Discriminative models have consistently been the option for high-stakes prediction tasks because they focus directly on learning the decision boundary between classes, ensuring high precision and reliability. More importantly, discriminative models can be explained post-hoc, making them the ultimate choice for critical applications in sectors such as healthcare, finance, and security. However, generative models may increasingly become more popular for high-stakes modeling as explainability techniques emerge.

Example: Consider a discriminative model trained specifically for disease prediction in healthcare. A specialized model can classify data points (e.g., images of skin) as healthy or unhealthy, giving healthcare professionals a tool for early intervention and treatment plans. Post-hoc explanation methods, such as SHAP, can be employed to identify and analyze the key features that influence classification outcomes. This approach offers clear insights into the specific results (i.e., feature attribution).

Myth 3: Generative models continuously learn from user input.

Truth: Not exactly. Generative LLMs are trained using a static approach. This means they learn from a vast training data corpora, and their knowledge is limited to the information contained within that training window. While models can be augmented with additional data or in-context information to help them contextualize, giving the impression of real-time learning, the underlying model itself is essentially frozen and does not learn in real time.

Example: GPT-3 was trained in 2020 and only contained information up to that date until its successor GPT-3.5, released in March of 2023. Naturally, GPT-4 was trained on more recent data, but due to training limitations (including diminishing performance returns), it is reasonable to expect that subsequent training checkpoints will be released periodically and not continuously.

While generative and discriminative models have distinct strengths and limitations, knowing when to apply each paradigm requires evaluating several key factors. As we have clarified some common myths about their capabilities, let’s turn our attention to guidelines for selecting the right approach for a given task or problem.

Choosing the right paradigm

The choice between generative and discriminative models depends on various factors, such as the task or problem at hand, the quality and quantity of data available, the desired output, and the level of performance required. The following is a list of key considerations:

Task specificity: Discriminative models are more suitable for high-stakes applications, such as disease diagnosis, fraud detection, or credit risk assessment, where precision is crucial. However, generative models are more adept at creative tasks such as synthesizing images, text, music, or video.Data availability: Discriminative models tend to overfit (or memorize examples) when trained on small datasets, which may lead to poor generalization. On the other hand, because generative models are often pretrained on vast amounts of data, they can produce a diverse output even with minimal input, making them a viable choice when data are scarce.Model performance: Discriminative models outperform generative models in tasks where it is crucial to learn and explain a decision boundary between classes or where expected relationships in the data are well understood. Generative models usually excel in less constrained tasks that require a measure of perceived creativity and flexibility.Model explainability: While both paradigms can include models that are considered “black boxes” or not intrinsically interpretable, generative models can be more difficult, or at times, impossible to explain, as they often involve complex data generation processes that rely on understanding the underlying data distribution. Alternatively, discriminative models often focus on learning the boundary between classes. In use cases where model explainability is a key requirement, discriminative models may be more suitable. However, generative explainability research is gaining traction.Model complexity: Generally, discriminative models require less computational power because they learn to directly predict some output given a well-defined set of inputs.

Alternatively, generative models may consume more computational resources, as their training objective is to jointly capture the intricate hidden relationships between both inputs and presumed outputs. Accurately learning these intricacies requires vast amounts of data and large computations. Computational efficiency in generative LLM training (e.g., quantization) is a vibrant area of research.

Ultimately, the choice between generative and discriminative models should be made by considering the trade-offs involved. Moreover, the adoption of these paradigms requires different levels of infrastructure, data curation, and other prerequisites. Occasionally, a hybrid approach that combines the strengths of both models can serve as an ideal solution. For example, a pretrained generative model can be fine-tuned as a classifier. We will learn about task-specific fine-tuning in Chapter 5.

Now that we have explored the key distinctions between traditional ML (i.e., discriminative) and generative paradigms, including their distinct risks, we can look back at how we arrived at this paradigm shift. In the next section, we take a brief look at the evolution of generative AI.

Looking back at the evolution of generative AI

The field of generative AI has experienced an unprecedented acceleration, leading to a surge in the development and adoption of foundation models such as GPT. However, this momentum has been building for several decades, driven by continuous and significant advancements in ML and natural language generation research. These developments have brought us to the current generation of state-of-the-art models.

To fully appreciate the current state of generative AI, it is important to understand its evolution, beginning with traditional language processing techniques and moving through to more recent advancements.

Overview of traditional methods in NLP

Natural language processing (NLP) technology has enabled machines to understand, interpret, and generate human language. It emerged from traditional statistical techniques such as n-grams and hidden Markov models (HMMs), which converted linguistic structures into mathematical models that machines could understand.

Initially, n-grams and HMMs were the primary methods used in NLP. N-grams predicted the next word in a sequence based on the last “n” words, while HMMs modeled sequences by considering every word as a state in a Markov process. These early methods were good at capturing local patterns and short-range dependencies in language.

As computational power and data availability grew, more sophisticated techniques for natural language processing emerged. Among these was the recurrent neural network (RNN), which managed relationships across extended sequences and was proven to be effective in tasks where prior context influenced future predictions.

Subsequently, long short-term memory networks (LSTMs) were developed.

Unlike traditional RNNs, LSTMs had a unique ability to retain relevant long-term information while disregarding irrelevant data, maintaining semantic relationships across prolonged sequences.

Further advancements led to the introduction of sequence-to-sequence models, often utilizing LSTMs as their underlying structure. These models revolutionized fields such as machine translation and text summarization by dramatically improving efficiency and effectiveness.