29,99 €
Become an expert in Generative AI through immersive, hands-on projects that leverage today’s most powerful models for Natural Language Processing (NLP) and computer vision. Generative AI with Python and PyTorch is your end-to-end guide to creating advanced AI applications, made easy by Raghav Bali, a seasoned data scientist with multiple patents in AI, and Joseph Babcock, a PhD and machine learning expert. Through business-tested approaches, this book simplifies complex GenAI concepts, making learning both accessible and immediately applicable.
From NLP to image generation, this second edition explores practical applications and the underlying theories that power these technologies. By integrating the latest advancements in LLMs, it prepares you to design and implement powerful AI systems that transform data into actionable intelligence.
You’ll build your versatile LLM toolkit by gaining expertise in GPT-4, LangChain, RLHF, LoRA, RAG, and more. You’ll also explore deep learning techniques for image generation and apply styler transfer using GANs, before advancing to implement CLIP and diffusion models.
Whether you’re generating dynamic content or developing complex AI-driven solutions, this book equips you with everything you need to harness the full transformative power of Python and AI.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Veröffentlichungsjahr: 2025
Generative AI with Python and PyTorch
Second Edition
Navigating the AI frontier with LLMs, Stable Diffusion, and next-gen AI applications
Joseph Babcock
Raghav Bali
Generative AI with Python and PyTorch
Second Edition
Copyright © 2025 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Portfolio Director: Gebin George
Relationship Lead: Vignesh Raju
Project Manager: Prajakta Naik
Content Engineer: Deepayan Bhattacharjee
Technical Editor: Rahul Limbachiya
Copy Editor: Safis Editing
Indexer: Rekha Nair
Proofreader: Safis Editing
Production Designer: Ajay Patule
Growth Lead: Kunal Sawant
First published: April 2021
Second edition: March 2025
Production reference: 1240325
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-83588-444-7
www.packtpub.com
Joseph Babcock has spent over a decade working with big data and AI in the e-commerce, digital streaming, and quantitative finance domains. Throughout his career, he has worked on recommender systems, petabyte-scale cloud data pipelines, A/B testing, causal inference, and time-series analysis. He completed his PhD studies at Johns Hopkins University, applying machine learning to drug discovery and genomics.
Raghav Bali is a Principal Data Scientist at Delivery Hero. With more than 14 years of experience, is involved in the research and development of data-driven, enterprise-level solutions based on machine learning, deep learning, and natural language processing. He has published multiple peer-reviewed papers at leading conferences, eight well-received books with major publishers, and is a co-inventor of more than 10 patents across various domains. His recent books include Generative AI with Python and TensorFlow 2 and Hands-On Transfer Learning with Python.
To my wife, parents, and teachers, without whom this would not have been possible. To all the researchers whose work continues to inspire me to learn. And to my co-author, Joseph, the reviewers, and the Packt team (especially Pradeep, Namrata, Bhavesh, Deepayan, Vignesh, and Prajakta) for their hard work in transforming our work into this amazing book.
Ajinkya Pahinka is an ML engineer with expertise in deep learning, computer vision, and NLP. He has worked on projects spanning the tire industry, agriculture, and satellite imaging. Ajinkya holds a master’s degree in data science from Indiana University Bloomington, where he conducted research in biomedical image segmentation and NLP. His work on tire defect detection using CNNs was published at an IEEE conference, and he has authored research on computer vision in internationally recognized journals. Ajinkya has contributed to machine learning initiatives for agricultural pest prediction and satellite image enhancement as part of an ISRO-funded project. He is currently a software developer at ServiceLink, a subsidiary of Fidelity National Financial, where he works on cutting-edge financial products in the mortgage industry.
Darshil Modi is an AI research engineer at DeGirum Corp, a semiconductor company that ships AI models on its hardware. He earned a master’s degree in computer science from Santa Clara University and has over five years of experience in NLP and AI. He has helped numerous Silicon Valley startups build LLM-based products and is the creator of the LLM framework AutoMeta RAG, published by LlamaIndex and Qdrant. A tech speaker, Darshil has been invited to various conferences and events to discuss tackling real-world challenges using AI and LLMs. He is also a technical reviewer for several publications and is co-authoring a book on RAG with Manning Publications. His expertise lies in bridging business problems with comprehensive, end-to-end AI solution architectures and executing them efficiently.
Have questions about the book or want to contribute to discussions on Generative AI and LLMs? Join our Discord server at https://packt.link/I1tSU and our Reddit channel at https://packt.link/rmYYs to connect, share, and collaborate with like-minded AI professionals.
Discord QR
Reddit QR
Preface
Who this book is for
What this book covers
To get the most out of this book
Get in touch
Introduction to Generative AI: Drawing Data from Models
Discriminative versus generative models
Implementing generative models
The rules of probability
Discriminative and generative modeling, and Bayes’ theorem
Why generative models?
The promise of deep learning
Generating images
Data augmentation
Style transfer and image transformation
Fake news and chatbots
Unique challenges of generative models
Summary
References
Building Blocks of Deep Neural Networks
Perceptrons: A brain in a function
From tissues to TLUs
From TLUs to tuning perceptrons
Multilayer perceptrons and backpropagation
Backpropagation in practice
The shortfalls of backpropagation
Varieties of networks: convolution and recursive
Networks for seeing: convolutional architectures
Early CNNs
AlexNet and other CNN innovations
AlexNet architecture
Networks for sequential data
RNNs and LSTMs
Transformers
Building a better optimizer
Gradient descent to ADAM
Xavier initialization
Summary
References
The Rise of Methods for Text Generation
Text representation
Sparse representations (Bag of Words)
Dense representations
Word2vec
GloVe
FastText
Contextual representations
Text generation and the magic of LSTMs
Language models
Hands-on: Character-level language model
Decoding strategies
Greedy decoding
Beam search
Sampling
Hands-on: Decoding strategies
LSTM variants and convolutions for text
Bidirectional LSTMs
Convolutions and text
Summary
References
NLP 2.0: Using Transformers to Generate Text
Attention
Self-attention
Transformers
Overall architecture
Multi-head self-attention
Positional encodings
NLP tasks and transformer architectures
Encoder-only architectures
Decoder-only architectures
Encoder-decoder architectures
DistilBERT in action
Hands-on with DistilBERT
Text generation with GPT
Generative re-training: GPT
GPT-2
Hands-on with GPT-2
GPT-3
Summary
References
Join our communities on Discord and Reddit
LLM Foundations
Recap: Transformer architectures
Updated training setup
Instruction fine-tuning
Hands-on: Instruction tuning
Problem statement
Dataset preparation
Training setup
Analyze the results
Reinforcement Learning with Human Feedback (RLHF)
Hands-on: RLHF using PPO
Problem statement
Dataset preparation
PPO setup
Reward model
Training loop
Analyze training results
LLMs
Summary
Open-Source LLMs
The LLaMA models
Exploring LLaMA 8B in Hugging Face
Mixtral
Dolly
Falcon
Grok-1
Summary
References
Join our communities on Discord and Reddit
Prompt Engineering
Prompt engineering
Prompt design fundamentals
System instructions
Prompt template
Context preprocessing
LLM parameters
Prompting strategies
Be clear and specific
Use system instructions
Break down complex tasks
Provide examples
Add contextual information
Prompting techniques
Task-specific prompting techniques
Advanced prompting techniques
Chain of Thought
Tree of Thought
ReAct
Self-consistency
Cross-domain prompting
Adversarial prompting
Jailbreaks
Prompt injection and leakage
Defence mechanisms
Limitations of prompt engineering
Summary
References
LLM Toolbox
The LangChain ecosystem
Building a simple LLM application
Creating an LLM chain
Creating the LLM application
Logging LLM results to LangSmith
Creating complex applications with LangGraph
Adding a chat interface
Adding a vector store for RAG
Adding a memory thread
Adding a human interrupt
Adding a search function
Summary
References
Join our communities on Discord and Reddit
LLM Optimization Techniques
Why optimize?
Pre-training optimizations
Data efficiency
Architectural improvements
Quantization and mixed precision
Architectural efficiencies
Mixture of experts
Fine-tuning optimizations
Parameter efficient fine-tuning
Additive PEFT
Reparameterization PEFT
Inference time improvements
Emerging trends and research areas
Alternate architectures
Specialized hardware and frameworks
Small foundational models
Summary
References
Emerging Applications in Generative AI
Advances in model development
Improved text generation
Improved reinforcement learning
Model distillation
New usages for LLMs
Detecting hallucinations
Multi-modal models
AI agents
Summary
References
Neural Networks Using VAEs
Creating separable encodings of images
The variational objective
The reparameterization trick
Inverse autoregressive flow
Importing CIFAR
Creating the network in PyTorch
Creating a Bernoulli MLP layer
Creating a Gaussian MLP layer
Combining subnetworks in a VAE
Summary
References
Join our communities on Discord and Reddit
Image Generation with GANs
Generative adversarial networks
Discriminator model
Generator model
Training GANs
Non-saturating generator cost
Maximum likelihood game
Vanilla GAN
Improved GANs
Deep convolutional GANs
Conditional GANs
Progressive GANs
Overview
Progressive growth-smooth fade-in
Minibatch standard deviation
Equalized learning rate
Pixelwise normalization
PyTorch GAN zoo implementation
Challenges
Training instability
Mode collapse
Uninformative loss and evaluation metrics
Summary
References
Join our communities on Discord and Reddit
Style Transfer with GANs
Pix2Pix-GAN: paired style transfer
U-Net generator
PatchGAN discriminator
Loss
Training Pix2Pix
CycleGAN: unpaired style transfer
Overall setup for CycleGAN
Adversarial loss
Cycle loss
Identity loss
Overall loss
Hands-on
Generator setup
Discriminator setup
GAN setup
Training loop
Summary
References
Join our communities on Discord and Reddit
Deepfakes with GANs
Deepfakes overview
Modes of operation
Replacement
Re-enactment
Editing
Other key feature sets
The FACS
3DMM
Key feature set
Facial landmarks
Facial landmark detection using OpenCV
Facial landmark detection using Dlib
Facial landmark detection using MTCNN
High-level workflow
Re-enactment using Pix2Pix
Dataset preparation
Pix2Pix GAN setup and training
Results and limitations
Challenges
Ethical issues
Technical challenges
Generalization
Occlusions
Temporal issues
Off-the-shelf implementations
Summary
References
Join our communities on Discord and Reddit
Diffusion Models and AI Art
A walk through image generation: Why we need diffusion models
Pictures from noise: Using diffusion to model natural image variability
Using variational inference to generate high-quality diffusion models
Stable Diffusion: Generating images in latent space
Running Stable Diffusion in the cloud
Installing dependencies and running an example
Key parameters for Stable Diffusion text-to-image generation
Deep dive into the text-to-image pipeline
The tokenizer
Generating text embedding
Generating the latent image using the VAE decoder
The U-Net
Summary
References
Join our communities on Discord and Reddit
Other Books You May Enjoy
Index
Cover
Index
Once you’ve read Generative AI with Python and PyTorch, Second Edition, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily.
Follow these simple steps to get the benefits:
Scan the QR code or visit the link below:https://packt.link/free-ebook/9781835884447
Submit your proof of purchase.That’s it! We’ll send your free PDF and other benefits to your email directly.