33,77 €
Master the innovative world of deepfakes and generative AI for face replacement with this full-color guide
Purchase of the print or Kindle book includes a free PDF eBook
Key Features
Book Description
Applying Deepfakes will allow you to tackle a wide range of scenarios creatively.
Learning from experienced authors will help you to intuitively understand what is going on inside the model. You'll learn what deepfakes are and what makes them different from other machine learning techniques, and understand the entire process from beginning to end, from finding faces to preparing them, training the model, and performing the final swap.
We'll discuss various uses for face replacement before we begin building our own pipeline. Spending some extra time thinking about how you collect your input data can make a huge difference to the quality of the final video. We look at the importance of this data and guide you with simple concepts to understand what your data needs to really be successful.
No discussion of deepfakes can avoid discussing the controversial, unethical uses for which the technology initially became known. We'll go over some potential issues, and talk about the value that deepfakes can bring to a variety of educational and artistic use cases, from video game avatars to filmmaking.
By the end of the book, you'll understand what deepfakes are, how they work at a fundamental level, and how to apply those techniques to your own needs.
What you will learn
Who this book is for
This book is for AI developers, data scientists, and anyone looking to learn more about deepfakes or techniques and technologies from Deepfakes to help them generate new image data. Working knowledge of Python programming language and basic familiarity with OpenCV, Pillow, Pytorch, or Tensorflow is recommended to get the most out of the book.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 280
Veröffentlichungsjahr: 2023
Deploy powerful AI techniques for face replacement and more with this comprehensive guide
Bryan Lyon
Matt Tora
BIRMINGHAM—MUMBAI
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Ali Abidi
Publishing Product Manager: Gebin George, Sunith Shetty
Senior Editor: David Sugarman
Technical Editor: Kavyashree K. S.
Copy Editor: Safis Editing
Project Coordinator: Farheen Fathima
Proofreader: Safis Editing
Indexer: Sejal Dsilva
Production Designer: Joshua Misquitta
Marketing Coordinator: Shifa Ansari
First published: February 2023
Production reference: 1280223
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80181-069-2
www.packtpub.com
Bryan Lyon is a seasoned AI expert with over a decade of experience in and around the field. His background is in computational linguistics and he has worked with the cutting-edge open source deepfake software Faceswap since 2018. Currently, Bryan serves as the chief technology officer for an AI company based in California.
Matt Tora is a seasoned software developer with over 15 years of experience in the field. He specializes in machine learning, computer vision, and streamlining workflows. He leads the open source deepfake project Faceswap and consults with VFX studios and tech start-ups on integrating machine learning into their pipelines.
Saisha Chhabria is a computer engineer with diverse experience ranging from software engineering to deep learning. She is currently based in Singapore and is pursuing her master’s in computing, specializing in artificial intelligence from the National University of Singapore. She strives for challenging opportunities to build upon her repertoire of computation, development, and collaboration skills, and aspires to combat problems that impact the community.
The media attention around deepfakes often focuses on the ills and dangers of the technology. Even many of the articles about what deepfakes can do fail to account for why you might want to do it. The truth is that deepfakes bring new techniques and abilities to neural networks for those who know how to use them.
Beyond replacing faces, deepfakes provide insights into all areas of generative AI, especially when traditional methods fall short. Join us as we explore what deepfakes are, what they can be used for, and how they may change in the future.
There are a lot of concerns when it comes to the use of deepfakes. To that end, we have to establish some common ground so that we can communicate and discuss deepfake technology:
Deepfakes are not for creating inappropriate contentDeepfakes are not for changing faces without consent or with the intent of hiding their useDeepfakes are not to be utilized for any illicit, unethical, or questionable purposesDeepfakes exist to experiment with and discover AI techniques, for social or political commentary, movies, and any number of other ethical and reasonable usesWe are very troubled by the fact that Faceswap has been used in unethical and disreputable ways. However, we support the development of tools and techniques that can be used ethically, as well as providing education and experience in AI for anyone who wants to learn it hands-on. We take a zero-tolerance approach to anyone using Faceswap for any unethical purposes and actively discourage any such uses.
This book is for anyone interested in learning about deepfakes. From academics to researchers to content creators to developers, we’ve written this book so it has something for everybody.
The early chapters will cover the essential background of deepfakes, how they work, their ethics, and how to make one yourself using free software that you can download and use without any technical knowledge.
The middle chapters will go in depth into the exact methodology that deepfakes use to work, including working code that you can run and follow step-by-step as we get hands-on with the major processes of deepfakes: extraction, training, and conversion.
The final chapters will look at where you can go from there. They cover how to use deepfake techniques in your own tasks, and where the technology might go in the future.
Chapter 1, Surveying Deepfakes, provides a look into the past and present of deepfakes with a description of how they work and are used.
Chapter 2, Examining Deepfake Ethics and Dangers, provides a look at the sordid history of deepfakes and guidelines on creating ethical deepfakes.
Chapter 3, Mastering Data, teaches you how to get the most from your data, whether you make it yourself or have to find it.
Chapter 4, The Deepfake Workflow, provides a step-by-step walk-through of using Faceswap from the installation to the final output.
Chapter 5, Extracting Faces, is where we begin our hands-on dive into the code of a deepfake by learning how we detect, align, and mask faces for deepfakes.
Chapter 6, Training a Deepfake Model, is where we continue exploring the code as we train a model from scratch, including defining the layers of the model, feeding it images, and updating the model weights.
Chapter 7, Swapping the Face Back into the Video, is where we complete the code analysis with conversion, the process that puts the swapped face back into the original video.
Chapter 8, Applying the Lessons of Deepfakes, teaches you the process of solving hypothetical problems using deepfake techniques.
Chapter 9, The Future of Generative AI, examines where generative AI will move in the future and what limitations they need to overcome.
This book is designed to build knowledge as you read through the chapters. If you’re starting with no background knowledge of deepfakes, then we suggest you start at the beginning. If you want to skip straight to the code, then you’ll want to look at Part 2 (though we hope you’ll give Part 1 a peruse once you’re ready). If you only care about what you can do with the techniques moving forward, then check out Part 3 (but I promise that the earlier parts have some juicy nuggets of information).
We use Python for all code examples in this book. If you know Python, you should be able to understand all the code samples with the help of the text. If you don’t know Python, then don’t worry! There is a lot of non-code explanation, and even the code includes hands-on explanations of what is going on in it.
All the libraries used in this book are explained when they’re used, but this book should not be considered a guide or in-depth explanation of any of the libraries. Many of these libraries have books of their own dedicated to them, and their use in this book is solely functional.
Software covered in the book
Operating system requirements
Python
Faceswap
Windows, macOS, or Linux
PyTorch
OpenCV
Pillow (PIL Fork)
We use Anaconda (https://www.anaconda.com/) for package management and sandboxing throughout this book. If you want to follow along, we highly recommend you install it from the site listed here. If you would rather use Python virtual environments, you may, but if you do, the instructions in this book will not always work without modification, especially installing the necessary packages. If you choose to use that route, you will have to find the correct version of libraries to install yourself.
If you are using the digital version of this book, we advise you access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
Included in each hands-on chapter is a list of exercises. Please don’t take these as directions on what you must do, but consider them as helpers to more completely understand what it is that the code is doing and how you can use the techniques for yourself. They do not have “answers” as they are not really questions; they’re just prompts for you to find new and exciting ways to apply your knowledge.
If you do complete any of the exercises (or come up with something impressive of your own), we’d appreciate it if you would “fork” the book’s repo into your own GitHub account and show the world your accomplishment! We’d love to see what you can do with deepfakes.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Exploring-Deepfakes. If an update to the code, it will be updated in the GitHub repository. there’s
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.”
A block of code is set as follows:
html, body, #map { height: 100%; margin: 0; padding: 0 }When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
[default] exten => s,1,Dial(Zap/1|30) exten => s,2,Voicemail(u100) exten => s,102,Voicemail(b100) exten => i,1,Voicemail(s0)Any command-line input or output is written as follows:
$ mkdir css $ cd cssBold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Select System info from the Administration panel.”
Tips or important notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Once you’ve read Exploring Deepfakes, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily
Follow these simple steps to get the benefits:
Scan the QR code or visit the link belowhttps://packt.link/free-ebook/9781801810692
Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directlyDeepfakes are a new (and controversial) technique using generative AI. But despite the basic idea that they swap one face with another, how much do you really know about deepfakes?
It’s normal to have questions about something such as deepfakes, and this section will address those questions. We’ll start with the basics of how they work and the machine learning principles that deepfakes are built on, and then take a look at what software is available to make deepfakes. After that, we’ll examine the ethics of deepfakes, including the unsavory beginnings, and build a framework of sorts to evaluate how to make ethical deepfakes. Then, we’ll look at the most important part of creating a deepfake: data, including explanations of what makes good data and how to get the most from your (not-so-great) data. Finally, we’ll walk you through a complete deepfake using Faceswap, an open source deepfake program.
By the end of this part, you’ll have a good understanding of what makes a deepfake, how they work, and even how to make one for yourself.
This part comprises the following chapters:
Chapter 1, Surveying DeepfakesChapter 2, Examining Deepfake Ethics and DangersChapter 3, Acquiring and Processing DataChapter 4, The Deepfake WorkflowUnderstanding deepfakes begins with understanding where they came from and what they can do. In this chapter, we’ll begin to explore deepfakes and their operation. We will go through the basics of what makes a deepfake work, talking about the differences between a generative auto-encoder and a generative adversarial network (GAN). We will examine their usTo PD: es in media, education, and advertising. We’ll investigate their limitations and consider how to plan and design your deepfakes to avoid the common pitfalls. Finally, we’ll examine existing deepfake software and discuss what each kind can do.
We’ll cover this in the following sections:
Introducing deepfakesExploring the uses of deepfakesDiscovering how deepfakes workAssessing the limitations of generative AILooking at existing deepfake softwareThe name deepfake comes from a portmanteau of “deep”, referring to deep learning, and “fake,” referring to the fact that the images generated are not genuine. The term first came into use on the popular website Reddit, where the original author released several deepfakes of adult actresses with other women’s faces artificially applied to them.
Note
The ethics of deepfakes are controversial, and we will cover this in more depth in Chapter 2,Examining Deepfake Ethics and Dangers.
This unethical beginning is still what the technology is most known for, but it’s not all that it can be used for. Since that time, deepfakes have moved into movies, memes, and more. Tom Cruise signed up for Instagram only after “Deep Tom Cruise” beat him to it. Steve Buscemi has remarked to Stephen Colbert that he “never looked better” when his face was placed on top of Jennifer Lawrence’s and a younger version of Bill Nighy was deepfaked onto his own older self for a news clip from the “past” in the movie Detective Pikachu.
In this book, we will be taking a fairly narrow view of what deepfaking is, so let’s define it now. A deepfake is the use of a neural network trained on two faces to replace one face with another. There are other technologies to swap faces that aren’t deepfakes, and there are generative AIs that do other things besides swapping faces but to include all of those in the term just muddies the water and confuses the issue.
The original use of Deepfakes might be the one that required the least amount of imagination. Putting one person’s face on another’s person has many different uses in various fields. Please don’t consider the ideas here as the full extent of the capabilities of deepfakes – someone is bound to imagine something new!
Entertainment is the first area that comes to mind for most people when they consider the usage of deepfakes. There are two main areas of entertainment in which I see deepfakes playing a significant role: narrativeand parody.
The utility of deepfakes in movies is obvious. Imagine an actor’s face being superimposed onto their stunt double or an actor who becomes unavailable being replaced by another performer without any changes to the faces in the final movie.
While deepfakes may not seem good enough, deepfakes are already being used in Hollywood and other media today – from Detective Pikachu, which used deepfakes to de-age Bill Nighy, to For All Mankind, which used it to put actors face to face with Ronald Reagan. Agencies and VFX shops are all examining how to use deepfakes in their work.
These techniques are not unique to deepfakes. CGI (in this book, referring to 3D graphics) face replacements have been used in many movies. However, using CGI face replacement is expensive and complicated, requiring filming to be done in particular ways with lots of extra data captured to be used by the artists to get the CGI face to look good in the final scene. This is an art more than a science and requires extensive skills and knowledge to accomplish. Deepfakes solve many of these problems making new forms of face replacements possible.
Making a deepfake requires no special filming techniques (although some awareness will make the process smoother). Deepfakes also require very little attention or skill compared to CGI face replacements. This makes it ideal for lower-cost face replacements, but it can also be higher-quality since the AI accounts for details that even the most dedicated artist can’t recreate.
Parody is an extremely popular form of social criticism and forms the basis for entire To PD: movies, TV shows, and other forms of media. Parody is normally done by professional impersonators. In some cases, those impersonators look (or can be made to look) similar to the person they’re impersonating. Other times, there is a reliance on their performance to make the impersonation clear.
Deepfakes provide an opportunity to change the art of parody wherein the impersonator can be made to look like the individual being parodied via a deepfake instead of by chance of birth. By removing the attention from basic appearance, deepfakes allow the focus to be placed directly on the performance itself.
Deepfakes also enable a whole new form of parody in which normal situations can become parodic simply due to the changed face. This particular form becomes humorous due to the distinct oddity of very different faces, instead of an expected swap.
Figure 1.1 – Steve Buscemi as Jennifer Lawrence by birbfakes
Note
This image is included with the kind permission of its original creator, birbfakes. You can view the original video here: https://youtu.be/r1jng79a5xc.
Video games present an interesting opportunity when it comes to deepfakes. The idea here is that a computer-generated character could be deepfaked into a photorealistic avatar. This could be done for any character in the game, even the player’s character. For example, it would be possible to make a game in which, when the player’s character looked into a mirror, they would see their own face looking back at them. Another possibility would be to replace a non-player character with a deepfake of the original actor, allowing for a far more realistic appearance without making a complete 3D clone of the actor.
Education could also benefit from deepfakes. Imagine if your history class had a video of Abraham Lincoln himself reading the Gettysburg address. Or a corporate training video in which the entire video is hosted by the public mascot (who may not even be a real person) without having to resort to costumes or CGI. It could even be used to allow multiple videos or scenes filmed at significantly different times to appear to be more cohesive by appearing to show the actor at the same time.
Many people are very visual learners and seeing a person “come alive” can really bring the experience home. Bringing the pre-video past to life using deepfakes enables a whole new learning experience. One example of this is the Dalí Museum, which created a series of videos of Salvador Dalí talking to guests. This was done by training a deepfake model on an actor to put Dalí’s face on the videos. Once the model was trained and set up, they were able to convert many videos, saving a lot of time and effort compared to a CGI solution.
Advertising agencies are always looking for the newest way to grab attention and deepfakes could be a whole new way to catch viewers’ attention. Imagine if you walked past a clothing store, you stopped to look at an item of clothing in the window, and suddenly the screen beside the item showed a video of an actor wearing the item but with your face, allowing you to see how the item would look on you. Alternatively, a mascot figure could be brought to life in a commercial. Deepfakes offer a whole new tool for creative use, which can grab attention and provide whole new experiences in advertising.
Now that we’ve got some idea of a few potential uses for deepfakes, let’s take a quick look under the hood and see how they work.
Deepfakes are a unique variation of a generative auto-encoder being used to generate the face swap. This requires a special structure, which we will explain in this section.
The particular type of neural network that regular deepfakes use is called a generative auto-encoder. Unlike a Generative Adversarial Network (GAN), an auto-encoder does not use a discriminator or any “adversarial” techniques.
All auto-encoders work by training a collection of neural network models to solve a problem. In the case of generative auto-encoders, the AI is used to generate a new image with new details that weren’t in the original image. However, with a normal auto-encoder, the problem is usually something such as classification (deciding what an image is), object identification (finding something inside an image), or segmentation (identifying different parts of an image). To do this, there are two types of models used in the autoencoder – the encoder and decoder. Let’s see how this works.
The training cycle is a cyclical process in which the model is continuously trained on images until stopped. The process can be broken down into four steps:
Encode faces into smaller intermediate representations.Decode the intermediate representations back into faces.Calculate the loss of (meaning, the difference between) the original face and the output of the model.Modify (backpropagate) the models toward the correct answer.Figure 1.2 – Diagram of the training cycle
In more detail, the process unfolds as follows:
The encoder’s job is to encode two different faces into an array, which we call the intermediate representation. The intermediate representation is much smaller than the original image size, with enough space to describe the lighting, pose, and expression of the faces. This process is similar to compression, where unnecessary data is thrown out to fit the data into a smaller space.The decoder is actually a matched pair of models, which turn the intermediate representation back into faces. There is one decoder for each of the input faces, which is trained only on images of that one person’s face. This process tries to create a new face that matches the original face that was given to the encoder and encoded into the intermediate representation.Figure 1.3 – Encoder and decoder
Loss is a score that is given to the auto-encoder based on how well it recreates the original faces. This is calculated by comparing the original image to the output from the encoder-decoder process. This comparison can be done in many ways, from a strict difference between them or something significantly more complicated that includes human perception as part of the calculation. No matter how it’s done, the result is the same: a number from 0 to 1, with 0 being the score for the model returning the exact same image and 1 being the exact opposite or the image. Most of the numbers will fall between 0 to 1. However, a perfect reconstruction (or its opposite) is impossible.Note
The loss is where an auto-encoder differs from a GAN. In a GAN, the comparison loss is either replaced or supplemented with an additional network (usually an auto-encoder itself), which then produces a loss score of its own. The theory behind this structure is that the loss model (called a discriminator) can learn to get better at detecting the output of the generating model (called a generator) while the generator can learn to get better at fooling the discriminator.
Finally, there is backpropagation, a process in which the models are adjusted by following the path back through both the decoder and encoder that generated the face and nudging those paths toward the correct answer.Figure 1.4 – Loss and backpropagation
Once complete, the whole process starts back over at the encoder again. This continues to repeat until the neural network has finished training. The decision of when to end training can happen in several ways. It can happen when a certain number of repetitions have occurred (called iterations), when all the data has been gone through (called an epoch), or when the results meet a certain loss score.
GANs are one of the current darlings of generative networks. They are extremely popular and used extensively, being used particularly for super-resolution (intelligent upscaling), music generation, and even sometimes deepfakes. However, there are some reasons that they’re not used in all deepfake solutions.
GANs are popular due to their “imaginative” nature. They learn through the interaction of their generator and discriminator to fill in gaps in the data. Because they can fill in missing pieces, they are great at reconstruction tasks or at tasks where new data is required.
The ability of a GAN to create new data where it is missing is great for numerous tasks, but it has a critical flaw when used for deepfakes. In deepfakes, the goal is to replace one face with another face. An imaginative GAN would likely learn to fill the gaps in the data from one face with the data from the other. This leads to a problem that we call “identity bleed” where the two faces aren’t swapped properly; instead, they’re blended into a face that doesn’t look like either person, but a mix of the two.
This flaw in a GAN-created deepfake can be corrected or prevented but requires much more careful data collection and processing. In general, it’s easier to get a full swap instead of a blending by using a generative auto-encoder instead of a GAN.
Another name for an auto-encoder is an “hourglass” model. The reason for this is that each layer of an encoder is smaller than the layer before it while each layer of a decoder is larger than the one before. Because of this, the auto-encoder figure starts out large at the beginning, shrinks toward the middle, and then widens back out again as it reaches the end:
Figure 1.5 – Hourglass structure of an autoencoder
While these methods are flexible and have many potential uses, there are limitations. Let’s examine those limitations now.
Generative AIs like those used in deepfakes are not a panacea and actually have some significant limitations. However, by knowing about these limitations, they can generally be worked around or sidestepped with careful design.
Deepfakes are limited in the resolution that they can swap. This is a hardware and time limitation: greater hardware and more time can provide higher resolution swaps. However, this is not a 1:1 linear growth. Doubling the resolution (from, say, 64x64 to 128x128) actually quadruples the amount of required VRAM – that is, the memory that a GPU has direct access to – and the time necessary to train is expanded a roughly equivalent amount. Because of this, resolution is often a balancing act, where you’ll want to make the deepfake the lowest resolution you can without sacrificing the results.
To provide the best results, traditional deepfakes require that you train on every face pair that you wish to swap. This means that if you wanted to swap your own face with two of your friends, you’d have to train two separate models. This is because each model has one encoder and two decoders, which are trained only to swap the faces they were given.
There is a workaround to some multi-face swaps. In order to swap additional faces, you could write your own version with more than two decoders allowing you to swap additional faces. This is an imperfect solution, however, as each decoder takes up a significant amount of VRAM, requiring you to balance the number of faces carefully.
It may be better to simply train multiple pairs. By splitting the task up on multiple computers, you could train multiple models simultaneously, allowing you to create many face pairs at once.
Another option is to use a different type of AI face replacement. First Order Model (which is covered in the Looking at existing deepfake software section of this chapter) uses a different technique: instead of a paired approach, it uses AI to animate an image to match the actions of a replacement. This solution removes the need to retrain on each face pair, but comes at the cost of greatly reduced quality of the swap.
Generative AIs requires a significant amount of training data to accomplish their tasks. Sometimes, finding sufficient data or data of a high-enough quality is not possible. For example, how would someone create a deepfake of William Shakespeare when there are no videos or photographs of him? This is a tricky problem but can be worked around in several ways. While it is unfortunately impossible to create a proper deepfake of England’s greatest playwright, it would be possible to use an actor who looks like his portraits and then deepfake that actor as Shakespeare.
Tip
We will cover more on how to deal with poor or insufficient data in Chapter 3, Mastering Data.
