Exploring Deepfakes - Bryan Lyon - E-Book

Exploring Deepfakes E-Book

Bryan Lyon

0,0
33,77 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Master the innovative world of deepfakes and generative AI for face replacement with this full-color guide



Purchase of the print or Kindle book includes a free PDF eBook

Key Features



  • Understand what deepfakes are, their history, and how to use the technology ethically
  • Get well-versed with the workflow and processes involved to create your own deepfakes
  • Learn how to apply the lessons and techniques of deepfakes to your own problems

Book Description



Applying Deepfakes will allow you to tackle a wide range of scenarios creatively.



Learning from experienced authors will help you to intuitively understand what is going on inside the model. You'll learn what deepfakes are and what makes them different from other machine learning techniques, and understand the entire process from beginning to end, from finding faces to preparing them, training the model, and performing the final swap.



We'll discuss various uses for face replacement before we begin building our own pipeline. Spending some extra time thinking about how you collect your input data can make a huge difference to the quality of the final video. We look at the importance of this data and guide you with simple concepts to understand what your data needs to really be successful.



No discussion of deepfakes can avoid discussing the controversial, unethical uses for which the technology initially became known. We'll go over some potential issues, and talk about the value that deepfakes can bring to a variety of educational and artistic use cases, from video game avatars to filmmaking.



By the end of the book, you'll understand what deepfakes are, how they work at a fundamental level, and how to apply those techniques to your own needs.

What you will learn



  • Gain a clear understanding of deepfakes and their creation
  • Understand the risks of deepfakes and how to mitigate them
  • Collect efficient data to create successful deepfakes
  • Get familiar with the deepfakes workflow and its steps
  • Explore the application of deepfakes methods to your own generative needs
  • Improve results by augmenting data and avoiding overtraining
  • Examine the future of deepfakes and other generative AIs
  • Use generative AIs to increase video content resolution

Who this book is for



This book is for AI developers, data scientists, and anyone looking to learn more about deepfakes or techniques and technologies from Deepfakes to help them generate new image data. Working knowledge of Python programming language and basic familiarity with OpenCV, Pillow, Pytorch, or Tensorflow is recommended to get the most out of the book.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 280

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Exploring Deepfakes

Deploy powerful AI techniques for face replacement and more with this comprehensive guide

Bryan Lyon

Matt Tora

BIRMINGHAM—MUMBAI

Exploring Deepfakes

Copyright © 2023 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Ali Abidi

Publishing Product Manager: Gebin George, Sunith Shetty

Senior Editor: David Sugarman

Technical Editor: Kavyashree K. S.

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Sejal Dsilva

Production Designer: Joshua Misquitta

Marketing Coordinator: Shifa Ansari

First published: February 2023

Production reference: 1280223

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80181-069-2

www.packtpub.com

Contributors

About the authors

Bryan Lyon is a seasoned AI expert with over a decade of experience in and around the field. His background is in computational linguistics and he has worked with the cutting-edge open source deepfake software Faceswap since 2018. Currently, Bryan serves as the chief technology officer for an AI company based in California.

Matt Tora is a seasoned software developer with over 15 years of experience in the field. He specializes in machine learning, computer vision, and streamlining workflows. He leads the open source deepfake project Faceswap and consults with VFX studios and tech start-ups on integrating machine learning into their pipelines.

About the reviewer

Saisha Chhabria is a computer engineer with diverse experience ranging from software engineering to deep learning. She is currently based in Singapore and is pursuing her master’s in computing, specializing in artificial intelligence from the National University of Singapore. She strives for challenging opportunities to build upon her repertoire of computation, development, and collaboration skills, and aspires to combat problems that impact the community.

Table of Contents

Preface

Part 1: Understanding Deepfakes

1

Surveying Deepfakes

Introducing deepfakes

Exploring the uses of deepfakes

Entertainment

Parody

Education

Advertisements

Discovering how deepfakes work

Generative auto-encoders

Assessing the limitations of generative AI

Resolution

Training required for each face pair

Training data

Looking at existing deepfake software

Faceswap

DeepFaceLab

First Order Model

Reface

Summary

2

Examining Deepfake Ethics and Dangers

The unethical origin of deepfakes

Being an ethical deepfaker

Consent

Respect

Deception

Putting it into practice

The dangers of deepfakes

Reputation

Politics

Avoiding consequences by claiming manipulation

Preventing damage from deepfakes

Starving the model of data

Authenticating any genuine media

Deepfake detection

Public relations

Public awareness

Summary

3

Acquiring and Processing Data

Upscaling

Why data is important

Understanding the value of variety

Pose

Expression

Lighting

Bringing this variety together

Sourcing data

Filming your own data

Getting data from historical sources

Improving your data

Linear color

Data matching

Upscaling

Summary

4

The Deepfake Workflow

Technical requirements

Identifying suitable candidates for a swap

Preparing the training images

Extracting faces from your source data

Curating training images

Training a model

Setting up

Launching and monitoring training

Manual intervention

Applying a trained model to perform a swap

The alignments file

Cleaning the alignments file

Fixing the alignments file

Using the Preview tool

Generating the swap

Summary

Part 2: Getting Hands-On with the Deepfake Process

5

Extracting Faces

Technical requirements

Getting image files from a video

Running extract on frame images

face_alignments.json

face_bbox_{filename}_{face number}.png

face_aligned_{filename}_{face number}.png

face_mask_{filename}_{face number}.png

Getting hands-on with the code

Initialization

Image preparation

Face detection

Face landmarking/aligning

Summary

Exercises

6

Training a Deepfake Model

Technical requirements

Understanding convolutional layers

Getting hands-on with AI

Defining our upscaler

Creating the encoder

Building the decoders

Exploring the training code

Creating our models

Looping over the training

Teaching the network

Saving results

Summary

Exercises

7

Swapping the Face Back into the Video

Technical requirements

Preparing to convert video

Getting hands-on with the convert code

Initialization

Loading the AI

Preparing data

The conversion loop

Creating the video from images

Summary

Exercises

Part 3: Where to Now?

8

Applying the Lessons of Deepfakes

Technical requirements

Aligning other types of images

Finding an aligner

Using the library

Using the landmarks to align

The power of masking images

Types of masking

Finding a usable mask for your object

Examining an example

Getting data under control

Defining your rules

Evolving your rules

Dealing with errors

Summary

9

The Future of Generative AI

Generating text

Recent developments

Building sentences

The future of text generation

Improving image quality

Various tactics

The future of image quality upgrading

Text-guided image generation

CLIP

Image generation with CLIP

The future of image generation

Generating sound

Voice swapping

Text-guided music generation

The future of sound generation

Deepfakes

Sound generation

Text-guided image generation

Improving image quality

Text generation

The future of deepfakes

The future of AI ethics

Summary

Index

Other Books You May Enjoy

Preface

The media attention around deepfakes often focuses on the ills and dangers of the technology. Even many of the articles about what deepfakes can do fail to account for why you might want to do it. The truth is that deepfakes bring new techniques and abilities to neural networks for those who know how to use them.

Beyond replacing faces, deepfakes provide insights into all areas of generative AI, especially when traditional methods fall short. Join us as we explore what deepfakes are, what they can be used for, and how they may change in the future.

A manifesto for the ethical use of deepfakes

There are a lot of concerns when it comes to the use of deepfakes. To that end, we have to establish some common ground so that we can communicate and discuss deepfake technology:

Deepfakes are not for creating inappropriate contentDeepfakes are not for changing faces without consent or with the intent of hiding their useDeepfakes are not to be utilized for any illicit, unethical, or questionable purposesDeepfakes exist to experiment with and discover AI techniques, for social or political commentary, movies, and any number of other ethical and reasonable uses

We are very troubled by the fact that Faceswap has been used in unethical and disreputable ways. However, we support the development of tools and techniques that can be used ethically, as well as providing education and experience in AI for anyone who wants to learn it hands-on. We take a zero-tolerance approach to anyone using Faceswap for any unethical purposes and actively discourage any such uses.

Who this book is for

This book is for anyone interested in learning about deepfakes. From academics to researchers to content creators to developers, we’ve written this book so it has something for everybody.

The early chapters will cover the essential background of deepfakes, how they work, their ethics, and how to make one yourself using free software that you can download and use without any technical knowledge.

The middle chapters will go in depth into the exact methodology that deepfakes use to work, including working code that you can run and follow step-by-step as we get hands-on with the major processes of deepfakes: extraction, training, and conversion.

The final chapters will look at where you can go from there. They cover how to use deepfake techniques in your own tasks, and where the technology might go in the future.

What this book covers

Chapter 1, Surveying Deepfakes, provides a look into the past and present of deepfakes with a description of how they work and are used.

Chapter 2, Examining Deepfake Ethics and Dangers, provides a look at the sordid history of deepfakes and guidelines on creating ethical deepfakes.

Chapter 3, Mastering Data, teaches you how to get the most from your data, whether you make it yourself or have to find it.

Chapter 4, The Deepfake Workflow, provides a step-by-step walk-through of using Faceswap from the installation to the final output.

Chapter 5, Extracting Faces, is where we begin our hands-on dive into the code of a deepfake by learning how we detect, align, and mask faces for deepfakes.

Chapter 6, Training a Deepfake Model, is where we continue exploring the code as we train a model from scratch, including defining the layers of the model, feeding it images, and updating the model weights.

Chapter 7, Swapping the Face Back into the Video, is where we complete the code analysis with conversion, the process that puts the swapped face back into the original video.

Chapter 8, Applying the Lessons of Deepfakes, teaches you the process of solving hypothetical problems using deepfake techniques.

Chapter 9, The Future of Generative AI, examines where generative AI will move in the future and what limitations they need to overcome.

To get the most out of this book

This book is designed to build knowledge as you read through the chapters. If you’re starting with no background knowledge of deepfakes, then we suggest you start at the beginning. If you want to skip straight to the code, then you’ll want to look at Part 2 (though we hope you’ll give Part 1 a peruse once you’re ready). If you only care about what you can do with the techniques moving forward, then check out Part 3 (but I promise that the earlier parts have some juicy nuggets of information).

We use Python for all code examples in this book. If you know Python, you should be able to understand all the code samples with the help of the text. If you don’t know Python, then don’t worry! There is a lot of non-code explanation, and even the code includes hands-on explanations of what is going on in it.

All the libraries used in this book are explained when they’re used, but this book should not be considered a guide or in-depth explanation of any of the libraries. Many of these libraries have books of their own dedicated to them, and their use in this book is solely functional.

Software covered in the book

Operating system requirements

Python

Faceswap

Windows, macOS, or Linux

PyTorch

OpenCV

Pillow (PIL Fork)

We use Anaconda (https://www.anaconda.com/) for package management and sandboxing throughout this book. If you want to follow along, we highly recommend you install it from the site listed here. If you would rather use Python virtual environments, you may, but if you do, the instructions in this book will not always work without modification, especially installing the necessary packages. If you choose to use that route, you will have to find the correct version of libraries to install yourself.

If you are using the digital version of this book, we advise you access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Included in each hands-on chapter is a list of exercises. Please don’t take these as directions on what you must do, but consider them as helpers to more completely understand what it is that the code is doing and how you can use the techniques for yourself. They do not have “answers” as they are not really questions; they’re just prompts for you to find new and exciting ways to apply your knowledge.

If you do complete any of the exercises (or come up with something impressive of your own), we’d appreciate it if you would “fork” the book’s repo into your own GitHub account and show the world your accomplishment! We’d love to see what you can do with deepfakes.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Exploring-Deepfakes. If an update to the code, it will be updated in the GitHub repository. there’s

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.”

A block of code is set as follows:

html, body, #map { height: 100%; margin: 0; padding: 0 }

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default] exten => s,1,Dial(Zap/1|30) exten => s,2,Voicemail(u100) exten => s,102,Voicemail(b100) exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

$ mkdir css $ cd css

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Select System info from the Administration panel.”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share your thoughts

Once you’ve read Exploring Deepfakes, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781801810692

Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directly

Part 1: Understanding Deepfakes

Deepfakes are a new (and controversial) technique using generative AI. But despite the basic idea that they swap one face with another, how much do you really know about deepfakes?

It’s normal to have questions about something such as deepfakes, and this section will address those questions. We’ll start with the basics of how they work and the machine learning principles that deepfakes are built on, and then take a look at what software is available to make deepfakes. After that, we’ll examine the ethics of deepfakes, including the unsavory beginnings, and build a framework of sorts to evaluate how to make ethical deepfakes. Then, we’ll look at the most important part of creating a deepfake: data, including explanations of what makes good data and how to get the most from your (not-so-great) data. Finally, we’ll walk you through a complete deepfake using Faceswap, an open source deepfake program.

By the end of this part, you’ll have a good understanding of what makes a deepfake, how they work, and even how to make one for yourself.

This part comprises the following chapters:

Chapter 1, Surveying DeepfakesChapter 2, Examining Deepfake Ethics and DangersChapter 3, Acquiring and Processing DataChapter 4, The Deepfake Workflow

1

Surveying Deepfakes

Understanding deepfakes begins with understanding where they came from and what they can do. In this chapter, we’ll begin to explore deepfakes and their operation. We will go through the basics of what makes a deepfake work, talking about the differences between a generative auto-encoder and a generative adversarial network (GAN). We will examine their usTo PD: es in media, education, and advertising. We’ll investigate their limitations and consider how to plan and design your deepfakes to avoid the common pitfalls. Finally, we’ll examine existing deepfake software and discuss what each kind can do.

We’ll cover this in the following sections:

Introducing deepfakesExploring the uses of deepfakesDiscovering how deepfakes workAssessing the limitations of generative AILooking at existing deepfake software

Introducing deepfakes

The name deepfake comes from a portmanteau of “deep”, referring to deep learning, and “fake,” referring to the fact that the images generated are not genuine. The term first came into use on the popular website Reddit, where the original author released several deepfakes of adult actresses with other women’s faces artificially applied to them.

Note

The ethics of deepfakes are controversial, and we will cover this in more depth in Chapter 2,Examining Deepfake Ethics and Dangers.

This unethical beginning is still what the technology is most known for, but it’s not all that it can be used for. Since that time, deepfakes have moved into movies, memes, and more. Tom Cruise signed up for Instagram only after “Deep Tom Cruise” beat him to it. Steve Buscemi has remarked to Stephen Colbert that he “never looked better” when his face was placed on top of Jennifer Lawrence’s and a younger version of Bill Nighy was deepfaked onto his own older self for a news clip from the “past” in the movie Detective Pikachu.

In this book, we will be taking a fairly narrow view of what deepfaking is, so let’s define it now. A deepfake is the use of a neural network trained on two faces to replace one face with another. There are other technologies to swap faces that aren’t deepfakes, and there are generative AIs that do other things besides swapping faces but to include all of those in the term just muddies the water and confuses the issue.

Exploring the uses of deepfakes

The original use of Deepfakes might be the one that required the least amount of imagination. Putting one person’s face on another’s person has many different uses in various fields. Please don’t consider the ideas here as the full extent of the capabilities of deepfakes – someone is bound to imagine something new!

Entertainment

Entertainment is the first area that comes to mind for most people when they consider the usage of deepfakes. There are two main areas of entertainment in which I see deepfakes playing a significant role: narrativeand parody.

Narrative

The utility of deepfakes in movies is obvious. Imagine an actor’s face being superimposed onto their stunt double or an actor who becomes unavailable being replaced by another performer without any changes to the faces in the final movie.

While deepfakes may not seem good enough, deepfakes are already being used in Hollywood and other media today – from Detective Pikachu, which used deepfakes to de-age Bill Nighy, to For All Mankind, which used it to put actors face to face with Ronald Reagan. Agencies and VFX shops are all examining how to use deepfakes in their work.

These techniques are not unique to deepfakes. CGI (in this book, referring to 3D graphics) face replacements have been used in many movies. However, using CGI face replacement is expensive and complicated, requiring filming to be done in particular ways with lots of extra data captured to be used by the artists to get the CGI face to look good in the final scene. This is an art more than a science and requires extensive skills and knowledge to accomplish. Deepfakes solve many of these problems making new forms of face replacements possible.

Making a deepfake requires no special filming techniques (although some awareness will make the process smoother). Deepfakes also require very little attention or skill compared to CGI face replacements. This makes it ideal for lower-cost face replacements, but it can also be higher-quality since the AI accounts for details that even the most dedicated artist can’t recreate.

Parody

Parody is an extremely popular form of social criticism and forms the basis for entire To PD: movies, TV shows, and other forms of media. Parody is normally done by professional impersonators. In some cases, those impersonators look (or can be made to look) similar to the person they’re impersonating. Other times, there is a reliance on their performance to make the impersonation clear.

Deepfakes provide an opportunity to change the art of parody wherein the impersonator can be made to look like the individual being parodied via a deepfake instead of by chance of birth. By removing the attention from basic appearance, deepfakes allow the focus to be placed directly on the performance itself.

Deepfakes also enable a whole new form of parody in which normal situations can become parodic simply due to the changed face. This particular form becomes humorous due to the distinct oddity of very different faces, instead of an expected swap.

Figure 1.1 – Steve Buscemi as Jennifer Lawrence by birbfakes

Note

This image is included with the kind permission of its original creator, birbfakes. You can view the original video here: https://youtu.be/r1jng79a5xc.

Video games

Video games present an interesting opportunity when it comes to deepfakes. The idea here is that a computer-generated character could be deepfaked into a photorealistic avatar. This could be done for any character in the game, even the player’s character. For example, it would be possible to make a game in which, when the player’s character looked into a mirror, they would see their own face looking back at them. Another possibility would be to replace a non-player character with a deepfake of the original actor, allowing for a far more realistic appearance without making a complete 3D clone of the actor.

Education

Education could also benefit from deepfakes. Imagine if your history class had a video of Abraham Lincoln himself reading the Gettysburg address. Or a corporate training video in which the entire video is hosted by the public mascot (who may not even be a real person) without having to resort to costumes or CGI. It could even be used to allow multiple videos or scenes filmed at significantly different times to appear to be more cohesive by appearing to show the actor at the same time.

Many people are very visual learners and seeing a person “come alive” can really bring the experience home. Bringing the pre-video past to life using deepfakes enables a whole new learning experience. One example of this is the Dalí Museum, which created a series of videos of Salvador Dalí talking to guests. This was done by training a deepfake model on an actor to put Dalí’s face on the videos. Once the model was trained and set up, they were able to convert many videos, saving a lot of time and effort compared to a CGI solution.

Advertisements

Advertising agencies are always looking for the newest way to grab attention and deepfakes could be a whole new way to catch viewers’ attention. Imagine if you walked past a clothing store, you stopped to look at an item of clothing in the window, and suddenly the screen beside the item showed a video of an actor wearing the item but with your face, allowing you to see how the item would look on you. Alternatively, a mascot figure could be brought to life in a commercial. Deepfakes offer a whole new tool for creative use, which can grab attention and provide whole new experiences in advertising.

Now that we’ve got some idea of a few potential uses for deepfakes, let’s take a quick look under the hood and see how they work.

Discovering how deepfakes work

Deepfakes are a unique variation of a generative auto-encoder being used to generate the face swap. This requires a special structure, which we will explain in this section.

Generative auto-encoders

The particular type of neural network that regular deepfakes use is called a generative auto-encoder. Unlike a Generative Adversarial Network (GAN), an auto-encoder does not use a discriminator or any “adversarial” techniques.

All auto-encoders work by training a collection of neural network models to solve a problem. In the case of generative auto-encoders, the AI is used to generate a new image with new details that weren’t in the original image. However, with a normal auto-encoder, the problem is usually something such as classification (deciding what an image is), object identification (finding something inside an image), or segmentation (identifying different parts of an image). To do this, there are two types of models used in the autoencoder – the encoder and decoder. Let’s see how this works.

The deepfake training cycle

The training cycle is a cyclical process in which the model is continuously trained on images until stopped. The process can be broken down into four steps:

Encode faces into smaller intermediate representations.Decode the intermediate representations back into faces.Calculate the loss of (meaning, the difference between) the original face and the output of the model.Modify (backpropagate) the models toward the correct answer.

Figure 1.2 – Diagram of the training cycle

In more detail, the process unfolds as follows:

The encoder’s job is to encode two different faces into an array, which we call the intermediate representation. The intermediate representation is much smaller than the original image size, with enough space to describe the lighting, pose, and expression of the faces. This process is similar to compression, where unnecessary data is thrown out to fit the data into a smaller space.The decoder is actually a matched pair of models, which turn the intermediate representation back into faces. There is one decoder for each of the input faces, which is trained only on images of that one person’s face. This process tries to create a new face that matches the original face that was given to the encoder and encoded into the intermediate representation.

Figure 1.3 – Encoder and decoder

Loss is a score that is given to the auto-encoder based on how well it recreates the original faces. This is calculated by comparing the original image to the output from the encoder-decoder process. This comparison can be done in many ways, from a strict difference between them or something significantly more complicated that includes human perception as part of the calculation. No matter how it’s done, the result is the same: a number from 0 to 1, with 0 being the score for the model returning the exact same image and 1 being the exact opposite or the image. Most of the numbers will fall between 0 to 1. However, a perfect reconstruction (or its opposite) is impossible.

Note

The loss is where an auto-encoder differs from a GAN. In a GAN, the comparison loss is either replaced or supplemented with an additional network (usually an auto-encoder itself), which then produces a loss score of its own. The theory behind this structure is that the loss model (called a discriminator) can learn to get better at detecting the output of the generating model (called a generator) while the generator can learn to get better at fooling the discriminator.

Finally, there is backpropagation, a process in which the models are adjusted by following the path back through both the decoder and encoder that generated the face and nudging those paths toward the correct answer.

Figure 1.4 – Loss and backpropagation

Once complete, the whole process starts back over at the encoder again. This continues to repeat until the neural network has finished training. The decision of when to end training can happen in several ways. It can happen when a certain number of repetitions have occurred (called iterations), when all the data has been gone through (called an epoch), or when the results meet a certain loss score.

Why not GANs?

GANs are one of the current darlings of generative networks. They are extremely popular and used extensively, being used particularly for super-resolution (intelligent upscaling), music generation, and even sometimes deepfakes. However, there are some reasons that they’re not used in all deepfake solutions.

GANs are popular due to their “imaginative” nature. They learn through the interaction of their generator and discriminator to fill in gaps in the data. Because they can fill in missing pieces, they are great at reconstruction tasks or at tasks where new data is required.

The ability of a GAN to create new data where it is missing is great for numerous tasks, but it has a critical flaw when used for deepfakes. In deepfakes, the goal is to replace one face with another face. An imaginative GAN would likely learn to fill the gaps in the data from one face with the data from the other. This leads to a problem that we call “identity bleed” where the two faces aren’t swapped properly; instead, they’re blended into a face that doesn’t look like either person, but a mix of the two.

This flaw in a GAN-created deepfake can be corrected or prevented but requires much more careful data collection and processing. In general, it’s easier to get a full swap instead of a blending by using a generative auto-encoder instead of a GAN.

The auto-encoder structure

Another name for an auto-encoder is an “hourglass” model. The reason for this is that each layer of an encoder is smaller than the layer before it while each layer of a decoder is larger than the one before. Because of this, the auto-encoder figure starts out large at the beginning, shrinks toward the middle, and then widens back out again as it reaches the end:

Figure 1.5 – Hourglass structure of an autoencoder

While these methods are flexible and have many potential uses, there are limitations. Let’s examine those limitations now.

Assessing the limitations of generative AI

Generative AIs like those used in deepfakes are not a panacea and actually have some significant limitations. However, by knowing about these limitations, they can generally be worked around or sidestepped with careful design.

Resolution

Deepfakes are limited in the resolution that they can swap. This is a hardware and time limitation: greater hardware and more time can provide higher resolution swaps. However, this is not a 1:1 linear growth. Doubling the resolution (from, say, 64x64 to 128x128) actually quadruples the amount of required VRAM – that is, the memory that a GPU has direct access to – and the time necessary to train is expanded a roughly equivalent amount. Because of this, resolution is often a balancing act, where you’ll want to make the deepfake the lowest resolution you can without sacrificing the results.

Training required for each face pair

To provide the best results, traditional deepfakes require that you train on every face pair that you wish to swap. This means that if you wanted to swap your own face with two of your friends, you’d have to train two separate models. This is because each model has one encoder and two decoders, which are trained only to swap the faces they were given.

There is a workaround to some multi-face swaps. In order to swap additional faces, you could write your own version with more than two decoders allowing you to swap additional faces. This is an imperfect solution, however, as each decoder takes up a significant amount of VRAM, requiring you to balance the number of faces carefully.

It may be better to simply train multiple pairs. By splitting the task up on multiple computers, you could train multiple models simultaneously, allowing you to create many face pairs at once.

Another option is to use a different type of AI face replacement. First Order Model (which is covered in the Looking at existing deepfake software section of this chapter) uses a different technique: instead of a paired approach, it uses AI to animate an image to match the actions of a replacement. This solution removes the need to retrain on each face pair, but comes at the cost of greatly reduced quality of the swap.

Training data

Generative AIs requires a significant amount of training data to accomplish their tasks. Sometimes, finding sufficient data or data of a high-enough quality is not possible. For example, how would someone create a deepfake of William Shakespeare when there are no videos or photographs of him? This is a tricky problem but can be worked around in several ways. While it is unfortunately impossible to create a proper deepfake of England’s greatest playwright, it would be possible to use an actor who looks like his portraits and then deepfake that actor as Shakespeare.

Tip

We will cover more on how to deal with poor or insufficient data in Chapter 3, Mastering Data.