32,39 €
Practical Generative AI with ChatGPT is your hands-on guide to unlocking the full potential of ChatGPT. From building AI assistants and mastering prompt engineering to analyzing documents and images and even generating code, this book equips you with the skills to integrate generative AI into your workflow.
Written by a technical architect specializing in AI and intelligent applications, this book provides the tools and knowledge you need to streamline tasks, enhance productivity, and create intelligent solutions. You’ll learn how to craft precise prompts, leverage ChatGPT for daily efficiency, and develop custom AI assistants tailored to your needs.
The chapters show you how to use ChatGPT’s multimodal capabilities to generate images with DALL·E and even transform images into code. This ChatGPT book goes beyond basic interactions by showing you how to design custom GPTs and integrate OpenAI’s APIs into your applications. You’ll explore how businesses use OpenAI models, from building AI applications, including semantic search, to creating an AI roadmap. Each chapter is packed with practical examples, ensuring you can apply the techniques right away.
By the end of this book, you’ll be well equipped to leverage OpenAI's technology for competitive advantage.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 301
Veröffentlichungsjahr: 2025
Practical Generative AI with ChatGPT
Second Edition
Unleash your prompt engineering potential with OpenAI technologies for productivity and creativity
Valentina Alto
Practical Generative AI with ChatGPT
Second Edition
Copyright © 2025 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing or its dealers and distributors will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
The author acknowledges the use of cutting-edge AI, such as Microsoft Copilot and ChatGPT, with the sole aim of enhancing and improving the clarity of the language, code, and images within the book, thereby ensuring a smooth reading experience for readers. It’s important to note that the content itself has been crafted by the author and edited by a professional publishing team.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Portfolio Director: Gebin George Relationship Lead: Ali Abidi Project Manager: Prajakta Naik Content Engineer: Aditi Chatterjee Technical Editor: Irfa Ansari Copy Editor: Safis Editing Indexer: Pratik Shirodkar Proofreader: Safis Editing Production Designer: Ganesh Bhadwalkar Growth Lead: Nimisha Dua
First published: May 2023 Second edition: April 2025
Production reference: 1210325
Published by Packt Publishing Ltd.Grosvenor House|11 St Paul’s Square Birmingham B3 1RB, UK.
ISBN 978-1-83664-785-0
www.packt.com
To my family and friends—thank you for your support, patience, and encouragement throughout this journey.
– Valentina
Valentina Alto is a technical architect specializing in AI and intelligent apps at Microsoft Innovation Hub in Dubai. During her tenure at Microsoft, she covered different roles as a solution specialist, focusing on data, AI, and applications workloads within the manufacturing, pharmaceutical, and retail industries and driving customers’ digital transformations in the era of AI. Valentina is an active tech author and speaker who contributes to books, articles, and events on AI and machine learning. Over the past two years, Valentina has published two books on generative AI and large language models, further establishing her expertise in the field.
I would like to thank my family and friends for their unwavering support, patience, and understanding throughout this process. Your encouragement has been invaluable.
I am also grateful to my colleagues and peers in the AI and technology community for the insightful discussions, feedback, and inspiration that have shaped my understanding of generative AI. Your contributions continue to push the boundaries of innovation.
A special thanks to Bhavesh Amin for giving me the opportunity to write this second edition, which was a very enriching experience. Special thanks to Rebecca Youé, Ali Abidi, Prajakta Naik, Ganesh Bhadwalkar, and Aditi Chatterjee for their valuable input and time reviewing this book and to the entire Packt team for their support during the course of writing this book.
Dr. Michael Seller is an AI strategist, prompt engineering expert, and business consultant specializing in AI-driven solutions. He holds a Doctorate in Business Administration and certifications in AI and data analytics. As the founder of AI Alchemy, he has developed over 200 tailored prompts across various domains, helping businesses and nonprofits optimize their operations. Dr. Seller has conducted AI training workshops for organizations such as the Humanity House Foundation, the Center of Public Safety for Women, and Ampac, equipping professionals with practical AI skills. His work spans academia, consulting, and technical reviewing for AI publications.
Bharat Saxena has over 19 years of experience in data science, machine learning, and AI, with a strong focus on NLP, generative AI, anomaly detection, and explainable AI. Bharat has worked across diverse organizations, including enterprise tech companies like NTT Data, BMC Software, and Accenture, delivering innovative AI-driven solutions. His expertise spans agentic frameworks, retrieval-augmented generation (RAG), knowledge graphs, and federated learning. Bharat has led the design and deployment of large-scale AI architectures, optimizing LLM-based applications for real-world adaptability, contributed to cloud-native AI applications, and built scalable data pipelines for production environments. His work has been published at leading conferences, and he actively contributes to the AI research community.
Have questions about the book or want to contribute to discussions on Generative AI and LLMs? Join our Discord server at https://packt.link/I1tSU and our Reddit channel at https://packt.link/jwAmA to connect, share, and collaborate with like-minded enthusiasts.
Preface
Who this book is for
What this book covers
To get the most out of this book
Get in touch
Fundamentals of Generative AI and OpenAI
Introduction to Generative AI
Introducing generative AI
Domains of generative AI
Text generation
Image generation
Music generation
Video generation
Main trends and innovations
Retrieval augmented generation
Multimodality
AI agents
Small language models
Legal and ethical landscape of generative AI
Copyright and intellectual property issues
Misinformation, hallucinations, and the risk of fake news
Deepfakes and deceptive manipulation
Bias, discrimination, and social harm
Summary
References
OpenAI and ChatGPT: Beyond the Market Hype
Technical requirements
What is OpenAI?
The origins of OpenAI
The emergence of ChatGPT
An overview of OpenAI model families
Getting started with ChatGPT
Creating an OpenAI account
ChatGPT Plus tour
The art of the possible with ChatGPT
Image understanding and generation
Mathematical thinking
Analytical skills
Summary
References
ChatGPT in Action
Understanding Prompt Engineering
Technical requirements
What is prompt engineering?
Understanding zero-, one-, and few-shot learning
Zero-shot learning
One-shot learning
Few-shot learning
Principles of prompt engineering
Clear instructions
Split complex tasks into subtasks
Ask for justification
Generate many outputs, then use the model to pick the best one
Use delimiters
Meta-prompting
Exploring some advanced techniques
Chain of thought
ReAct
Ethical considerations to avoid bias
Summary
References
Boosting Day-to-Day Productivity with ChatGPT
Technical requirements
ChatGPT as a daily assistant
Generating text
Improving writing skills and translation
Quick information retrieval and competitive intelligence
Summary
Developing the Future with ChatGPT
Technical requirements
Why should developers use ChatGPT?
Generating, optimizing, and debugging code
Generating documentation and code explainability
Understanding ML model interpretability
Translation among different programming languages
Working with code on canvas
Summary
Mastering Marketing with ChatGPT
Technical requirements
Leveraging ChatGPT for marketing
New product development and the go-to-market strategy
Bonus prompts
A/B testing for marketing comparison
Bonus prompts
Boosting SEO
Sentiment analysis for quality and customer satisfaction
Summary
Research Reinvented with ChatGPT
Researchers’ need for ChatGPT
Brainstorming literature for your study
Bonus prompts
Providing support for the design and framework of your experiment
Bonus prompts
Generating and formatting a bibliography
Generating a presentation of the study
Summary
References
Unleashing Creativity Visually with ChatGPT
What is multimodality?
Prompt design to generate stunning illustrations with DALL-E
Defining the subject and setting
Setting the mood with color and lighting
Introducing camera angles and materials
Infusing artistic influence
Setting the cultural and historical context
Choosing a medium and form
Adding style, techniques, and aspect ratio
Combining techniques for maximum impact
Leveraging ChatGPT as a designer assistant
Fashion assistant
UX designer
Style transfer
Exploring advanced plugins within the GPT store
Canva
Wix
Veed.io
Summary
References
Exploring GPTs
Technical requirements
What are GPTs?
Personal assistant
Code assistant
Marketing assistant
Research assistant
Summary
References
OpenAI for Enterprises
Leveraging OpenAI’s Models for Enterprise-Scale Applications
Technical requirements
How GenAI is disrupting industries
Healthcare
Case study
Finance
Case study
Retail and e-commerce
Case study
Manufacturing
Case study
Media and entertainment
Case study
Legal services
Case study
Education
Case study
Understanding OpenAI models’ APIs
What is a model API?
How to use OpenAI models’ APIs with the Python SDK
Architectural patterns to build applications with models’ APIs
New application components
AI orchestrators
LangChain
Haystack
Semantic Kernel
Introducing the public cloud: Azure OpenAI
AOAI Service
Summary
References
Epilogue and Final Thoughts
An overview of what we have learned so far
It’s not all about OpenAI
Mistral AI
Meta
Microsoft
Anthropic
Ethical implications of generative AI and why we need responsible AI
What to expect in the near future
Summary
References
Appendix
Trying OpenAI models in the Playground
Chat
Assistants
Completions
Text to speech
Customizing your model
Summary
Other Books You May Enjoy
Index
Cover
Index
We are living in an era of rapid technological transformation, where artificial intelligence (AI) is no longer just a tool but an active collaborator in our daily lives. Among the many advancements in AI, generative AI has emerged as a disruptive force, reshaping how we interact with technology, create content, and drive innovation. From generating human-like text and producing stunning visuals to composing music and even writing code, generative AI has unlocked possibilities that once belonged only to science fiction.
This book serves as a comprehensive guide to generative AI, with a special focus on ChatGPT, one of the most influential players in this evolving landscape. It is designed for both beginners and professionals who want to understand the underlying principles, practical applications, and enterprise-scale implementations of large language models (LLMs).
The book is structured into three parts:
Part 1, Fundamentals of Generative AI and OpenAI,introduces the core concepts of generative AI, the evolution of AI models, and the mechanics behind large foundation models. It also provides an in-depth look at OpenAI, its model families (such as GPT-4, DALL·E, and Whisper), and the rapid adoption of ChatGPT.Part 2, ChatGPT in Action, explores how to interact with ChatGPT effectively, covering prompt engineering techniques and real-world applications across various domains, including productivity, software development, marketing, research, and creativity. This section also introduces GPTs, the next step in AI customization, allowing users to build their own personalized AI assistants.Part 3, OpenAI for Enterprises, shifts the focus to enterprise-scale applications, discussing how businesses can leverage OpenAI’s models via APIs to develop powerful AI-driven solutions. The book concludes with a forward-looking epilogue, analyzing the broader AI landscape and what to expect in the near future.This book is for AI enthusiasts, business professionals, and researchers who want to harness the power of generative AI. Whether you’re a software engineer exploring AI-driven development, a marketer leveraging AI for content creation, or a business leader strategizing AI adoption, this book provides the knowledge and practical insights you need.
Chapter 1, Introduction to Generative AI, lets you discover the evolution of AI from traditional methods to generative AI, explore the foundation of LLMs, and understand how generative AI powers text, image, music, and video generation.
Chapter 2, OpenAI and ChatGPT: Beyond the Market Hype, dives into OpenAI’s ecosystem, explores the different model families (GPT-4, DALL·E, and Whisper), and understands ChatGPT’s rapid rise and its capabilities for everyday and professional use.
Chapter 3, Understanding Prompt Engineering, explores the art of crafting effective prompts, including techniques like ReAct and Chain of Thought (CoT), and shows how structured prompting enhances AI-generated responses.
Chapter 4, Boosting Day-to-Day Productivity with ChatGPT, leverages ChatGPT as a personal productivity assistant, showing how to automate tasks, improve writing, translate content, retrieve quick information, and enhance research efficiency.
Chapter 5, Developing the Future with ChatGPT, explores how ChatGPT aids developers in generating, optimizing, and debugging code and translating programming languages.
Chapter 6, Mastering Marketing with ChatGPT, uncovers how ChatGPT can revolutionize marketing—enhancing content creation, optimizing SEO, running A/B testing, and improving customer engagement with sentiment analysis.
Chapter 7, Research Reinvented with ChatGPT, shows how ChatGPT can assist researchers in brainstorming ideas, structuring studies, formatting bibliographies, and presenting findings in a clear and concise manner.
Chapter 8, Unleashing Creativity Visually with ChatGPT, explores ChatGPT’s multimodal capabilities, including GPT-4 Vision and DALL-E, enabling AI-driven image generation, visual Q&A, and enhanced creative workflows.
Chapter 9, Exploring GPTs, teaches the concept of GPTs, explores assistant-based AI workflows, and shows how to build your own AI-powered assistants for tasks like research, analysis, and marketing.
Chapter 10, Leveraging OpenAI Models for Enterprise-Scale Applications, delves into OpenAI’s model APIs, comprehends enterprise applications of LLMs, and explores how businesses can integrate generative AI into their workflows responsibly.
Chapter 11, Epilogue and Final Thoughts, reflects on the evolving landscape of generative AI, discusses ethical implications, and looks ahead to the future of AI.
The Appendix contains a set of hands-on examples of real-world use cases leveraging OpenAI and Python code.
Following along will be easier if you keep the following in mind:
Learn through hands-on examples: Many sections include practical exercises and real-world applications. Whenever possible, try them out using OpenAI’s APIs, ChatGPT, and other tools.Experiment with different prompts: Since prompt engineering is a key skill in working with generative AI, experiment with different prompts and observe how slight modifications affect the results.Explore the APIs and developer tools: If you’re a developer, take time to explore OpenAI’s API documentation and try integrating AI capabilities into your own applications.Think beyond the basics: This book provides a foundation, but AI is an evolving field. Stay updated with the latest research and industry trends to deepen your understanding.Here is a list of things you need to have:
Software/hardware covered in the book
System requirements
Python 3.7.1 or higher
Windows, macOS, or Linux
Streamlit
Windows, macOS, or Linux
LangChain
Windows, macOS, or Linux
OpenAI model APIs
An OpenAI account
Azure OpenAI Service (optional)
An Azure subscription enabled for Azure OpenAI (optional)
The code bundle for the book is hosted on GitHub at https://github.com/PacktPublishing/Practical-GenAI-with-ChatGPT-Second-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing. Check them out!
There are a number of text conventions used throughout this book.
A block of code is set as follows:
{"prompt": "<prompt text>", "completion": "<ideal generated text>"} {"prompt": "<prompt text>", "completion": "<ideal generated text>"} {"prompt": "<prompt text>", "completion": "<ideal generated text>"}Bold: Indicates a new term, an important word, or words that you see on the screen. For instance, words in menus or dialog boxes appear in the text like this. For example: “As always, a subject-matter expert (SME) is needed in the loop to review the results.”
Warnings or important notes appear like this.
Tips and tricks appear like this.
Subscribe to AI_Distilled, the go-to newsletter for AI professionals, researchers, and innovators, at https://packt.link/aWQQB.
Feedback from our readers is always welcome.
General feedback: Email [email protected] and mention the book’s title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you reported this to us. Please visit http://www.packtpub.com/submit-errata, click Submit Errata, and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com/.
Once you’ve read Practical Generative AI with ChatGPT, Second Edition, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily.
Follow these simple steps to get the benefits:
Scan the QR code or visit the link below:https://packt.link/free-ebook/9781836647850
Submit your proof of purchase.That’s it! We’ll send your free PDF and other benefits to your email directly.In Part 1 of this book, the fundamentals of generative AI and GPT models are introduced, including a brief history of the development of OpenAI and its flagship set of models, the GPT family.
This part starts with an overview of the domain of generative AI, providing you with foundational knowledge about this area of AI, including its history and state-of-the-art developments. You will also get familiar with the applications of generative AI, ranging from text generation to music composition.
It then introduces the company that brought the power of generative AI to the general public: OpenAI. You will get familiar with the technology behind OpenAI’s most popular release – ChatGPT – and understand the research journey that, starting from artificial neural networks (ANNs), led to large language models (LLMs).
This part contains the following chapters:
Chapter 1, Introduction to Generative AIChapter 2, OpenAI and ChatGPT: Beyond the Market HypeHello! Welcome to Practical Generative AI with ChatGPT! In this book, we will explore the fascinating world of generative artificial intelligence (AI) and its groundbreaking applications, with a particular focus on ChatGPT.
Generative AI has transformed the way we interact with machines, enabling computers to create, predict, and learn without explicit human instruction. Since the launch of OpenAI’s ChatGPT in November 2022, we have witnessed unprecedented advances in natural language processing, image and video synthesis, and many other fields. Whether you are a curious beginner or an experienced practitioner, this guide will equip you with the knowledge and skills to effectively navigate the exciting landscape of generative AI. So, let’s dive in and start the book with some definitions of the context we are moving in.
In this chapter, we focus on the applications of generative AI to various fields, such as image synthesis, text generation, and music composition, highlighting the potential of generative AI to revolutionize various industries with concrete examples and recent developments. Being aware of the research journey toward the current state of the art of generative AI will give you an understanding of the foundations of recent developments and state-of-the-art models.
All this, we will cover through the following topics:
Introducing generative AIExploring the domains of generative AIMain trends and innovation after 2 years of ChatGPTLegal and ethical landscape of generative AIBy the end of this chapter, you will be familiar with the exciting world of generative AI, its applications, the research history behind it, and the current developments that could have – and are currently having – a disruptive impact on businesses.
Generative AI is an exciting branch of AI that focuses on creating new content, such as text, images, music, or even videos, that is often indistinguishable from something made by humans.
To understand where it fits, let’s break it down:
AI: AI is the broad field that enables machines to mimic human-like tasks, such as decision-making or problem-solving.Machine learning (ML): Within AI, ML refers to techniques where machines learn patterns from data to make predictions or decisions without being explicitly programmed. The process of learning is made possible by sophisticated mathematical models called algorithms.Deep learning (DL): A subset of ML, DL uses complex algorithms inspired by the human brain to process large amounts of data and recognize intricate patterns. Because of their architecture – inspired by our brains and neural connections – these algorithms are called artificial neural networks.Definition
An artificial neural network is a type of computer program designed to learn patterns by processing information in a way that’s inspired by the human brain. Instead of following strict, step-by-step rules, it uses interconnected “nodes” (like virtual brain cells) that work together and adjust their connections over time. By repeatedly reviewing examples, it gradually improves at tasks like recognizing images, understanding speech, or predicting outcomes—all without needing explicit instructions for each step.
Generative AI emerges from DL and uses specialized algorithms to generate something entirely new based on what it has learned from existing data. For example, a generative AI model trained on thousands of paintings could create brand-new art that blends different styles or themes.
The following figure shows how these areas of research are related to each other:
Figure 1.1: Relationship between AI, ML, DL, and generative AI
Generative AI models are trained on vast amounts of data and then they can generate new examples based on user’s requests. And the game-changer element here is that these requests are made in the easiest way possible – using our natural language. These models are called large language models (LLMs).
Definition
LLMs are a type of artificial neural network featured by a particular architectural framework called “Transformer.” They are characterized by a huge number of parameters (in the order of billions) and have been trained on billions of words. Given the training set, LLMs are capable of inferring language patterns and intents in user queries and generating natural language responses.
The possibility of interacting in natural language with LLMs is disruptive, and a whole new science has been born around that activity. This science is called “prompt engineering,” named after the term “prompt,” which we are going to cover in Chapter 3.
Definition
A prompt is the specific text, question, or description you provide to a generative AI model to guide it toward producing the kind of output you want—whether that’s a helpful explanation, a creative story, or a detailed solution. How you phrase the prompt can greatly affect the AI’s response. This practice of carefully designing and refining prompts, often called “prompt engineering,” involves experimenting with different word choices, instructions, and formats to improve both the quality and accuracy of the AI’s output. By learning how to craft effective prompts, you help ensure the AI more consistently gives you results that are useful, engaging, and aligned with your goals.
Even though text understanding and generation is probably one of the most outstanding features of Generative AI, this field covers many domains, which we will cover next.
In recent years, generative AI has made significant advancements and has expanded its applications to a wide range of domains, such as art, music, fashion, and architecture. In some of them, it is indeed transforming the way we create, design, and understand the world around us. In others, it is improving and making existing processes and operations more efficient.
For example, in the context of the pharmaceutical industry, generative AI is revolutionizing drug discovery by enabling the rapid design of novel therapeutic molecules, thereby significantly reducing development timelines and costs. By analyzing extensive datasets of chemical and biological information, generative AI models can identify promising drug candidates and predict their interactions within the human body. For instance, Insilico Medicine utilized generative AI to develop ISM001-055, a drug candidate for idiopathic pulmonary fibrosis, which progressed to Phase II clinical trials in 2023 (https://insilico.com/blog/first_phase2).
Another example is the way generative AI is revolutionizing game development by enabling the creation of dynamic and adaptive environments that respond to player actions, thereby enhancing immersion and replayability. By leveraging generative AI, developers can procedurally generate vast, ever-changing game worlds, ensuring that each playthrough offers a unique experience. This technology facilitates the creation of realistic non-playable characters (NPCs) with behaviors that adapt to player interactions, making game narratives more engaging. Additionally, generative AI streamlines the development process by automating asset creation, which reduces production time and costs.
As a result, developers can focus more on crafting innovative gameplay mechanics and rich storytelling, ultimately delivering more personalized and captivating gaming experiences (https://www.xcubelabs.com/blog/generative-ai-in-game-development-creating-dynamic-and-adaptive-environments/).
Lastly, generative AI can have a great impact on advertising and visual asset generation. For example, in March 2023, Coca-Cola launched the “Create Real Magic” platform (https://www.coca-colacompany.com/media-center/coca-cola-invites-digital-artists-to-create-real-magic-using-new-ai-platform), inviting digital artists worldwide to craft original artwork using iconic brand assets from its archives. Developed in collaboration with OpenAI and Bain & Company, this innovative platform combines the capabilities of GPT-4 and DALL-E, enabling users to generate unique pieces that blend Coca-Cola’s heritage with modern AI technology. Participants had the opportunity to submit their creations for a chance to be featured on Coca-Cola’s digital billboards in New York’s Times Square and London’s Piccadilly Circus, exemplifying the brand’s commitment to fostering creativity through cutting-edge technology. These are just a few examples of how generative AI can reshape business processes.
Now, the fact that generative AI is used in many domains also implies that its models can deal with different kinds of data, from natural language to audio or images. In the next section, we’ll explore how generative AI models address different types of data and domains.
The evolution of text generation within AI has been a journey from early theoretical concepts to today’s sophisticated language models. The 1950s marked the formal inception of AI as a field, with pioneers like Alan Turing exploring machine intelligence. Early efforts in natural language processing (NLP) during the 1960s and 1970s led to programs such as ELIZA, which simulated conversation through pattern matching. The 1980s and 1990s saw the development of statistical models that improved language modeling by probabilistically predicting word sequences. The advent of ML algorithms during this period further advanced text generation capabilities.
A significant breakthrough occurred in 2017 with the introduction of the Transformer architecture which, as aforementioned, is the framework that features today’s LLMs.
The unique element of this new series of models featuring the landscape of generative AI is that – once they are trained – they can be consumed, queried, and instructed in the easiest way possible. The introduction of LLMs marked a paradigm shift in the context of AI since no advanced skills were needed to benefit from them.
Today, one of the greatest applications of generative AI—and the one we are going to cover the most throughout this book—is its ability to produce new content in natural language. Indeed, generative AI models can be used to generate new coherent and grammatically correct text in different languages, such as articles, poetry, and product descriptions. They can also extract relevant features from text such as keywords, topics, or full summaries.
Here is an example of working with GPT-4o, one of the latest models released by OpenAI and available through ChatGPT:
Figure 1.2: Example of ChatGPT responding to a user’s query in natural language
As you can see, the model was not only able to answer my question with an explanation of what a proton is; it also adapted its style and jargon to a specific target audience – in my case, a 5-year-old child. This is remarkable since it paves the way for many scenarios of hyper-personalization that were not possible before. In the next chapters, we will cover many examples of that.
ChatGPT is the main focus of this book, and in the upcoming chapters, you will see examples that showcase this powerful application.
Now, we will move on to image generation.
One of the earliest and most well-known examples of generative AI in image synthesis is the generative adversarial network (GAN) architecture introduced in the 2014 paper by I. Goodfellow et al., Generative Adversarial Networks. The purpose of GANs is to generate realistic images that are indistinguishable from real images. This ability has several interesting business applications, such as generating synthetic datasets for training computer vision models, generating realistic product images, and generating realistic images for virtual reality and augmented reality applications.
Then, in 2021, a new generative AI model was introduced in this field by OpenAI, DALL-E. Different from GANs, the DALL-E model is designed to generate images from descriptions in natural language and can generate a wide range of images. The main difference here is that while GANs are often used to create or improve realistic images, models like DALL-E are ideal for visual creativity, turning any description in natural language into an illustration.
DALL-E has great potential in creative industries such as advertising, product design, and fashion to create unique and creative images.
Since its first release to the time of writing (December 2024), DALL-E has improved dramatically, as you can see in the following examples. Below is an artistic creation by DALL-E at the dawn of its life:
Figure 1.3: Images generated by DALL-E with a natural language prompt as input
Let’s now see what DALL-E3, the most recent version of the model at the time of writing this book, can produce (here, we will use Microsoft Image Creator, powered by DALL-E3. You can try it at https://copilot.microsoft.com/images/create):
Figure 1.4: Images generated by DALL-E3 with a natural language prompt as input
It’s impressive to see the level of improvement of this model in less than 2 years. We are just scraping the surface of the massive improvements occurring at a fast pace.
The first approaches to generative AI for music generation trace back to the 1950s, with research in the field of algorithmic composition, a technique that uses algorithms to generate musical compositions. In 1957, Lejaren Hiller and Leonard Isaacson created the Illiac Suite for String Quartet (https://www.youtube.com/watch?v=n0njBFLQSk8), the first piece of music entirely composed by AI. Since then, the field of generative AI for music has been the subject of ongoing research.
Among recent years’ developments, new architectures and frameworks have become widespread among the general public, such as the WaveNet architecture introduced by Google in 2016, which has been able to generate high-quality audio samples, and the Magenta project, also developed by Google, which uses recurrent neural networks (RNNs) and other ML techniques to generate music and other forms of art.
Definition
RNNs are a type of neural network designed to process sequential data by retaining information from previous inputs through a loop-like structure. This allows them to recognize patterns and dependencies over time, making them ideal for tasks like language modeling, time-series prediction, and speech recognition.
In 2020, OpenAI also announced Jukebox, a neural network that generates music when provided with genre, artist, and lyrics as input.
These and other frameworks became the foundations of many AI composer assistants for music generation. An example is Flow Machines, developed by Sony CSL Research. This generative AI system was trained on a large database of musical pieces to create new music in a variety of styles. It was used by French composer Benoît Carré to compose an album called Hello World (https:// www.helloworldalbum.net/), which features collaborations with several human musicians.
Here, you can see an example of a track generated entirely by Music Transformer, one of the models within the Magenta project:
Figure 1.5: Music Transformer allows users to listen to musical performances generated by AI (https://magenta.tensorflow.org/music-transformer)
Another incredible application of generative AI within the music domain is speech synthesis. This refers to AI tools that can create audio based on text inputs in the voices of well-known singers.
For example, if you have always wondered how your songs would sound if Lady Gaga performed them, well, you can now fulfill your dreams with tools such as FakeYou Text to Speech (https://fakeyou.com/tts) or UberDuck.ai (https://uberduck.ai/)!
Figure 1.6: Text-to-speech synthesis with fakeyou.com
The results are really impressive! If you want to have fun, you can also try voices from your favorite cartoons, such as Winnie the Pooh. The only thing you need to do is input the text of the song you want your favorite voice to sing aloud.
Let’s go even further. What if we could generate a song from scratch, just asking the generative AI to do that for us in natural language? Well, we can do that seamlessly today and without any knowledge about music. Among the generative AI products that are rising in the music market today is Suno, whose mission is “[...]building a future where anyone can make great music. Whether you’re a shower singer or a charting artist, we break barriers between you and the song you dream of making. No instrument needed, just imagination. From your mind to music.” (source: https://suno.com/about).
Figure 1.7: Example of an entire song generated by Suno.com from a description in natural language
As you can see, on the left-hand side of the picture, I provided a very brief song description in natural language – this was my prompt. From that, the model was able to generate not only the title and lyrics of a song (on the right-hand side) but also the music!
Can you believe that it became my summer 2024 hit? If you want to create your summer hit too, you can try it for free at https://suno.com/create.
Generative AI for video generation shares a similar timeline of development with image generation. One of the key developments in the field of video generation has been the development of GANs. Thanks to their accuracy in producing realistic images, researchers have started to apply this technique to video generation as well. One of the most notable examples of GAN-based video generation is DeepMind’s Veo, which generates high-quality videos from a single image and a sequence of motions. Another great example is NVIDIA’s video-to-video synthesis (Vid2Vid) DL-based framework, which uses GANs to synthesize high-quality videos from input videos.
The Vid2Vid system can generate temporally consistent videos, meaning that they maintain smooth and realistic motion over time. The technology can be used to perform a variety of video synthesis tasks, such as the following:
Converting videos from one domain into another (for example, converting a daytime video into a nighttime video or a sketch into a realistic image)Modifying existing videos (for example, changing the style or appearance of objects in a video)Creating new videos from static images (for example, animating a sequence of still images)In September 2022, Meta’s researchers announced the general availability of Make-A-Video (https://makeavideo.studio/), a new AI system that allows users to convert their natural language prompts into video clips. Behind this technology, you can recognize many of the models that we mentioned in other domains – language understanding for the prompt, image and motion generation with image generation, and background music made by AI composers.
Now, everything we’ve mentioned above pales in comparison to the latest text-to-video models. To name one, OpenAI announced a text-to-video model called SORA