Practical Generative AI with ChatGPT - Valentina Alto - E-Book

Practical Generative AI with ChatGPT E-Book

Valentina Alto

0,0
32,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Practical Generative AI with ChatGPT is your hands-on guide to unlocking the full potential of ChatGPT. From building AI assistants and mastering prompt engineering to analyzing documents and images and even generating code, this book equips you with the skills to integrate generative AI into your workflow.
Written by a technical architect specializing in AI and intelligent applications, this book provides the tools and knowledge you need to streamline tasks, enhance productivity, and create intelligent solutions. You’ll learn how to craft precise prompts, leverage ChatGPT for daily efficiency, and develop custom AI assistants tailored to your needs.
The chapters show you how to use ChatGPT’s multimodal capabilities to generate images with DALL·E and even transform images into code. This ChatGPT book goes beyond basic interactions by showing you how to design custom GPTs and integrate OpenAI’s APIs into your applications. You’ll explore how businesses use OpenAI models, from building AI applications, including semantic search, to creating an AI roadmap. Each chapter is packed with practical examples, ensuring you can apply the techniques right away.
By the end of this book, you’ll be well equipped to leverage OpenAI's technology for competitive advantage.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 301

Veröffentlichungsjahr: 2025

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Practical Generative AI with ChatGPT

Second Edition

Unleash your prompt engineering potential with OpenAI technologies for productivity and creativity

Valentina Alto

Practical Generative AI with ChatGPT

Second Edition

Copyright © 2025 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing or its dealers and distributors will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

The author acknowledges the use of cutting-edge AI, such as Microsoft Copilot and ChatGPT, with the sole aim of enhancing and improving the clarity of the language, code, and images within the book, thereby ensuring a smooth reading experience for readers. It’s important to note that the content itself has been crafted by the author and edited by a professional publishing team.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Portfolio Director: Gebin George Relationship Lead: Ali Abidi Project Manager: Prajakta Naik Content Engineer: Aditi Chatterjee Technical Editor: Irfa Ansari Copy Editor: Safis Editing Indexer: Pratik Shirodkar Proofreader: Safis Editing Production Designer: Ganesh Bhadwalkar Growth Lead: Nimisha Dua

First published: May 2023 Second edition: April 2025

Production reference: 1210325

Published by Packt Publishing Ltd.Grosvenor House|11 St Paul’s Square Birmingham B3 1RB, UK.

ISBN 978-1-83664-785-0

www.packt.com

To my family and friends—thank you for your support, patience, and encouragement throughout this journey.

– Valentina

Contributors

About the author

Valentina Alto is a technical architect specializing in AI and intelligent apps at Microsoft Innovation Hub in Dubai. During her tenure at Microsoft, she covered different roles as a solution specialist, focusing on data, AI, and applications workloads within the manufacturing, pharmaceutical, and retail industries and driving customers’ digital transformations in the era of AI. Valentina is an active tech author and speaker who contributes to books, articles, and events on AI and machine learning. Over the past two years, Valentina has published two books on generative AI and large language models, further establishing her expertise in the field.

I would like to thank my family and friends for their unwavering support, patience, and understanding throughout this process. Your encouragement has been invaluable.

I am also grateful to my colleagues and peers in the AI and technology community for the insightful discussions, feedback, and inspiration that have shaped my understanding of generative AI. Your contributions continue to push the boundaries of innovation.

A special thanks to Bhavesh Amin for giving me the opportunity to write this second edition, which was a very enriching experience. Special thanks to Rebecca Youé, Ali Abidi, Prajakta Naik, Ganesh Bhadwalkar, and Aditi Chatterjee for their valuable input and time reviewing this book and to the entire Packt team for their support during the course of writing this book.

About the reviewers

Dr. Michael Seller is an AI strategist, prompt engineering expert, and business consultant specializing in AI-driven solutions. He holds a Doctorate in Business Administration and certifications in AI and data analytics. As the founder of AI Alchemy, he has developed over 200 tailored prompts across various domains, helping businesses and nonprofits optimize their operations. Dr. Seller has conducted AI training workshops for organizations such as the Humanity House Foundation, the Center of Public Safety for Women, and Ampac, equipping professionals with practical AI skills. His work spans academia, consulting, and technical reviewing for AI publications.

Bharat Saxena has over 19 years of experience in data science, machine learning, and AI, with a strong focus on NLP, generative AI, anomaly detection, and explainable AI. Bharat has worked across diverse organizations, including enterprise tech companies like NTT Data, BMC Software, and Accenture, delivering innovative AI-driven solutions. His expertise spans agentic frameworks, retrieval-augmented generation (RAG), knowledge graphs, and federated learning. Bharat has led the design and deployment of large-scale AI architectures, optimizing LLM-based applications for real-world adaptability, contributed to cloud-native AI applications, and built scalable data pipelines for production environments. His work has been published at leading conferences, and he actively contributes to the AI research community.

Join our communities on Discord and Reddit

Have questions about the book or want to contribute to discussions on Generative AI and LLMs? Join our Discord server at https://packt.link/I1tSU and our Reddit channel at https://packt.link/jwAmA to connect, share, and collaborate with like-minded enthusiasts.

Contents

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Fundamentals of Generative AI and OpenAI

Introduction to Generative AI

Introducing generative AI

Domains of generative AI

Text generation

Image generation

Music generation

Video generation

Main trends and innovations

Retrieval augmented generation

Multimodality

AI agents

Small language models

Legal and ethical landscape of generative AI

Copyright and intellectual property issues

Misinformation, hallucinations, and the risk of fake news

Deepfakes and deceptive manipulation

Bias, discrimination, and social harm

Summary

References

OpenAI and ChatGPT: Beyond the Market Hype

Technical requirements

What is OpenAI?

The origins of OpenAI

The emergence of ChatGPT

An overview of OpenAI model families

Getting started with ChatGPT

Creating an OpenAI account

ChatGPT Plus tour

The art of the possible with ChatGPT

Image understanding and generation

Mathematical thinking

Analytical skills

Summary

References

ChatGPT in Action

Understanding Prompt Engineering

Technical requirements

What is prompt engineering?

Understanding zero-, one-, and few-shot learning

Zero-shot learning

One-shot learning

Few-shot learning

Principles of prompt engineering

Clear instructions

Split complex tasks into subtasks

Ask for justification

Generate many outputs, then use the model to pick the best one

Use delimiters

Meta-prompting

Exploring some advanced techniques

Chain of thought

ReAct

Ethical considerations to avoid bias

Summary

References

Boosting Day-to-Day Productivity with ChatGPT

Technical requirements

ChatGPT as a daily assistant

Generating text

Improving writing skills and translation

Quick information retrieval and competitive intelligence

Summary

Developing the Future with ChatGPT

Technical requirements

Why should developers use ChatGPT?

Generating, optimizing, and debugging code

Generating documentation and code explainability

Understanding ML model interpretability

Translation among different programming languages

Working with code on canvas

Summary

Mastering Marketing with ChatGPT

Technical requirements

Leveraging ChatGPT for marketing

New product development and the go-to-market strategy

Bonus prompts

A/B testing for marketing comparison

Bonus prompts

Boosting SEO

Sentiment analysis for quality and customer satisfaction

Summary

Research Reinvented with ChatGPT

Researchers’ need for ChatGPT

Brainstorming literature for your study

Bonus prompts

Providing support for the design and framework of your experiment

Bonus prompts

Generating and formatting a bibliography

Generating a presentation of the study

Summary

References

Unleashing Creativity Visually with ChatGPT

What is multimodality?

Prompt design to generate stunning illustrations with DALL-E

Defining the subject and setting

Setting the mood with color and lighting

Introducing camera angles and materials

Infusing artistic influence

Setting the cultural and historical context

Choosing a medium and form

Adding style, techniques, and aspect ratio

Combining techniques for maximum impact

Leveraging ChatGPT as a designer assistant

Fashion assistant

UX designer

Style transfer

Exploring advanced plugins within the GPT store

Canva

Wix

Veed.io

Summary

References

Exploring GPTs

Technical requirements

What are GPTs?

Personal assistant

Code assistant

Marketing assistant

Research assistant

Summary

References

OpenAI for Enterprises

Leveraging OpenAI’s Models for Enterprise-Scale Applications

Technical requirements

How GenAI is disrupting industries

Healthcare

Case study

Finance

Case study

Retail and e-commerce

Case study

Manufacturing

Case study

Media and entertainment

Case study

Legal services

Case study

Education

Case study

Understanding OpenAI models’ APIs

What is a model API?

How to use OpenAI models’ APIs with the Python SDK

Architectural patterns to build applications with models’ APIs

New application components

AI orchestrators

LangChain

Haystack

Semantic Kernel

Introducing the public cloud: Azure OpenAI

AOAI Service

Summary

References

Epilogue and Final Thoughts

An overview of what we have learned so far

It’s not all about OpenAI

Mistral AI

Meta

Microsoft

Google

Anthropic

Ethical implications of generative AI and why we need responsible AI

What to expect in the near future

Summary

References

Appendix

Trying OpenAI models in the Playground

Chat

Assistants

Completions

Text to speech

Customizing your model

Summary

Other Books You May Enjoy

Index

Landmarks

Cover

Index

Preface

We are living in an era of rapid technological transformation, where artificial intelligence (AI) is no longer just a tool but an active collaborator in our daily lives. Among the many advancements in AI, generative AI has emerged as a disruptive force, reshaping how we interact with technology, create content, and drive innovation. From generating human-like text and producing stunning visuals to composing music and even writing code, generative AI has unlocked possibilities that once belonged only to science fiction.

This book serves as a comprehensive guide to generative AI, with a special focus on ChatGPT, one of the most influential players in this evolving landscape. It is designed for both beginners and professionals who want to understand the underlying principles, practical applications, and enterprise-scale implementations of large language models (LLMs).

The book is structured into three parts:

Part 1, Fundamentals of Generative AI and OpenAI,introduces the core concepts of generative AI, the evolution of AI models, and the mechanics behind large foundation models. It also provides an in-depth look at OpenAI, its model families (such as GPT-4, DALL·E, and Whisper), and the rapid adoption of ChatGPT.Part 2, ChatGPT in Action, explores how to interact with ChatGPT effectively, covering prompt engineering techniques and real-world applications across various domains, including productivity, software development, marketing, research, and creativity. This section also introduces GPTs, the next step in AI customization, allowing users to build their own personalized AI assistants.Part 3, OpenAI for Enterprises, shifts the focus to enterprise-scale applications, discussing how businesses can leverage OpenAI’s models via APIs to develop powerful AI-driven solutions. The book concludes with a forward-looking epilogue, analyzing the broader AI landscape and what to expect in the near future.

Who this book is for

This book is for AI enthusiasts, business professionals, and researchers who want to harness the power of generative AI. Whether you’re a software engineer exploring AI-driven development, a marketer leveraging AI for content creation, or a business leader strategizing AI adoption, this book provides the knowledge and practical insights you need.

What this book covers

Chapter 1, Introduction to Generative AI, lets you discover the evolution of AI from traditional methods to generative AI, explore the foundation of LLMs, and understand how generative AI powers text, image, music, and video generation.

Chapter 2, OpenAI and ChatGPT: Beyond the Market Hype, dives into OpenAI’s ecosystem, explores the different model families (GPT-4, DALL·E, and Whisper), and understands ChatGPT’s rapid rise and its capabilities for everyday and professional use.

Chapter 3, Understanding Prompt Engineering, explores the art of crafting effective prompts, including techniques like ReAct and Chain of Thought (CoT), and shows how structured prompting enhances AI-generated responses.

Chapter 4, Boosting Day-to-Day Productivity with ChatGPT, leverages ChatGPT as a personal productivity assistant, showing how to automate tasks, improve writing, translate content, retrieve quick information, and enhance research efficiency.

Chapter 5, Developing the Future with ChatGPT, explores how ChatGPT aids developers in generating, optimizing, and debugging code and translating programming languages.

Chapter 6, Mastering Marketing with ChatGPT, uncovers how ChatGPT can revolutionize marketing—enhancing content creation, optimizing SEO, running A/B testing, and improving customer engagement with sentiment analysis.

Chapter 7, Research Reinvented with ChatGPT, shows how ChatGPT can assist researchers in brainstorming ideas, structuring studies, formatting bibliographies, and presenting findings in a clear and concise manner.

Chapter 8, Unleashing Creativity Visually with ChatGPT, explores ChatGPT’s multimodal capabilities, including GPT-4 Vision and DALL-E, enabling AI-driven image generation, visual Q&A, and enhanced creative workflows.

Chapter 9, Exploring GPTs, teaches the concept of GPTs, explores assistant-based AI workflows, and shows how to build your own AI-powered assistants for tasks like research, analysis, and marketing.

Chapter 10, Leveraging OpenAI Models for Enterprise-Scale Applications, delves into OpenAI’s model APIs, comprehends enterprise applications of LLMs, and explores how businesses can integrate generative AI into their workflows responsibly.

Chapter 11, Epilogue and Final Thoughts, reflects on the evolving landscape of generative AI, discusses ethical implications, and looks ahead to the future of AI.

The Appendix contains a set of hands-on examples of real-world use cases leveraging OpenAI and Python code.

To get the most out of this book

Following along will be easier if you keep the following in mind:

Learn through hands-on examples: Many sections include practical exercises and real-world applications. Whenever possible, try them out using OpenAI’s APIs, ChatGPT, and other tools.Experiment with different prompts: Since prompt engineering is a key skill in working with generative AI, experiment with different prompts and observe how slight modifications affect the results.Explore the APIs and developer tools: If you’re a developer, take time to explore OpenAI’s API documentation and try integrating AI capabilities into your own applications.Think beyond the basics: This book provides a foundation, but AI is an evolving field. Stay updated with the latest research and industry trends to deepen your understanding.

Here is a list of things you need to have:

Software/hardware covered in the book

System requirements

Python 3.7.1 or higher

Windows, macOS, or Linux

Streamlit

Windows, macOS, or Linux

LangChain

Windows, macOS, or Linux

OpenAI model APIs

An OpenAI account

Azure OpenAI Service (optional)

An Azure subscription enabled for Azure OpenAI (optional)

Download the example code files

The code bundle for the book is hosted on GitHub at https://github.com/PacktPublishing/Practical-GenAI-with-ChatGPT-Second-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

A block of code is set as follows:

{"prompt": "<prompt text>", "completion": "<ideal generated text>"} {"prompt": "<prompt text>", "completion": "<ideal generated text>"} {"prompt": "<prompt text>", "completion": "<ideal generated text>"}

Bold: Indicates a new term, an important word, or words that you see on the screen. For instance, words in menus or dialog boxes appear in the text like this. For example: “As always, a subject-matter expert (SME) is needed in the loop to review the results.”

Warnings or important notes appear like this.

Tips and tricks appear like this.

Get in touch

Subscribe to AI_Distilled, the go-to newsletter for AI professionals, researchers, and innovators, at https://packt.link/aWQQB.

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book’s title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you reported this to us. Please visit http://www.packtpub.com/submit-errata, click Submit Errata, and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com/.

Share your thoughts

Once you’ve read Practical Generative AI with ChatGPT, Second Edition, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily.

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below:

https://packt.link/free-ebook/9781836647850

Submit your proof of purchase.That’s it! We’ll send your free PDF and other benefits to your email directly.

Part 1

Fundamentals of Generative AI and OpenAI

In Part 1 of this book, the fundamentals of generative AI and GPT models are introduced, including a brief history of the development of OpenAI and its flagship set of models, the GPT family.

This part starts with an overview of the domain of generative AI, providing you with foundational knowledge about this area of AI, including its history and state-of-the-art developments. You will also get familiar with the applications of generative AI, ranging from text generation to music composition.

It then introduces the company that brought the power of generative AI to the general public: OpenAI. You will get familiar with the technology behind OpenAI’s most popular release – ChatGPT – and understand the research journey that, starting from artificial neural networks (ANNs), led to large language models (LLMs).

This part contains the following chapters:

Chapter 1, Introduction to Generative AIChapter 2, OpenAI and ChatGPT: Beyond the Market Hype

1

Introduction to Generative AI

Hello! Welcome to Practical Generative AI with ChatGPT! In this book, we will explore the fascinating world of generative artificial intelligence (AI) and its groundbreaking applications, with a particular focus on ChatGPT.

Generative AI has transformed the way we interact with machines, enabling computers to create, predict, and learn without explicit human instruction. Since the launch of OpenAI’s ChatGPT in November 2022, we have witnessed unprecedented advances in natural language processing, image and video synthesis, and many other fields. Whether you are a curious beginner or an experienced practitioner, this guide will equip you with the knowledge and skills to effectively navigate the exciting landscape of generative AI. So, let’s dive in and start the book with some definitions of the context we are moving in.

In this chapter, we focus on the applications of generative AI to various fields, such as image synthesis, text generation, and music composition, highlighting the potential of generative AI to revolutionize various industries with concrete examples and recent developments. Being aware of the research journey toward the current state of the art of generative AI will give you an understanding of the foundations of recent developments and state-of-the-art models.

All this, we will cover through the following topics:

Introducing generative AIExploring the domains of generative AIMain trends and innovation after 2 years of ChatGPTLegal and ethical landscape of generative AI

By the end of this chapter, you will be familiar with the exciting world of generative AI, its applications, the research history behind it, and the current developments that could have – and are currently having – a disruptive impact on businesses.

Introducing generative AI

Generative AI is an exciting branch of AI that focuses on creating new content, such as text, images, music, or even videos, that is often indistinguishable from something made by humans.

To understand where it fits, let’s break it down:

AI: AI is the broad field that enables machines to mimic human-like tasks, such as decision-making or problem-solving.Machine learning (ML): Within AI, ML refers to techniques where machines learn patterns from data to make predictions or decisions without being explicitly programmed. The process of learning is made possible by sophisticated mathematical models called algorithms.Deep learning (DL): A subset of ML, DL uses complex algorithms inspired by the human brain to process large amounts of data and recognize intricate patterns. Because of their architecture – inspired by our brains and neural connections – these algorithms are called artificial neural networks.

Definition

An artificial neural network is a type of computer program designed to learn patterns by processing information in a way that’s inspired by the human brain. Instead of following strict, step-by-step rules, it uses interconnected “nodes” (like virtual brain cells) that work together and adjust their connections over time. By repeatedly reviewing examples, it gradually improves at tasks like recognizing images, understanding speech, or predicting outcomes—all without needing explicit instructions for each step.

Generative AI emerges from DL and uses specialized algorithms to generate something entirely new based on what it has learned from existing data. For example, a generative AI model trained on thousands of paintings could create brand-new art that blends different styles or themes.

The following figure shows how these areas of research are related to each other:

Figure 1.1: Relationship between AI, ML, DL, and generative AI

Generative AI models are trained on vast amounts of data and then they can generate new examples based on user’s requests. And the game-changer element here is that these requests are made in the easiest way possible – using our natural language. These models are called large language models (LLMs).

Definition

LLMs are a type of artificial neural network featured by a particular architectural framework called “Transformer.” They are characterized by a huge number of parameters (in the order of billions) and have been trained on billions of words. Given the training set, LLMs are capable of inferring language patterns and intents in user queries and generating natural language responses.

The possibility of interacting in natural language with LLMs is disruptive, and a whole new science has been born around that activity. This science is called “prompt engineering,” named after the term “prompt,” which we are going to cover in Chapter 3.

Definition

A prompt is the specific text, question, or description you provide to a generative AI model to guide it toward producing the kind of output you want—whether that’s a helpful explanation, a creative story, or a detailed solution. How you phrase the prompt can greatly affect the AI’s response. This practice of carefully designing and refining prompts, often called “prompt engineering,” involves experimenting with different word choices, instructions, and formats to improve both the quality and accuracy of the AI’s output. By learning how to craft effective prompts, you help ensure the AI more consistently gives you results that are useful, engaging, and aligned with your goals.

Even though text understanding and generation is probably one of the most outstanding features of Generative AI, this field covers many domains, which we will cover next.

Domains of generative AI

In recent years, generative AI has made significant advancements and has expanded its applications to a wide range of domains, such as art, music, fashion, and architecture. In some of them, it is indeed transforming the way we create, design, and understand the world around us. In others, it is improving and making existing processes and operations more efficient.

For example, in the context of the pharmaceutical industry, generative AI is revolutionizing drug discovery by enabling the rapid design of novel therapeutic molecules, thereby significantly reducing development timelines and costs. By analyzing extensive datasets of chemical and biological information, generative AI models can identify promising drug candidates and predict their interactions within the human body. For instance, Insilico Medicine utilized generative AI to develop ISM001-055, a drug candidate for idiopathic pulmonary fibrosis, which progressed to Phase II clinical trials in 2023 (https://insilico.com/blog/first_phase2).

Another example is the way generative AI is revolutionizing game development by enabling the creation of dynamic and adaptive environments that respond to player actions, thereby enhancing immersion and replayability. By leveraging generative AI, developers can procedurally generate vast, ever-changing game worlds, ensuring that each playthrough offers a unique experience. This technology facilitates the creation of realistic non-playable characters (NPCs) with behaviors that adapt to player interactions, making game narratives more engaging. Additionally, generative AI streamlines the development process by automating asset creation, which reduces production time and costs.

As a result, developers can focus more on crafting innovative gameplay mechanics and rich storytelling, ultimately delivering more personalized and captivating gaming experiences (https://www.xcubelabs.com/blog/generative-ai-in-game-development-creating-dynamic-and-adaptive-environments/).

Lastly, generative AI can have a great impact on advertising and visual asset generation. For example, in March 2023, Coca-Cola launched the “Create Real Magic” platform (https://www.coca-colacompany.com/media-center/coca-cola-invites-digital-artists-to-create-real-magic-using-new-ai-platform), inviting digital artists worldwide to craft original artwork using iconic brand assets from its archives. Developed in collaboration with OpenAI and Bain & Company, this innovative platform combines the capabilities of GPT-4 and DALL-E, enabling users to generate unique pieces that blend Coca-Cola’s heritage with modern AI technology. Participants had the opportunity to submit their creations for a chance to be featured on Coca-Cola’s digital billboards in New York’s Times Square and London’s Piccadilly Circus, exemplifying the brand’s commitment to fostering creativity through cutting-edge technology. These are just a few examples of how generative AI can reshape business processes.

Now, the fact that generative AI is used in many domains also implies that its models can deal with different kinds of data, from natural language to audio or images. In the next section, we’ll explore how generative AI models address different types of data and domains.

Text generation

The evolution of text generation within AI has been a journey from early theoretical concepts to today’s sophisticated language models. The 1950s marked the formal inception of AI as a field, with pioneers like Alan Turing exploring machine intelligence. Early efforts in natural language processing (NLP) during the 1960s and 1970s led to programs such as ELIZA, which simulated conversation through pattern matching. The 1980s and 1990s saw the development of statistical models that improved language modeling by probabilistically predicting word sequences. The advent of ML algorithms during this period further advanced text generation capabilities.

A significant breakthrough occurred in 2017 with the introduction of the Transformer architecture which, as aforementioned, is the framework that features today’s LLMs.

The unique element of this new series of models featuring the landscape of generative AI is that – once they are trained – they can be consumed, queried, and instructed in the easiest way possible. The introduction of LLMs marked a paradigm shift in the context of AI since no advanced skills were needed to benefit from them.

Today, one of the greatest applications of generative AI—and the one we are going to cover the most throughout this book—is its ability to produce new content in natural language. Indeed, generative AI models can be used to generate new coherent and grammatically correct text in different languages, such as articles, poetry, and product descriptions. They can also extract relevant features from text such as keywords, topics, or full summaries.

Here is an example of working with GPT-4o, one of the latest models released by OpenAI and available through ChatGPT:

Figure 1.2: Example of ChatGPT responding to a user’s query in natural language

As you can see, the model was not only able to answer my question with an explanation of what a proton is; it also adapted its style and jargon to a specific target audience – in my case, a 5-year-old child. This is remarkable since it paves the way for many scenarios of hyper-personalization that were not possible before. In the next chapters, we will cover many examples of that.

ChatGPT is the main focus of this book, and in the upcoming chapters, you will see examples that showcase this powerful application.

Now, we will move on to image generation.

Image generation

One of the earliest and most well-known examples of generative AI in image synthesis is the generative adversarial network (GAN) architecture introduced in the 2014 paper by I. Goodfellow et al., Generative Adversarial Networks. The purpose of GANs is to generate realistic images that are indistinguishable from real images. This ability has several interesting business applications, such as generating synthetic datasets for training computer vision models, generating realistic product images, and generating realistic images for virtual reality and augmented reality applications.

Then, in 2021, a new generative AI model was introduced in this field by OpenAI, DALL-E. Different from GANs, the DALL-E model is designed to generate images from descriptions in natural language and can generate a wide range of images. The main difference here is that while GANs are often used to create or improve realistic images, models like DALL-E are ideal for visual creativity, turning any description in natural language into an illustration.

DALL-E has great potential in creative industries such as advertising, product design, and fashion to create unique and creative images.

Since its first release to the time of writing (December 2024), DALL-E has improved dramatically, as you can see in the following examples. Below is an artistic creation by DALL-E at the dawn of its life:

Figure 1.3: Images generated by DALL-E with a natural language prompt as input

Let’s now see what DALL-E3, the most recent version of the model at the time of writing this book, can produce (here, we will use Microsoft Image Creator, powered by DALL-E3. You can try it at https://copilot.microsoft.com/images/create):

Figure 1.4: Images generated by DALL-E3 with a natural language prompt as input

It’s impressive to see the level of improvement of this model in less than 2 years. We are just scraping the surface of the massive improvements occurring at a fast pace.

Music generation

The first approaches to generative AI for music generation trace back to the 1950s, with research in the field of algorithmic composition, a technique that uses algorithms to generate musical compositions. In 1957, Lejaren Hiller and Leonard Isaacson created the Illiac Suite for String Quartet (https://www.youtube.com/watch?v=n0njBFLQSk8), the first piece of music entirely composed by AI. Since then, the field of generative AI for music has been the subject of ongoing research.

Among recent years’ developments, new architectures and frameworks have become widespread among the general public, such as the WaveNet architecture introduced by Google in 2016, which has been able to generate high-quality audio samples, and the Magenta project, also developed by Google, which uses recurrent neural networks (RNNs) and other ML techniques to generate music and other forms of art.

Definition

RNNs are a type of neural network designed to process sequential data by retaining information from previous inputs through a loop-like structure. This allows them to recognize patterns and dependencies over time, making them ideal for tasks like language modeling, time-series prediction, and speech recognition.

In 2020, OpenAI also announced Jukebox, a neural network that generates music when provided with genre, artist, and lyrics as input.

These and other frameworks became the foundations of many AI composer assistants for music generation. An example is Flow Machines, developed by Sony CSL Research. This generative AI system was trained on a large database of musical pieces to create new music in a variety of styles. It was used by French composer Benoît Carré to compose an album called Hello World (https:// www.helloworldalbum.net/), which features collaborations with several human musicians.

Here, you can see an example of a track generated entirely by Music Transformer, one of the models within the Magenta project:

Figure 1.5: Music Transformer allows users to listen to musical performances generated by AI (https://magenta.tensorflow.org/music-transformer)

Another incredible application of generative AI within the music domain is speech synthesis. This refers to AI tools that can create audio based on text inputs in the voices of well-known singers.

For example, if you have always wondered how your songs would sound if Lady Gaga performed them, well, you can now fulfill your dreams with tools such as FakeYou Text to Speech (https://fakeyou.com/tts) or UberDuck.ai (https://uberduck.ai/)!

Figure 1.6: Text-to-speech synthesis with fakeyou.com

The results are really impressive! If you want to have fun, you can also try voices from your favorite cartoons, such as Winnie the Pooh. The only thing you need to do is input the text of the song you want your favorite voice to sing aloud.

Let’s go even further. What if we could generate a song from scratch, just asking the generative AI to do that for us in natural language? Well, we can do that seamlessly today and without any knowledge about music. Among the generative AI products that are rising in the music market today is Suno, whose mission is “[...]building a future where anyone can make great music. Whether you’re a shower singer or a charting artist, we break barriers between you and the song you dream of making. No instrument needed, just imagination. From your mind to music.” (source: https://suno.com/about).

Figure 1.7: Example of an entire song generated by Suno.com from a description in natural language

As you can see, on the left-hand side of the picture, I provided a very brief song description in natural language – this was my prompt. From that, the model was able to generate not only the title and lyrics of a song (on the right-hand side) but also the music!

Can you believe that it became my summer 2024 hit? If you want to create your summer hit too, you can try it for free at https://suno.com/create.

Video generation

Generative AI for video generation shares a similar timeline of development with image generation. One of the key developments in the field of video generation has been the development of GANs. Thanks to their accuracy in producing realistic images, researchers have started to apply this technique to video generation as well. One of the most notable examples of GAN-based video generation is DeepMind’s Veo, which generates high-quality videos from a single image and a sequence of motions. Another great example is NVIDIA’s video-to-video synthesis (Vid2Vid) DL-based framework, which uses GANs to synthesize high-quality videos from input videos.

The Vid2Vid system can generate temporally consistent videos, meaning that they maintain smooth and realistic motion over time. The technology can be used to perform a variety of video synthesis tasks, such as the following:

Converting videos from one domain into another (for example, converting a daytime video into a nighttime video or a sketch into a realistic image)Modifying existing videos (for example, changing the style or appearance of objects in a video)Creating new videos from static images (for example, animating a sequence of still images)

In September 2022, Meta’s researchers announced the general availability of Make-A-Video (https://makeavideo.studio/), a new AI system that allows users to convert their natural language prompts into video clips. Behind this technology, you can recognize many of the models that we mentioned in other domains – language understanding for the prompt, image and motion generation with image generation, and background music made by AI composers.

Now, everything we’ve mentioned above pales in comparison to the latest text-to-video models. To name one, OpenAI announced a text-to-video model called SORA