Generative AI - Martin Musiol - E-Book

Generative AI E-Book

Martin Musiol

0,0
19,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

An engaging and essential discussion of generative artificial intelligence

In Generative AI: Navigating the Course to the Artificial General Intelligence Future, celebrated author Martin Musiol—founder and CEO of generativeAI.net and GenAI Lead for Europe at Infosys—delivers an incisive and one-of-a-kind discussion of the current capabilities, future potential, and inner workings of generative artificial intelligence. In the book, you'll explore the short but eventful history of generative artificial intelligence, what it's achieved so far, and how it's likely to evolve in the future. You'll also get a peek at how emerging technologies are converging to create exciting new possibilities in the GenAI space.

Musiol analyzes complex and foundational topics in generative AI, breaking them down into straightforward and easy-to-understand pieces. You'll also find:

  • Bold predictions about the future emergence of Artificial General Intelligence via the merging of current AI models
  • Fascinating explorations of the ethical implications of AI, its potential downsides, and the possible rewards
  • Insightful commentary on Autonomous AI Agents and how AI assistants will become integral to daily life in professional and private contexts

Perfect for anyone interested in the intersection of ethics, technology, business, and society—and for entrepreneurs looking to take advantage of this tech revolution—Generative AI offers an intuitive, comprehensive discussion of this fascinating new technology.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 550

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Table of Contents

Title Page

Introduction

CHAPTER 1: AI in a Nutshell

What Is AI?

What Is Discriminative AI?

What Is Generative AI?

Note

CHAPTER 2: Innovative Approaches for High-Quality Data Generation

Why Generative Models?

From Birth to Maturity: Tracing the Development of Generative Models

GANs: The Era of Modern Generative AI Begins

From Pixels to Perfection: The Evolution of AI Image Generation

A Crucial Tech Disruption: Text Generation

Tech Triumphs in Text Generation

Notes

CHAPTER 3: Generative AI's Broad Spectrum of Applications

Foundational and Specialized AI Models, and the Question of Open Source vs. Closed Source

Application Fields

The Untapped Potential of Generative AI

Notes

CHAPTER 4: Generative AI's Exponential Growth

The Growth Pattern of New Technologies—The S-Curve

Technological Convergence

Exponential Progress in Computing

Exponential Growth in Data

Exponential Patterns in Research, Development, and Financial Allocations

Requirements for Growth

CHAPTER 5: Ethical Concerns and Social Implications of Generative AI

Intellectual Property and the Generative AI Platform

Bias and Fairness in AI-Generated Data

Misinformation and Misuse of Generative AI

Privacy, Safety, and Security

Generative AI's Impact on Jobs and Industry

The Dependency on AI

Environmental Concerns

AI Oversight and Self-Regulation

On a Positive Note

Notes

CHAPTER 6: Artificial General Intelligence in Sight

What Is Next in Generative AI?

Scaled Utilization of AI: Autonomous AI Agents

Embodiment of AGI: (Humanoid) Robots

The Human Potential Is Boundless; Optimism Helps

Acknowledgments

About the Author

Index

Copyright

Dedication

End User License Agreement

List of Illustrations

Chapter 1

FIGURE 1.1 The relationship between AI, ML, and DL

FIGURE 1.2 In supervised training of a ML model, two main steps are involved...

FIGURE 1.3 Prediction mode in a supervised ML model.

FIGURE 1.4 In ML, the concept of classification involves assigning data to o...

FIGURE 1.5 In regression, data like house details go into the ML model, whic...

FIGURE 1.6 Clustering model identifying buying patterns

FIGURE 1.7 Dimensionality reduction

FIGURE 1.8 Technical workings of reinforcement learning models

FIGURE 1.9 Exploration versus exploitation in RL training over time

Chapter 2

FIGURE 2.1 Representation of a discriminative model, showing how it distingu...

FIGURE 2.2 Representation of a generative model, highlighting the joint prob...

FIGURE 2.3 A conversation with the ELIZA chatbot.

FIGURE 2.4 Boltzmann machine concept

FIGURE 2.5 Deep Blue, a computer similar to this one, defeated chess world c...

FIGURE 2.6 Garry Kasparov.

FIGURE 2.7 Concept of restricted Boltzmann machines.

FIGURE 2.8 A deep belief network.

FIGURE 2.9 The autoencoder architecture.

FIGURE 2.10 The variational autoencoder architecture.

FIGURE 2.11 The generative adversarial network architecture.

FIGURE 2.12 Training of CLIP.

FIGURE 2.13 A probability diagram of a Markov chain for text generation. Eac...

FIGURE 2.14 A recurrent neural network unrolled.

FIGURE 2.15 A standard RNN unit.

FIGURE 2.16 An LSTM unit

FIGURE 2.17 The big picture perspective of a Seq2Seq model.

FIGURE 2.18 The different stages of receiving a desired LLM output.

FIGURE 2.19 Few-shot prompting.

FIGURE 2.20 Zero-shot prompting.

FIGURE 2.21 Self-consistency prompting: same question asked multiple times, ...

FIGURE 2.22 Generated knowledge prompting structure.

FIGURE 2.23 Directional stimulus prompting.

FIGURE 2.24 ReAct prompting.

FIGURE 2.25 Chinchilla scaling in table.

FIGURE 2.26 The multimodal capabilities of GPT-4 allow it to comprehend the ...

FIGURE 2.27 GPT-4 of simulated exams. Additional visual information helps th...

FIGURE 2.28 Steerability example of GPT-4 as a Socratic tutor.

FIGURE 2.29 The Alpaca model development process: starting with a seed set o...

Chapter 3

FIGURE 3.1 From foundation models to serving specific tasks.

FIGURE 3.2 Preliminary generative AI tech stack.

FIGURE 3.3 Tweet from Elon Musk about OpenAI turning from a nonprofit to a f...

FIGURE 3.4 Hugging Face's LLM-Leaderboard, mapping performances for various ...

FIGURE 3.5 Airbus APWorks launches the Light Rider, the world's first 3D-pri...

FIGURE 3.6 The Elbo chair, an exemplar of generative design and additive man...

FIGURE 3.7 Midjourney prompt: “Architecture futuristic city designed from pa...

FIGURE 3.8 AlphaFold's predictive power.

FIGURE 3.9 The exponential growth of the protein database, now encompassing ...

FIGURE 3.10 An overview of how to build an LLM agent, its structure, classes...

FIGURE 3.11 Using a single command to generate a plot from the data containe...

FIGURE 3.12 The rapid expansion of ChatGPT plug-ins, with over 100 unique pl...

FIGURE 3.13 (3D) U-Net: The 3D U-Net is an extension of the U-Net, designed ...

FIGURE 3.14 Market share distribution of cloud service providers.

FIGURE 3.15 The generator component of 3D generative adversarial networks....

FIGURE 3.16 A highly detailed stone bust of Theodoros Kolokotronics.

FIGURE 3.17 The NVIDIA Picasso service structure, showcasing the integration...

FIGURE 3.18 Real images are augmented using a publicly available off-the-she...

FIGURE 3.19 AugGPT's structure involves: (a) using ChatGPT for data augmenta...

FIGURE 3.20 The dominant form of data employed in AI will shift toward synth...

FIGURE 3.21 How to come up with your generative AI idea in this dynamic AI m...

Chapter 4

FIGURE 4.1 The life cycle of innovation: the S-curve.

FIGURE 4.2 The evolution of innovation: successive waves of technological ad...

FIGURE 4.3 Moore's law in action: a logarithmic scale representation of the ...

FIGURE 4.4 A quantum computer's intricate design: the loops, which straighte...

FIGURE 4.5 The AutoML workflow: an overview of automated ML's end-to-end pro...

FIGURE 4.6 GitHub Copilot at work: seamlessly providing Python code suggesti...

FIGURE 4.7 Digital storage units, from bytes to zettabytes.

FIGURE 4.8 Annual global data generation: Historical trends and projections ...

FIGURE 4.9 Annual global data generation: Historical trends and projections ...

FIGURE 4.10 Evolution of real vs. synthetic data ratios over time.

FIGURE 4.11 ARK Investment Management's projections: a tale of two futures....

Chapter 5

FIGURE 5.1 Deepfakes can be nearly indistinguishable from authentic images: ...

FIGURE 5.2 Jobs least likely to be automated by AI

Chapter 6

FIGURE 6.1 A simple two-step prompt unfolding the horizon of endless ideas a...

FIGURE 6.2 ImageBind unveils a realm of possibilities, including the innovat...

FIGURE 6.3 Autonomous AI agents framework.

FIGURE 6.4 Star history of one of the first AI agent repositories. Going vir...

FIGURE 6.5 A high-level diagram of LangChain capabilities.

FIGURE 6.6 SuperAGI's Marketplace.

FIGURE 6.7 Orchestrate, automate, and optimize complex LLM workflows with cu...

FIGURE 6.8 The listening and speaking screen of Pi.

FIGURE 6.9 A moment where technology transcends code, entering a realm of co...

FIGURE 6.10 Optimus at an exhibition in 2023.

Guide

Cover

Table of Contents

Title Page

Copyright

Dedication

Introduction

Begin Reading

Acknowledgments

About the Author

Index

End User License Agreement

Pages

i

ii

iii

ix

x

xi

xii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

405

406

407

409

410

411

412

413

414

415

416

417

418

419

420

iv

v

421

Praise for Generative AI

“Cutting through the clutter, Martin Musiol explains generative AI with great insight and clarity. The reader is left with a clear understanding of the technology, without the need to master complex mathematics or code. A must read for those who want to understand the future.”

—Rens ter Weijde, Chairman & CEO of KIMO.AI

“An illuminating guide through the evolving landscape of generative AI and AGI, this book masterfully demystifies complex concepts, making them accessible to all and ignites the imagination about the boundless possibilities of the future.”

—David Foster, author of Generative Deep Learning, Partner at Applied Data Science Partners

“This book is a must-read for anyone wanting to improve their understanding of where AI has come from, where it stands today, and, importantly, where it is heading. The advent of AGI and ASI is too important not to understand, and Martin meticulously explains many potential outcomes with a factual and unbiased perspective.”

— Roy Bhasin (Zeneca), author, entrepreneur, angel investor

“Highly recommended. Musiol deeply and expertly demonstrates how to navigate the complex, exhilarating, and essential landscape of generative AI.”

— Katie King, published author, CEO of AI in Business

“Generative AI by Martin Musiol offers a comprehensive overview of the GenAI technology and skillfully demystifies complex concepts of this transformative AI.”

— Sheamus McGovern, entrepreneur, investor, Founder & CEO Open Data Science

“Martin, my esteemed former colleague and an AI expert, has authored this crucial book designed for anyone seeking to enhance their knowledge of generative AI, autonomous AI agents, and AGI. From complex subjects to compelling and easily comprehensible, this book is invaluable for business applications and everyday life.”

— Martin Weis, Country Head Switzerland & Global Co-Lead AI, Analytics & Automation at Infosys Consulting

“Martin's book masterfully encapsulates the transformative power of AI and provides great foundational knowledge for innovators and builders to explore the industry further.”

— Anton Volovyk, Co-CEO Reface (GenAI app, 250m downloads, backed by a16z)

“This book is akin to a comprehensive playbook, detailing strategies and rules for navigating the complex field of AI, much like a coach laying out a winning game plan. It masterfully presents the evolutionary stages, key players beyond ChatGPT, foundational technologies, and practical guidance, equipping readers to effectively 'play' and excel in the dynamic and competitive arena of AI.”

— Dr. Harald Gunia, Leader for Applied Artificial Intelligence Europe at Infosys Consulting

“Martin Musiol's book on generative AI provides a compelling narrative that unveils the meticulous evolution of this groundbreaking technology. From the quiet simmering of its inception, to the carefully curated recipe of technological advancements that propelled it to unprecedented heights, Musiol carefully peels back the layers, revealing the pivotal factors that shaped the rise of generative AI.”

— Matteo Penzo, Co-Founder & CEO of zicklearn.com

“Martin's book offers deep insights and a comprehensive overview that makes this complex subject accessible to all readers.”

—Prof. Dr. Patrick Glauner

“This book is a must-read for anyone like me captivated by artificial intelligence's present and future implications.”

—Catherine Adenle, Senior Director, Global Employer Brand, Elsevier, top 22 AI and tech influencer

Generative AI

Navigating the Course to the Artificial General Intelligence Future

 

Martin Musiol

 

 

 

 

 

 

Introduction

In the realm of technology, epochs of transformation are often ignited by the spark of human imagination, fused with the finesse of engineering artistry. We stand at the precipice of such an epoch, where the realms of generative AI unfurl into the once uncharted territories of artificial general intelligence (AGI). I am both thrilled and humbled to be your guide on this thrilling expedition into the future, a journey that begins with the pages of this book.

The technological zeitgeist of our times is one of exponential progress. A mere glimpse into the recent past reveals the embryonic stages of generative AI, yet, within a fleeting span, advancements like ChatGPT have marked a point of no return. This crescendo of innovation is not confined to textual realms alone but spans across images, videos, 3D objects, datasets, virtual realities, code, music, and sound generation, each stride accelerating our pace toward the enigmatic horizon of AGI. The rapid maturation and adoption of generative AI outshine the evolutionary arcs of many preceding technologies.

It was during the cusp of this book's creation that the concept of autonomous AI agents morphed into a tangible reality, courtesy of emerging open source frameworks. Now, a subscription away, the first AI agents are at our beck and call. This swift progression, magnifying the efficiency of AI model development, underscores the urgency and the timeliness of delving into the discourse this book intends to foster. As you traverse through its chapters, you'll realize we are merely at the dawn of an exhilarating technological epoch with a vast expanse yet to be unveiled.

Who should venture into this exploration? Whether you're a technology aficionado, a student with a zest for the unknown, a policymaker, or someone who's merely curious, this book beckons. No prior acquaintance with AI or machine learning is required; your curiosity is the sole ticket to this expedition. As we commence, we'll demystify the essence of AI, its lexicon, and its metamorphosis over time. With each page, we'll delve deeper, yet the narrative is crafted to foster an understanding, irrespective of your prior knowledge. By the narrative's end, your imagination will be aflame with the boundless possibilities that the future holds.

The narrative arc of this book has been meticulously crafted to offer an understanding yet a profound insight into generative AI and its trajectory toward AGI. Our expedition begins with the rudiments of AI, tracing its evolution and the brilliant minds that propelled it forward. As we delve into the heart of generative AI, we'll explore its broad spectrum of applications, unraveling potential startup ideas and pathways to venture into this domain. The discussion will then transcend into the convergence of diverse technological realms, each advancing exponentially toward a shared zenith. Ethical and social considerations, indispensable to this discourse, will be deliberated upon before we venture into the realms of AGI, humanoid and semi-humanoid robotics, and beyond. Through the annals of my experience, including my tenure as the generative AI lead for EMEA at Infosys Consulting, we'll traverse through real-world scenarios, albeit veiled for confidentiality, offering a pragmatic lens to envision the theoretical discourse.

What sets this narrative apart is not merely the content, but the vantage point from which it is observed. My journey, from advocating generative AI since 2016, founding GenerativeAI.net in 2018, to now sharing a platform with luminaries at the AI Speaker Agency, has been nothing short of exhilarating. It's through the crucible of real-world implementations and continuous discourse with global thought leaders that the insights within this book have been honed. Our conversations, a confluence of diverse perspectives, have enriched the narrative, making it a crucible of collective wisdom.

A treasure trove of knowledge awaits to equip you to navigate the complex yet exhilarating landscape of generative AI and AGI. The ethos of this narrative is to empower you to become a 10X more effective human, to harness the tools that propel you forward, and should a spark of an idea ignite within, to pursue it with vigor. Things can be figured out along the way, especially in this era equipped with generative AI tools. Remember, AI in itself won't replace us, but those wielding AI effectively certainly will have an edge.

In the words of British physicist David Deutsch, our civilization thrives on technological growth, and it's our prerogative to strive for a better future. This book is a stepping stone toward that endeavor, and I invite you to step into the future, one page at a time.

How to Contact the Publisher

If you believe you've found a mistake in this book, please bring it to our attention. At John Wiley & Sons, we understand how important it is to provide our customers with accurate content, but even with our best efforts an error may occur.

In order to submit your possible errata, please email it to our Customer Service Team at [email protected] with the subject line “Possible Book Errata Submission.”

How to Contact the Author

I appreciate your input and questions about this book! Feel free to contact me at the following:

Martin Musiol's email: [email protected]

Martin's LinkedIn profile: www.linkedin.com/in/martinmusiol1

GenerativeAI.net's web page: https://generativeai.net

CHAPTER 1AI in a Nutshell

No other field of technology has such inconsistent jargon as artificial intelligence (AI). From mainstream media to tech influencers to research scientists, each layer of media has contributed to that confusion. In order of their degree of contribution and frequency, I observed mainstream media simplifying and misusing terms consistently, tech influencers misunderstanding the tech in-depth, and even some research scientists over-complicating their model findings with fancy terms. By no means do I intend to criticize research scientists. They are the backbone of everything discussed in this book. Their work offers solutions to a plethora of problems, making AI the umbrella term for almost every intelligent problem. However, its interdisciplinary nature, the rapid advancements in this space, and AI's general complexity make it already difficult to gain a clear understanding of this field. I am convinced that consistent and clear language would help to understand this topic area.

We can see two broad classes in AI: generative AI, the subject of this book, and discriminative AI. The latter is the traditional and better-known part of AI. Before delving into both AI classes, let's take a moment to understand the broader picture of AI, machine learning (ML), deep learning (DL), and the process of training models, to avoid getting ahead of ourselves.

What Is AI?

Even though AI includes a broad spectrum of intelligent code, the term is often incorrectly used. Figure 1.1 shows how AI, ML, and DL are related. ML, a part of AI, learns from data. DL, a deeper part of ML, uses layered setups to solve tougher problems. Non-self-learning programs like expert systems don't learn from data, unlike ML and DL. We'll explore these more next.

FIGURE 1.1 The relationship between AI, ML, and DL

How AI Trains Complex Tasks

AI can perform tasks ranging from predefined expert answers, also known as expert systems, to tasks that require human-level intelligence. Think about recognizing speech and images, understanding natural language processing (NLP), making sophisticated decisions, and solving complex problems. For tasks like this, the AI has to train on a respective dataset until it is able to perform the desired activity as well as possible. This self-learning part of AI is referred to as machine learning (ML). Because most of the interesting applications are happening through machine learning in one way or another, and to keep it simple, we use AI and ML interchangeably.

To make it tangible, we are designing an AI system that rates the cuteness of cats from 5 (absolutely adorable) to 1 (repulsively inelegant). The ideal dataset would consist of pictures of cute kittens, normal cats, and those half-naked grumpy cats from the Internet. Further, for classifying pictures in a case like this, we would need labeled data, meaning a realistic rating of the cats. The model comes to life through three essential steps: training, validation, and evaluation.

In training, the model looks at each picture, rates it, compares it with the actually labeled cuteness of the cat, and adjusts the model's trainable parameters for a more accurate rating next time—much like a human learns by strengthening the connections between neurons in the brain. Figure 1.2 and Figure 1.3 illustrate training and prediction, respectively.

Throughout the training process, the model needs to make sure training goes in the right direction—the validation step. In validation, the model checks the progress of the training against separate validation data. As an analogy, when we acquire a skill like solving mathematical problems, it makes sense to test it in dedicated math exams.

After training has been successfully completed and respective accuracy goals have been reached, the model enters the prediction or evaluation mode. The trainable parameters are not being adjusted anymore, and the model is ready to rate all the cats in the world.

FIGURE 1.2 In supervised training of a ML model, two main steps are involved: predict the training data point, then update the trainable parameters meaningfully based on the prediction's accuracy.

FIGURE 1.3 Prediction mode in a supervised ML model.

It is typical for a model in production mode that the accuracy gets worse over time. The reason for this could be that the real-world data changed. Maybe we are only looking at kittens and they are all cute compared to our training data. Retraining the model, whenever accuracy decreases or by scheduling retraining periodically, tackles the problem of a discrepancy between the data distribution of training data and evaluation data.

Perhaps you have a sense already that training AI models requires much more computing power than they need in prediction mode. To adjust its trainable parameters, often referred to as weights, we need to calculate the grade of adjustment carefully. This happens through a famous model function called backpropagation. It entails the backward propagation of prediction errors—the learning from making mistakes in the training process. The errors are turned back to respective weights for improvement. This means that we go forward to predict a data point and backward to adjust the weights. In prediction mode, however, we don't adjust the weights anymore, but just go forward and predict. The function that has been trained through the training data is being applied, which is comparatively cheap.

Unsupervised Learning

When ML models reach a certain complexity by having many computing stages, called layers, we enter the realm of deep learning (DL). Most of the cutting-edge applications are at least partially drawing their algorithms from DL. Algorithms are step-by-step instructions for solving problems or performing tasks.

The preceding example of rating the cuteness of a cat was simplified drastically and didn't tell the whole story. A relevant addition to this is that as we train on labeled cat pictures, with the label being the cuteness of the cats, we call this supervised machine learning. With labels, we provide guidance or feedback to the learning process in a supervised fashion.

The counterpart for supervised ML is called unsupervised machine learning. The main difference between them is that in unsupervised ML the training data is not labeled. The algorithms ought to find patterns in the data by themselves.

For example, imagine you have a dataset of customer purchases at a grocery store, with information about the type of product, the price, and the time of day. In AI these attributes are called features. You could use an unsupervised clustering algorithm to group similar purchases together based on these features. This could help the store better understand customer buying habits and preferences. The algorithm might identify that some customers tend to buy a lot of fresh produce and dairy products together, whereas others tend to purchase more processed foods and snacks. This information could be used to create targeted marketing campaigns or to optimize store layout and product placement.

Comparing the performance of unsupervised learning applications to that of supervised learning applications is akin to contrasting boats with cars—they represent distinct methodologies for addressing fundamentally diverse problems. Nevertheless, there are several reasons why we reached success years faster with supervised than with unsupervised learning methods.

In supervised learning, the model is given a training dataset that already includes correct answers through labels. Understandably, this helpful information supports model learning. It also accurately outlines the AI model's intended objective. The model knows precisely what it is trying to achieve. Evaluating the model's performance is simpler than it is in unsupervised machine learning, as accuracy and other metrics can be easily calculated. These metrics help in understanding how well the model is performing.

With this information, a variety of actions can be taken to enhance the model's learning process and ultimately improve its performance in achieving the desired outcomes.

Unsupervised models face the challenge of identifying data patterns autonomously, which is often due to the absence of apparent patterns or a multitude of ways to group available data.

Generative AI a Decade Later

Generative AI predominantly employs unsupervised learning. Crafting complex images, sounds, or texts that resemble reasonable outputs, like an adorable cat, is a challenging task compared to evaluating existing options. This is primarily due to the absence of explicit labels or instructions.

Two main reasons explain why generative AI is taking off roughly a decade after discriminative AI. First, generative AI is mostly based on unsupervised learning, which is inherently more challenging. Second, generating intricate outputs in a coherent manner is much more complex than simply choosing between alternatives. As a result, generative AI's development has been slower, but its potential applications are now visible.

Between supervised and unsupervised learning, there are plenty of hybrid approaches. We could go arbitrarily deep into the knick-knacks of these ML approaches, but because we want to focus on generative AI, it is better to leave it at that. If you want to dive deeper into the technicalities, I recommend the book Deep Learning (Adaptive Computation and Machine Learning series), by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (MIT Press, 2016), which covers ML and DL in great detail, laying the theoretical generative AI foundation. It is regarded as the best book in the space, which isn't surprising, given the authors. I will come back to those gentlemen later.

The AI landscape is vast and ever-expanding. In this book, I strike a balance between simplifying concepts for clarity and providing sufficient detail to capture the essence of recent AI advancements. To understand what generative AI is and its value proposition, we first have to understand the traditional part of AI, called discriminative AI.

What Is Discriminative AI?

Discriminative AI models made headlines long before large language models (LLMs) like ChatGPT by OpenAI and image generation models like stable diffusion by Stability AI entered the stage. Since the term “artificial intelligence” was coined by John McCarthy in 1955, discriminative models have yielded great results, especially in the past 15 years.

Discriminative AI focuses on algorithms that learn to tell apart different data classes. They recognize patterns and features unique to each class, aiming to link input features with labels for the output. This way, they can effectively classify instances into predefined groups, making it easier to distinguish one class from another. Discriminative AI has found numerous applications in various domains, including NLP, recommendations, and computer vision.

In the field of NLP, discriminative AI is used to classify text data into different categories, such as sentiment analysis or topic classification. In the domain of recommendations, discriminative AI is used to predict user preferences and make personalized product recommendations. In computer vision, discriminative AI is used to recognize objects and classify images based on their content. The applications of discriminative AI are vast and diverse, and its impact on various industries is immense.

Looking at existing applications, discriminative AI generally has five main tasks: classification, regression, clustering, dimensionality reduction, and reinforcement learning. They are not crucial to be able to follow the book's thread, but it helps to understand them conceptually because then the term “discriminative” and what it means in the context of AI becomes apparent. Put simply, in one way or another, this part of AI is deciding, selecting, distinguishing, or differentiating on data or a problem at hand.

Classification

The objective of classification is to accurately predict the class of new inputs based on prior training with labeled examples (Figure 1.4). This supervised learning process uses training examples accompanied by their respective class labels.

For instance, consider unlocking your phone with facial recognition. You initially show your face from various angles, allowing the classifier model to learn your appearance. Advanced face recognition systems, like the iPhone's FaceID, quickly identify you due to their extensive pretraining and incorporation of biometric information to deterministically classify users. In essence, the model or system of models assesses your face and discriminates whether you belong to the “person with access rights” or “person without access rights” class.

FIGURE 1.4 In ML, the concept of classification involves assigning data to one of a finite set of categories.

Classification has driven breakthroughs in diverse applications, including image classification, sentiment analysis, disease diagnosis, and spam filtering. These applications typically involve multiple processing steps and rely on deep learning techniques.

Regression

A regression model in AI is designed to predict numerical values for new inputs based on data it has learned from a given problem. In this case, the output is not a class label but a continuous value. For example, imagine you want to buy a 100-square-meter apartment with a balcony in Munich, Germany. A real estate agent presents three similar apartments, priced at 2 million, 2.5 million, and 2.7 million euros.

You have three options: the naive approach, where you assume these three properties represent the market; the informed approach, where you estimate market prices by researching multiple offers; or the data science approach, which involves building a machine learning model to determine a fair price by analyzing all available properties in the market with their price tags.

A well-trained regression model will give you a market-based and rational price, as it takes into account all the characteristics of apartments in the market (Figure 1.5), helping you make a more informed decision. By recommending a price, the model inherently has a discriminative nature.

FIGURE 1.5 In regression, data like house details go into the ML model, which then predicts its price based on these features.

Clustering

As the name suggests, this application field in AI clusters data points. Be they people, groceries, or songs, based on a similarity measure, these items are grouped. By the way, you are being clustered all the time. For example, Internet ads are targeted to your digital persona, including your sex, age, IP address (which represents your location), and all other data ad-providing companies have collected about you. To cement it, if you use a web page that recommends songs like Spotify, movies like Netflix, and products like Amazon to you, then you have been clustered. In the success of big tech companies like those mentioned previously, clustering algorithms have played a crucial role, as they are the backbone of every recommendation engine.

In clustering tasks, the data comes without labels. For instance, there are no labels on our heads indicating “prefers Ben & Jerry's Chubby Hubby.” Clustering models must identify patterns and groups autonomously, making it an unsupervised learning task. Moreover, the process of assigning items or personas to clusters is a decision-making aspect of discriminative AI. Figure 1.6 illustrates the conceptual operation of a clustering model. By analyzing other people's behavior, it infers that individuals who purchase butter and milk might also prefer cereals. Adding soda to the mix increases the likelihood of a preference for Ben & Jerry's Chubby Hubby.

FIGURE 1.6 Clustering model identifying buying patterns

Dimensionality Reduction

Dimensionality reduction is not an application field of AI that is discussed much in mainstream media. It is rather research-heavy and often a means to achieve something greater, more efficiently.

Its primary purpose is to reduce low-information data, mainly making machine learning applications as effective as possible. By “low-information data,” I mean data that contains little to no meaningful insights to solve a problem. See Figure 1.7 for a visual representation.

FIGURE 1.7 Dimensionality reduction

Imagine that you have an extensive recipe book with hundreds of recipes. Each recipe has several ingredients, and some of them are similar. For example, many recipes might call for salt, pepper, and olive oil. If you were to list all the ingredients used in the book, it would be a long list with many similar items.

Now imagine that you want to make a simpler version of the recipe book that is easy to use on a daily basis. One way to do this is to group similar ingredients. For example, you could create a category called “seasonings” that includes salt, pepper, and other spices used in the recipes. You could also create a category called “cooking oils” that contains olive oil, vegetable oil, and so forth.

In the world of data science, the same thing happens. We might have a large dataset with many different features, and we want to simplify it to make it easier to work with. Dimensionality reduction techniques help us to do this by finding a way to represent the data with fewer features while still preserving essential information. They make it easier to analyze data, build models, or visualize data more understandably.

Naturally, the data is not labeled, and we don't know up front which features carry relevant information. In an unsupervised manner, the models must learn to distinguish what low-information data can be modified or truncated and how. The models must decide or discriminate, indicating that we are in discriminative AI.

Reinforcement Learning

Reinforcement learning (RL) models, typically called agents, learn from positive or negative consequences that their actions yield in real-world or virtual environments. A positive consequence is a reward, and a negative consequence is a punishment. In Figure 1.8, the agent executes an action in a virtual/physical environment, altering the environment (even if minimally), and receives a reward or penalty based on its stated goal. During the training phase of the RL model, initial emphasis is on exploration to identify available paths (e.g., for warehouse navigation), gradually shifting to an exploitation phase for efficient goal achievement (or technically, maximizing rewards), as indicated in Figure 1.9.

FIGURE 1.8 Technical workings of reinforcement learning models

Virtual environments encompass a wide range of applications, from simulations for practicing real-world maneuvers to gaming experiences, and even stock market environments for trading agents. In gaming, AI has demonstrated remarkable super-human abilities, excelling in games such as Super Mario. When an RL agent acts in a real-world environment, it is probably a robot in a warehouse or Boston Dynamics's Atlas performing ninja moves. The agents acquire the ability to determine the optimal action in a given situation, positioning them as a component of discriminative AI.

FIGURE 1.9 Exploration versus exploitation in RL training over time

Reinforcement learning has many exciting aspects, one of which is forming great synergies with generative AI. It was of little public interest for decades until its turning point in 2016, when AlphaGo by Google's DeepMind won a series of Go matches against the former world champion Lee Sedol. Go is a complex Chinese board game with a 19×19 grid, and thus it has 10^172 possible moves. For comparison, there are 10^82 atoms in the universe. RL not only plays complex games exceptionally well but also delivers on a variety of tasks, ranging from autonomous vehicles to energy management in buildings. More on the powerful collaboration between RL and generative AI later.

Additionally, RL is helping to advance our understanding of the learning process itself, leading to new insights into how intelligence works and how it can be developed and applied.

What Is Generative AI?

So far we have talked about discriminative AI, which can decide, distinguish, or discriminate between different options or continuous values.

Generative AI, however, is fundamentally different. It has the ability to generate all kinds of data and content. By learning the patterns and characteristics of given datasets, generative AI models can create new data samples that are similar to the original data.

Recent advancements, such as the mind-blowing creations of Midjourney's image generation, the steps of video generation like Meta's Make-A-Video, and the conversational abilities of ChatGPT, have completely altered the way we view AI. It is a fascinating field that revolutionizes the way we create products and interact with data.

Generally speaking, generative AI models can perform three tasks, each with a unique and exciting set of applications.

Data Generation

First, and it is the most obvious one, that they can generate all kinds of data, including images, videos, 3D objects, music, voice, other types of audio, and also text—like book summaries, poems, and movie scripts. By learning the patterns and characteristics of given data, generative AI models can create new data samples that are similar in style and content to the original.

Data Transformation

The second task of generative AI is to perform data transformations. This means transforming existing data samples to create new variations of them. Transformations can reveal new insights and create appealing outputs for various applications. For example, you can transform winter pictures into summer pictures or day pictures into night pictures. Translating an image from one domain (for example, summer) into another (winter) is called a domain transfer. Image style transformation involves taking an image, such as a photograph of your garden, and maintaining the content (i.e., the garden) while altering its appearance to resemble the artistic style of, say, Monet's paintings. This process, known as style transfer, is not limited to visual content like photos and videos but can also be applied to other data types like music, text, speech, and more. The essence of style transfer lies in preserving the original content while imbuing it with a distinct and recognizable, often artistic, flair.

Style transfer is more than just a delightful tool; it possesses the potential to significantly improve datasets for broader applications. For example, researchers from Korea and Switzerland have independently investigated the use of style transfer techniques to augment the segmentation of cancer cells in medical images using machine learning. This method, dubbed contextual style transfer, relies on the seamless integration of style-transferred instances within the overall image, ensuring a smooth and cohesive appearance—something that generative adversarial networks (GANs) are able to perform. In a fascinating study, Nvidia showcased a remarkable improvement in segmentation performance by incorporating synthetic data into the training set. This integration led to a leap from 64 percent to 82 percent in accuracy simply by augmenting the dataset, without modifying the machine learning pipeline in any way.

Data Enrichment

As already indicated with style transfer, the third task of generative AI is to enrich datasets to improve machine learning models ultimately. This involves generating new data samples similar to the original dataset to increase its size and diversity. By doing so, generative AI can help to improve the accuracy and robustness of machine learning models.

Imagine we want to build a computer vision model that uses ML techniques to classify whether rare cancer cells are benign or malignant. As we are looking at a rare cancer type, it will be a small dataset to train on. In real-world scenarios, privacy issues are another data-diminishing factor. However, our neural net is data-hungry and we can't get the most out of its power, landing at 64 percent classification accuracy. Through generative AI, rare cancer images can be generated to create a larger and more diverse training dataset for improved detection performance.

Overall, the capabilities of generative AI are truly remarkable, and the potential applications are vast and varied. AI limits are being pushed every day, not only by research but also by for-profit companies. This is especially true of generative AI.

If we zoom out further, we see that the overall concept of generative AI is even simpler. Models generate data based on some input. The complexity of the input can vary a lot. It could range from simple tasks, such as transforming a single digit like 6 into a handwritten image, to complex endeavors like applying domain transformations to a video.

Under the Radar No More: Picking Up Speed

What we often observe, especially in AI, is that a new tech approach has early roots, but has been in stealth mode for a couple of decades. Once sufficient advancements transpire in a related tech domain, the dormant technology awakens, delivering substantial value in real-world applications. This is recognized as technological convergence.

Deep Learning Tech Convergence with GPUs  The advent of deep learning, the underlying technology propelling fields such as computer vision and robotics, traces its roots back to 1967, when the first neural network, the multilayer perceptron, was conceived and introduced by two prominent Soviet scientists, Ivakhnenko and Lapa.1 For numerous decades deep learning struggled to yield tangible business value and real-world applications. However, a transformative moment arrived with the emergence of graphics processing units (GPUs) at the onset of the 21st century.

GPUs first became popular in the gaming industry. In the late 1990s and early 2000s, video games became increasingly complex and required more processing power to render high-quality graphics and animations.

In the 1990s, GPUs were initially developed with the primary aim of providing specialized processing for intricate 3D graphics and rendering in video games and other computer applications. Firms such as 3DFX, ATI, and Nvidia spearheaded these advancements. The early 2000s witnessed another significant development for GPUs: the introduction of parallel processing, enabling multiple calculations to be executed simultaneously.

This ability to compute large amounts of data breathed new life into deep learning, allowing it to gain traction and experience a surge in research popularity. Leveraging GPUs' enhanced capabilities, researchers and practitioners accelerated deep learning's potential, sparking a multitude of practical applications. Today, it's unimaginable to train a robust machine learning or deep learning model without the assistance of GPUs.

Deep learning has reaped the benefits of other advancements as well. The Internet's growth and technological innovations provided abundant data for training models, while committed researchers and research, in general, led to numerous breakthroughs in deep neural networks. This progress extends from convolutional neural networks achieving remarkable feats in image recognition to recurrent neural networks demonstrating advanced NLP capabilities. It's not just the researchers who are passionate about the subject; capital allocators and profit-driven companies have also invested heavily in the field.

Incidentally, it's worth mentioning that we are now seeing, and will likely keep seeing, a similar rise in interest in generative AI. The growth of other areas, especially discriminative AI and computational power, along with the increasing amount of data, were crucial for generative models to evolve in the background.

Today, we see billions being invested in generative AI projects aimed at tackling a wide range of business and non-business applications, as long as people can imagine it. This growing focus on generative AI promises to bring even more transformative advancements in the near future, building on the foundation established by previous AI breakthroughs.

In today's attention economy, capturing the focus of individuals has become increasingly challenging, as attention itself is a scarce and valuable resource. The widespread adoption of the Internet, social media, and other digital technologies has led to an overwhelming influx of information and stimuli, all competing for our limited attention. Consequently, only groundbreaking technologies can truly stand out and capture the spotlight. For a long time, generative AI remained relatively obscure in this competitive landscape. However, recent advances and remarkable achievements have now propelled generative AI into prominence, showcasing its immense potential and securing its place at the forefront of technological innovation.

Generative AI's Early Impact  Generative AI is still quite new, but its future effects are expected to be amazing, going beyond what we've seen so far. Its influence can be noticed in many areas, but it has mainly made a difference in three sectors: creative industries, gaming, and natural language processing.

Creative Industries Generative AI has made a lasting impact on creative fields like art. This technology enables artists to create unique and inventive digital artworks. By studying patterns and styles in existing art, music, and fashion, AI algorithms can produce new content that matches market trends and engages audiences. In the world of music, these algorithms can generate original tracks or remix current ones, opening up fresh possibilities for both producers and artists.

The integration of generative AI has led to new business models in the creative industry, such as selling exclusive digital art or creating customized products using AI-generated designs. This growth has occurred alongside a technological convergence between AI and the rapidly expanding cryptocurrency landscape.

In the last eight years, the cryptocurrency world has seen incredible progress, with numerous coins quickly making some people wealthy and leaving others financially devastated. Decentralized finance and institutional adoption have drawn significant interest. However, the most far-reaching impact may come from non-fungible tokens (NFTs).

NFTs allow artists and creators to produce unique, verifiable digital assets, leading to a growing demand for imaginative, high-quality AI-generated art. While not the sole driving force behind advancements in image generation, the NFT market has undeniably accelerated progress in this area.

Gaming Industry The gaming industry has experienced a significant transformation due to generative AI, which has opened up possibilities for a variety of new game content, such as levels, characters, 3D objects, scenarios, and even entire quests. A notable example is Microsoft's Flight Simulator, which partnered with Blackshark.ai to generate a photorealistic, three-dimensional world from two-dimensional satellite images, covering the whole Earth.

The popularity of open-world concepts in gaming has encouraged many companies to adopt AI-generated content. Imagine AI algorithms that study player behavior and dynamically modify game difficulty or generate new content on the spot, leading to personalized and engaging gaming experiences. Consider the potential of giving non-player characters (NPCs) AI-driven language models for more captivating and immersive interactions. These advancements could make returning to the real world a challenge.

By using generative AI to create in-game items and environments more efficiently, gaming companies can allocate more time and resources to concentrate on core aspects, ensuring the production of intriguing and original content. The future of gaming, fueled by generative AI, is set to be an exciting and immersive adventure for players.

Natural Language Processing The third impact vertical is not a single industry per se but rather many industries.

Generative AI can be used to generate new content such as text, summaries, or translations. Large language models are at the forefront of generative AI applications, with widespread impacts across various industries. LLMs can improve operational efficiencies by automating repetitive internal processes and accelerating innovation through customer feedback analysis, insights, and market research. These models can also improve customer experiences with concise answers and summaries available 24/7. The potential for managing knowledge is perhaps one of the most significant aspects of AI systems; organizations with specialized knowledge can offer their expertise in a tailored and concise manner to end users. Take the Mayo Clinic, for instance. Specializing in patient care, research, and education, the Mayo Clinic has amassed a wealth of data on medical conditions and treatments, such as patient records, research studies, and medical imaging data. They could create chatbots and virtual assistants that harness this data to provide expert guidance and advice to patients. By integrating these AI-driven tools into the Mayo Clinic's website or mobile app, patients could access expert medical advice from anywhere around the globe.

Language models don't just generate language, but also code, music, poetry, stories, jokes, captions, summaries, translations, recommendations, and much more. The fields will further broaden, with LLMs providing innovative solutions for businesses and society.

Generative AI is immensely exciting as it will undoubtedly revolutionize how we create, consume, and process content across all aspects of our lives. As the technology develops, we can expect further paradigm shifts, leading to groundbreaking advancements in industries worldwide.

Note

1

  A. G. Ivakhnenko and Valentin Grigor'evich Lapa,

Cybernetics and Forecasting Techniques

, American Elsevier Publishing Company, 1967.