Coding with ChatGPT and Other LLMs - Dr. Vincent Austin Hall - E-Book

Coding with ChatGPT and Other LLMs E-Book

Dr. Vincent Austin Hall

0,0
32,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Keeping up with the AI revolution and its application in coding can be challenging, but with guidance from AI and ML expert Dr. Vincent Hall—who holds a PhD in machine learning and has extensive experience in licensed software development—this book helps both new and experienced coders to quickly adopt best practices and stay relevant in the field.
You’ll learn how to use LLMs such as ChatGPT and Gemini to produce efficient, explainable, and shareable code and discover techniques to maximize the potential of LLMs. The book focuses on integrated development environments (IDEs) and provides tips to avoid pitfalls, such as bias and unexplainable code, to accelerate your coding speed. You’ll master advanced coding applications with LLMs, including refactoring, debugging, and optimization, while examining ethical considerations, biases, and legal implications. You’ll also use cutting-edge tools for code generation, architecting, description, and testing to avoid legal hassles while advancing your career.
By the end of this book, you’ll be well-prepared for future innovations in AI-driven software development, with the ability to anticipate emerging LLM technologies and generate ideas that shape the future of development.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 442

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Coding with ChatGPT and Other LLMs

Navigate LLMs for effective coding, debugging, and AI-driven development

Dr. Vincent Austin Hall

Coding with ChatGPT and Other LLMs

Copyright © 2024 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

The author acknowledges the use of cutting-edge AI, such as ChatGPT, with the sole aim of enhancing the language and clarity within the book, thereby ensuring a smooth reading experience for readers. It’s important to note that the content itself has been crafted by the author and edited by a professional publishing team.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Niranjan Naikwadi

Publishing Product Manager: Nitin Nainani

Book Project Manager: Aparna Nair

Senior Editor: Joseph Sunil

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Proofreader: Joseph Sunil

Indexer: Manju Arasan

Production Designer: Joshua Misquitta

Senior DevRel Marketing Executive: Vinishka Kalra

First published: November 2024

Production reference: 1061124

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK.

ISBN 978-1-80512-505-1

www.packtpub.com

Contributors

About the author

Dr. Vincent Austin Hall is a computer science lecturer at Birmingham Newman University and CEO of Build Intellect Ltd, an AI consultancy. Build Intellect works closely with ABT News LTD, based in Reading, England. He holds a physics degree from the University of Leeds, an MSc in biology, chemistry, maths, and coding from Warwick, and a PhD in machine learning and chemistry, also from Warwick, where he developed licensed software for pharma applications. With experience in tech firms and academia, he’s worked on ML projects in the automotive and medtech sectors. He supervises dissertations at the University of Exeter, consults on AI strategies, coaches students and professionals, and shares insights through blogs and YouTube content.

I would like to thank my supportive and patient family: my excellent and wise partner Anna, our brilliant, different, and loving son Peter and our brilliant, inventive, and hilarious daughter Lara, for allowing me time to work on this book over many weekends and evenings and understanding that good things take long, hard work, and many iterations.

Thank you to Packt Publishing: Editor Joseph Sunil for making only good suggestions and improving my work; Book Project Manager, Aparna Nair for keeping the project progressing well and making sure everything got done; Publishing Product Manager, Nitin Nainani for managing and further direction; Priyanshi J for bringing me on board and suggesting this book in the first place; as well as the technical reviewers for helping Joseph and me to keep the book quality high.

Thanks to my business partner, Chief Chigbo Uzokwelu, CEO of ABT News Ltd, for lots of support in friendship and business: legal, sales, business communications, proof reading, and marketing.

Thanks to the reader for reading and learning, sharing what you've learned and helping others to upskill and create the best code, careers and solutions for Earth (and future populated worlds).

About the reviewers

Parth Santpurkar is a senior software engineer with over a decade of industry experience based out of the San Francisco Bay area. He’s a senior IEEE member and his expertise and interests range from software engineering and distributed systems to machine learning and artificial intelligence.

Sougata Pal is a passionate technology specialist performing the role of an enterprise architect in software architecture design and application scalability management, team building, and management. With over 15 years of experience, they have worked with different start-ups and large-scale enterprises to develop their business application infrastructure, enhancing their reach to customers. They have contributed to different open source projects on GitHub to empower the open source community. For the last couple of years, they have playing around with federated learning and cybersecurity algorithms to enhance the performance of cybersecurity processes by introducing concepts of federated learning.

Table of Contents

Preface

Part 1: Introduction to LLMs and Their Applications

1

What is ChatGPT and What are LLMs?

Introduction to LLMs

Origins of LLMs

Early LLMs

GPT lineage

BERT

LaMDA

LLaMA‘s family tree

Exploring modern LLMs

GPT-4

LLaMA-2

Gemini (formerly Bard)

Amazon Olympus

How Transformers work

How an LLM processes a piece of text

ChatGPT uses reinforcement learning from human feedback

LLMs are expensive

A note on the mathematics of LLMs

Applications of LLMs

Summary

Bibliography

2

Unleashing the Power of LLMs for Coding: A Paradigm Shift

Technical requirements

Unveiling the advantages of coding with LLMs

The short version

The longer version

Planning your LLM-powered coding

1. Understanding your purpose – unveiling the why

2. Identifying your audience – tailoring the experience

3. Defining the environment – where your code calls home

4. Mapping user interaction – charting the navigation flow

5. Identifying data sources – feeding the machine learning beast

6. What data format?

7. How will you plumb in the data?

8. Visualizing the interface

Getting into LLM-powered coding

Back to the HTML code for Prompt 5

Back to the Flask code for Prompt 5

Making it work for you

Summary

3

Code Refactoring, Debugging, and Optimization: A Practical Guide

Technical requirements

Dealing with error codes – debugging

Prompt 4 debugging

Prompt 5 debugging – HTML

Prompt 5 debugging – Python/Flask

Where’s the code?

Refactoring code

Refactoring code with Claude 3

Documenting code

Let’s get ChatGPT and to explain some code

Testing code

How do you test code?

Virtual software companies

Agents

Relevance to virtual software companies?

ChatDev

Summary

Part 2: Be Wary of the Dark Side of LLM-Powered Coding

4

Demystifying Generated Code for Readability

Technical requirements

Generating more readable code

Introduction to data compression methods

Code to compress data, written in Python 3.10

Let’s look at some well-written code

What makes code hard or easy to read?

Why is reading code hard?

Dos and don’ts of readable code – how to make readable code

Summarizing code for understanding

Generating documentation

Documentation for crypto_price_and_indicators.py

Summary

Bibliography

5

Addressing Bias and Ethical Concerns in LLM-Generated Code

Technical requirements

Understanding bias in LLM-generated code

Where does bias in LLMs come from?

Examining ethical dilemmas – challenges in LLM-enhanced working

Meta AI, or Meta Llama 3

ChatGPT on international security measures

Racist Gemini 1.5

Detecting bias – tools and strategies

Biases you might find in code and how to improve them

Analyzing the training data

Examining the code

Preventing biased code – coding with ethical considerations

Get good data

Ethical guidelines

Create transparent and explainable code

Code reviews

Your inevitable success

Examples of getting the balance right

Summary

Bibliography

6

Navigating the Legal Landscape of LLM-Generated Code

Technical requirements

Unraveling copyright and intellectual property considerations

The EU – needs the human touch

The UK – human creativity and arrangements necessary for the creation

The USA – no ownership of AI-generated works

The People’s Republic of China – whoever made the greater contribution

Taiwan – human creative expression

India and Canada – human author’s skill and judgment

Australia – to the person making the necessary arrangements

Japan – copyright requires human authorship

South Korea

Brazil – human authorship required

Indonesia – human authorship needed

Evolving legal landscape

Precedent

Addressing liability and responsibility for LLM-generated code

Licensing

Attribution and credit

Quality and reliability

Ethical considerations

Product liability

Use case restrictions

Security concerns

Transparency and explainability

Third-party dependencies

Use good communication to avoid legal action

Code of ethics when using AI

Accountability and redress mechanisms

Examining legal frameworks governing the use of LLMs in coding

UN resolution on AI

EU – the European Parliament adopts the “AI Act”

California AI kill switch bill proposed

AI Acts of other countries

Other regulations

Possible future of the regulation of AI-generated code

Key points moving forward

Questions that should still be answered

Keep up to date

Summary

Bibliography

7

Security Considerations and Measures

Technical requirements

Understanding the security risks of LLMs

Data privacy and confidentiality

Security risks in LLM-generated code

Implementing security measures for LLM-powered coding

Input sanitization and validation

Secure integration patterns

Monitoring and logging

Version control and traceability

Encryption and data protection

Regular security assessments

Incident response planning

Bonus – training

Who can help here?

Best practices for secure LLM-powered coding

Making the future more secure

Emerging threats

Shifting focus

Summary

Bibliography

Part 3: Explainability, Shareability, and the Future of LLM-Powered Coding

8

Limitations of Coding with LLMs

Technical requirements

Inherent limitations of LLMs

Core limitations

Other limitations to LLMs

Evaluating LLM performance

Overcoming inherent limitations

Challenges in integrating LLMs into coding workflows

Relevant workflow example

Security risks

IP concerns

Dependency management

Explainability

Future research directions to address limitations

Continuous learning

Novel architectures

Computational efficiency

Specialized training

Summary

Bibliography

9

Cultivating Collaboration in LLM-Enhanced Coding

Technical requirements

Why share LLM-generated code?

Benefits of sharing code

Real-world examples

Best practices for code sharing

Documentation

Consistent coding standards

Version control

Code security best practices

Proper attribution

Test the code thoroughly

Continuous improvement

Knowledge management – capturing and sharing expertise

Creating knowledge repositories

Conducting regular knowledge-sharing sessions

Peer mentorship – sharing the wisdom

Making the best use of collaborative platforms

Code review tools

Project management software

Communication channels – keeping the conversation flowing

Summary

Bibliography

10

Expanding the LLM Toolkit for Coders: Beyond LLMs

Technical requirements

Code completion and generation tools

Eclipse’s Content Assist

PyCharm’s code completion

NetBeans’ code completion

VS Code’s IntelliSense

SCA and code review tools

SonarQube

ESLint

PMD

Checkstyle for Java

Fortify Static Code Analyzer

CodeSonar

Coverity

FindBugs/SpotBugs

Bandit

HoundCI

Testing and debugging tools

Jest

Postman

Cypress

Selenium

Mocha

Charles Proxy

Summary

Bibliography

Part 4: Maximizing Your Potential with LLMs: Beyond the Basics

11

Helping Others and Maximizing Your Career with LLMs

Why Mentor Others in LLM-powered coding?

Mentoring in the time of LLMs

The Ripple Effect of Mentorship

Elevating Standards in the Field

Personal Growth Through Mentorship

Supporting a Culture of Continuous Learning

Section Summary

Other Ways to Share Your Expertise and Work

Blogging and Writing Articles

Online Courses

Open-Source Projects

Running Workshops

Social Media and Online Communities

Section Summary

Attend, Build, Network

Speaking Engagements and Workshops

Joining Professional Organizations

Network with Peers and Experts

Building Genuine Relationships

Seeking Mentorship and Offering Support

Section Summary

New Approaches from LLMs

Embracing Collaborative Coding

Latest Developments in LLMs

Section Summary

Summary

Bibliography

12

The Future of LLMs in Software Development

Technical requirements

Emerging trends in LLM technologies

Multimodal LLMs

Human-AI collaboration

Multi-agent systems

Generative business intelligence (Gen BI)

Your wish is my command

Future impacts

Democratization of coding and more

Feedback loop

Harmful AI?

Coming challenges and opportunities

Legal

Politics and government

No jobs for humans?

Scale to the stars, literally

Human directed

Summary

Like my ideas or what to change them?

Bibliography

Index

Other Books You May Enjoy

Part 1: Introduction to LLMs and Their Applications

This section lays the groundwork for understanding Large Language Models (LLMs) and their transformative potential across various fields. It introduces LLMs like ChatGPT, explaining how they work. We will also explore different ways that LLMs are applied across industries, from customer service to content generation. We will also check out the unique capabilities of LLMs in software development.

This section covers the following chapters:

Chapter 1, What is ChatGPT and what are LLMs?Chapter 2, Unleashing the Power of LLMs for Coding: A Paradigm ShiftChapter 3, Code Refactoring, Debugging, and Optimization: A Practical Guide

1

What is ChatGPT and What are LLMs?

The world has been strongly influenced by the recent advancements in AI, especially large language models (LLMs) such as ChatGPT and Gemini (formerly Bard). We’ve witnessed stories such as OpenAI reaching one million users in five days, huge tech company lay-offs, history-revising image scandals, more tech companies getting multi-trillion dollar valuations (Microsoft and NVIDIA), a call for funding of $5–7 trillion for the next stage of technology, and talks of revolutions in how everythingis done!

Yes, these are all because of new AI technologies, especially LLM tech.

LLMs are large in multiple ways: not just large training sets and large training costs but also large impacts on the world!

This book is about harnessing that power effectively, for your benefit, if you are a coder.

Coding has changed, and we must all keep up or else our skills will become redundant or outdated. In this book are tools needed by coders to quickly generate code and do it well, to comment, debug, document, and stay ethical and on the right side of the law.

If you’re a programmer or coder, this is for you. Software, especially AI/machine learning, is changing everything at ever-accelerating rates, so you’ll have to learn this stuff quickly, and then use it to create and understand future technologies.

I don’t want to delay you any longer, so let’s get into the first chapter.

In this chapter, we’ll cover some basics of ChatGPT, Gemini, and other LLMs, where they come from, who develops them, and what the architectures entail. We’ll introduce some organizations that use LLMs and their services. We’ll also briefly touch on some mathematics that go into LLMs. Lastly, we’ll check out some of the competition and applications of LLMs in the field.

This chapter covers the following topics:

Introduction to LLMsOrigins of LLMsEarly LLMsExploring modern LLMsHow transformers workApplications of LLMs

Introduction to LLMs

ChatGPT is an LLM. LLMs can be used to answer questions and generate emails, marketing materials, blogs, video scripts, code, and even books that look a lot like they’ve been written by humans. However, you probably want to know about the technology.

Let’s start with what an LLM is.

LLMs are deep learning models, specifically, transformer networks or just “transformers.” Transformers certainly have transformed our culture!

An LLM is trained on huge amounts of text data, petabytes (thousands of terabytes) of data, and predicts the next word or words. Due to the way LLMs operate, they are not perfect at outputting text; they can give alternative facts, facts that are “hallucinated.”

ChatGPT is, as of the time of writing, the most popular and famous LLM, created and managed by OpenAI. OpenAI is a charity and a capped-profit organization based in San Francisco [OpenAI_LP, OpenAIStructure].

ChatGPT is now widely used for multiple purposes by a huge number of people around the world. Of course, there’s GPT-4 and now GPT-4 Turbo, which are paid, more powerful, and do more things, as well as taking more text in prompts.

It’s called ChatGPT: Chat because that’s what you do with it, it’s a chatbot, and GPT is the technology and stands for generative pre-trained transformer. We will get more into that in the GPT lineage subsection.

A transformer is a type of neural network architecture, and a transformer is the basis of the most successful LLMs today (2024). GPT is a Generative Pre-trained Transformer. Gemini is a transformer [ChatGPT, Gemini, Menon, HuggingFace]. OpenAI’s GPT-4 is a remarkable advancement in the field of AI. This model, which is the fourth iteration of the GPT series, has introduced a new feature: the ability to generate images alongside text. This is a significant leap from its predecessors, which were primarily text-based models.

OpenAI also has an image generation AI, DALL-E, and an AI that can connect images and text and does image recognition, called CLIP (OpenAI_CLIP). The image generation capability of DALL-E is achieved by training the transformer model on image data. This means that the model has been exposed to a vast array of images during its training phase, enabling it to understand and generate visual content [OpenAI_DALL.E].

Furthermore, since images can be sequenced to form videos, DALL.E can also be considered a video generator. This opens up a plethora of possibilities for content creation, ranging from static images to dynamic videos. It’s a testament to the versatility and power of transformer models, and a glimpse into the future of AI capabilities.

In essence, tools from OpenAI are not just text generators but a comprehensive suite of content generators, capable of producing a diverse range of outputs. It’s called being multi-modal. This makes these tools invaluable in numerous applications, from content creation and graphic design to research and development. The evolution from GPT-3 to GPT-4 signifies a major milestone in AI development, pushing the boundaries of what AI models can achieve.

Origins of LLMs

Earlier neural networks with their ability to read sentences and predict the next word could only read one word at a time and were called recurrent neural networks, (RNNs). RNNs attempted to mimic human-like sequential processing of words and sentences but faced challenges in handling long-term dependencies between words and sentences due to very limited memory capacity.

In 1925, the groundwork was laid by Wilhelm Lenz and Ernst Ising with their non-learning Ising model, considered an early RNN architecture [Brush, Gemini].

In 1972, Shun’ichi Amari made this architecture adaptive, paving the way for learning RNNs. This work was later popularized by John Hopfield in 1982 [Amari, Gemini].

Due to this, there has been a fair amount of research to find ways to stretch this memory to include more text to get more context. RNNs are transformers. There are other transformers, including LSTMs, which are long short-term memory neural networks that are based on a more advanced version of RNNs, but we won’t go into that here [Brownlee_LLMs, Gemini]. LSTMs were invented by Hochreiter and Schmidhuber in 1997 [Wiki_LSTM, Hochreiter1997].

There is another network called the convolutional neural network (CNN). Without going into much detail, CNNs are very good at images and lead the world in image recognition and similar jobs. CNNs (or ConvNets) were invented in 1980 by Kunihiko Fukushima and developed by Yann LeCun, but they only really became popular in the 2000s, when GPUs became available. Chellapilla et al. tested the speeds of training CNNs on CPUs and GPUs and found the network trained on GPUs 4.1 times faster [Fukushima1980, LeCun1989, Chellapilla2006]. Sometimes, your inventions take time to bear fruit, but keep inventing! CNNs use many layers or stages to do many different mathematical things to their inputs and try to look at them in different ways: different angles, with detail taken out (dropout layers), pooling nearby regions of each image, zeroing negative numbers, and other tricks.

What was needed was a model with some form of memory to remember and also generate sentences and longer pieces of writing.

In 2017, Ashish Vaswani and others published a paper called Attention Is All You Need, [Vaswani, 2017]. In this important paper, the transformer architecture was proposed based on attention mechanisms. In other words, this model didn’t use recurrence and convolutions, such as RNNs and CNNs. These methods have been very successful and popular AI architectures in their own right.

Compared to RNNs and CNNs, Vaswani’s Transformer performed faster training and allowed for higher parallelizability.

The Transformer was the benchmark for English-to-German translation and established a new state-of-the-art single model in the WMT 2014 English-to-French translation task. It also performed this feat after being trained for a small fraction of the training times of the next best existing models. Indeed, Transformers were a groundbreaking advancement in natural language processing [Vaswani, 2017].

Now that we have covered the origins of LLMs, we will check out some of the earliest LLMs that were created.

Early LLMs

There are many LLMs today and they can be put into a family tree; see Figure 1.1. The figure shows the evolution from word2vec to the most advanced LLMs in 2023: GPT-4 and Gemini [Bard].

Figure 1.1: Family tree of LLMs from word2vec to GPT-4 and Bard, from Yang2023 with permission

So, that’s all of them but, for now, we’ll look at the earlier LLMs that lead to the most advanced technologies today. We’ll start with GPT.

GPT lineage

The development of GPT is a constantly changing and iterative process, with each new model building upon the strengths and weaknesses of its ancestors. The GPT series, initiated by OpenAI, has undergone a great deal of evolution, leading to advancements in natural language processing (NLP) and understanding.

GPT-3, the third iteration, brought a significant leap in terms of size and complexity, with an impressive 175 billion parameters. This allowed it to generate pretty human-like text across a wide range of topics and subjects [Wiki_GPT3, ProjectPro].

As the GPT series progressed, OpenAI continued to refine and enhance the architecture. In subsequent iterations, GPT-4 and GPT-4 Turbo have further pushed back the boundaries of what these LLMs can achieve. The iterative development process focuses on increasing model size and improving fine-tuning capabilities, enabling more nuanced and contextually relevant outputs.

Further to this, there are more modalities, such as GPT-4 with vision and text-to-speech.

GPT model iteration is not solely about scaling up the number of parameters; it also involves addressing the limitations observed in earlier versions. Feedback from user interactions, research findings, and technological advancements contribute to the iterative nature of the GPT series. OpenAI is constantly working to reduce the amount of inaccurate information and incoherent outputs (hallucinations) that its chatbots produce. Also, each iteration of the chatbot takes on board the lessons learned from real-world applications and user feedback.

GPT models are trained and fine-tuned on very large, diverse datasets to make sure the chatbots can adapt to many different contexts, industries, and user requirements. The iterative development approach ensures that later GPT models are better equipped to understand and generate human-like text, making them extremely valuable tools for a huge number of applications, including content creation such as blogs, scripts for videos, and copywriting (writing the text in adverts) as well as conversational agents (chatbots and AI assistants).

The way GPT models are developed iteratively shows OpenAI’s commitment to continuous improvement and innovation in the field of LLMs, allowing even more sophisticated and capable models to be built from these models in the future.

Here are the dates for when the different versions of GPT were launched:

GPT was first launched in June 2018GPT-2 was released in February 2019GPT-3 in 2020GPT-3.5 in 2022/ChatGPT in November 2022

There will be more on the GPT family later, in the GPT-4 /GPT-4 Turbo section.

Here, we will detail the architecture of LLMs and how they operate.

BERT

To comprehend the roots and development of Bidirectional Encoder Representations from Transformers (BERT), we must know more about the intricate and fast-moving landscape of neural networks. Without hyperbole, BERT was a seriously important innovation in NLP, part of the ongoing evolution of AI. BERT was the state of the art for a wide range of NLP tasks in October 2018, when it was released [Gemini]. This included question answering, sentiment analysis, and text summarization.

BERT also paved the way for later R&D of LLMs; it played a pivotal role in LLM development. BERT, being open source, helped to speed up LLM advancement.

BERT takes some of its DNA from RNNs (mentioned in the Origins of LLMs section), the neural nets that loop back on themselves to create a kind of memory, although rather limited memory.

The invention of the first transformer architecture was key to the origin of BERT. The creation of BERT as a bidirectional encoder (these go backward and forward along a sentence) drew inspiration from the transformer’s attention-based mechanism, allowing it to capture contextual relationships between words in both directions within a sentence.

So, BERT’s attention is bidirectional (left-to-right and right-to-left context). At its creation, this was unique, and it enabled BERT to gain a more comprehensive understanding of nuanced language semantics.

While BERT’s foundations are in transformer architecture, its characteristics have evolved with further research and development, though it is not currently in development. Each iteration of BERT refined and expanded its capabilities.

The BERT LLM was a stage of the ongoing innovation in AI. BERT’s ability to understand language bidirectionally, drawing insights from both preceding and succeeding words, is part of the endeavors taken to achieve the creation of an AI with a sufficiently deep awareness of the intricacies of natural language.

Figure 1.2: Architecture of BERT, a bidirectional encoder (reproduced from GeekCultureBERT)

LaMDA

Understanding the ancestry of Language Model for Dialogue Applications (LaMDA) involves tracing the roots of its architectural design and the evolutionary path it followed in the landscape of NLP. LaMDA, like its counterparts, emerges from a family of models that have collectively revolutionized how machines comprehend and generate human-like text.

RNNs, mentioned in this chapter’s first section, play a pivotal role in LaMDA’s family tree.

The breakthrough came with the invention of transformer architectures, and LaMDA owes a significant debt to the transformative Attention Is All You Need paper [Vaswani2017, 2023]. This paper laid the groundwork for a novel approach, moving away from sequential processing to a more parallelized and attention-based mechanism.

The LaMDA LLM inherits its core architecture from the transformer family and was developed by Google. These models learn very well how words in a sentence relate to each other. This allows a transformer to have a richer understanding of language. This change from using traditional processing in sequence was a paradigm shift in NLP, enabling LaMDA to more effectively grasp nuanced interactions and dependencies within texts.

While the origins lie in the transformer architecture, LaMDA’s unique characteristics may have been fine-tuned and evolved through subsequent research and development efforts. LaMDA’s lineage is not just a linear progression but a family tree, a branching exploration of many possibilities, with each iteration refining and expanding its capabilities. In Figure 1.1, LaMDA is near ERNIE 3.0, Gopher, and PaLM on the right of the main, vertical blue branch.

Simply put, LaMDA is a product of ongoing innovation and refinement in the field of AI, standing on the shoulders of earlier models and research breakthroughs. Its ability to comprehend and generate language is deeply rooted in an evolutionary process of learning from vast amounts of text data, mimicking the way humans process and understand language on a grand, digital scale.

LaMDA was launched in May 2021.

LLaMA‘s family tree

LLaMA is the AI brainchild of Meta AI. It might not be one you’ve heard the most about but its lineage holds stories of innovation and evolution, tracing a fascinating path through the history of AI communication.

Like the other chatbot LLMs, LLaMA’s roots are also in transformer architectures. These models rely on intricate attention mechanisms, allowing them to analyze relationships between words, not just their sequence.

Trained on massive datasets of text and code, LLaMA learned to generate basic responses, translate languages, and even write different kinds of creative text formats.

However, like a newborn foal, their capabilities were limited. They stumbled with complex contexts, lacked common sense reasoning, and sometimes sputtered out nonsensical strings.

Yet their potential was undeniable. The ability to learn and adapt from data made them valuable tools for researchers. Meta AI nurtured these nascent models, carefully tweaking their architecture and feeding them richer datasets. They delved deeper into the understanding of human language, acquiring skills such as factual grounding, reasoning, and the ability to engage in multi-turn conversations (Wiki_llama).

The Llama family tree is not a linear progression but, rather, a family of multiple branches of exploration. Different versions explored specific avenues: Code Llama focused on code generation, while Megatron-Turing NLG 530 B was trained on filling in missing words, reading comprehension, and common-sense reasoning, among other things (CodeLlama 2023, Megatron-Turing 2022).

For an idea of how LLaMA fits into the evolutionary tree, see Figure 1.1 at the top left of the vertical blue branch, near Bard (Gemini).

Each experiment, each successful leap forward, contributed valuable DNA to future generations.

Why the name Megatron-Turing NLG 530 B? Megatron because it represents a powerful hardware and software framework. Turing to honor Alan Turing, the first AI researcher, and the originator of AI and ML. NLG stands for natural language generation, and it has 530 billion parameters.

Meta AI continues to shepherd the Llama family, and the future promises more exciting developments.

Llama LLM was launched in February 2023, while Megatron-Turing NLG 530 B was released in January 2022.

Now that we have covered the origins and explored the early stages of LLMs, let us fast-forward and talk about modern LLMs in the next section.

Exploring modern LLMs

After the explosive take-off of ChatGPT in late 2022, with 1 million active users in 5 days and 100 million active users in January 2023 (about 2 months), 2023 was a pretty hot year for LLMs, AI research, and the use of AI in general.

Most tech companies have worked on their own LLMs or transformer models to use and make publicly available. Many companies, organizations, and individuals (students included) have used LLMs for a multitude of tasks. OpenAI keeps updating its GPT family and Google keeps updating its Bard version. Bard became Gemini in February 2024, so all references to Bard have changed to Gemini. Many companies use ChatGPT or GPT-4 as the core of their offering, just creating a wrapper and selling it.

This might change as OpenAI keeps adding modalities (speech, image, etc.) to the GPTs and even a new marketplace platform where users can create and sell their own GPT agents right on OpenAI servers. This was launched in early January 2024 to paid users ($20/month before VAT). We’ll cover some of the latest LLMs that companies have worked on in the following sections.

GPT-4

GPT-4 Turbo, OpenAI’s latest hot chatbot, is another big upgrade. It’s the GPT-4 you know, but on steroids, with 10 times more memory and a newfound understanding of images.

If GPT-4 was a gifted writer, GPT-4 Turbo is a multimedia polymath. It can not only spin captivating stories and poems but also decipher images, paint vivid digital landscapes, and even caption photos with witty remarks. Forget outdated information – Turbo’s knowledge base refreshes constantly, keeping it as sharp as a tack on current events.

But it’s not just about flashy tricks. Turbo is a stickler for facts. It taps into external knowledge bases and employs sophisticated reasoning, ensuring its responses are accurate and reliable. Gone are the days of biased or misleading outputs – Turbo strives for truth and clarity, making it a trustworthy companion for learning and exploration.

The best part? OpenAI isn’t keeping this powerhouse locked away. They’ve crafted an API and developer tools, inviting programmers and innovators to customize Turbo for specific tasks and domains. This democratization of advanced language processing opens doors to a future where everyone, from artists to scientists, can harness the power of language models to create, analyze, and understand the world around them.

GPT-4 Turbo is probably widely considered the pinnacle of technology at the moment, showing us the breathtaking potential of LLMs. It’s not just a language model; it’s a glimpse into a future where machines understand and interact with us like never before. So, buckle up! The future of language is here, and it’s powered by GPT-4 Turbo.

GPT-4 was launched in March 2023 and GPT-4 Turbo in November 2023 (Wiki_GPT4, OpenAI_GPT4Turbo, Gemini).

GPT-4o or GPT-4 omni was released in May 2024, and it can understand multiple formats of data. Omni is faster than previous models and can respond to speech in 0.32 seconds on average, similar to human response times, while Turbo takes about 5.4 seconds to respond in Voice Mode.

This is partially because, while Turbo takes in text, transcribed from the audio by a simple model, and a third model converts the text back into audio response, omni is a single model that understands audio, video, and text. The three models for Turbo are slower than omni and a lot of information is lost to GPT-4 Turbo due to transcription.

GPT-4o is much better than GPT-4 Turbo in non-English human languages.

The Omni API is also half the cost of Turbo (OpenAI-GPT-4o)!

GPT-4o does very well on code generation versus Claude 3 Opus and Gemini 1.5 Pro. Claude is moderate, Gemini is judged to be very good, and GPT-4o is excellent [encord].

GPT-4 architecture

OpenAI has not released details of the architecture and full details of GPT-4, proprietary information for now, but we can piece together elements from similar work.

GPT-4 has 1.75 trillion parameters (1.75 million million) (MotiveX_Gemini).

The vision transformer will likely involve some encoder-decoder architecture: image and video inputs for the encoder, then the decoder will generate output such as text descriptions or captions as well as images (Gemini).

It will have an attention mechanism because “attention is all you need.”

The vision components will probably multi-head to process various aspects of the input simultaneously. There should also be positional encoding, image pro-processing layers, and modality fusion.

Modality fusion is where the vision capabilities are combined with the faculties to process text. From this, it would need to generate a unified understanding of the inputs or the scene given to it.

So, GPT-4 can understand images, and it’s believed that it uses a combination of Vision Transformer (ViT) and Flamingo visual language models.

Figure 1.3 shows the architecture of ViT (reproduced from Wagh).

Figure 1.3: This is what the internal workings of ViT involve (reproduced from Wagh)

So, the inner workings of GPT-4 that handle vision processing likely involve visual transformers as shown in the preceding figure, along with the text processors in the How an LLM processes a sentence subsection.

You can find out more about ViT here: https://github.com/lucidrains/vit-pytorch.

LLaMA-2

The latest official LLaMA, LLaMA-2, is capable of holding complicated conversations, generating various creative text formats, and even adapting its responses to specific user personalities.

OpenLLaMA is an open source version of LLaMA released by Open LM Research (Watson 2023, OpenLMR, Gemini). OpenLLaMA has several versions, each trained on different datasets but the training process was very similar to the original LLaMA. Model weights can be found on the HuggingFace Hub and accessed without the need for any additional permission. The HuggingFace page for Open LLaMA is here: https://huggingface.co/docs/transformers/en/model_doc/open-llama.

OpenLLaMA models serve as benchmarks for LLM research. Their open source nature makes it possible to compare with other models. This is made easier because there are PyTorch and TensorFlow formats available.LLaMA-2 was released in April 2023.OpenLLaMA was released in June 2023.In early 2024, the rumors are that LLaMA-3 will be released this year.

Gemini (formerly Bard)

Google’s Gemini is a chatbot LLM with access to the internet and just requires a Google login. Technically, Gemini is the face and the brain is whatever Google slots in.

Previously, Gemini was powered by PaLM 2.

As of writing (early February 2024), Bard was earlier powered by Gemini. There are three versions of Gemini: Nano, Pro, and Ultra. Nano is for mobile devices. As Bard is powered by Gemini Pro, the name changed to Gemini. There may soon be a paid version.

Gemini was released in March 2023 (Wiki_Gemini).

Gemini has 142.4 million users, 62.6% of which are in the USA (AnswerIQ).

The architecture of Gemini

Gemini is one of the LLMs and AIs developed and used by Google/Alphabet. Let’s take a peek under the hood to understand what makes Gemini tick!

Gemini is trained on a vast library of the world’s books, articles, and internet chatter. 1.56 trillion words are in the Infiniset dataset of Google Gemini; that’s 750 GB of data. Gemini has 137 billion parameters, which are the neural network weights (ChatGPT has 175 billion parameters/weights) (ProjectPro).

In November 2023, Bard got an upgrade and started to be powered by Gemini, a new AI system (SkillLeapAI). Previously, Gemini was powered by LaMDA from March 2023, then PaLM 2 from May 2023.

There are three models, Gemini Nano, Gemini Pro, and Gemini Ultra. As of 19th January 2024, Gemini is powered by Gemini Ultra, which was launched in December 2023.

Figure 1.4 shows the architecture of Gemini (GeminiTeam).

Figure 1.4: Bard/Gemini architecture, from the DeepMind GeminiTeam (GeminiTeam)

Gemini can deal with combinations of text, images, audio, and video inputs, which are represented as different colors here. Outputs can be text and images combined.

The transition to Gemini Ultra signifies a significant leap in Gemini’s capabilities, offering higher performance, greater efficiency, and a wider range of potential applications (Gemini). Bard/Gemini Ultra has a complex architecture that is like a sophisticated language processing factory, with each component playing a crucial role in understanding your questions and crafting the perfect response.

The key component is the transformer decoder, the brain of the operation. It analyzes the incoming text, dissecting each word’s meaning and its connection to others. It’s like a skilled translator, deciphering the message you send and preparing to respond fluently.

The Gemini Ultra multimodal encoder can handle more than just text. Images, audio, and other data types can be processed, providing a richer context for the decoder. This allows Gemini to interpret complex situations, such as describing an image you send or composing music based on your mood.

To polish the decoder’s output, pre-activation and post-activation transformers come into play. These additional layers refine and smoothen the response, ensuring it’s clear, grammatically correct, and reads like natural, human language. With less hallucination, the factual grounding module anchors its responses in the real world. Just like a reliable teacher, it ensures Gemini’s information is accurate and unbiased, grounding its creativity in a strong foundation of truth. Beyond basic understanding, Gemini Ultra also has reasoning abilities. It can answer complex questions, draw logical conclusions, and even solve problems.

The implementation that is Gemini also has a little link to Google to help users to fact-check its responses. At the bottom of the output, above the input window, Google enables you to double-check its response.

Figure 1.5: Gemini’s Google search button to fact-check the output it gives you

Click this and it says Google search and outputs some search results and a guide to what you’re seeing.

Figure 1.6: Google search based on its output

Figure 1.7 shows what the highlighting means.

Figure 1.7: Understanding the results of the Google search to help fact-check

On your Gemini screen, you’ll see various passages highlighted in brown or green. The green-highlighted text has results agreeing, the brown-highlighted text doesn’t agree with the sources, and no highlight means not enough information to confirm.

This is just a simplified glimpse into Gemini Ultra’s architecture and functioning. With its massive parameter count, self-attention mechanisms, and fine-tuning capabilities, it’s a constantly evolving language maestro, pushing the boundaries of what LLMs can achieve.

Amazon Olympus

Amazon has developed an enormous new LLM. It’s a hulking beast, dwarfing even OpenAI’s GPT-4 in sheer size. But this isn’t just a power contest. Olympus aims for something more: a significant leap in coherence, reasoning, and factual accuracy. Their chatbot, Metis is powered by Olympus: https://happyfutureai.com/amazons-metis-a-new-ai-chatbot-powered-by-olympus-llm/.

With no half-baked ideas, Olympus digs deep, thinks logically, and double-checks its facts before uttering a word. Amazon is purportedly working to reduce bias and misinformation. This LLM strives for high levels of wisdom and reliability.

It’s not just about bragging rights for Amazon. Olympus represents a potential turning point for language models.

The aim is to be able to tackle complex tasks with pinpoint accuracy, grasp subtle nuances of meaning, and engage in intelligent, fact-based conversations with other AI.

Olympus will, hopefully, be a more thoughtful companion capable of deeper understanding and insightful exchange.

Olympus may not be ready to join your book club just yet, but its story is worth watching. Hopefully, Olympus will be a needed advancement for LLMs and not hallucinate, only producing truth and changing what LLMs can do.

Amazon Olympus should have around two trillion parameters (weights and biases) (Life_Achritecture).

Amazon Olympus is expected in the second half of 2024 but not much information has come out since November 2023.

Now that we have introduced many of the modern LLMs, let’s look at how they work, including using an example piece of text.

How Transformers work

Moving on to the general transformers, Figure 1.8 shows the structure of a Transformer:

Figure 1.8: Architecture of a Transformer: an encoder for the inputs and a decoder for the outputs (reproduced from Zahere)

You can see that it has an encoder and a decoder. The encoder learns the patterns in the data and the decoder tries to recreate them.

The encoder has multiple neural network layers. In transformers, each layer uses self-attention, allowing the encoder to understand how the different parts of the sentence fit together and understand the context.

Here is a quick version of the transformer process:

Encoder network:

Uses multiple layers of neural networks.

Each layer employs self-attention to understand relationships between sentence parts and context.

Creates a compressed representation of the input.

Decoder network:

Utilizes the encoder’s representation for generating new outputs.

Employs multiple layers with cross-attention for information exchange with the encoder.

Generates meaningful outputs such as translations, summaries, or answers based on input.

Encoder-decoder partnership:

Combined, they power the transformer for various tasks with high accuracy and flexibility.

For example, Microsoft Bing leverages GPT-4, a transformer model, to understand user intent and context beyond keywords for delivering relevant search results.

Beyond keywords:

Bing transforms from a search engine to an AI-powered copilot using GPT-4.

It interprets questions and requests by analyzing context and intent, not just keywords.

For example, instead of only providing ingredient lists, it recommends personalized recipes considering dietary needs and skill levels.

From links to understanding:

Bing evolves beyond finding links to comprehending user needs and delivering relevant, helpful information.

Next is the detailed version of the Transformer process.

How an LLM processes a piece of text

The encoder produces a compressed representation of the input. This allows the decoder to not only consider its own outputs but also look back at the encoder’s representation, which contains a representation of the whole input sequence for guidance. This is used by the decoder for each step of its output generation.

The decoder uses output from the encoder to generate a new output sequence. Because of Transformers, modern LLMs can hold entire sentences or paragraphs in their attention, not just one word at a time like RNNs.

Again, this section has lots of layers but, this time, there is cross-attention.

This back-and-forth conversation between the decoder and the encoder’s compressed knowledge empowers the decoder to generate meaningful and relevant outputs, such as translating a sentence to another language, summarizing a paragraph, or answering a question based on the input.

Together, the encoder and decoder form the powerhouse of the transformer, enabling it to perform a wide range of tasks with remarkable accuracy and flexibility.

Microsoft’s Bing search engine uses GPT-4 to deliver more relevant search results, understanding your intent and context beyond just keywords.

Bing has gone from a search engine to an AI-powered copilot with the help of GPT-4. This powerful language model acts as Bing’s brain, understanding your questions and requests not just through keywords, but by analyzing the context and intent.

You can, for example, ask for a recipe instead of just ingredients; GPT-4 scours the web, considers your dietary needs and skill level, and then presents a personalized selection. It’s like having a knowledgeable friend helping you navigate the vast ocean of information. So, Bing isn’t just about finding links anymore; it’s about understanding what you truly need and delivering it in a way that’s relevant and helpful (https://www.bing.com/).

The whole process of getting a paragraph into an LLM goes like this:

CleaningTokenizationWord-to-number conversion (words given indices: 1, 2, 3, 4…)Numbers are turned into vectorsContextual embeddingContext vectors are formedAttention vectors are formed and fed into final blocksSubsequent words are predicted

(ChatGPT, Gemini, Panuganty, Aakanksha).

With this framework in your subconscious, we can go