Conversational AI with Rasa - Xiaoquan Kong - E-Book

Conversational AI with Rasa E-Book

Xiaoquan Kong

0,0
31,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The Rasa framework enables developers to create industrial-strength chatbots using state-of-the-art natural language processing (NLP) and machine learning technologies quickly, all in open source.
Conversational AI with Rasa starts by showing you how the two main components at the heart of Rasa work – Rasa NLU (natural language understanding) and Rasa Core. You'll then learn how to build, configure, train, and serve different types of chatbots from scratch by using the Rasa ecosystem. As you advance, you'll use form-based dialogue management, work with the response selector for chitchat and FAQ-like dialogs, make use of knowledge base actions to answer questions for dynamic queries, and much more. Furthermore, you'll understand how to customize the Rasa framework, use conversation-driven development patterns and tools to develop chatbots, explore what your bot can do, and easily fix any mistakes it makes by using interactive learning. Finally, you'll get to grips with deploying the Rasa system to a production environment with high performance and high scalability and cover best practices for building an efficient and robust chat system.
By the end of this book, you'll be able to build and deploy your own chatbots using Rasa, addressing the common pain points encountered in the chatbot life cycle.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 275

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Conversational AI with Rasa

Build, test, and deploy AI-powered, enterprise-grade virtual assistants and chatbots

Xiaoquan Kong

Guan Wang

BIRMINGHAM—MUMBAI

Conversational AI with Rasa

Copyright © 2021 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Devika Battike

Senior Editor: Mohammed Yusuf Imaratwale

Content Development Editor: Nazia Shaikh

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Aparna Ravikumar Nair

Proofreader: Safis Editing

Indexer: Sejal Dsilva

Production Designer: Joshua Misquitta

First published: October 2021

Production reference: 1260821

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80107-705-7

www.packt.com

To my parents, for their unwavering devotion. To my wife, for her support behind the scenes. In addition, thanks to Google for providing the Google Cloud credits to support this work.

– Xiaoquan Kong

To my mom and dad. To my wife and kids. Thank you!

– Guan Wang

Foreword

Conversational AI combines ideas from linguistics, human-computer interaction, artificial intelligence, and machine learning to develop voice and chat assistants for a near-infinite set of use cases. Since 2016 there has been a surge in interest in this field, driven by the widespread adoption of mobile chat applications. The coronavirus pandemic accelerated this trend, with almost all one-on-one interactions becoming digital.

2016 was also the year Rasa was first released and we saw the first community contributions come in on GitHub. Open source communities live and die by their users and contributors, and this is doubly true for Rasa, where our global community builds assistants in hundreds of human languages. Xiaoquan Kong and Guan Wang have been leading members of our community for years and I am grateful for their many contributions. Not least Xiaoquan's efforts to ensure Rasa has robust support for building assistants in Mandarin. I've been eagerly awaiting the publication of this book.

Conversational AI with Rasa covers precisely the topics required to become proficient at building real-world applications with Rasa. Aside from covering the fundamentals of natural language understanding and dialogue management, the book emphasizes the real-world context of building great products. In the first chapter, you are challenged to think whether a conversational experience is even the right one to build. The book also covers the essential process of Conversation-Driven Development, without which many assistants get built but fail to serve their intended users. Additionally, readers are taught practical skills like debugging an assistant, writing tests, and deploying an assistant to production.

This book will be of great use for anyone starting out as a Rasa developer, and I'm sure many existing Rasa developers will discover things they didn't know.

Alan Nichol

Co-founder and CTO, Rasa

Contributors

About the authors

Xiaoquan Kong is a machine learning expert specializing in NLP applications. He has extensive experience of leading teams to build NLP platforms for several Fortune Global 500 companies. He is a Google Developer Expert in machine learning and has been actively involved in contributing to TensorFlow for many years. He also has actively contributed to the development of the Rasa framework since the early stages and became a Rasa Superhero in 2018. He manages the Rasa Chinese community and has also participated in the Chinese localization of TensorFlow documents as a technical reviewer.Guan Wang is currently working on Al applications and research for the insurance industry. Prior to that, he worked as a machine learning researcher for several industry Al labs. He was raised and educated in mainland China and lived in Hong Kong for 10 years before relocating to Singapore in 2020. Guan holds BSc degrees in physics and computer science from Peking University, and an. MPhil degree in physics from HKUST. Guan is an active tech blogger and community contributor to open source projects including Rasa, receiving more than 10,000 stars for his own projects on GitHub.

About the reviewers

Harin Joshi's journey in chabot development started with an internship at ImpactGuru, India's fourth largest crowdfunding platform. There he developed two chatbots and a machine learning module. He was awarded Intern of the Month for this. Thereafter, he associated with the Co-learning Lounge AI community and developed a chatbot as educational content. Currently, he is working for the QuickGHY start-up as a chatbot developer.

I would like to thank my parents for always being there no matter what. Moreover, I am very grateful to have friends, who stood strong when I needed them at different stages of my life. And lastly, I would thank all the readers of this book: you are definitely going to learn a lot about Rasa and its functionalities.

Pratik Kotian is an conversational AI engineer with 5 years of experience in building conversational AI agents and designing products related to conversational design. He is working as a machine learning engineer (specializing in conversational AI) at Quantiphi, which is an AI company and recognized Google Partner. He has also worked with Packt on reviewing The Tensorflow Workshop.

I would like to thank my family and friends, who are always supportive and have always believed in me and my talents. It's because of them that I am doing well in my career and helping others to build great conversational bots.

Table of Contents

Preface

Section 1: The Rasa Framework

Chapter 1: Introduction to Chatbots and the Rasa Framework

Technical requirements

What is ML?

Supervised learning (SL)5

Stages of machine learning7

Performance metrics7

Overfitting and underfitting9

Transfer learning (TL)10

Introduction to Natural Language Processing (NLP)

Evolution of modern NLP11

Basic tasks of NLP14

Chatbot basics

Is a chatbot really necessary?16

Introduction to chatbot architecture17

Introduction to the Rasa framework

Why Rasa?25

System architecture26

Installing Rasa27

The pipeline of a Rasa project27

Rasa command line28

Creating a sample project29

Summary

Further reading

Chapter 2: Natural Language Understanding in Rasa

Technical requirements

The format of NLU training data

The intent field – storing NLU samples34

The synonym field – storing synonyms and aliases35

The lookup field – providing extra features by using lookup tables36

The regex field – providing extra features by using regular expressions37

Using regex and lookup38

Overview of Rasa NLU components

Language model components40

Tokenizer components40

Featurizer components41

Entity extraction components42

Intent classifier components43

Handling frequently asked questions by using a response selector44

Configuring your Rasa NLU via a pipeline

What is a pipeline?44

Configuring a pipeline44

The output of Rasa NLU

The intent field – the purpose of the user's utterance48

The entities field – key parameters of user's utterance49

Other possible fields49

Training and running Rasa NLU

Training our models51

Testing models from the command line52

Starting the Rasa NLU service53

Practice – building the NLU part of a medical bot

What are the features of our bot?54

How can we implement our bot in Rasa?55

Summary

Chapter 3: Rasa Core

Technical requirements

Understanding the universe of your bot (domain)

Intents and entities63

Slots64

All possible actions the bot can take (actions)64

All the predefined replies to users (responses)64

Configuring sessions66

Training data for dialogue management (stories)

User messages68

Bot actions and events69

Auxiliary features (checkpoints and OR statements)70

Data augmentation (creating longer stories automatically)72

Reacting to user input (action)

Response actions72

Form actions72

Built-in actions73

Custom actions73

Understanding the memory of your bot (slots)

The influences of slots on the conversation75

Slot types75

Automatic slot filling76

Setting initial values for slots76

Understanding the decision-maker of your bot (policies)

Configuring policies77

Built-in policies77

Policy priority78

Connecting with other services via endpoints

Building custom actions using Rasa SDK

Installing the Rasa SDK package80

Writing custom actions80

Tracker objects (tracking the states of conversations)81

Event objects (records for changes in conversations)82

Running custom actions83

Using channels to communicate with instant messaging software

Building a tell-the-time bot

Defining the features that our bot should provide85

How can we implement those features?87

Training models, serving models, and making inferences93

Summary

Section 2: Rasa in Action

Chapter 4: Handling Business Logic

Technical requirements

The fallback mechanism in Rasa

Handling fallback in NLU100

Handling fallback in policy101

Making intents trigger actions

Triggering actions by using built-in intents102

Triggering actions by using custom intents102

Using forms to complete tasks

Defining a form103

Activating a form104

Executing a form task105

Practice – building a weather forecast chatbot

Designing the features of this bot105

Implementing the bot step by step106

Training models via the command line113

Running the dialogue system113

Extending this project114

Summary

Chapter 5: Working with Response Selector to Handle Chitchat and FAQs

Technical requirements

Defining retrieval intents – the questions users want to ask

Defining responses – the answers to the questions

Updating the configuration to use ResponseSelector

Learning by doing – building an FAQ bot

What are the features of our bot?119

How can we implement it?120

Summary

Chapter 6: Knowledge Base Actions to Handle Question Answering

Technical requirements

Why do we need knowledge base actions?

How do you use knowledge base actions?

Creating a knowledge base130

Creating a custom knowledge base action131

Defining NLU data and stories to perform queries from users132

How do knowledge base actions work?134

How do you customize knowledge base actions?

Modifying ActionQueryKnowledgeBase to customize the behavior138

Customizing InMemoryKnowledgeBase139

Building your own knowledge base140

Learning by doing – building a knowledge-based music query chatbot

What are the features of our bot?141

How do we implement the bot?143

Supporting the Neo4j knowledge base152

Summary

Chapter 7: Entity Roles and Groups for Complex Named Entity Recognition

Technical requirements

Why do we need entity roles and entity groups?

Using entity roles to distinguish semantics roles in entities of the same type

Using entity groups to divide entities into groups

Configuring Rasa to use entity roles and groups

Updating the entities setting for roles and groups161

Updating forms and stories for roles and groups161

Components supporting entity roles and entity groups163

Learning by doing – building a ticket and drink booking bot

What are the features of our bot?163

How can we implement it?164

Summary

Chapter 8: Working Principles and Customization of Rasa

Understanding Rasa's NLU module

How does the NLU training work?172

How does NLU inference work?173

Understanding how Rasa policies work

Converting trackers to training data175

How does policy training work?179

How does policy inference work?179

Writing Rasa extensions

Writing pipeline and policy extensions180

Writing custom slot types182

Writing extensions for other functionalities183

Practice – Creating your own custom English tokenizer

Summary

Section 3: Best Practices

Chapter 9: Testing and Production Deployment

Testing Rasa projects

Validating data and stories190

Evaluating the NLU performance190

Evaluating Dialogue management performance194

Deploying your Rasa assistant to production

When to deploy196

Deployment options196

Model storage196

Tracker stores198

Lock stores199

High-performance settings for Rasa servers and action servers200

Summary

Chapter 10: Conversation-Driven Development and Interactive Learning

Introduction to CDD

Introduction to Rasa X

Installing Rasa X203

Using Rasa X204

Performing interactive learning209

Saving the interactive learning data and exiting213

Summary

Chapter 11: Debugging, Optimization, and Community Ecosystem

Debugging Rasa systems

Wrong prediction of results216

Code errors218

Optimizing Rasa systems

Understanding the community ecosystem of Rasa

Data generation tool – Chatito226

Data generation tool – Chatette227

Data labeling tool – Doccano228

Language-specific libraries229

Summary

Other Books You May Enjoy

Preface

The Rasa framework enables developers to create industrial-strength chatbots using state-of-the-art natural language processing (NLP) and machine learning technologies quickly, all in open source.

Conversational AI with Rasa starts by showing you how the two main components at the heart of Rasa work – Rasa NLU and Rasa Core. You’ll then learn how to build, configure, train, and serve different types of chatbots from scratch by using the Rasa ecosystem. As you advance, you’ll use form-based dialogue management, work with the response selector for chitchat and FAQ-like dialogues, make use of knowledge base actions to answer questions for dynamic queries, and more. Furthermore, you’ll understand how to customize the Rasa framework, use conversation-driven development patterns and tools to develop chatbots, explore what your bot can do, and easily fix any mistakes it makes by using interactive learning. Finally, you’ll get to grips with deploying the Rasa system to a production environment with high performance and high scalability and cover best practices for building an efficient and robust chat system.

By the end of this book, you’ll be able to build and deploy your own chatbots using Rasa, addressing the common pain points encountered in the chatbot life cycle.

Who this book is for

This book is for NLP professionals and machine learning and deep learning practitioners who have knowledge of NLP and want to build chatbots with Rasa. Anyone with beginner-level knowledge of NLP and deep learning will be able to get the most out of the book.

What this book covers

Chapter 1, Introduction to Chatbots and the Rasa Framework, introduces all the fundamental knowledge pertaining to chatbots and the Rasa framework, including machine learning, NLP, chatbots, and Rasa Basic.

Chapter 2, Natural Language Understanding in Rasa, covers Rasa NLU’s architecture, configuration methods, and how to train and infer.

Chapter 3, Rasa Core, introduces how to implement dialogue management in Rasa.

Chapter 4, Handling Business Logic, explains how Rasa gives developers great flexibility in handling different business logic. This chapter introduces how we can use these features to handle complex business logic more elegantly and efficiently.

Chapter 5, Working with Response Selector to Handle Chitchat and FAQs, explains how to define questions and their corresponding answers and how to configure Rasa to automatically identify the query and give the corresponding answer.

Chapter 6, Knowledge Base Actions to Handle Question Answering, describes how to create a knowledge base that will be used to answer questions. You will also learn to customize knowledge base actions, learn how referential resolution (mapping mention to object) works, and how to create your own knowledge base.

Chapter 7, Entity Roles and Groups for Complex Named Entity Recognition, explains how entity roles and entity groups solve the complex NER problem, and how to define training data, configure pipelines, and write stories for entity roles and entity groups.

Chapter 8, Working Principles and Customization of Rasa, introduces the working principles behind Rasa and how we can extend and customize Rasa.

Chapter 9, Testing and Production Deployment, explains how to test Rasa applications and how to deploy Rasa applications in production environments.

Chapter 10, Conversation-Driven Development and Interactive Learning, introduces conversation-driven development and Rasa X to develop chatbots more effectively. We will also introduce how to use interactive learning to quickly find and fix problems.

Chapter 11, Debugging, Optimization, and Community Ecosystem, explains how to debug and optimize Rasa applications. We will also introduce some tools to help developers build chatbots effectively.

To get the most out of this book

You will need a version of Rasa 2.x installed on your computer—the latest version if possible. All code examples have been tested using Rasa 2.8.1 on Ubuntu 20.04 LTS. However, they should work with future version releases, too.

You should install Rasa with the following command: pip install rasa[transformers]. This command will install the transformers library, which provides the components we need in the code.

You will also need to install the pyowm Python package to run the code present in Chapter 4, Handling Business Logic. You will also need to install Docker and the neo4j Python package 4.1 to run the code of the custom knowledge base part in Chapter 6, Knowledge Base Actions to Handle Question Answering.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section).

The versions of Rasa change quickly, and the related knowledge base and documents are also rapidly updated. We recommend that you frequently read Rasa’s documentation to understand the changes.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Conversational-AI-with-RASA. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781801077057_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "The following example demonstrates post-mortem debugging using the pdb command."

A block of code is set as follows:

version: "2.0"

language: en

pipeline:

- name: WhitespaceTokenizer

- name: LanguageModelFeaturizer

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

WebChat.default.init({ selector: "#webchat", initPayload: "Hello",

Any command-line input or output is written as follows:

python -m pdb -c continue <XXX>/rasa/__main__.py train

Bold: Indicates a new term, an important word, or words that you see on screen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "Click on the Cancel button."

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share your thoughts

Once you've read Conversational AI with Rasa, we'd love to hear your thoughts! Please https://packt.link/r/1801077053 for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

Section 1: The Rasa Framework

In this section, you will learn about the core concepts of machine learning, natural language processing, dialogue systems, and Rasa. All these foundational concepts will prepare you for subsequent learning.

This section comprises the following chapters:

Chapter 1, Introduction to Chatbots and the Rasa FrameworkChapter 2, Natural Language Understanding in RasaChapter 3, Rasa Core

Chapter 1: Introduction to Chatbots and the Rasa Framework

In this first chapter, we will introduce chatbots and the Rasa framework. Knowledge of these is important because they will be used in later chapters. We will split that fundamental knowledge into four pieces, of which the first three are machine learning (ML), natural language processing (NLP), and chatbots. This is the theory and concept part of the fundamentals. With these in place, you will know in theory how to build a chatbot.

The last piece is Rasa basics. We will introduce the key technology of this book: the Rasa framework and its basic usage.

In particular, we will cover the following topics:

What is ML?Introduction to NLPChatbot basicsIntroduction to the Rasa framework

Technical requirements

Rasa is a Python-based framework. To install it, you need a Python developer environment, which can be downloaded from https://python.org/downloads/. At the time of writing this chapter, Rasa only supports Python 3.6, 3.7, and 3.8, so please be careful to choose the correct Python version when you set up the developing environment.

You can find all the code for this chapter in the ch01 directory of the GitHub repository, at https://github.com/PacktPublishing/Conversational-AI-with-RASA.

What is ML?

ML and artificial intelligence (AI) have almost become buzzwords in recent years. Everyone must have heard about AI in the news after AlphaGo from Google beat the best Go player in the world. There is no doubt that ML is now one of the most popular and advanced areas of research and applications. So, what exactly is ML?

Let's imagine that we are building an application to automatically recognize rock/paper/scissors based on video inputs from a camera. The hand gesture from the user will be recognized by the computer as one of rock/paper/scissors.

Let's look at the differences between ML and traditional programming in solving this problem.

In traditional programming, the working process usually goes like this:

Software development: Product managers and software engineers work together to understand business requirements and transform them into detailed business rules. Then, software engineers write the code to transform business rules into computer programs. This stage is shown as process 1 in the following diagram. Software usage: Computer software transforms users' input to output. This stage is shown as process 2 in the following diagram:

Figure 1.1 – Traditional programming working pattern

Let's go back to our rock/paper/scissors application. If we use a traditional programming methodology, it will be very difficult to recognize the position of hands and boundaries of the fingers, not to mention that even the same gesture can evolve into many different representations, including the position of the hand, different sizes and shapes of hands and fingers, different skin colors, and so on. In order to solve all these problems, the source code will be very cumbersome, the logic will become very complicated, and it will become almost impossible to maintain and update the solution. In reality, probably no one can accomplish their target with traditional programming methodology.

On the other hand, in ML, the working process usually follows this pattern:

Software development: The ML algorithm infers hidden business rules by learning from training data and encodes the business rules into models with lots of weight parameters. Process 1 in the following diagram shows the data flow.Software usage: The model transforms users' input to output. In the following diagram, process 2 corresponds to this stage:

Figure 1.2 – Programming working pattern driven by ML

There are a few types of ML algorithms: supervised learning (SL), unsupervised learning (UL), and reinforcement learning (RL). In NLP, the most useful and most common algorithms belong to SL, so let's focus on this learning algorithm.

Supervised learning (SL)

An SL algorithm builds a mathematical model of a set of data that contains both the inputs (x) and the expected outputs (y). The algorithm's input data is also known as training data, composed of a set of training examples. The SL algorithm learns a function or a mapping from inputs to outputs of training data. Such a function or mapping is called a model. A model can be used to predict outputs associated with new inputs.

The algorithm used for our rock/paper/scissors application is an SL algorithm. More specifically, this is a classification task. Classification is a task that requires algorithms to learn how to assign (limited) class labels to examples—for example, classifying emails as "spam" or "non-spam" is a classification task. More specifically, it divides data into two categories, so it is a binary classification task. The rock/paper/scissors application in this example divides the picture into three categories, so, to be more specific, it belongs to a multi-class classification task. The opposite of a classification task is a regression task, which predicts a continuous quantity output for each example—for example, predicting future house prices in a certain area is a regression task.

Our application's training data contains the data (the image) and a label (one of rock/paper/scissors), which are the input and output (I/O) of the SL algorithm. The data consists of many pictures. As the example in the following screenshot shows, each picture is simply a big matrix of pixel values for the algorithm to consume, and the label of the picture is rock or paper or scissors for the hand gesture in the picture:

Figure 1.3 – Data and label

Now we understand what an SL algorithm is, in the next section, we will cover the general process of ML.

Stages of machine learning

There are three basic stages of applying ML algorithms: training, inference, and evaluation. Let's look at these stages in more detail here:

Training stage: The training stage is when the algorithms learn knowledge or business rules from training data. As shown in process 1 in Figure 1.2, the input of the training stage is training data, and the output of the training stage is the model.Inference stage: The inference stage is when we use a model to compute the output label of a new input data. The input of this stage is the new input data without labels, and the output is the most likely label.Evaluation stage: In a serious application, we always want to know how good a model is before we use it in production. This is a stage called evaluation. The evaluation stage will measure the model's performance in various ways and can help users to compare models.

In the next section, we will introduce how to measure model performance.

Performance metrics

In NLP, most problems can be viewed as classification problems. A key concept in classification performance is a confusion matrix, on which almost all other performance metrics are based.

A confusion matrix is a table of the model predictions versus the ground-truth labels.

Let me give you a specific example. Assume we are building a binary classification to classify whether an image is a cat image or not. When the image is a cat image, we call it a positive. Remember—we are building an application to detect cats, so a cat image is a positive result for our system, and if it is not a cat image (in our case, it's a dog image), we call it a negative. Our test data has 10 images. The real label of test data is listed as follows, where the cat image represents a cat and the dog image represents a dog:

Figure 1.4 – The real label of test data

The prediction result of our model is shown here:

Figure 1.5 – The prediction result of our model on test data

The confusion matrix of our case would look like this:

Figure 1.6 – The confusion matrix of our case

In this confusion matrix, there are five cat images, and the model predicts that one of them is a dog. This is an error, and we call it a false negative (FN) because the model says it is a negative result, but that is actually incorrect. And in the five dog images, the model predicts that two of these are cats. This is another error, and we call it a false positive (FP) because the model says it is a positive result but it's actually incorrect. All correct predictions belong to one of two cases: cats-to-cats prediction, which we call a true positive (TP), and dogs-to-dogs prediction, which we call a true negative (TN).

So, the preceding confusion matrix can be viewed as an instance of the following abstract confusion matrix:

Figure 1.7 – The confusion matrix in abstract terms

Many important performance metrics are derived from a confusion matrix. Here, we will introduce some of the most important ones, as follows:

Accuracy (ACC):Recall:Precision:F1 score:

Among the preceding metrics, the F1 score is the combined advantage of recall and precision, so it is the most commonly used metric for now.

In the next section, we will talk about the root cause of poor performance (the performance metrics being low): overfitting and underfitting.

Overfitting and underfitting

Generally speaking, there are two types of errors found in ML models: overfitting and underfitting.

When a model performs poorly on the training data, we call it underfitting. Common reasons that can lead to underfitting include the following:

The algorithm is too simple. It does not have enough power to capture the complexity of the training data. For algorithms based on neural networks, there are too few hidden layers.The network architecture or features used for training is not suitable for the task—for example, models based on bag-of-words (BoW) are not suitable for complex NLP tasks. In these tasks, the order of words is critical, but a BoW model completely discards this information.Training a model for too few epochs (a full training pass over the entire training data so that each example has been seen once) or at too low a learning rate (a scalar used to train a model via gradient descent, which can determine the degree of weight changes).Using a too-high regularization rate (a scale used to indicate the penalty degree on a model's complexity; the penalty can reduce the power of fitting) to train a model.

When a model performs very well on the training data but performs poorly on new data that it has never seen before, we call this overfitting. Overfitting means the algorithm has the ability to fit the training data well, but it does not generalize well to samples that are not in the training data. Generalization is the most important key feature of ML. It means that algorithms learn some key concepts from training data rather than just simply remembering them. When overfitting happens, it shows that the model is more likely to remember what it saw in training than learn from it, so it performs very well on the training data, but since it does not see the new data before and does not learn the concept well, it thus performs poorly on the new data. ML scientists have already developed various methods against overfitting, such as adding more training data, regularization, dropout, and stopping early.

In the next section, we will introduce TL, which is very useful when the training data is insufficient (this is a common situation).

Transfer learning (TL)

TL is a method where a model can use knowledge from another model for another task.

TL is popular in the chatbot domain. There are many reasons for this, and some of them are listed here:

TL needs less training data: In a chatbot domain, there usually is not much training data. When using a traditional ML method to train a model, it usually does not perform well due to a lack of training data. With TL, we can achieve much better performance on the same amount of training data. The less data you have, the more performance increase you can get.  TL makes training faster: TL only needs a few training epochs to fine-tune a model for a new task. Generally, it is much faster than the traditional ML method and makes the whole development process more efficient.

Now we understand what ML is, in the next section, we will cover the basics of NLP.

Introduction to Natural Language Processing (NLP)

NLP is a subfield of linguistics and ML, concerned with interactions between computers and humans via text or speech.

Let's start with a brief history of NLP.

Evolution of modern NLP

Before 2013, there was no unified method for NLP. This was because two problems had not been solved well.

The first problem relates to how we represent textual information during the computing process.

Time-series data such as voices can be represented as signals and waves. Image information gives pixel position and pixel value. However, there were no intuitive ways to digitalize text. There were some preliminary methods such as one-hot encoding to represent each word or phrase and use BoW to represent sentences and paragraphs, but it became quite obvious that this was not the perfect way to deal with this.

After one-hot encoding, the dimension of each vector will be the size of the entire vocabulary, with all 0 values except one value of 1, to represent the position of that word. Such sparse vectors waste a lot of space and, in the meantime, give no indication of the semantic meaning of the word itself—every pair of two different words will always be orthogonal to each other.

A BoW model simply counts the frequency of each word that appears in the text and ignores the dependency and order of the words in the context.

The second problem relates to how we can build models for text.