Machine Learning for Emotion Analysis in Python - Allan Ramsay - E-Book

Machine Learning for Emotion Analysis in Python E-Book

Allan Ramsay

0,0
35,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Artificial intelligence and machine learning are the technologies of the future, and this is the perfect time to tap into their potential and add value to your business. Machine Learning for Emotion Analysis in Python helps you employ these cutting-edge technologies in your customer feedback system and in turn grow your business exponentially.

With this book, you’ll take your foundational data science skills and grow them in the exciting realm of emotion analysis. By following a practical approach, you’ll turn customer feedback into meaningful insights assisting you in making smart and data-driven business decisions.

The book will help you understand how to preprocess data, build a serviceable dataset, and ensure top-notch data quality. Once you’re set up for success, you’ll explore complex ML techniques, uncovering the concepts of deep neural networks, support vector machines, conditional probabilities, and more. Finally, you’ll acquire practical knowledge using in-depth use cases showing how the experimental results can be transformed into real-life examples and how emotion mining can help track short- and long-term changes in public opinion.

By the end of this book, you’ll be well-equipped to use emotion mining and analysis to drive business decisions.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 531

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Machine Learning for Emotion Analysis in Python

Build AI-powered tools for analyzing emotion using natural language processing and machine learning

Allan Ramsay

Tariq Ahmad

BIRMINGHAM—MUMBAI

Machine Learning for Emotion Analysis in Python

Copyright © 2023 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Gebin George

Publishing Product Manager: Dinesh Chaudhary, Tejashwini R

Book Project Manager: Kirti Pisat

Content Development Editor: Manikandan Kurup

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Proofreader: Safis Editing

Indexer: Tejal Daruwale Soni

Production Designer: Prafulla Nikalje

DevRel Marketing Coordinator: Vinishka Kalra

First published: September 2023

Production reference: 1310823

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB

ISBN 978-1-80324-068-8

www.packtpub.com

Less than a decade ago, I was a software developer immersed in code, seeking an intellectual challenge. Little did I know that I would soon embark on a journey leading me down a path of exploration, growth, and self-discovery. And now, barely a blink of an eye later, here I am writing the Dedication section of my first book.

– Tariq Ahmad

Contributors

About the authors

Allan Ramsay is an emeritus professor of formal linguistics in the School of Computer Science at the University of Manchester, having previously been a professor of artificial intelligence at University College Dublin. He has published over 140 books and research articles on all aspects of natural language processing, from using deep neural nets for identifying accents in spoken Arabic to using higher-order theorem proving to reason about people’s intentions when they make jokes and use sarcasm. Much of his research has come from problems posed by Ph.D. students – it is easy to carry out research in areas you are familiar with, but taking on ideas suggested by Ph.D. students broadens your view of the discipline and enriches your understanding of areas you thought you understood.

I am grateful to Tariq Ahmad for suggesting that emotion mining might be an interesting area to investigate. This book is the result of our collaboration in this area!

Tariq Ahmad has been working in the IT industry since 1994. He specializes in .NET and Python and has worked for KPMG and Sybase. He lives in England and currently works for a leading consulting company, helping clients understand and harness the power of NLP by combining his knowledge of NLP algorithms, techniques, and tools with his strong consulting skills to guide clients through the complex landscape of NLP. Prior to this, Tariq was a senior developer for a company that specialized in providing software solutions and services to the public sector.

First and foremost, I am eternally grateful to the Almighty. Thanks also to Professor Ramsay, Packt, my family, friends, and mentors, and all the people who, in one way or another, contributed to the completion of this work.

About the reviewers

Muhammed Sahal is a doctor of pharmacy by education, has been active in the open source realm, and currently serves as a lead machine learning engineer for the community. He has worked on various use cases in healthcare. He is also an incoming candidate for data science at IIT Madras and he believes this will improve the impact he can make.

Anna Astori holds a master’s degree in computational linguistics and artificial intelligence from Brandeis University. Over the years, Anna has worked on multiple large-scale machine learning and data science applications for companies such as Amazon and Decathlon. Anna is an AWS Certified Developer and solutions architect. She speaks at conferences and on podcasts, reviews talk proposals for tech conferences, and writes about Python and machine learning for curated publications on Medium. She is currently a co-director of the Women Who Code Boston network.

Sumanas Sarma is the Head of AI at Captur, a Computer Vision AI company. With a foundational 15-year journey in the tech industry, he initially focused on safety-critical systems as a software engineer. This experience was pivotal as he ventured into fintech, grappling with its unique challenges. Seeking a deeper expertise in machine learning, Sumanas pursued an MSc at Queen Mary University of London. Over the past eight years, he has immersed himself in diverse ML domains. He customised large language models (LLMs) for enterprise use in the NLP space, and in Retail AI, he developed models for dynamic pricing, price elasticity, and product recommendations. Today, he steers cutting-edge projects in his pivotal role at Captur.

Table of Contents

Preface

Part 1:Essentials

1

Foundations

Emotions

Categorical

Dimensional

Sentiment

Why emotion analysis is important

Introduction to NLP

Phrase structure grammar versus dependency grammar

Rule-based parsers versus data-driven parsers

Semantics (the study of meaning)

Introduction to machine learning

Technical requirements

A sample project

Logistic regression

Support vector machines (SVMs)

K-nearest neighbors (k-NN)

Decision trees

Random forest

Neural networks

Making predictions

A sample text classification problem

Summary

References

Part 2:Building and Using a Dataset

2

Building and Using a Dataset

Ready-made data sources

Creating your own dataset

Data from PDF files

Data from web scraping

Data from RSS feeds

Data from APIs

Other data sources

Transforming data

Non-English datasets

Evaluation

Summary

References

3

Labeling Data

Why labeling must be high quality

The labeling process

Best practices

Labeling the data

Gold tweets

The competency task

The annotation task

Buy or build?

Results

Inter-annotator reliability

Calculating Krippendorff’s alpha

Debrief

Summary

References

4

Preprocessing – Stemming, Tagging, and Parsing

Readers

Word parts and compound words

Tokenizing, morphology, and stemming

Spelling changes

Multiple and contextual affixes

Compound words

Tagging and parsing

Summary

References

Part 3:Approaches

5

Sentiment Lexicons and Vector-Space Models

Datasets and metrics

Sentiment lexicons

Extracting a sentiment lexicon from a corpus

Similarity measures and vector-space models

Vector spaces

Calculating similarity

Latent semantic analysis

Summary

References

6

Naïve Bayes

Preparing the data for sklearn

Naïve Bayes as a machine learning algorithm

Naively applying Bayes’ theorem as a classifier

Multi-label datasets

Summary

References

7

Support Vector Machines

A geometric introduction to SVMs

Using SVMs for sentiment mining

Applying our SVMs

Using a standard SVM with a threshold

Making multiple SVMs

Summary

References

8

Neural Networks and Deep Neural Networks

Single-layer neural networks

Multi-layer neural networks

Summary

References

9

Exploring Transformers

Introduction to transformers

How data flows through the transformer model

Input embeddings

Positional encoding

Encoders

Decoders

Linear layer

Softmax layer

Output probabilities

Hugging Face

Existing models

Transformers for classification

Implementing transformers

Google Colab

Single-emotion datasets

Multi-emotion datasets

Summary

References

10

Multiclassifiers

Multilabel datasets are hard to work with

Confusion matrices

Using “neutral” as a label

Thresholds and local thresholds

Multiple independent classifiers

Summary

Part 4:Case Study

11

Case Study – The Qatar Blockade

The case study

Short-term changes

Long-term changes

Proportionality revisited

Summary

Index

Other Books You May Enjoy

Part 1:Essentials

The part introduces natural language processing (NLP), sentiment analysis (SA), and emotion analysis (EA). You will learn about the basic concepts behind SA and EA, why and how they are different, and why EA is so challenging. There is also an introduction to some of the tools that will be used. This part will also look at other approaches to multi-emotion classification and discuss emotions from a psychological point of view. Finally, there will be a discussion on why EA is important, its benefits, and usages.

This part has the following chapter:

Chapter 1, Foundations

1

Foundations

Emotions play a key role in our daily lives. Some people define them as the reactions that we as human beings experience as a response to events or situations, some describe them simply as a class of feelings, and others say they describe physiological states and are generated subconsciously. Psychologists describe emotions as “a complex state of feeling that results in physical and psychological changes that influence thought and behavior.” So, it appears that although we feel emotions, they are much harder to describe.

Our brains play a crucial role when creating and processing emotions. Historically, it was believed that each emotion was located in a specific part of the brain. However, research has shown that there is no single region of the brain that’s responsible for processing emotions – several brain regions are activated when emotions are being processed. Furthermore, different parts of the brain can generate the same emotion and different parts can also contribute to generating an emotion.

The reality may even be that emotion and sentiment are experiences that result from combined influences of biological, cognitive, and social aspects. Whatever the case, emotions matter because they help us decide what actions to do, how to negotiate tricky situations, and, at a basic level, how to survive. Different emotions rule our everyday lives; for example, we make decisions based on whether we are happy, angry, or sad, and we choose our daily pastimes and routines based on the emotions they facilitate. So, emotions are important, and understanding them may make our lives easier.

In this chapter, you will learn about the main concepts and differences between sentiment analysis and emotion analysis, and also understand why emotion analysis is important in the modern world. By combining this with a basic introduction to natural language processing (NLP) and machine learning, we will lay the foundations for successfully using these techniques for emotion analysis.

In this chapter, we’ll cover the following topics:

EmotionsSentimentWhy is emotion analysis important?Introduction to natural language processingIntroduction to machine learning

Emotions

This book is about writing programs that can detect emotions expressed in texts, particularly informal texts. Emotions play a crucial role in our daily lives. They impact how we feel, how we think, and how we behave. Consequently, it stands to reason that they impact the decisions we make. If this is the case, then being able to detect emotions from written text (for example, social media posts) is a useful thing to do because the impact it would have on many practical everyday applications in sectors such as marketing, industry, health, and security would be huge.

However, while it is clear that we all experience emotions and that they play a significant role in our plans and actions, it is much less clear what they are. Given that we are about to embark on a detailed study of how to write programs to detect them, it is perhaps worth beginning by investigating the notion of what an emotion is and looking at the various theories that attempt to pin them down. This is a topic that has fascinated philosophers and psychologists from antiquity to the present day, and it is still far from settled. We will briefly look at a number of the most prominent theories and approaches. This overview will not lead us to a definitive view, but before we start trying to identify them in written texts, we should at least become aware of the problems that people still have in pinning them down.

Darwin believed that emotions allowed humans and animals to survive and reproduce. He argued that they evolved, were adaptive, and that all humans, and even other animals, expressed emotion through similar behaviors. He believed that emotions had an evolutionary history that could be traced across cultures and species. Today, psychologists agree that emotions such as fear, surprise, disgust, happiness, and sadness can be regarded as universal regardless of culture.

The James-Lange theory proposes that it is our physical responses that are responsible for emotions. For example, if someone jumps out at you from behind a bush, your heart rate will increase, and it is this increase that causes the individual to feel fear. The facial-feedback theory builds on this idea and suggests that physical activity is responsible for influencing emotion, for example, if you smile, likely, you will automatically feel happier than if you did not smile. However, Cannon-Bard’s theory refutes James-Lange, instead suggesting that people experience emotional and physical responses simultaneously. The Schachter-Singer theory is a cognitive theory of emotion that suggests that it is our thoughts that are responsible for emotions, and similarly, cognitive appraisal theory suggests that thinking must come before experiencing an emotion. For instance, the brain might understand a situation as threatening, and hence fear is experienced.

To try to obtain a deeper understanding of emotions, let’s look at the three main theories of emotion:

Physiological: Psychologists have the view that emotions are formed when a bodily response is triggered by a stimulus, so as the individual experiences physiological changes, this is also experienced as an emotionNeurological: Biologists claim that hormones (for example, estrogen, progesterone, and testosterone) that are produced by the body’s glands impact the chemistry and circuitry of the brain and these lead to emotional responsesCognitive: Cognitive scientists believe that thoughts and other mental activities play a crucial role in forming emotions

In all likelihood, all three theories are probably valid to some extent. It has also been postulated that instead of thinking of these as mutually exclusive, it is more likely that they are complementary and that each explains and accounts for a different aspect of what we think of as an emotion.

Although emotions have been studied for many decades, it is probably still true that we still do not fully understand emotions.

Humans can experience a huge number of emotions, but only a handful are considered basic. However, the number of emotions considered in emotion analysis research is not always limited to just these basic emotions. Furthermore, it is not straightforward to demarcate emotions, and hence boundaries are very rarely clearly defined.

We will now consider what are known as the primary emotions. These have been described as a reaction to an event or situation, or the immediate strong first reaction experienced when something happens. There has been much research on identifying these primary emotions, but there is still no general agreement, and different models have been suggested by eminent researchers such as Ekman, Plutchik, and Parrot. Some emotions such as anger, fear, joy, and surprise are universally agreed upon. However, the same is not true for other emotions, with disagreements on the emotions that constitute the basic emotions and the number of these emotions. Although there is, again, no consensus on which model is best at covering basic emotions, the models proposed by Ekman and Plutchik are most commonly used. There are two popular approaches: categoricaland dimensional.

Categorical

Ekman is an advocate of the categorical theory, which suggests that emotions arise from separate neural systems. This approach also suggests that there are a limited number of primary, distinct emotions, such as anger, anxiety, joy, and sadness. Ekman suggested that primary emotions must have a distinct facial expression that is recognizable across all cultures. For example, the corners of the lips being turned down demonstrates sadness – and this facial expression is recognized universally as portraying sadness. Similarly, smiling with teeth exposed and the corners of the mouth pointing upwards is universally recognized as joy.

Amazingly, people blind from birth use the same facial expressions when expressing sadness and joy. They have never seen these facial expressions, so it is impossible that these expressions were learned. It is much more likely that these are an integral part of human nature. Using this understanding of distinct, universal facial expressions, Ekman proposed six primary emotions (Ekman, 1993):

AngerDisgustFearJoySadnessSurprise

Ekman suggested that these basic emotions were biologically primitive and have evolved to increase the reproductive fitness of animals and that all other emotions were combinations of these eight primary emotions. Later, Eckman expanded this list to include other emotions that he considered basic, such as embarrassment, excitement, contempt, shame, pride, satisfaction, and amusement.

Another of the most influential works in the area of emotions is Plutchik’s psychoevolutionary theory of emotion. Plutchik proposed eight primary emotions (Plutchik, 2001):

AngerAnticipationDisgustFearJoySadnessSurpriseTrust

From this theory, Plutchik developed a Wheel of Emotions (see Figure 1.1). This wheel was developed to help understand the nuances of emotion and how emotions contrast. It has eight sectors representing the eight emotions. Emotions intensify as they move from outside toward the center of the wheel. For example, annoyance increases to anger and then further increases to outright rage. Each sector of the circle has an opposite emotion that is placed directly opposite in the wheel. For example, the opposite of sadness is joy, and the opposite of anger is fear. It also shows how different emotions can be combined.

Figure 1.1 – Plutchik’s Wheel of Emotions

Although Ekman and Plutchik’s theories are the most common, there are other works, but there is little agreement on what the basic emotions are. However, in the area of emotion analysis research, Ekman and Plutchik’s models are the most often used classification schemes.

Dimensional

The dimensional approach posits that to understand emotional experiences, the fundamental dimensions of valence (the goodness and badness of the emotion) and arousal (the intensity of the emotion) are vital. This approach suggests that a common and interconnected neurophysiological system is responsible for all affective states. Every emotion can then be defined in terms of these two measures, so the plane can be viewed as a continuous two-dimensional space, with dimensions of valence and arousal, and each point in the place corresponds to a separate emotion state.

Figure 1.2 – Russell’s circumplex model

The most common dimensional model is Russell’s circumplex model ((Russell, 1980): see Figure 1.2). The model posits that emotions are made up of two core dimensions: valence and arousal. Figure 1.2 shows that valence ranges from −1 (unpleasant) to 1 (pleasant), and arousal also ranges from −1 (calm) to 1 (excited). Each emotion is then a linear combination of these two dimensions. For example, anger is an unpleasant emotional state (a negative valence) with a high intensity (a positive arousal). Other basic emotions can be seen in Figure 1.2 with their approximate positions in the two-dimensional space.

Some emotions have similar arousal and valence (for example, grief and rage). Hence, a third dimension (control) has also been suggested that can be used to distinguish between these. Control ranges from no control to full control. So, the entire range of human emotions can be represented as a set of points in the three-dimensional space using these three dimensions.

The dimensional model has a poorer resolution of emotions; that is, it is harder to distinguish between ambiguous emotions. The categorical model is simpler to understand, but some emotions are not part of the set of basic emotions.

Most emotion analysis research uses a categorical perspective; there seems to be a lack of research using the dimensional approach.

Sentiment

There is a second closely-related term known as sentiment. The terms sentiment and emotion seem to be used in an ad hoc manner, with different writers using them almost interchangeably. Given the difficulty we have found in working out what emotions are, and in deciding exactly how many emotions there are, having yet another ill-defined term is not exactly helpful. To try to clarify the situation, note that when people work on sentiment mining, they generally make use of a simple, limited system of classification using positive, negative, and neutral cases. This is a much simpler scheme to process and ascertain, and yields results that are also easier to understand. In some ways, emotion analysis may be regarded as an upgrade to sentiment analysis; a more complex solution that analyzes much more than the simple positive and negative markers and instead tries to determine specific emotions (anger, joy, sadness). This may be more useful but also involves much more effort, time, and cost. Emotion and sentiment are, thus, not the same. An emotion is a complex psychological state, whereas a sentiment is a mental attitude that is created through the very existence of the emotion.

For us, sentiment refers exclusively to an expressed opinion that is positive, negative, or neutral. There is some degree of overlap here because, for example, emotions such as joy and love could both be considered positive sentiments. It may be that the terms simply have different granularity – in the same way that ecstasy, joy, and contentment provide a fine-grained classification of a single generic emotion class that we might call happiness, happiness and love are a fine-grained classification of the general notion of feeling positive. Alternatively, it may be that sentiment is the name for one of the axes in the dimensional model – for example, the valence axis in Russell’s analysis. Given the range of theories of emotion, it seems best to just avoid having another term for much the same thing. In this book, we will stick to the term emotion; we will take an entirely pragmatic approach by accepting some set of labels from an existing theory such as Plutchik’s or Russell’s as denoting emotions, without worrying too much about what it is that they denote. We can all agree that I hate the people who did that and I wish they were all dead expresses hate and anger, and that it is overall negative, even if we’re not sure what hate and anger are or what the scale from negative to positive actually measures.

Now that we know a bit more about what emotion is and how it is categorized and understood, it is essential to understand why emotion analysis is an important topic.

Why emotion analysis is important

The amount of data generated daily from online sources such as social media and blogs is staggering. In 2019, Forbes estimated this to be around 2.5 quintillion bytes of data, though this figure is more

than likely even higher now. Due to this, much research has focused on using this data for analysis and for gaining hitherto unknown insights (for example, predicting flu trends and disease outbreaks using Twitter (now known as “X”) data).

Similarly, people are also increasingly expressing their opinions online – and many of these opinions are, explicitly or implicitly, highly emotional (for example, I love summer). Nowadays, social network platforms such as Facebook, LinkedIn, and Twitter are at the hub of everything we do. Twitter is one of the most popular social network platforms, with more than 300 million users using Twitter actively every month. Twitter is used by people from all walks of life; celebrities, movie stars, politicians, sports stars, and everyday people. Users post short messages, known as tweets, and, every day, millions share their opinions about themselves, news, sports, movies, and other topics. Consequently, this makes platforms such as Twitter rich sources of data for public opinion mining and sentiment analysis.

As we have seen, emotions play an important role in human intelligence, decision-making, social interaction, perception, memory, learning, creativity, and much much more.

Emotion analysis is the process of recognizing the emotions that are expressed through texts (for example, social media posts). It is a complex task because user-generated content, such as tweets, is typically understood as follows:

Written in natural languageOften unstructured, informal, and misspelledCan contain slang and made-up wordsCan contain emojis and emoticons where their usage does not always correspond to the reason for their original creation (for example, using the pizza emoji to express love)

Furthermore, it is also entirely possible to express emotion without using any obvious emotion markers.

One of the big unsolved problems in emotion analysis is detecting emotions such as anticipation, pessimism, and sarcasm. Consider the following tweet:

We lost again. Great.

We humans are fairly knowledgeable when it comes to drilling down to the true meaning implied, and would understand that the user was being sarcastic. We know that a team losing again is not a good thing. Hence, by making use of this understanding, we can easily identify the implied meaning.

The problem is that simply considering each word that has sentiment in isolation will not do a good job. Instead, further rules must be applied to understand the context of the word. These rules will help the analyzer differentiate between sentences that might contain similar words but have completely different meanings. However, even with these rules, analyzers will still make mistakes.

Social media is now viewed as a valuable resource, so organizations are showing an increased interest in social media monitoring to analyze massive, free-form, short, user-generated text from social

media sites. Exploiting these allows organizations to gain insights into understanding their customer’s opinions, concerns, and needs about their products and services.

Due to its real-time nature, governments are also interested in using social media to identify threats and monitor and analyze public responses to current events.

Emotion analysis has many interesting applications:

Marketing: Lots of Twitter users follow brands (for example, Nike), so there are many marketing opportunities. Twitter can help spread awareness of a brand, generate leads, drive traffic to sites, build a customer base, and more. Some of the biggest marketing campaigns of previous years include #ShareACoke by Coca-Cola, #WantAnR8 by Audi, and #BeTheFastest by Virgin Media.Stock markets: Academics have attempted to use Twitter to anticipate trends in financial markets. In 2013, the Associated Press Twitter account posted a (false) tweet stating that there had been explosions in the White House and that Obama was injured. The post was debunked very quickly but the stock markets still took a nosedive, resulting in hundreds of billions of dollars changing hands.Social studies: Millions of people regularly interact with the world by tweeting, providing invaluable insights into their feelings, actions, routines, emotions, and behavior. This vast amount of public communication can be used to generate forecasts of various types of events. For example, large-scale data analysis of social media has demonstrated that not only did Brexit supporters have a more powerful, emotional message, but they were also more effective in the use of social media. They routinely outmuscled their rivals and had more vocal and active supporters across nearly all social media platforms. This led to the activation of a greater number of Leave supporters and enabled them to dominate social media platforms – thus influencing many undecided voters.

Gaining an understanding of emotions is also important for organizations to gain insights into public opinion about their products and services. However, it is also important to automate this process so that decisions can be made and actions can be taken in real-time. For example, analysis techniques can automatically analyze and process thousands of reviews about a particular product and extract insights that show whether consumers are satisfied with the product or service. This can be sentiment or emotion, although emotion may be more useful due to it being more granular.

Research has shown that tweets posted by dissatisfied users are shared more often and spread faster and wider than other types of tweets. Therefore, organizations have to provide customer services beyond the old-fashioned agent at the end of the phone line. Due to this, many organizations today also provide social media-based customer support in an attempt to head-off bad reviews and give a good impression. Nowadays, there is so much consumer choice, and it is so much easier for customers to switch to competitors, that it is vitally important for organizations to retain and increase their customer base. Hence, the quicker an organization reacts to a bad post, the better chance they have

of retaining the customer. Furthermore, there is no better advertising than word of mouth – such as that generated by happy customers. Emotion analysis is one way to quickly analyze hundreds of tweets, find the ones where customers are unhappy, and use this to drive other processes that attempt to resolve the problem before the customer becomes too unhappy and decides to take their business elsewhere. Emotion analysis not only requires data – it also generates a lot of data. This data can be further analyzed to determine, for example, what the top items on user wishlists are, or what the top user gripes are. These can then be used to drive the next iteration or version of the product or service.

Although sentiment analysis and emotion analysis are not mutually exclusive and can be used in conjunction, the consensus is that sentiment analysis is not adequate for classifying something as complex, multi-layered, and nuanced as emotion. Simply taking the whole range of emotions and considering them as only positive, negative, or neutral runs the considerable risk of missing out on deeper insights and understandings.

Emotion analysis also provides more in-depth insights. Understanding why someone ignored or liked a post needs more than just a sentiment score. Furthermore, gaining actionable insights also requires more than just a sentiment score.

Emotion analysis is a sub-field of NLP, so it makes sense to gain a better understanding of that next.

Introduction to NLP

Sentiment mining is about finding the sentiments that are expressed by natural language texts – often quite short texts such as tweets and online reviews, but also larger items such as newspaper articles. There are many other ways of getting computers to do useful things with natural language texts and spoken language: you can write programs that can have conversations (with people or with each other), you can write programs to extract facts and events from articles and stories, you can write programs to translate from one language to another, and so on. These applications all share some basic notions and techniques, but they each lay more emphasis on some topics and less on others. In Chapter 4, Preprocessing – Stemming, Tagging, and Parsing, we will look at the things that matter most for sentiment mining, but we will give a brief overview of the main principles of NLP here. As noted, not all of the stages outlined here are needed for every application, but it is nonetheless useful to have a picture of how everything fits together when considering specific subtasks later.

We will start with a couple of basic observations:

Natural language is linear. The fundamental form of language is speech, which is necessarily linear. You make one sound, and then you make another, and then you make another. There may be some variation in the way you make each sound – louder or softer, with a higher pitch or a lower one, quicker or slower – and this may be used to overlay extra information on the basic message, but fundamentally, spoken language is made up of a sequence of identifiable units, namely sounds; and since written language is just a way of representing spoken language, it too must be made up of a sequence of identifiable units.Natural language is hierarchical. Smaller units are grouped into larger units, which are grouped into larger units, which are grouped into larger units, and so on. Consider the sentence smaller units are grouped into larger units. In the written form of English, for instance, the smallest units are characters; these are grouped into morphemes (meaning-bearing word-parts), as small er unit s are group ed into large er unit s, which are grouped into words (small-er unit-s are group-ed into large-er unit-s), which are grouped into base-level phrases ([small-er unit-s] [are group-ed] [into] [large-er unit-s]), which are grouped into higher-level phrases ([[small-er unit-s] [[are group-ed] [[into] [large-er unit-s]]]]]).

These two properties hold for all natural languages. All natural languages were spoken before they were written (some widely spoken languages have no universally accepted written form!), and hence are fundamentally linear. But they all express complex hierarchical relations, and hence to understand them, you have to be able to find the ways that smaller units are grouped into larger ones.

What the bottom-level units are like, and how they are grouped, differs from language to language. The sounds of a language are made by moving your articulators (tongue, teeth, lips, vocal cords, and various other things) around while trying to expel air from your lungs. The sound that you get by closing and then opening your lips with your vocal cords tensed (/b/, as in the English word bat) is different from the sound you get by doing the same things with your lips while your vocal cords are relaxed (/p/, as in pat). Different languages use different combinations – Arabic doesn’t use /p/ and English doesn’t use the sound you get by closing the exit from the chamber containing the vocal cords (a glottal stop): the combinations that are used in a particular language are called its phonemes. Speakers of a language that don’t use a particular combination find it hard to distinguish words that use it from ones that use a very similar combination, and very hard to produce that combination when they learn a language that does.

To make matters worse, the relationship between the bottom-level units in spoken language and written language can vary from language to language. The phonemes of a language can be represented in the written form of that language in a wide variety of ways. The written form may make use of graphemes, which are combinations of ways of making a shape out of strokes and marks (so, AAAAAA are all written by producing two near-vertical more-or-less-straight lines joined at the top with a cross-piece about half-way up), just as phonemes are combinations of ways of making a sound; a single phoneme may be represented by one grapheme (the short vowel /a/ from pat is represented in English by the character a) or by a combination of graphemes (the sound /sh/ from should is represented by the pair of graphemes s and h); a sound may have no representation in the written form (Arabic text omits short vowels and some other distinctions between phonemes); or there may simply be no connection between the written form and the way it is pronounced (written Chinese, Japanese kanji symbols). Given that we are going to be largely looking at text, we can at least partly ignore the wide variety of ways that written and spoken language are related, but we will still have to be aware that different languages combine the basic elements of the written forms in completely different ways to make up words.

The bottom-level units of a language, then, are either identifiable sounds or identifiable marks. These are combined into groups that carry meaning – morphemes. A morpheme can carry quite a lot of meaning; for example, cat (made out of the graphemes c, a, and t) denotes a small mammal with pointy ears and an inscrutable outlook on life, whereas s just says that you’ve got more than one item of the kind you are thinking about, so cats denotes a group of several small mammals with pointy ears and an opaque view of the world. Morphemes of the first kind are sometimes called lexemes, with a single lexeme combining with one or more other morphemes to express a concept (so, the French lexeme noir (black) might combine with e (feminine) and s (plural) to make noires – several black female things). Morphemes that add information to a lexeme, such as about how many things were involved or when an event happened, are called inflectional morphemes, whereas ones that radically change their meaning (for example an incomplete solution to a problem is not complete) are called derivational morphemes, since they derive a new concept from the original. Again, most languages make use of inflectional and derivational morphemes to enrich the basic set of lexemes, but exactly how this works varies from language to language. We will revisit this at some length in Chapter 5 , Sentiment Lexicons and Vector Space Models since finding the core lexemes can be significant when we are trying to assign emotions to texts.

A lexeme plus a suitable set of morphemes is often referred to as a word. Words are typically grouped into larger tree-like structures, with the way that they are grouped carrying a substantial part of the message conveyed by the text. In the sentence John believes that Mary expects Peter to marry Susan, for instance, Peter to marry Susan is a group that describes a particular kind of event, Mary expects [Peter to marry Susan] is a group that describes Mary’s attitude to this event, and John believes [that Mary expected [Peter to marry Susan]] is a group that describes John’s view of Mary’s expectation.

Yet again, different languages carry out this kind of grouping in different ways, and there are numerous ways of approaching the task of analyzing the grouping in particular cases. This is not the place for a review of all the grammatical theories that have ever been proposed to analyze the ways that words get grouped together or of all the algorithms that have ever been proposed for applying those theories to specific cases (parsers), but there are a few general observations that are worth making.

Phrase structure grammar versus dependency grammar

In some languages, groups are mainly formed by merging adjacent groups. The previous sentence, for instance, can be analyzed if we group it as follows:

In some languages groups are mainly formed by merging adjacent groups

In [some languages]np groups are mainly formed by merging [adjacent groups]np

[In [some languages]]pp groups are mainly formed by [merging [adjacent groups]]vp

[In [some languages]]pp groups are mainly formed [by [merging [adjacent groups]]]pp

[In [some languages]]pp groups are mainly [formed [by [merging [adjacent groups]]]]vp

[In [some languages]]pp groups are [mainly [formed [by [merging [adjacent groups]]]]]vp

[In [some languages]]pp groups [are [mainly [formed [by [merging [adjacent groups]]]]]]vp

[In [some languages]]pp [groups [are [mainly [formed [by [merging [adjacent groups]]]]]]]s

[[In [some languages]][groups [are [mainly [formed [by [merging [adjacent groups]]]]]]]]s

This tends to work well for languages where word order is largely fixed – no languages have completely fixed word order (for example, the preceding sentence could be rewritten as Groups are mainly formed by merging adjacent groups in some languages with very little change in meaning), but some languages allow more freedom than others. For languages such as English, analyzing the relationships between words in terms of adjacent phrases, such as using a phrase structure grammar, works quite well.

For languages where words and phrases are allowed to move around fairly freely, it can be more convenient to record pairwise relationships between words. The following tree describes the same sentence using a dependency grammar – that is, by assigning a parent word to every word (apart from the full stop, which we are taking to be the root of the tree):

Figure 1.3 – Analysis of “In some languages, groups are mainly formed by merging adjacent groups” using a rule-based dependency parser

There are many variations of phrase structure grammar and many variations of dependency grammar. Roughly speaking, dependency grammar provides an easier handle on languages where words can move around very freely, while phrase structure grammar makes it easier to deal with invisible items such as the subject of merging in the preceding example. The difference between the two is, in any case, less clear than it might seem from the preceding figure: a dependency tree can easily be transformed into a phrase structure tree by treating each subtree as a phrase, and a phrase structure tree can be transformed into a dependency tree if you can specify which item in a phrase is its head – for example, in the preceding phrase structure tree, the head of a group labeled as nn is its noun and the head of a group labeled as np is the head of nn.

Rule-based parsers versus data-driven parsers

As well as having a theory of how to describe the structure of a piece of text, you need a program that applies that theory to specific texts – a parser. There are two ways to approach the development of a parser:

Rule-based: You can try to devise a set of rules that describe the way that a particular language works (a grammar), and then implement a program that tries to apply these rules to the texts you want analyzed. Devising such rules is difficult and time-consuming, and programs that try to apply them tend to be slow and fail if the target text does not obey the rules.Data-driven: You can somehow produce a set of analyses of a large number of texts (a treebank), and then implement a program that extracts patterns from these analyses. Producing a treebank is difficult and time-consuming – you need hundreds of thousands of examples, and the trees all have to be consistently annotated, which means that if this is to be done by people, then they have to be given consistent guidelines that cover every example they will see (which is, in effect, a grammar) (and if it is not done by people then you must already have an automated way of doing it, that is, a parser!).

Both approaches have advantages and disadvantages: when considering whether to use a dependency grammar or a phrase structure grammar and then when considering whether to follow a rule-based approach or a data-driven one, there are several criteria to be considered. Since no existing system optimizes all of these, you should think about which ones matter most for your application and then decide which way to go:

Speed: The first criterion to consider is the speed at which the parser runs. Some parsers can become very slow when faced with long sentences. The worst-case complexity of the standard chart-parsing algorithm for rule-based approaches is O(N3), where N is the length of the sentence, which means that for long sentences, the algorithm can take a very long time. Some other algorithms have much better complexity than this (the MALT (Nivre et al., 2006) and MST (McDonald et al., 2005) parsers, for instance, are linear in the length of the sentence), while others have much worse. If two parsers are equally good according to all the other criteria, then the faster one will be preferable, but there will be situations where one (or more) of the other criteria is more important.Robustness: Some parsers, particularly rule-based ones, can fail to produce any analysis at all for some sentences. This will happen if the input is ungrammatical, but it will also happen if the rules are not a complete description of the language. A parser that fails to produce a perfectly grammatical input sentence is less useful than one that can analyze every grammatically correct sentence of the target language. It is less clear that parsers that will do something with every input sentence are necessarily more useful than ones that will reject some sentences as being ungrammatical. In some applications, detecting ungrammaticality is a crucial part of the task (for example, in language learning programs), but in any case, assigning an analysis to an ungrammatical sentence cannot be either right or wrong, and hence any program that makes use of such an analysis cannot be sure that it is doing the right thing.Accuracy: A parser that assigns the right analysis to every input text will generally be more useful than one that does not. This does, of course, beg the question of how to decide what the right analysis is. For data-driven parsers, it is impossible to say what the right analysis of a sentence that does not appear in the treebank is. For rule-based parsers, any analysis that is returned will be right in the sense that it obeys the rules. So, if an analysis looks odd, you have to work out how the rules led to it and revise them accordingly.

There is a trade-off between accuracy and robustness. A parser that fails to return any analysis at all in complex cases will produce fewer wrong analyses than one that tries to find some way of interpreting every input text: the one that simply rejects some sentences will have lower recall but may have higher precision, and that can be a good thing. It may be better to have a system that says Sorry, I didn’t quite understand what you just said than one that goes ahead with whatever it is supposed to be doing based on an incorrect interpretation.

Sensitivity and consistency: Sometimes, sentences that look superficially similar have different underlying structures. Consider the following examples: a) I want to see the queen b) I went to see the queen

1(a) is the answer to What do you want? and 2(b) is the answer to Why did you go? If the structures that are assigned to these two sentences do not reflect the different roles for to see the queen, then it will be impossible to make this distinction:

Figure 1.4 – Trees for 1(a) and 1(b) from the Stanford dependency parser (Dozat et al., 2017)

a) One of my best friends is watching old movies b) One of my favorite pastimes is watching old movies

Figure 1.5 – Trees for 2(a) and 2(b) from the Stanford dependency parser

The Stanford dependency parser (SDP) trees both say that the subject (One of my best friends, One of my favorite pastimes) is carrying out the action of watching old movies – it is sitting in its most comfortable armchair with the curtains drawn and the TV on. The first of these makes sense, but the second doesn’t: pastimes don’t watch old movies. What we need is an equational analysis that says that One of my favorite pastimes and watching old movies are the same thing, as in Figure 1.6:

Figure 1.6 – Equational analysis of “One of my favorite pastimes is watching old movies”

Spotting that 2(b) requires an analysis like this, where my favorite pastime is the predication in an equational use of be rather than the agent of a watching-old-movies event, requires more detail about the words in question than is usually embodied in a treebank.

It can also happen that sentences that look superficially different have very similar underlying structures:

a) Few great tenors are poor b) Most great tenors are rich

This time, the SDP assigns quite different structures to the two sentences:

Figure 1.7 – Trees for 3(a) and 3(b) from the SDP

The analysis of 3(a) assigns most as a modifier of great, whereas the analysis of 3(b) assigns few as a modifier of tenors. Most can indeed be used for modifying adjectives, as in He is the most annoying person I know, but in 3(a), it is acting as something more like a determiner, just as few is in 3(b).

a) There are great tenors who are rich b) Are there great tenors who are rich?

It is clear that 4(a) and 4(b) should have almost identical analyses – 4(b) is just 4(a) turned into a question. Again, this can cause problems for treebank-based parsers:

Figure 1.8 – Trees for 4(a) and 4(b) from MALTParser

The analysis in Figure 1.8 for 4(a) makes are the head of the tree, with there, great tenors who are rich, and as daughters, whereas 4(b) is given tenors as its head and are, there, great, who are rich, and ? as daughters. It would be difficult, given these analyses, to see that 4(a) is the answer to 4(b)!

Treebank-based parsers frequently fail to cope with issues of the kind raised by the examples given here. The problem is that the treebanks on which they are trained tend not to include detailed information about the words that appear in them – that went is an intransitive verb and want requires a sentential complement, that friends are human and can therefore watch old movies while pastimes are events, and can therefore be equated with the activity of watching something, or that most can be used in a wide variety of ways.

It is not possible to say that all treebank-based parsers suffer from these problems, but several very widely used ones (the SDP, the version of MALT distribute