39,59 €
Natural language processing (NLP) uses machine learning to extract information from unstructured data. This book will help you to move quickly from business questions to high-performance models in production.
To start with, you'll understand the importance of NLP in today’s business applications and learn the features of Amazon Comprehend and Amazon Textract to build NLP models using Python and Jupyter Notebooks. The book then shows you how to integrate AI in applications for accelerating business outcomes with just a few lines of code. Throughout the book, you'll cover use cases such as smart text search, setting up compliance and controls when processing confidential documents, real-time text analytics, and much more to understand various NLP scenarios. You'll deploy and monitor scalable NLP models in production for real-time and batch requirements. As you advance, you'll explore strategies for including humans in the loop for different purposes in a document processing workflow. Moreover, you'll learn best practices for auto-scaling your NLP inference for enterprise traffic.
Whether you're new to ML or an experienced practitioner, by the end of this NLP book, you'll have the confidence to use AWS AI services to build powerful NLP applications.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 483
Veröffentlichungsjahr: 2021
Derive strategic insights from unstructured data with Amazon Textract and Amazon Comprehend
Mona M
Premkumar Rangarajan
BIRMINGHAM—MUMBAI
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Product Manager: Sunith Shetty
Senior Editor: David Sugarman
Content Development Editor: Priyanka Soam
Technical Editor: Devanshi Ayare
Copy Editor: Safis Editing
Project Coordinator: Aparna Ravikumar Nair
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Production Designer: Sinhayna Bais
First published: November 2021
Production reference: 2191121
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
978-1-80181-253-5
www.packt.com
Wisdom is knowing I am nothing. Love is knowing I am everything. Between the two, my life moves. – Shri Nisargadatta Maharaj
We dedicate this book to our dear fathers, for it is their wisdom and love that guides us from within. We may miss their physical presence, but we are blessed with the light of their true essence in every moment of our lives.
We also wanted to take a few moments to express our sincere gratitude to our families, friends, and well-wishers for their continued support, without whom this book would not have been possible.
I, Premkumar Rangarajan, would like to thank my wife, Sapna Mohan Kumar, for her patience with me, her words of encouragement during creative blocks, and her skillful ability to know exactly when to leave me alone; my son, Harivatsa Premkumar, for his faith in me; my mother, Prabhavathy Rangarajan, for her constant love and guidance; my brother, Arun Rangarajan, for his unwavering support; my niece Anya Arun and my nephew Satvik Arun for their playful encouragement; and all my family and friends, for being there for me during this very tough 2021.
I, Mona M, would like to thank my grandparents, Nawal Kishore Prasad and Indu Kumari Sinha, for their constant encouragement to write this book; my mother, Punam Kumari, for never losing faith in me even in the most difficult situation of my life. Also, to my aunts (Nirupama Kaushik and Anupama Kunwar), for always giving me the strength to stay positive, and all the rest of my family and friends, for being there for me during this very tough 2021
For decades, very few organizations could master the arcane field of Machine Learning (ML). A fascinating mix of math, computer science, software engineering, and IT, ML required a collection of skills and resources that were simply not available outside of very large companies or research labs.
This all changed about 10 years ago with the availability of commodity compute and storage, open source libraries, and the omnipresence of digital data. Indeed, the near-simultaneous emergence of tools such as Amazon EC2, Amazon S3, scikit-learn, and Theano quickly made ML much more accessible and cost-effective. Just a few years later, research teams demonstrated that Graphical Processing Units (GPUs) could be used to massively accelerate neural networks, giving this ancient and impractical technology a new lease of life, and kicking off a Deep Learning (DL) frenzy that has yet to slow down.
Initially, Computer Vision (CV) stole the limelight, amazing us all with ever more sophisticated applications. Meanwhile, Natural Language Processing (NLP) progressed as well, although in a quieter manner. For a while, tasks such as translation, sentiment analysis, and searching didn't look as exciting and flashy as autonomous driving. And then, transformer models burst onto the scene, delivering stunning, state-of-the-art results on NLP tasks and rejuvenating the whole field.
Today, NLP use cases are ubiquitous. Many organizations have accumulated mountains of text documents, including invoices, contracts, forms, reports, emails, web pages, and more. The sheer volume and diversity of these documents make it very challenging to process them efficiently so as to extract precious business insights that can help improve business performance and customer experience.
Some teams decide to build their own NLP solutions using ML libraries and their preferred flavor of IT infrastructure. On top of the ML work, this also requires deploying and managing production environments, with their cohort of challenges: security, monitoring, high availability, scaling, and so on. All of this is important work (who wants to skimp on security?), but it takes valuable time and resources away from the project without creating any actual business value.
This is precisely the problem that AWS AI Services solves. You can extract business insights from your text documents without having to train models or manage any infrastructure. In fact, you don't even need to know the first thing about ML! The answer is literally an API call away, and any developer can start using these services in minutes. Many AWS customers have deployed these services in production in a couple of days, if not hours. They're that simple, and they provide the out-of-the-box security and robustness that is commonly associated with AWS infrastructure.
Authors Mona and Prem have worked with diverse AWS customers for years, and they've distilled this experience in their book, the first of its kind. Not only will you learn how AWS APIs work, but you'll also and, most importantly, learn how to combine them to implement powerful NLP workflows, such as automating document processing, understanding the voice of your customers, or building a solution to monetize content. I highly recommend this book to every developer interested in adding NLP capabilities to their applications with just a few lines of code. So, turn the page, start learning, and build great apps!
Julien Simon
Global AI and ML Evangelist, AWS
Mona M is an AI/ML customer engineer at Google. She is a highly skilled IT professional, with more than 10 years' experience in software design, development, and integration across diverse work environments. As an AWS solutions architect, her role is to ensure customer success in building applications and services on the AWS platform. She is responsible for crafting a highly scalable, flexible, and resilient cloud architecture that addresses customer business problems. She has published multiple blogs on AI and NLP on the AWS AI channel along with research papers on AI-powered search solutions.
Premkumar Rangarajan is an enterprise solutions architect, specializing in AI/ML at Amazon Web Services. He has 25 years of experience in the IT industry in a variety of roles, including delivery lead, integration specialist, and enterprise architect. He has significant architecture and management experience in delivering large-scale programs across various industries and platforms. He is passionate about helping customers solve ML and AI problems.
Hitesh Hinduja is an ardent AI enthusiast working as a Senior Manager in AI at Ola Electric, where he leads a team of 20+ people in the areas of ML, statistics, CV, NLP, and reinforcement learning. He has filed 14+ patents in India and the US and has numerous research publications to his name. Hitesh has been associated in research roles at India's top B-schools: the Indian School of Business, Hyderabad, and the Indian Institute of Management, Ahmedabad. He is also actively involved in training and mentoring and has been invited to be a guest speaker by various corporations and associations across the globe.
Egor Pushkin is a technical leader responsible for natural language processing and understanding efforts within the AWS Languages organization. His specialty is the design of highly scalable and reliable services backed by ML/NLP tech. Before joining AWS, he focused on location-sharing technology and built systems deployed to over a billion devices worldwide. Prior to his years in the industry, he pursued an academic career, studying the processing of multispectral satellite images with the use of neural networks.
In this section, we introduce the NLP construct and the business value of using NLP, leading to an overview of the AWS AI stack, along with the key NLP services.
This section comprises the following chapters:
Chapter 1, NLP in the Business Context and Introduction to AWS AI ServicesChapter 2, Introducing Amazon TextractChapter 3, Introducing Amazon ComprehendNatural language processing, or NLP, is quite popular in the scientific community, but the value of using this Artificial Intelligence (AI) technique to gain business benefits is not immediately obvious to mainstream users. Our focus will be to raise awareness and educate you on the business context of NLP, provide examples of the proliferation of data in unstructured text, and show how NLP can help derive meaningful insights to inform strategic decisions within an enterprise.
In this introductory chapter, we will be establishing the basic context to familiarize you with some of the underlying concepts of AI and Machine Learning (ML), the types of challenges that NLP can help solve, common pitfalls when building NLP solutions, and how NLP works and what it's really good at doing, with examples.
In this chapter, we will cover the following:
Introducing NLPOvercoming the challenges in building NLP solutionsUnderstanding why NLP is becoming mainstreamIntroducing the AWS ML stackLanguage is as old as civilization itself and no other communication tool is as effective as the spoken or written word. In their childhood days, the authors were enamored with The Arabian Nights, a centuries-old collection of stories from India, Persia, and Arabia. In one famous story, Ali Baba and the Forty Thieves, Ali Baba is a poor man who discovers a thieves' den containing hordes of treasure hidden in a cave that can only be opened by saying the magic words open sesame. In the authors' experience, this was the first recollection of a voice-activated application. Though purely a work of fiction, it was indeed an inspiration to explore the art of the possible.
Recently, in the last two decades, the popularity of the internet and the proliferation of smart devices has fueled significant technological advancements in digital communications. In parallel, the long-running research to develop AI made rapid strides with the advent of ML. Arthur Lee Samuel was the first to coin the term machine learning, in 1959, and helped make it mainstream in the field of computer science by creating a checkers playing program that demonstrated how computers can be taught.
The concept that machines can be taught to mimic human cognition, though, was popularized a little earlier in 1950 by Alan Turing in his paper Computing Machinery and Intelligence. This paper introduced the Turing Test, a variation of a common party game of the time. The purpose of the test was for an interpreter to ask questions and compare responses from a human participant and a computer. The trick was that the interpreter was not aware which was which, considering all three were isolated in different rooms. If the interpreter was unable to differentiate the two participants because the responses matched closely, the Turing Test had successfully validated that the computer possessed AI.
Of course, the field of AI has progressed leaps and bounds since then, largely due to the success of ML algorithms in solving real-world problems. An algorithm, at its simplest, is a programmatic function that converts inputs to outputs based on conditions. In contradiction to regular programmable algorithms, ML algorithms have learned the ability to alter their processing based on the data they encounter. There are different ML algorithms to choose from based on requirements, for example, Extreme Gradient Boosting (XGBoost), a popular algorithm for regression and classification problems, Exponential Smoothing (ETS), for statistical time series forecasting, Single Shot MultiBox Detector (SSD), for computer vision problems, and Latent Dirichlet Allocation (LDA), for topic modeling in NLP problems.
For more complex problems, ML has evolved into deep learning with the introduction of Artificial Neural Networks (ANNs), which have the ability to solve highly challenging tasks by learning from massive volumes of data. For example, AWS DeepComposer (https://aws.amazon.com/deepcomposer/), an ML service from Amazon Web Services (AWS), educates developers with music as a medium of instruction. One of the ML models that DeepComposer uses is trained with a type of neural network called the Convolutional Neural Network (CNN) to create new and unique musical compositions from a simple input melody using AutoRegressive (AR) techniques:
Figure 1.1 – Composing music with AWS DeepComposer and ML
A piano roll is an image representation of music, and AR-CNN considers music generation as a sequence of these piano roll images:
Figure 1.2 – Piano roll representation of music
While there is broad adoption of ML across organizations of all sizes and industries spurred by the democratization of advanced technologies, the potential to solve many types of problems, and the breadth and depth of capabilities in AWS, ML is only a subset of what is possible today with AI. According to one report (https://www.gartner.com/en/newsroom/press-releases/2019-01-21-gartner-survey-shows-37-percent-of-organizations-have, accessed on March 23, 2021), AI adoption grew by 270% in the period 2015 to 2019. And it is continuing to grow at a rapid pace. AI is no longer a peripheral technology only available to those enterprises that have the economic resources to afford high-performance computers. Today, AI is a mainstream option for organizations looking to add cognitive intelligence to their applications to accelerate business value. For example, ExxonMobil in partnership with Amazon created an innovative and efficient way for customers to pay at gas stations. The Alexa pay for gas skill uses the car's Alexa-enabled device or your smartphone's Alexa app to communicate with the gas pump to manage the payment. The authors paid a visit to a local ExxonMobil gas station to try it out, and it was an awesome experience. For more details, please refer to https://www.exxon.com/en/amazon-alexa-pay-for-gas.
AI addresses a broad spectrum of tasks similar to human intelligence, both sensory and cognitive. Typically, these are grouped into categories, for example, computer vision (mimics human vision), NLP (mimics human speech, writing, and auditory processes), conversational interfaces (such as chatbots, mimics dialogue-based interactions), and personalization (mimics human intuition). For example, C-SPAN, a broadcaster that reports on proceedings at the US Senate and the House of Representatives, usesAmazon Rekognition (a computer vision-based image and video analysis service) to tag who is speaking/on camera at each time. With Amazon Rekognition, C-SPAN was able to index twice as much content compared to what they were doing previously. In addition, AWS offers AI services for intelligent search, forecasting, fraud detection, anomaly detection, predictive maintenance, and much more, which is why AWS was named the leader in the first Gartner Magic Quadrant for Cloud AI.
While language is inherently structured and well defined, the usage or interpretation of language is subjective, and may inadvertently cause an unintended influence that you need to be cognizant of when building natural language solutions. Consider, for example, the Telephone Game, which shows how conversations are involuntarily embellished, resulting in an entirely different version compared to how it began. Each participant repeats exactly what they think they heard, but not what they actually heard. It is fun when played as a party game but may have more serious repercussions in real life. Computers, too, will repeat what they heard, based on how their underlying ML model interprets language.
To understand how small incremental changes can completely change the meaning, let's look at another popular game: Word Ladder (https://en.wikipedia.org/wiki/Word_ladder). The objective is to convert one word into a different word, often one with the opposite meaning, in as few steps as possible with only one letter in the word changing in one step.
An example is illustrated in the following table:
Figure 1.3 – The Word Ladder game
Adapting AI to work with natural language resulted in a group of capabilities that primarily deal with computational emulation of cognitive and sensory processes associated with human speech and text. There are two main categories that applications can be grouped into:
Natural Language Understanding (NLU), for voice-based applications such as Amazon Alexa, and speech-to-text/text-to-speech conversionsNLP, for the interpretation of context-based insights from textWith NLU, applications that hitherto needed multiple and sometimes cumbersome interfaces, such as a screen, keyboard, and a mouse to enable computer-to-human interactions, can work as efficiently with just voice.
In Stanley Kubrick's 1968 movie 2001: A Space Odyssey, (spoiler alert!!) an artificially intelligent computer known as the HAL 9000 uses vision and voice to interact with the humans on board, and in the course of the movie, develops a personality, does not accept when it is in error, and attempts to kill the humans when it discovers their plot to shut it down. Fast forward to now, 20 years after the future depicted in the movie, and we have made significant progress in language understanding and processing, but not to the extreme extent of the dramatization of the plot elements used in the movie, thankfully.
Now that we have a good understanding of the context in which NLP has developed and how it can be used, let's try examining some of the common challenges you might face while developing NLP solutions.
We read earlier that the main difference between the algorithms used for regular programming and those used for ML is the ability of ML algorithms to modify their processing based on the input data fed to them. In the NLP context, as in other areas of ML, these differences add significant value and accelerate enterprise business outcomes. Consider, for example, a book publishing organization that needs to create an intelligent search capability displaying book recommendations to users based on topics of interest they enter.
In a traditional world, you would need multiple teams to go through the entire book collection, read books individually, identify keywords, phrases, topics, and other relevant information, create an index to associate book titles, authors, and genres to these keywords, and link this with the search capability. This is a massive effort that takes months or years to set up based on the size of the collection, the number of people, and their skill levels, and the accuracy of the index is prone to human error. As books are updated to newer editions, and new books are added or removed, this effort would have to be repeated incrementally. This is also a significant cost and time investment that may deter many unless that time and those resources have already been budgeted for.
To bring in a semblance of automation in our previous example, we need the ability to digitize text from documents. However, this is not the only requirement, as we are interested in deriving context-based insights from the books to power a recommendations index for a reader. And if we are talking about, for example, a publishing house such as Packt, with 7,500+ books in its collection, we need a solution that not only scales to process large numbers of pages, but also understands relationships in text, and provides interpretations based on semantics, grammar, word tokenization, and language to create smart indexes. We will cover a detailed walkthrough of this solution, along with code samples and demo videos, in Chapter 5, Creating NLP Search.
Today's enterprises are grappling with leveraging meaningful insights from their data primarily due to the pace at which it is growing. Until a decade or so, most organizations used relational databases for all their data management needs, and some still do even today. This was fine because the data volume need was in single-digit terabytes or less. In the last few years, the technology landscape has witnessed a significant upheaval with smartphones becoming ubiquitous, the large-scale proliferation of connected devices (in the billions), the ability to dynamically scale infrastructure in size and into new geographies, and storage and compute costs becoming cheaper due to the democratization offered by the cloud. All of this means applications get used more often, have much larger user bases, more processing power, and capabilities, can accelerate their pace of innovation with faster go-to-market cycles, and as a result, have a need to store and manage petabytes of data. This, coupled with application users demanding faster response times and higher throughput, has put a strain on the performance of relational databases, fueling a move toward purpose-built databases such as Amazon DynamoDB, a key-value and document database that delivers single-digit millisecond latency at any scale.
While this move signals a positive trend, what is more interesting is how enterprises utilize this data to gain strategic insights. After all, data is only as useful as the information we can glean from it. We see many organizations, while accepting the benefits of purpose-built tools, implementing these changes in silos. So, there are varying levels of maturity in properly harnessing the advantages of data. Some departments use an S3 data lake (https://aws.amazon.com/products/storage/data-lake-storage/) to source data from disparate sources and run ML to derive context-based insights, others are consolidating their data in purpose-built databases, while the rest are still using relational databases for all their needs.
You can see a basic explanation of the main components of a data lake in the following Figure 1.5, An example of an Amazon S3 data lake:
Figure 1.4 – An example of an Amazon S3 data lake
Let's see how NLP can continue to add business value in this situation by referring back to our book publishing example. Suppose we successfully built our smart indexing solution, and now we need to update it with book reviews received via Twitter feeds. The searchable index should provide book recommendations based on review sentiment (for example, don't recommend a book if reviews are negative > 50% in the last 3 months). Traditionally, business insights are generated by running a suite of reports on behemoth data warehouses that collect, mine, and organize data into marts and dimensions. A tweet may not even be under consideration as a data source. These days, things have changed and mining social media data is an important aspect of generating insights. Setting up business rules to examine every tweet is a time-consuming and compute-intensive task. Furthermore, since a tweet is unstructured text, a slight change in semantics may impact the effectiveness of the solution.
Now, if you consider model training, the infrastructure required to build accurate NLP models typically uses the deep learning architecture called Transformers (please see https://www.packtpub.com/product/transformers-for-natural-language-processing/9781800565791) that use sequence-to-sequence processing without needing to process the tokens in order, resulting in a higher degree of parallelization. Transformer model families use billions of parameters with the training architecture using clusters of instances for distributed learning, which adds to time and costs.
AWS offers AI services that allow you, with just a few lines of code, to add NLP to your applications for the sentiment analysis of unstructured text at an almost limitless scale and immediately take advantage of the immense potential waiting to be discovered in unstructured text. We will cover AWS AI services in more detail from Chapter 2, Introducing Amazon Textract, onward.
In this section, we reviewed some challenges organizations encounter when building NLP solutions, such as complexities in digitizing paper-based text, understanding patterns from structured and unstructured data, and how resource-intensive these solutions can be. Let's now understand why NLP is an important mainstream technology for enterprises today.
According to this report (https://www.marketsandmarkets.com/Market-Reports/natural-language-processing-nlp-825.html, accessed on March 23, 2021), the global NLP market is expected to grow to USD 35.1 billion by 2026, at a Compound Annual Growth Rate (CAGR) of 20.3% during the forecast period. This is not surprising considering the impact ML is making across every industry (such as finance, retail, manufacturing, energy, utilities, real estate, healthcare, and so on) in organizations of every size, primarily driven by the advent of cloud computing and the economies of scale available.
This article about Emergence Cycle (https://blogs.gartner.com/anthony_bradley/2020/10/07/announcing-gartners-new-emergence-cycle-research-for-ai/), a research into emerging technologies in NLP (based on patents submitted, and looking at technology still in labs or recently released), shows the most mature usage of NLP is multimedia content analysis. This trend is true based on our experience, and content analysis to gain strategic insights is a common NLP requirement based on our discussions with a number of organizations across industries:
Figure 1.5 – Gartner's NLP Emergence Cycle 2020
For example, in 2020, when the world was struggling with the effects of the pandemic, a number of organizations adopted AI and specifically NLP to power predictions on the virus spread patterns, assimilate knowledge on virus behavior and vaccine research, and monitor the effectiveness of safety measures, to name a few. In April 2020, AWS launched an NLP-powered search site called https://cord19.aws/ using an AWS AI service called Amazon Kendra (https://aws.amazon.com/kendra/). The site provides an easy interface to search the COVID-19 Open Research Dataset using natural language questions. As the dataset is constantly updated based on the latest research on COVID-19, CORD-19 Search, due to its support for NLP, makes it easy to navigate this ever-expanding collection of research documents and find precise answers to questions. The search results provide not only specific text that contains the answer to the question but also the original body of text in which these answers are located:
Figure 1.6 – CORD-19 search results
Fred Hutchinson Cancer Research Center is a research institute focused on curing cancer by 2025. Matthew Trunnell, Chief Information Officer of Fred Hutchinson Cancer Research Center, has said the following:
"The process of developing clinical trials and connecting them with the right patients requires research teams to sift through and label mountains of unstructured clinical record data. Amazon Comprehend Medical will reduce this time burden from hours to seconds. This is a vital step toward getting researchers rapid access to the information they need when they need it so they can find actionable insights to advance lifesaving therapies for patients."
For more details and usage examples of Amazon Comprehend and Amazon Comprehend Medical, please refer to Chapter 3, Introducing Amazon Comprehend.
So, how can AI and NLP help us cure cancer or prepare for a pandemic? It's about recognizing patterns where none seem to exist. Unstructured text, such as documents, social media posts, and email messages, is similar to the treasure waiting in Ali Baba's cave. To understand why, let's briefly look at how NLP works.
NLP models train by learning what are called word embeddings, which are vector representations of words in large collections of documents. These embeddings capture semantic relationships and word distributions in documents, thereby helping to map the context of a word based on its relationship to other words in the document. The two common training architectures for learning word embeddings areSkip-gram and Continuous Bag of Words (CBOW). In Skip-gram, the embeddings of the input word are used to derive the distribution of the related words to predict the context, and in CBOW, the embeddings of the related words are used to predict the word in the middle. Both are neural network-based architectures and work well for context-based analytics use cases.
Now that we understand the basics of NLP (analyzing patterns in text by converting words to their vector representations), when we look at training models using text data from disparate data sources, unique insights are often derived due to patterns that emerge that previously appeared hidden when looked at within a narrower context, because we are using numbers to find relationships in text. For example, The Rubber Episode in the Amazon Prime TV show This Giant Beast That Is The Global Economy shows how a fungal disease has the potential to devastate the global economy, even though at first it might appear there is no link between the two. According to the US National Library of Medicine, natural rubber accounts for 40% of the world's consumption, and the South American Leaf Blight (SALB) fungal disease has the potential to spread worldwide and severely inhibit rubber production. Airplanes can't land without rubber, and its uses are so myriad that it would have unprecedented implications on the economy. This an example of a pattern that ML and NLP models are so good at finding specific items of interest across vast text corpora.
Before AWS and cloud computing revolutionized access to advanced technologies, setting up NLP models for text analytics was challenging to say the least. The most common reasons were as follows:
Lack of skills: Expertise in identifying data, feature engineering, building models, training, and tuning are all tasks that require a unique combination of skills, including software engineering, mathematics, statistics, and data engineering, that only a few practitioners have.Initial infrastructure setup cost: ML training is an iterative process, often requiring a trial-and-error approach to tune the models to get the desired accuracy. Further training and inference may require GPU acceleration based on the volume of data and the number of requests, requiring a high initial investment.Scalability with the current on-premises environment: Running ML training and inference from on-premises servers constrains the elasticity required to scale compute and storage based on model size, data volumes, and inference throughput needed. For example, training large-scale transformer models may require massively parallel clusters, and capacity planning for such scenarios is challenging.Availability of tools to help orchestrate the various moving parts of NLP training: As mentioned before, the ML workflow comprises many tasks, such as data discovery, feature engineering, algorithm selection, model building, which includes training and fine-tuning the models several times, and finally deploying those models into production. Furthermore, getting an accurate model is a highly iterative process. Each of these tasks requires purpose-built tools and expertise to achieve the level of efficiency needed for good models, which is not easy.Not anymore. The AWS AI services for natural language capabilities enable adding speech and text intelligence to applications using API calls rather than needing to develop and train models. NLU services provide the ability to convert speech to text with Amazon Transcribe (https://aws.amazon.com/transcribe/) or text to speech with Amazon Polly (https://aws.amazon.com/polly/). For NLP requirements, Amazon Textract (https://aws.amazon.com/textract/) enables applications to read and process handwritten and printed text from images and PDF documents, and with Amazon Comprehend (https://aws.amazon.com/comprehend/), applications can quickly analyze text and find insights and relationships with no prior ML training. For example, Assent, a supply chain data management company, used Amazon Textract to read forms, tables, and free-form text, and Amazon Comprehend to derive business-specific entities and values from the text. In this book, we will be walking you through how to use these services for some popular workflows. For more details, please refer to Chapter 4, Automating Document Processing Workflows.
In this section, we saw some examples of NLP's significance in solving real-world challenges, and what exactly it means. We understood that finding patterns in data can bring new meaning to light, and NLP models are very good at deriving these patterns. We then reviewed some technology challenges in NLP implementations and saw a brief overview of the AWS AI services. In the next section, we will introduce the AWS ML stack, and provide a brief overview of each of the layers.
The AWS ML services and features are organized into three layers of the stack, keeping in mind that some developers and data scientists are expert ML practitioners who are comfortable working with ML frameworks, algorithms, and infrastructure to build, train, and deploy models.
For these experts, the bottom layer of the AWS ML stack offers powerful CPU and GPU compute instances (the https://aws.amazon.com/ec2/instance-types/p4/ instances offer the highest performance for ML training in the cloud today), support for major ML frameworks including TensorFlow, PyTorch, and MXNet, which customers can use to build models with Amazon SageMaker as a managed experience, or usingdeep learning AMIs and containers on Amazon EC2 instances.
You can see the three layers of the AWS ML stack in the next figure. For more details, please refer to https://aws.amazon.com/machine-learning/infrastructure/:
To make ML more accessible and expansive, at the middle layer of the stack, Amazon SageMaker is a fully managed ML platform that removes the undifferentiated heavy lifting at each step of the ML process. Launched in 2018, SageMaker is one of the fastest-growing services in AWS history and is built on Amazon's two decades of experience in building real-world ML applications. With SageMaker Studio, developers and data scientists have the first fully integrated development environment designed specifically for ML. To learn how to build ML models using Amazon SageMaker, refer to Julien Simon's book, Learn Amazon SageMaker, also published by Packt (https://www.packtpub.com/product/learn-amazon-sagemaker/9781800208919):
Figure 1.7 – A tabular list of Amazon SageMaker features for each step of the ML workflow
For customers who are not interested in dealing with models and training, at the top layer of the stack, the AWS AI services provide pre-trained models with easy integration by means of API endpoints for common ML use cases including speech, text, vision, recommendations, and anomaly detection:
Figure 1.8 – AWS AI services
Alright, it's time that we started getting technical. Now that we understand how cloud computing played a major role in bringing ML and AI to the mainstream and how adding NLP to your application can accelerate business outcomes, let's deep dive into the NLP services Amazon Textract for document analysis and Amazon Comprehend for advanced text analytics.
Ready? Let's go!!
In this chapter, we introduced NLP by tracing the origins of AI, how it evolved over the last few decades, and how the application of AI became mainstream with the significant advances made with ML algorithms. We reviewed some examples of these algorithms, along with an example of how they can be used. We then pivoted to AI trends and saw how AI adoption grew exponentially over the last few years and has become a key technology in accelerating enterprise business value.
We read a cool example of how ExxonMobil uses Alexa at their gas stations and delved into how AI was created to mimic human cognition, and the broad categories of their applicability, such as text, speech, and vision. We saw how AI in natural language has two main areas of usage NLU for voice-based uses and NLP for deriving insights from text.
In analyzing how enterprises are building NLP models today, we reviewed some of the common challenges and how to mitigate them, such as digitizing paper-based text, collecting data from disparate sources, and understanding patterns in data, and how resource-intensive these solutions can be.
