Natural Language Processing with TensorFlow - Thushan Ganegedara - E-Book

Natural Language Processing with TensorFlow E-Book

Thushan Ganegedara

0,0
31,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Natural language processing (NLP) supplies the majority of data available to deep learning applications, while TensorFlow is the most important deep learning framework currently available. Natural Language Processing with TensorFlow brings TensorFlow and NLP together to give you invaluable tools to work with the immense volume of unstructured data in today’s data streams, and apply these tools to specific NLP tasks.
Thushan Ganegedara starts by giving you a grounding in NLP and TensorFlow basics. You'll then learn how to use Word2vec, including advanced extensions, to create word embeddings that turn sequences of words into vectors accessible to deep learning algorithms. Chapters on classical deep learning algorithms, like convolutional neural networks (CNN) and recurrent neural networks (RNN), demonstrate important NLP tasks as sentence classification and language generation. You will learn how to apply high-performance RNN models, like long short-term memory (LSTM) cells, to NLP tasks. You will also explore neural machine translation and implement a neural machine translator.
After reading this book, you will gain an understanding of NLP and you'll have the skills to apply TensorFlow in deep learning NLP applications, and how to perform specific NLP tasks.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 580

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Natural Language Processing with TensorFlow

Natural Language Processing with TensorFlow

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Acquisition Editor: Frank Pohlmann

Project Editor: Radhika Atitkar

Content Development Editor: Chris Nelson

Technical Editor: Bhagyashree Rai

Copy Editor: Tom Jacob

Proofreader: Safis Editing

Indexer: Rekha Nair

Graphics: Tom Scaria

Production Coordinator: Nilesh Mohite

First published: May 2018

Production reference: 2310518

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78847-831-1

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionalsLearn better with Skill Plans built especially for youGet a free eBook or video every monthMapt is fully searchableCopy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Thushan Ganegedara is currently a third year Ph.D. student at the University of Sydney, Australia. He is specializing in machine learning and has a liking for deep learning. He lives dangerously and runs algorithms on untested data. He also works as the chief data scientist for AssessThreat, an Australian start-up. He got his BSc. (Hons) from the University of Moratuwa, Sri Lanka. He frequently writes technical articles and tutorials about machine learning. Additionally, he also strives for a healthy lifestyle by including swimming in his daily schedule.

I would like to thank my parents, my siblings, and my wife for the faith they had in me and the support they have given, also all my teachers and my Ph.D advisor for the guidance he provided me with.

About the reviewers

Motaz Saad holds a Ph.D. in computer science from the University of Lorraine. He loves data and he likes to play with it. He has over 10 years, professional experience in NLP, computational linguistics, data science, and machine learning. He currently works as an assistant professor at the faculty of information technology, IUG.

Dr Joseph O'Connor is a data scientist with a deep passion for deep learning. His company, Deep Learn Analytics, a UK-based data science consultancy, works with businesses to develop machine learning applications and infrastructure from concept to deployment. He was awarded a Ph.D. from University College London for his work analyzing data on the MINOS high-energy physics experiment. Since then, he has developed ML products for a number of companies in the private sector, specializing in NLP and time series forecasting. You can find him at http://deeplearnanalytics.com/.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Preface

In the digital information age that we live in, the amount of data has grown exponentially, and it is growing at an unprecedented rate as we read this. Most of this data is language-related data (textual or verbal), such as emails, social media posts, phone calls, and web articles. Natural Language Processing (NLP) leverages this data efficiently to help humans in their businesses or day-to-day tasks. NLP has already revolutionized the way we use data to improve both businesses and our lives, and will continue to do so in the future.

One of the most ubiquitous use cases of NLP is Virtual Assistants (VAs), such as Apple's Siri, Google Assistant, and Amazon Alexa. Whenever you ask your VA for "the cheapest rates for hotels in Switzerland," a complex series of NLP tasks are triggered. First, your VA needs to understand (parse) your request (for example, learn that it needs to retrieve hotel rates, not the dog parks). Another decision the VA needs to make is "what is cheap?". Next, the VA needs to rank the cities in Switzerland (perhaps based on your past traveling history). Then, the VA might crawl websites such as Booking.com and Agoda.com to fetch the hotel rates in Switzerland and rank them by analyzing both the rates and reviews for each hotel. As you can see, the results you see in a few seconds are a result of a very intricate series of complex NLP tasks.

So, what makes such NLP tasks so versatile and accurate for our everyday tasks? The underpinning elements are "deep learning" algorithms. Deep learning algorithms are essentially complex neural networks that can map raw data to a desired output without requiring any sort of task-specific feature engineering. This means that you can provide a hotel review of a customer and the algorithm can answer the question "How positive is the customer about this hotel?", directly. Also, deep learning has already reached, and even exceeded, human-level performance in a variety of NLP tasks (for example, speech recognition and machine translation).

By reading this book, you will learn how to solve many interesting NLP problems using deep learning. So, if you want to be an influencer who changes the world, studying NLP is critical. These tasks range from learning the semantics of words, to generating fresh new stories, to performing language translation just by looking at bilingual sentence pairs. All of the technical chapters are accompanied by exercises, including step-by-step guidance for readers to implement these systems. For all of the exercises in the book, we will be using Python with TensorFlow—a popular distributed computation library that makes implementing deep neural networks very convenient.

Who this book is for

This book is for aspiring beginners who are seeking to transform the world by leveraging linguistic data. This book will provide you with a solid practical foundation for solving NLP tasks. In this book, we will cover various aspects of NLP, focusing more on the practical implementation than the theoretical foundation. Having sound practical knowledge of solving various NLP tasks will help you to have a smoother transition when learning the more advanced theoretical aspects of these methods. In addition, a solid practical understanding will help when performing more domain-specific tuning of your algorithms, to get the most out of a particular domain.

What this book covers

Chapter 1, Introduction to Natural Language Processing, embarks us on our journey with a gentle introduction to NLP. In this chapter, we will first look at the reasons we need NLP. Next, we will discuss some of the common subtasks found in NLP. Thereafter, we will discuss the two main eras of NLP—the traditional era and the deep learning era. We will gain an understanding of the characteristics of the traditional era by working through how a language modeling task might have been solved with traditional algorithms. Then, we will discuss the deep learning era, where deep learning algorithms are heavily utilized for NLP. We will also discuss the main families of deep learning algorithms. We will then discuss the fundamentals of one of the most basic deep learning algorithms—a fully connected neural network. We will conclude the chapter with a road map that provides a brief introduction to the coming chapters.

Chapter 2, Understanding TensorFlow, introduces you to the Python TensorFlow library—the primary platform we will implement our solutions on. We will start by writing code to perform a simple calculation in TensorFlow. We will then discuss how things are executed, starting from running the code to getting results. Thereby, we will understand the underlying components of TensorFlow in detail. We will further strengthen our understanding of TensorFlow with a colorful analogy of a restaurant and see how orders are fulfilled. Later, we will discuss more technical details of TensorFlow, such as the data structures and operations (mostly related to neural networks) defined in TensorFlow. Finally, we will implement a fully connected neural network to recognize handwritten digits. This will help us to understand how an end-to-end solution might be implemented with TensorFlow.

Chapter 3, Word2vec – Learning Word Embeddings, begins by discussing how to solve NLP tasks with TensorFlow. In this chapter, we will see how neural networks can be used to learn word vectors or word representations. Word vectors are also known as word embeddings. Word vectors are numerical representations of words that have similar values for similar words and different values for different words. First, we will discuss several traditional approaches to achieving this, which include using a large human-built knowledge base known as WordNet. Then, we will discuss the modern neural network-based approach known as Word2vec, which learns word vectors without any human intervention. We will first understand the mechanics of Word2vec by working through a hands-on example. Then, we will discuss two algorithmic variants for achieving this—the skip-gram and continuous bag-of-words (CBOW) model. We will discuss the conceptual details of the algorithms, as well as how to implement them in TensorFlow.

Chapter 4, Advance Word2vec, takes us on to more advanced topics related to word vectors. First, we will compare skip-gram and CBOW to see whether a winner exists. Next, we will discuss several improvements that can be used to improve the performance of the Word2vec algorithms. Then, we will discuss a more recent and powerful word embedding learning algorithm—the GloVe (global vectors) algorithm. Finally, we will look at word vectors in action, in a document classification task. In that exercise, we will see that word vectors are powerful enough to represent the topic (for example, entertainment and sport) that the document belongs to.

Chapter 5, Sentence Classification with Convolutional Neural Networks, discusses convolution neural networks (CNN)—a family of neural networks that excels at processing spatial data such as images or sentences. First, we will develop a solid high-level understanding of CNNs by discussing how they process data and what sort of operations are involved. Next, we will dive deep into each of the operations involved in the computations of a CNN to understand the underpinning mathematics of a CNN. Finally, we will walk through two exercises. First, we will classify hand written digit images with a CNN. We will see that CNNs are is capable of reaching a very high accuracy quickly for this task. Next, we will explore how CNNs can be used to classify sentences. Particularly, we will ask a CNN to predict whether a sentence is about an object, person, location, and so on.

Chapter 6, Recurrent Neural Networks, is about a powerful family of neural networks that can model sequences of data, known as recurrent neural networks (RNNs). We will first discuss the mathematics behind the RNNs and the update rules that are used to update the RNNs over time during learning. Then, we will discuss section different variants of RNNs and their applications (for example, one-to-one RNNs and one-to-many RNNs). Finally, we will go through an exercise where RNNs are used for a text generation task. In this, we will train the RNN on folk stories and ask the RNN to produce a new story. We will see that RNNs are poor at persisting long-term memory. Finally, we will discuss a more advanced variant of RNNs, which we will call RNN-CF, which is able to persist memory for longer.

Chapter 7, Long Short-Term Memory Networks, allows us to explore more powerful techniques that are able to remember for a longer period of time, having found out that RNNs are poor at retaining long-term memory. We will discuss one such technique in this chapter—Long Short-Term Memory Networks (LSTMs). LSTMs are more powerful and have been shown to outperform other sequential models in many time-series tasks. We will first investigate the underlying mathematics and update the rules of the LSTM, along with a colorful example that illustrates why each computation matters. Then, we will look at how LSTMs can persist memory for longer. Next, we will discuss how we can improve LSTMs prediction capabilities further. Finally, we will discuss several variants of LSTMs that have a more complex structure (LSTMs with peephole connections), as well as a method that tries to simplify the LSTMs gated recurrent units (GRUs).

Chapter 8, Applications of LSTM – Generating Text, extensively evaluates how LSTMs perform in a text generation task. We will qualitatively and quantitatively measure how good the text generated by LSTMs is. We will also conduct comparisons between LSTMs, LSTMs with peephole connections, and GRUs. Finally, we will see how we can bring word embeddings into the model to improve the text generated by LSTMs.

Chapter 9, Applications of LSTM – Image Caption Generation, moves us on to multimodal data (that is, images and text) after coping with textual data. In this chapter, we will investigate how we can automatically generate descriptions for a given image. This involves combining a feed-forward model (that is, a CNN) with a word embedding layer and a sequential model (that is, an LSTM) in a way that forms an end-to-end machine learning pipeline.

Chapter 10, Sequence to Sequence Learning – Neural Machine Translation, is about the implementing neural machine translation (NMT) model. Machine translation is where we translate a sentence/phrase from a source language into a target language. We will first briefly discuss what machine translation is. This will be followed by a section about the history of machine translation. Then, we will discuss the architecture of modern neural machine translation models in detail, including the training and inference procedures. Next, we will look at how to implement an NMT system from scratch. Finally, we will explore ways to improve standard NMT systems.

Chapter 11, Current Trends and Future of Natural Language Processing, the final chapter, focuses on the current and future trends of NLP. We will discuss the latest discoveries related to the systems and tasks we discussed in the previous chapters. This chapter will cover most of the exciting novel innovations, as well as giving you in-depth intuition to implement some of the technologies.

Appendix, Mathematical Foundations and Advanced TensorFlow, will introduce the reader to various mathematical data structures (for example, matrices) and operations (for example, matrix inverse). We will also discuss several important concepts in probability. We will then introduce Keras—a high-level library that uses TensorFlow underneath. Keras makes the implementing of neural networks simpler by hiding some of the details in TensorFlow, which some might find challenging. Concretely, we will see how we can implement a CNN with Keras, to get a feel of how to use Keras. Next, we will discuss how we can use the seq2seq library in TensorFlow to implement a neural machine translation system with much less code that we used in Chapter 11, Current Trends and the Future of Natural Language Processing. Finally, we will walk you through a guide aimed at teaching to use the TensorBoard to visualize word embeddings. TensorBoard is a handy visualization tool that is shipped with TensorFlow. This can be used to visualize and monitor various variables in your TensorFlow client.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected], and mention the book's title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book we would be grateful if you would report this to us. Please visit, http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Tasks of Natural Language Processing

NLP has a multitude of real-world applications. A good NLP system is that which performs many NLP tasks. When you search for today's weather on Google or use Google Translate to find out how to say, "How are you?" in French, you rely on a subset of such tasks in NLP. We will list some of the most ubiquitous tasks here, and this book covers most of these tasks:

Tokenization: Tokenization is the task of separating a text corpus into atomic units (for example, words). Although it may seem trivial, tokenization is an important task. For example, in the Japanese language, words are not delimited by spaces nor punctuation marks.Word-sense Disambiguation (WSD): WSD is the task of identifying the correct meaning of a word. For example, in the sentences, The dog barked at the mailman, and Tree bark is sometimes used as a medicine, the word bark has two different meanings. WSD is critical for tasks such as question answering.Named Entity Recognition (NER): NER attempts to extract entities (for example, person, location, and organization) from a given body of text or a text corpus. For example, the sentence, John gave Mary two apples at school on Monday will be transformed to [John]name gave [Mary]name [two]number apples at [school]organization on [Monday.]time. NER is an imperative topic in fields such as information retrieval and knowledge representation.Part-of-Speech (PoS) tagging: PoS tagging is the task of assigning words to their respective parts of speech. It can either be basic tags such as noun, verb, adjective, adverb, and preposition, or it can be granular such as proper noun, common noun, phrasal verb, verb, and so on.Sentence/Synopsis classification: Sentence or synopsis (for example, movie reviews) classification has many use cases such as spam detection, news article classification (for example, political, technology, and sport), and product review ratings (that is, positive or negative). This is achieved by training a classification model with labeled data (that is, reviews annotated by humans, with either a positive or negative label).Language generation: In language generation, a learning model (for example, neural network) is trained with text corpora (a large collection of textual documents), which predict new text that follows. For example, language generation can output an entirely new science fiction story by using existing science fiction stories for training.Question Answering (QA): QA techniques possess a high commercial value, and such techniques are found at the foundation of chatbots and VA (for example, Google Assistant and Apple Siri). Chatbots have been adopted by many companies for customer support. Chatbots can be used to answer and resolve straightforward customer concerns (for example, changing a customer's monthly mobile plan), which can be solved without human intervention. QA touches upon many other aspects of NLP such as information retrieval, and knowledge representation. Consequently, all this makes developing a QA system very difficult.Machine Translation (MT): MT is the task of transforming a sentence/phrase from a source language (for example, German) to a target language (for example, English). This is a very challenging task as, different languages have highly different morphological structures, which means that it is not a one-to-one transformation. Furthermore, word-to-word relationships between languages can be one-to-many, one-to-one, many-to-one, or many-to-many. This is known as the word alignment problem in MT literature.

Finally, to develop a system that can assist a human in day-to-day tasks (for example, VA or a chatbot) many of these tasks need to be performed together. As we saw in the previous example where the user asks, "Can you show me a good Italian restaurant nearby?" several different NLP tasks, such as speech-to-text conversion, semantic and sentiment analyses, question answering, and machine translation, need to be completed. In Figure 1.1, we provide a hierarchical taxonomy of different NLP tasks categorized into several different types. We first have two broad categories: analysis (analyzing existing text) and generation (generating new text) tasks. Then we divide analysis into three different categories: syntactic (language structure-based tasks), semantic (meaning-based tasks), and pragmatic (open problems difficult to solve):

Figure 1.1: A taxonomy of the popular tasks of NLP categorized under broader categories

Having understood the various tasks in NLP, let us now move on to understand how we can solve these tasks with the help of machines.

The deep learning approach to Natural Language Processing

I think it is safe to assume that deep learning revolutionized machine learning, especially in fields such as computer vision, speech recognition, and of course, NLP. Deep models created a wave of paradigm shifts in many of the fields in machine learning, as deep models learned rich features from raw data instead of using limited human-engineered features. This consequentially caused the pesky and expensive feature engineering to be obsolete. With this, deep models made the traditional workflow more efficient, as deep models perform feature learning and task learning, simultaneously. Moreover, due to the massive number of parameters (that is, weights) in a deep model, it can encompass significantly more features than a human would've engineered. However, deep models are considered a black box due to the poor interpretability of the model. For example, understanding the "how" and "what" features learnt by deep models for a given problem still remains an open problem.

A deep model is essentially an artificial neural network that has an input layer, many interconnected hidden layers in the middle, and finally, an output layer (for example, a classifier or a regressor). As you can see, this forms an end-to-end model from raw data to predictions. These hidden layers in the middle give the power to deep models as they are responsible for learning the good features from raw data, eventually succeeding at the task at hand.

History of deep learning

Let's briefly discuss the roots of deep learning and how the field evolved to be a very promising technique for machine learning. In 1960, Hubel and Weisel performed an interesting experiment and discovered that a cat's visual cortex is made of simple and complex cells, and that these cells are organized in a hierarchical form. Also, these cells react differently to different stimuli. For example, simple cells are activated by variously oriented edges while complex cells are insensitive to spatial variations (for example, the orientation of the edge). This kindled the motivation for replicating a similar behavior in machines, giving rise to the concept of deep learning.

In the years that followed, neural networks gained the attention of many researchers. In 1965, a neural network trained by a method known as the Group Method of Data Handling (GMDH) and based on the famous Perceptron by Rosenblatt, was introduced by Ivakhnenko and others. Later, in 1979, Fukushima introduced the Neocognitron, which laid the base for one of the most famous variants of deep models—Convolution Neural Networks. Unlike the perceptrons, which always took in a 1D input, a neocognitron was able to process 2D inputs using convolution operations.

Artificial neural networks used to backpropagate the error signal to optimize the network parameters by computing a Jacobian matrix from one layer to the layer before it. Furthermore, the problem of vanishing gradients strictly limited the potential number of layers (depth) of the neural network. The gradients of layers closer to the inputs, being very small, is known as the vanishing gradients phenomenon. This transpired due to the application of the chain rule to compute gradients (the Jacobian matrix) of lower layer weights. This in turn limited the plausible maximum depth of classical neural networks.

Then in 2006, it was found that pretraining a deep neural network by minimizing the reconstruction error (obtained by trying to compress the input to a lower dimensionality and then reconstructing it back into the original dimensionality) for each layer of the network, provides a good initial starting point for the weight of the neural network; this allows a consistent flow of gradients from the output layer to the input layer. This essentially allowed neural network models to have more layers without the ill-effects of the vanishing gradient. Also, these deeper models were able to surpass traditional machine learning models in many tasks, mostly in computer vision (for example, test accuracy for the MNIST hand-written digit dataset). With this breakthrough, deep learning became the buzzword in the machine learning community.

Things started gaining a progressive momentum, when in 2012, AlexNet (a deep convolution neural network created by Alex Krizhevsky (http://www.cs.toronto.edu/~kriz/), Ilya Sutskever (http://www.cs.toronto.edu/~ilya/), and Geoff Hinton) won the Large Scale Visual Recognition Challenge (LSVRC) 2012 with an error decrease of 10% from the previous best. During this time, advances were made in speech recognition, wherein state-of-the-art speech recognition accuracies were reported using deep neural networks. Furthermore, people began realizing that Graphical Processing Units (GPUs) enable more parallelism, which allows for faster training of larger and deeper networks compared with Central Processing Units (CPUs).

Deep models were further improved with better model initialization techniques (for example, Xavier initialization), making the time-consuming pretraining redundant. Also, better nonlinear activation functions, such as Rectified Linear Units (ReLUs), were introduced, which alleviated the ill-effects of the vanishing gradient in deeper models. Better optimization (or learning) techniques, such as Adam, automatically tweaked individual learning rates of each parameter among the millions of parameters that we have in the neural network model, which rewrote the state-of-the-art performance in many different fields of machine learning, such as object classification and speech recognition. These advancements also allowed neural network models to have large numbers of hidden layers. The ability to increase the number of hidden layers (that is, to make the neural networks deep) is one of the primary contributors to the significantly better performance of neural network models compared with other machine learning models. Furthermore, better intermediate regularizers, such as batch normalization layers, have improved the performance of deep nets for many tasks.

Later, even deeper models such as ResNets, Highway Nets, and Ladder Nets were introduced, which had hundreds of layers and billions of parameters. It was possible to have such an enormous number of layers with the help of various empirically and theoretically inspired techniques. For example, ResNets use shortcut connections to connect layers that are far apart, which minimizes the diminishing of gradients, layer to layer, as discussed earlier.

The current state of deep learning and NLP

Many different deep models have seen the light since their inception in early 2000. Even though they share a resemblance, such as all of them using nonlinear transformation of the inputs and parameters, the details can vary vastly. For example, a Convolution Neural Network (CNN) can learn from two-dimensional data (for example, RGB images) as it is, while a multilayer perceptron model requires the input to be unwrapped to a one-dimensional vector, causing loss of important spatial information.

When processing text, as one of the most intuitive interpretations of text is to perceive it as a sequence of characters, the learning model should be able to do time-series modelling, thus requiring the memory of the past. To understand this, think of a language modelling task; the next word for the word cat should be different from the next word for the word climbed. One such popular model that encompasses this ability is known as a Recurrent Neural Network (RNN). We will see in Chapter 6, Recurrent Neural Networks how exactly RNNs achieve this by going through interactive exercises.

It should be noted that memory is not a trivial operation that is inherent to a learning model. Conversely, ways of persisting memory should be carefully designed. Also, the term memory should not be confused with the learned weights of a non-sequential deep network that only looks at the current input, where a sequential model (for example, RNN) will look at both the learned weights and the previous element of the sequence to predict the next output.

One prominent drawback of RNNs is that they cannot remember more than few (approximately 7) time steps, thus lacking long-term memory. Long Short-Term Memory (LSTM) networks are an extension of RNNs that encapsulate long-term memory. Therefore, often LSTMs are preferred over standard RNNs, nowadays. We will peek under the hood in Chapter 7, Long Short-Term Memory Networks to understand them better.

In summary, we can mainly separate deep networks into two categories: the non-sequential models that deal with only a single input at a time for both training and prediction (for example, image classification) and the sequential models that cope with sequences of inputs of arbitrary length (for example, text generation where a single word is a single input). Then we can categorize non-sequential (also called feed-forward) models into deep (approximately less than 20 layers) and very deep networks (can be greater than hundreds of layers). The sequential models are categorized into short-term memory models (for example, RNNs), which can only memorize short-term patterns and long-term memory models, which can memorize longer patterns. In Figure 1.4, we outline the discussed taxonomy. It is not expected that you understand these different deep learning models fully at this point, but it only illustrates the diversity of the deep learning models:

Figure 1.4: A general taxonomy of the most commonly used deep learning methods, categorized into several classes

Understanding a simple deep model – a Fully-Connected Neural Network

Now let's have a closer look at a deep neural network in order to gain a better understanding. Although there are numerous different variants of deep models, let's look at one of the earliest models (dating back to 1950-60), known as a Fully-Connected Neural Network (FCNN), or sometimes called a multilayer perceptron. The Figure 1.5 depicts a standard three-layered FCNN.