Text Analysis with Python - Mamta Mittal - E-Book

Text Analysis with Python E-Book

Mamta Mittal

0,0
69,68 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Text Analysis with Python: A Research-Oriented Guide is a quick and comprehensive reference on text mining using python code. The main objective of the book is to equip the reader with the knowledge to apply various machine learning and deep learning techniques to text data. The book is organized into eight chapters which present the topic in a structured and progressive way.

Key Features
· Introduces the reader to Python programming and data processing
· Introduces the reader to the preliminaries of natural language processing (NLP)
· Covers data analysis and visualization using predefined python libraries and datasets
· Teaches how to write text mining programs in Python
· Includes text classification and clustering techniques
· Informs the reader about different types of neural networks for text analysis
· Includes advanced analytical techniques such as fuzzy logic and deep learning techniques
· Explains concepts in a simplified and structured way that is ideal for learners
· Includes References for further reading

Text Analysis with Python: A Research-Oriented Guide is an ideal guide for students in data science and computer science courses, and for researchers and analysts who want to work on artificial intelligence projects that require the application of text mining and NLP techniques.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 176

Veröffentlichungsjahr: 2002

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents
BENTHAM SCIENCE PUBLISHERS LTD.
End User License Agreement (for non-institutional, personal use)
Usage Rules:
Disclaimer:
Limitation of Liability:
General:
PREFACE
CONSENT FOR PUBLICATION
CONFLICT OF INTEREST
ACKNOWLEDGEMENT
Introduction
Abstract
1.1. INTRODUCTION
1.2. NATURAL LANGUAGE
1.2.1. From Linguistics to Natural Language Processing (NLP)
1.2.2. Natural Language Processing (NLP)
1.3. TEXT ANALYSIS
1.3.1. Advantages
1.3.2. Methods & Techniques
1.3.3. Sentiment Analysis (SA)
1.3.4. Topic Modelling
1.3.5. Intent Identification
1.3.6. Keyword Extraction
1.3.7. Entity Recognition
1.3.8. Text Analysis Functionality
1.4. TEXT SUMMARIZATION
1.4.1. Extraction
1.4.2. Abstractive Summarization
1.5. TEXT MINING AND WORKFLOW
1.5.1. Data Recovery
1.5.2. Data Extraction
1.5.3. Data Mining
CONCLUSION
REFERENCES
Introduction to Python
Abstract
2.1. INTRODUCTION
2.2. WORKING ENVIRONMENTS OF PYTHON
Google Colab
Features of Google Collaboratory (COLAB)
2.3.WORKING WITH ANACONDA
Steps to Anaconda Installation
2.4. CREATING THE FIRST PROJECT IN GOOGLE COLAB CREATING THE FIRST PROJECT IN GOOGLE COLAB CREATING THE FIRST PROJECT IN GOOGLE COLAB CREATING THE FIRST PROJECT IN GOOGLE COLAB
2.5. Mathematical Operations
2.6. PYTHON LIBRARIES AND CONCEPTS
Libraries
a). Math and CMath Libraries
b). SciPy Library
c). ScikitLearn Library
d). NumPy Library
2.7.BASIC CONCEPTS IN PYTHON
a). Arrays
b). Data Frames
c). Loops
For loop
While Loop and the Else Branch
Program:
CONCLUSION
REFERENCES
Data Loading and Pre-Processing
Abstract
3.1. INTRODUCTION
3.1. IMPORTING DATASETS
3.2. DATA RESHAPING
3.3. PIVOT AND MELT FUNCTIONS
3.4. STACKING AND UNSTACKING
3.5. DATA PRE-PROCESSING
Outliers
Missing Value Imputation
Handling of Missing Data
Mean Calculation
Deleting of Specific Row
Dummy Variables
One Hot Encoding
3.6. DATA VISUALIZATION
• Matplotlib
• ggplot Visualization
• Geoplot Visualization
• Regression Plots
CONCLUSION
REFERENCES
Text Mining
Abstract
INTRODUCTION
The Steps Followed for Text Mining are:
Why Should we use Text Mining?
Benefits of Text Mining
Text Analysis in Real-Time
Text Mining Applications
Issues in Text Mining
4.1. TEXT MINING WITH PYTHON
Program:
Program:
Program:
Gensim Library
Program:
Output:
Program
Output
4.2. DATA GATHERING
Reading a Text File
Steps for Reading a Text File in Python
Open() Function
Syntax
open(path to file,mode)
Reading Text File
Close ()
Syntax:close()
Reading a CSV File
Steps
Reading Text from a PDF File
import PyPDF2
Program
4.3. TEXT MINING PRE-PROCESSING TECHNIQUES
Program:
Output:
Program:
Output
Program:
Program:
Program:
Output
Program:
Output:
Program:
Program:
4.4. FEATURE SELECTION IN TEXT MINING
Program
Output:
4.5. TEXT SUMMARIZATION
Program
Program:
4.6. TEXT EXTRACTION
4.6.1. Bag of Words
Program:
Limitations of Bag of Words
4.6.2. TF-IDF
Program
Output
Program:
Output:
Word2vec
Program:
Output
Document Term Matrix
Program:
Output
4.7. TEXT VISUALIZATION
Program
Output
Program
Output:
Program:
Output
Program
Output
Program
Output
Program
Output
CONCLUSION
REFERENCES
Text Classification in Python
Abstract
5.1. INTRODUCTION
5.2. TEXT CLASSIFICATION
5.3. MACHINE LEARNING-BASED TEXT CLASSIFICATION
Step by Step Explanation
5.4. APPLICATIONS OF TEXT MINING
5.4.1. Email Spam Detection
5.4.2. Social Media Reviews
5.4.3. Google Translator
5.4.4. Text labelling Based on Content
5.5. CLASSIFICATION ALGORITHMS
5.5.1.. Naïve Bayes (NB) Classifiers
Case Study: Text Classification With Naïve Bayes
Movie Review Classification Dataset
Step 1. Importing Libraries
Step 2: Importing the Dataset
Step 3: Pre-processing of Text
Step 4: Changing Text into Numbers by Defining Classification Parameters
Step 5: Training and Testing datasets
Step 6: Model training and sentiment prediction
Step 7: Model evaluation
Step 8: Model Saving and Loading
Step 9: Text Prediction
5.5.2. DECISION TREE CLASSIFIERS
Case Study Text Classification with Decision Tree Algorithms
5.5.3. Nearest Neighbour Classifier
How KNN will Work in Text Classifications
Useful Information with KNN
Case Study Text Classification with KNN
5.5.4. Support Vector Machines
From Texts to Vectors
Advantages
Case Study Text Classification with KNN
CONCLUSIONS
Chapter highlights
REFERENCES
Text Clustering in Python
Abstract
6.1. INTRODUCTION
6.2. CLUSTERING PROCESS
6.2.1. Word Clustering
6.2.2. Document Clustering
6.2.3. Term Frequency-Inverse Document Frequency (tf-idf)
6.3. APPLICATIONS OF TEXT CLUSTERING IN REAL-TIME
Identifying Fake News
Spam Filter
Marketing and Sales
Classifying Website Traffic
Identifying Fraudulent or Criminal Activity
Document Analysis
6.4. CLUSTERING ALGORITHMS WITH CODE IMPLEMENTATION
6.4.1. K-means Clustering
Advantages
Disadvantages of k-means Clustering
K means Clustering in Scikit-learn
Step 1 Importing Libraries
Step 2 Data Preparation
Step 3 Elbow Method
6.4.2. Hierarchical Clustering
How it Works
Hierarchical Clustering Applications
Hierarchical Clustering with Scikit-learn
6.4.3. Fuzzy C-means Clustering
Stepwise Approach To Performing fuzzy C-means Clustering
Fuzzy C means Clustering via Scikit-learn
CONCLUSIONS
REFERENCES
Fuzzy Logic in Text Mining Using Python
Abstract
7.1. INTRODUCTION TO FUZZY LOGIC
Steps to be Followed in the Fuzzy System
Fuzzy Membership Functions
Trapezoidal Membership Function
Gaussian Membership Function
Generalised Bell Membership Function
Sigmoid Membership Function
Fuzzy Set Operations
Why do we use Fuzzy Logic?
Uses of Fuzzy Logic in Text Mining
Applications of Fuzzy System
Issues in Fuzzy Logic
7.2. FUZZY LOGIC WITH PYTHON
FuzzyWuzzy Library
Program
Program
Program
Program
Program
Program
7.3. PREPROCESSING
Program
Program
Program
7.4. FEATURE EXTRACTION
Program
Program
7.5. FUZZY CLUSTERING
Fuzzy C-Means Clustering
Steps to Perform the fuzzy C-means Clustering Algorithm
Program
Program
Fuzzy K-Means Clustering
Program
7.6. CLASSIFICATION
Program
Program
7.7. FUZZY ASSOCIATION RULES
Program
Program
7.8. FUZZY VISUALIZATION
Program
Program
Program
Program:
CONCLUSION
CHAPTER HIGHLIGHTS
References
Deep Learning for Text Mining
Abstract
8.1. DEEP LEARNING BASICS
Neuron
Activation Functions
Why Deep Learning?
Limitations of Deep Learning
Applications of Deep Learning
8.2. DEEP LEARNING WITH PYTHON
Keras Library
Import Keras
Program
Step 1. Importing the essential python Keras libraries
Program
8.3. FEED FORWARD NEURAL NETWORK
8.4. CONVOLUTION NEURAL NETWORK (CNN)
Program
Program
8.5. Multi-Layer Perceptron (MLP)
Program
8.6. RECURRENT NEURAL NETWORK (RNN)
Text Classification using RNN.
Program
Text Generation using RNN
Program
8.7. LONG SHORT-TERM MEMORY (LSTM)
Text Generation in LSTM
Text Classification using LSTM
CONCLUSION
CHAPTER HIGHLIGHTS
REFERENCES
Text Analysis with Python: A Research Oriented Guide
Authored by
Mamta Mittal
Delhi Skill & Entrepreneurship University,
New Delhi, India
Gopi Battineni
University of Camerino,
Camerino, Italy
Bhimavarapu Usharani
Department of CSE,
Koneru Lakshmaiah Education
Foundation at Vaddeswaram,
Andhra Pradesh, India
&
Lalit Mohan Goyal
Department of Computer Engineering,
J.C. Bose University of Science & Technology,
YMCA Faridabad (Hr.), India

BENTHAM SCIENCE PUBLISHERS LTD.

End User License Agreement (for non-institutional, personal use)

This is an agreement between you and Bentham Science Publishers Ltd. Please read this License Agreement carefully before using the ebook/echapter/ejournal (“Work”). Your use of the Work constitutes your agreement to the terms and conditions set forth in this License Agreement. If you do not agree to these terms and conditions then you should not use the Work.

Bentham Science Publishers agrees to grant you a non-exclusive, non-transferable limited license to use the Work subject to and in accordance with the following terms and conditions. This License Agreement is for non-library, personal use only. For a library / institutional / multi user license in respect of the Work, please contact: [email protected].

Usage Rules:

All rights reserved: The Work is the subject of copyright and Bentham Science Publishers either owns the Work (and the copyright in it) or is licensed to distribute the Work. You shall not copy, reproduce, modify, remove, delete, augment, add to, publish, transmit, sell, resell, create derivative works from, or in any way exploit the Work or make the Work available for others to do any of the same, in any form or by any means, in whole or in part, in each case without the prior written permission of Bentham Science Publishers, unless stated otherwise in this License Agreement.You may download a copy of the Work on one occasion to one personal computer (including tablet, laptop, desktop, or other such devices). You may make one back-up copy of the Work to avoid losing it.The unauthorised use or distribution of copyrighted or other proprietary content is illegal and could subject you to liability for substantial money damages. You will be liable for any damage resulting from your misuse of the Work or any violation of this License Agreement, including any infringement by you of copyrights or proprietary rights.

Disclaimer:

Bentham Science Publishers does not guarantee that the information in the Work is error-free, or warrant that it will meet your requirements or that access to the Work will be uninterrupted or error-free. The Work is provided "as is" without warranty of any kind, either express or implied or statutory, including, without limitation, implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the results and performance of the Work is assumed by you. No responsibility is assumed by Bentham Science Publishers, its staff, editors and/or authors for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products instruction, advertisements or ideas contained in the Work.

Limitation of Liability:

In no event will Bentham Science Publishers, its staff, editors and/or authors, be liable for any damages, including, without limitation, special, incidental and/or consequential damages and/or damages for lost data and/or profits arising out of (whether directly or indirectly) the use or inability to use the Work. The entire liability of Bentham Science Publishers shall be limited to the amount actually paid by you for the Work.

General:

Any dispute or claim arising out of or in connection with this License Agreement or the Work (including non-contractual disputes or claims) will be governed by and construed in accordance with the laws of Singapore. Each party agrees that the courts of the state of Singapore shall have exclusive jurisdiction to settle any dispute or claim arising out of or in connection with this License Agreement or the Work (including non-contractual disputes or claims).Your rights under this License Agreement will automatically terminate without notice and without the need for a court order if at any point you breach any terms of this License Agreement. In no event will any delay or failure by Bentham Science Publishers in enforcing your compliance with this License Agreement constitute a waiver of any of its rights.You acknowledge that you have read this License Agreement, and agree to be bound by its terms and conditions. To the extent that any other terms and conditions presented on any website of Bentham Science Publishers conflict with, or are inconsistent with, the terms and conditions set out in this License Agreement, you acknowledge that the terms and conditions set out in this License Agreement shall prevail.

Bentham Science Publishers Pte. Ltd. 80 Robinson Road #02-00 Singapore 068898 Singapore Email: [email protected]

PREFACE

The book focused on the latest research in the field of text mining using python code. The main objective of the book is to apply various machine learning and deep learning techniques to textual data. Natural language processing and fuzzy rule generation are also discussed in detail along with a basic introduction to python, data handing and shaping. Various data sets are used to show various techniques of text mining in the different research domains. This book is beneficial for the audience who want to work in the field related to text mining. In the book, the authors have presented the content of the book in simple and understandable manner to the reader by using the step-by-step implementation of different algorithms. This book will teach text mining concepts from scratch which is organized in eight chapters.

Chapter 1 covers the basics and preliminaries of natural language processing. This chapter gives the basic idea about text mining workflow, information retrieval and extraction.

Chapter 2 provides a brief introduction to the python programming language. This chapter focuses on the core Python language and important libraries to do the natural language processing by using the different IDEs like Anaconda and Google Co laboratory.

Chapter 3 discusses the data analysis concentrating on data loading and pre-processing concepts of text mining using the python language, learning about importing some predefined Python libraries and visualization techniques using the python various modules.

Chapter 4 discusses the basics of text mining and writing the python programs by using the NLP open-source libraries. The chapter also discusses different text mining techniques like pre-processing, feature selection, feature extraction, text summarization with detailed examples.

Chapter 5 presents details about text classification and text prediction techniques. In this chapter, we have given the real movie review dataset and discussed four classifiers namely naive Bayes, random forest, k-nearest neighbour, and support vector machine in detail.

Chapter 6 presents the details about how to conduct text clustering in python by unsupervised machine learning techniques. To explain this, we adopted the IRIS dataset which is famous in UCI Machine learning Repository and well presented with python script.

Chapter 7 discusses fuzzy logic, different membership functions, their applications, challenges and how to implement different text mining concepts like pre-processing, feature extraction, clustering, association rules, and classification using the fuzzy membership functions and fuzzy rules.

Chapter 8 provides details about deep learning in text mining using python. In this chapter, the basics of deep learning, different activation functions, their applications, challenges and how to write the python program using deep learning have been presented and explained clearly.

CONSENT FOR PUBLICATION

Not applicable.

CONFLICT OF INTEREST

The author declares no conflict of interest, financial or otherwise.

ACKNOWLEDGEMENT

Declared none.

Mamta Mittal Delhi Skill & Entrepreneurship University, New Delhi, IndiaGopi Battineni University of Camerino, Camerino, ItalyMs Bhimavarapu Usharani Department of CSE, Koneru Lakshmaiah Education Foundation at Vaddeswaram, Andhra Pradesh, India &Lalit Mohan Goyal Department of Computer Engineering, J.C. Bose University of Science & Technology,

Introduction

Mittal Mamta,Battineni Gopi,Usharani Bhimavarapu,Mohan Goyal Lalit

Abstract

In this chapter, the basic idea of what text data is, along with the definition of natural language and techniques involved in it has been clearly explained. Besides this, the framework of text mining and its workflow framework is explained.

Keywords: Data mining, Linguistics, NLP, Sentimental analysis, Text data.

1.1. INTRODUCTION

The text information accessible on the web and personal computers is increasing quickly, but dealing with that text data is not an easy task. To deal with the task, the introduction of intelligent algorithms was made to recover significant data from the data archives [1]. This data recovery is called text mining. Text mining is not a new concept, it is an advanced concept from data mining, and all the algorithms of data mining can be employed on the text data [2]. The difference between the idea of actual data mining and text mining is data mining. It is only applied to organized data, whereas text mining can be applied both on uni structural and semi-structural data. These days, text mining depends upon whether we are searching for the text context or content of the text.

As we said, the text mining functions largely on unstructured data, in reality, to make this conceivable, the data has to be converted into a semi-structured or structured manner so the data mining-based machine learning algorithms (ML) can be applied easily [3]. This data conversion is done by data pre-processing techniques. The pre-processing of the text data is an essential step as the text data prepared for the mining is set up. If we do not do this, a data conflict might occur and it may be possible to attain comprehensive investigation results. Thus, during data pre-processing, all the accentuation and immaterial words are removed. Words can be gathered into groups, and can stem from their roots. All missing features can be replaced with average values. Text case could be replaced with a unique value. Based on the necessity of the application, we can apply various advances [4]. After the data pre-processing, the data is to be changed over into a vector space model, and on to that vector-space model, different algorithms work.

Clustering operations, navigation, and visualization can be applied to gather comparable patterns if users are looking for support with text content. Finding connections, identifying relationships, and summarizing texts require content analysis, including data recovery, data extraction, and natural language processing (NLP) [ 5 ]. In this chapter, we will explore the idea of NLP and its philosophy. Besides, we will understand what text data is and its syntax, text pre-processing steps and applications of text in the real world.

1.2. NATURAL LANGUAGE

A natural language is a way in which we speak to each other. Both spoken and written communication are natural languages. Most of our daily lives are filled with text data, and we can imagine how often we come across text data in the form of menus, E-mails, SMS, signs, webpages, and others [6]. As a species, we may address one another more than we write. It may be easier to learn how to speak than to write. We communicate primarily through voice and text. As a result of the importance of this type of data, we should have techniques to use and understand regular language, just as we do with other types of information.

The problem with natural language is that it is chaotic and has few guidelines. We can still understand each other most of the time just by looking at each other. However, human languages are pretty questionable. They are also constantly changing and evolving. Humans are exceptional at producing and understanding language and are capable of expressing, interpreting, and interpreting others' meanings in a nuanced way. Despite our extraordinary proficiency in a language, we are also very bad at understanding and describing the guidelines that govern speech.

It is still not possible to function with natural language on text data. A half-century of study has been devoted to it, which is difficult to comprehend. From the perspective of the youngster, who spends years learning a language, it is hard for the senior student of language, it is hard for the researcher who uses proposed experiments to demonstrate important phenomena, and it is hard for the architect who builds frameworks that arrange with natural language output. Hence, Turing made familiar discussions in normal language stand out in his test for understanding [7].

1.2.1. From Linguistics to Natural Language Processing (NLP)

Language is studied scientifically, including punctuation, semantics, and phonetics. Old-style phonetics involved formulating and evaluating rules of language [8]. However, in general, the difficulties associated with understanding natural language oppose clean mathematical formalisms.

The term linguist can refer to anyone who examines language. However, a self-described language specialist may be more focused on being out in the field. Science includes mathematics, where mathematicians dealing with natural language might refer to their work as mathematical semantics, focusing on discrete mathematical formulas and hypotheses that can be applied to natural language.

Computational linguistics, on the other hand, uses computer knowledge to study linguistics. Using computational tools and thinking has overtaken most fields. Linguistics might be the present computational language specialist. The study of computational semantics is the application of computers to the generation and understanding of natural language. The use of computational linguistics in grammar testing is a natural purpose for theoretical linguistics.

By composing and running software, large text datasets can be mined, and new and different things can be discovered. Due to their improved outcomes, robustness, and speed, statistical techniques and factual machine learning have largely replaced the old-style hierarchical principle-based ways of dealing with language during the 1990s. Currently, the field of natural language research is dominated by the use of statistics.

Today, data-driven approaches to handling natural language have become so well-known that they should be considered standard approaches to computational linguistics. Undoubtedly, the wide availability of and access to electronic data have contributed to this development; another factor might be the perceived fragility of approaches based mostly on available manual guidelines.

It is not only the basic statistical methods that can be used to deal with the natural language but also derivation techniques such as those used in applied machine learning. Understanding natural language involves many aspects of morphology, grammar, semantics, pragmatics, and world knowledge. One of the fundamental knowledge requirements for developing successful language frameworks is the ability to collect and encode all of this data.

1.2.2. Natural Language Processing (NLP)

As with text and speech, NLP is the automatic control of natural language by software. Over the past 50 years, the study of NLP has outgrown the field of semantics with the development of computers. The idea behind the processing of natural language and why it is so significant will be discussed in this section.

After reading this chapter, the reader can answer the following questions

• What is NLP, and how is it not quite the same as other data types?

• What makes it challenging to work with natural language.

• Where the field of NLP came from and how it is characterized by current professionals.

To reflect the empirical approach taken by statical methods or engineer-based strategies, computer linguistics is often referred to as NLP techniques. As a result of the statistical dominance of this field, it is often referred to as statistical NLP, perhaps to exclude it from computational linguistics techniques.