E-Book
69,68 €

Text Analysis with Python E-Book

Mamta Mittal

0,0

69,68 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.

Herausgeber: Bentham Science Publishers
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Text Analysis with Python: A Research-Oriented Guide is a quick and comprehensive reference on text mining using python code. The main objective of the book is to equip the reader with the knowledge to apply various machine learning and deep learning techniques to text data. The book is organized into eight chapters which present the topic in a structured and progressive way.

Key Features
· Introduces the reader to Python programming and data processing
· Introduces the reader to the preliminaries of natural language processing (NLP)
· Covers data analysis and visualization using predefined python libraries and datasets
· Teaches how to write text mining programs in Python
· Includes text classification and clustering techniques
· Informs the reader about different types of neural networks for text analysis
· Includes advanced analytical techniques such as fuzzy logic and deep learning techniques
· Explains concepts in a simplified and structured way that is ideal for learners
· Includes References for further reading

Text Analysis with Python: A Research-Oriented Guide is an ideal guide for students in data science and computer science courses, and for researchers and analysts who want to work on artificial intelligence projects that require the application of text mining and NLP techniques.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 176

Veröffentlichungsjahr: 2002

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Table of Contents

BENTHAM SCIENCE PUBLISHERS LTD.

End User License Agreement (for non-institutional, personal use)

Usage Rules:

Disclaimer:

Limitation of Liability:

General:

PREFACE

CONSENT FOR PUBLICATION

CONFLICT OF INTEREST

ACKNOWLEDGEMENT

Introduction

Abstract

1.1. INTRODUCTION

1.2. NATURAL LANGUAGE

1.2.1. From Linguistics to Natural Language Processing (NLP)

1.2.2. Natural Language Processing (NLP)

1.3. TEXT ANALYSIS

1.3.1. Advantages

1.3.2. Methods & Techniques

1.3.3. Sentiment Analysis (SA)

1.3.4. Topic Modelling

1.3.5. Intent Identification

1.3.6. Keyword Extraction

1.3.7. Entity Recognition

1.3.8. Text Analysis Functionality

1.4. TEXT SUMMARIZATION

1.4.1. Extraction

1.4.2. Abstractive Summarization

1.5. TEXT MINING AND WORKFLOW

1.5.1. Data Recovery

1.5.2. Data Extraction

1.5.3. Data Mining

CONCLUSION

REFERENCES

Introduction to Python

Abstract

2.1. INTRODUCTION

2.2. WORKING ENVIRONMENTS OF PYTHON

Google Colab

Features of Google Collaboratory (COLAB)

2.3.WORKING WITH ANACONDA

Steps to Anaconda Installation

2.4. CREATING THE FIRST PROJECT IN GOOGLE COLAB CREATING THE FIRST PROJECT IN GOOGLE COLAB CREATING THE FIRST PROJECT IN GOOGLE COLAB CREATING THE FIRST PROJECT IN GOOGLE COLAB

2.5. Mathematical Operations

2.6. PYTHON LIBRARIES AND CONCEPTS

Libraries

a). Math and CMath Libraries

b). SciPy Library

c). ScikitLearn Library

d). NumPy Library

2.7.BASIC CONCEPTS IN PYTHON

a). Arrays

b). Data Frames

c). Loops

For loop

While Loop and the Else Branch

Program:

CONCLUSION

REFERENCES

Data Loading and Pre-Processing

Abstract

3.1. INTRODUCTION

3.1. IMPORTING DATASETS

3.2. DATA RESHAPING

3.3. PIVOT AND MELT FUNCTIONS

3.4. STACKING AND UNSTACKING

3.5. DATA PRE-PROCESSING

Outliers

Missing Value Imputation

Handling of Missing Data

Mean Calculation

Deleting of Specific Row

Dummy Variables

One Hot Encoding

3.6. DATA VISUALIZATION

• Matplotlib

• ggplot Visualization

• Geoplot Visualization

• Regression Plots

CONCLUSION

REFERENCES

Text Mining

Abstract

INTRODUCTION

The Steps Followed for Text Mining are:

Why Should we use Text Mining?

Benefits of Text Mining

Text Analysis in Real-Time

Text Mining Applications

Issues in Text Mining

4.1. TEXT MINING WITH PYTHON

Program:

Gensim Library

Program:

Output:

Program

Output

4.2. DATA GATHERING

Reading a Text File

Steps for Reading a Text File in Python

Open() Function

Syntax

open(path to file,mode)

Reading Text File

Close ()

Syntax:close()

Reading a CSV File

Steps

Reading Text from a PDF File

import PyPDF2

Program

4.3. TEXT MINING PRE-PROCESSING TECHNIQUES

Program:

Output:

Program:

Output

Program:

Output

Program:

Output:

Program:

4.4. FEATURE SELECTION IN TEXT MINING

Program

Output:

4.5. TEXT SUMMARIZATION

Program

Program:

4.6. TEXT EXTRACTION

4.6.1. Bag of Words

Program:

Limitations of Bag of Words

4.6.2. TF-IDF

Program

Output

Program:

Output:

Word2vec

Program:

Output

Document Term Matrix

Program:

Output

4.7. TEXT VISUALIZATION

Program

Output

Program

Output:

Program:

Output

Program

Output

Program

Output

Program

Output

CONCLUSION

REFERENCES

Text Classification in Python

Abstract

5.1. INTRODUCTION

5.2. TEXT CLASSIFICATION

5.3. MACHINE LEARNING-BASED TEXT CLASSIFICATION

Step by Step Explanation

5.4. APPLICATIONS OF TEXT MINING

5.4.1. Email Spam Detection

5.4.2. Social Media Reviews

5.4.3. Google Translator

5.4.4. Text labelling Based on Content

5.5. CLASSIFICATION ALGORITHMS

5.5.1.. Naïve Bayes (NB) Classifiers

Case Study: Text Classification With Naïve Bayes

Movie Review Classification Dataset

Step 1. Importing Libraries

Step 2: Importing the Dataset

Step 3: Pre-processing of Text

Step 4: Changing Text into Numbers by Defining Classification Parameters

Step 5: Training and Testing datasets

Step 6: Model training and sentiment prediction

Step 7: Model evaluation

Step 8: Model Saving and Loading

Step 9: Text Prediction

5.5.2. DECISION TREE CLASSIFIERS

Case Study Text Classification with Decision Tree Algorithms

5.5.3. Nearest Neighbour Classifier

How KNN will Work in Text Classifications

Useful Information with KNN

Case Study Text Classification with KNN

5.5.4. Support Vector Machines

From Texts to Vectors

Advantages

Case Study Text Classification with KNN

CONCLUSIONS

Chapter highlights

REFERENCES

Text Clustering in Python

Abstract

6.1. INTRODUCTION

6.2. CLUSTERING PROCESS

6.2.1. Word Clustering

6.2.2. Document Clustering

6.2.3. Term Frequency-Inverse Document Frequency (tf-idf)

6.3. APPLICATIONS OF TEXT CLUSTERING IN REAL-TIME

Identifying Fake News

Spam Filter

Marketing and Sales

Classifying Website Traffic

Identifying Fraudulent or Criminal Activity

Document Analysis

6.4. CLUSTERING ALGORITHMS WITH CODE IMPLEMENTATION

6.4.1. K-means Clustering

Advantages

Disadvantages of k-means Clustering

K means Clustering in Scikit-learn

Step 1 Importing Libraries

Step 2 Data Preparation

Step 3 Elbow Method

6.4.2. Hierarchical Clustering

How it Works

Hierarchical Clustering Applications

Hierarchical Clustering with Scikit-learn

6.4.3. Fuzzy C-means Clustering

Stepwise Approach To Performing fuzzy C-means Clustering

Fuzzy C means Clustering via Scikit-learn

CONCLUSIONS

REFERENCES

Fuzzy Logic in Text Mining Using Python

Abstract

7.1. INTRODUCTION TO FUZZY LOGIC

Steps to be Followed in the Fuzzy System

Fuzzy Membership Functions

Trapezoidal Membership Function

Gaussian Membership Function

Generalised Bell Membership Function

Sigmoid Membership Function

Fuzzy Set Operations

Why do we use Fuzzy Logic?

Uses of Fuzzy Logic in Text Mining

Applications of Fuzzy System

Issues in Fuzzy Logic

7.2. FUZZY LOGIC WITH PYTHON

FuzzyWuzzy Library

Program

7.3. PREPROCESSING

Program

7.4. FEATURE EXTRACTION

Program

7.5. FUZZY CLUSTERING

Fuzzy C-Means Clustering

Steps to Perform the fuzzy C-means Clustering Algorithm

Program

Fuzzy K-Means Clustering

Program

7.6. CLASSIFICATION

Program

7.7. FUZZY ASSOCIATION RULES

Program

7.8. FUZZY VISUALIZATION

Program

Program:

CONCLUSION

CHAPTER HIGHLIGHTS

References

Deep Learning for Text Mining

Abstract

8.1. DEEP LEARNING BASICS

Neuron

Activation Functions

Why Deep Learning?

Limitations of Deep Learning

Applications of Deep Learning

8.2. DEEP LEARNING WITH PYTHON

Keras Library

Import Keras

Program

Step 1. Importing the essential python Keras libraries

Program

8.3. FEED FORWARD NEURAL NETWORK

8.4. CONVOLUTION NEURAL NETWORK (CNN)

Program

8.5. Multi-Layer Perceptron (MLP)

Program

8.6. RECURRENT NEURAL NETWORK (RNN)

Text Classification using RNN.

Program

Text Generation using RNN

Program

8.7. LONG SHORT-TERM MEMORY (LSTM)

Text Generation in LSTM

Text Classification using LSTM

CONCLUSION

CHAPTER HIGHLIGHTS

REFERENCES

Text Analysis with Python: A Research Oriented Guide

Authored by

Mamta Mittal

Delhi Skill & Entrepreneurship University,

New Delhi, India

Gopi Battineni

University of Camerino,

Camerino, Italy

Bhimavarapu Usharani

Department of CSE,

Koneru Lakshmaiah Education

Foundation at Vaddeswaram,

Andhra Pradesh, India

Lalit Mohan Goyal

Department of Computer Engineering,

J.C. Bose University of Science & Technology,

YMCA Faridabad (Hr.), India

BENTHAM SCIENCE PUBLISHERS LTD.

End User License Agreement (for non-institutional, personal use)

This is an agreement between you and Bentham Science Publishers Ltd. Please read this License Agreement carefully before using the ebook/echapter/ejournal (“Work”). Your use of the Work constitutes your agreement to the terms and conditions set forth in this License Agreement. If you do not agree to these terms and conditions then you should not use the Work.

Bentham Science Publishers agrees to grant you a non-exclusive, non-transferable limited license to use the Work subject to and in accordance with the following terms and conditions. This License Agreement is for non-library, personal use only. For a library / institutional / multi user license in respect of the Work, please contact: [email protected].

Usage Rules:

All rights reserved: The Work is the subject of copyright and Bentham Science Publishers either owns the Work (and the copyright in it) or is licensed to distribute the Work. You shall not copy, reproduce, modify, remove, delete, augment, add to, publish, transmit, sell, resell, create derivative works from, or in any way exploit the Work or make the Work available for others to do any of the same, in any form or by any means, in whole or in part, in each case without the prior written permission of Bentham Science Publishers, unless stated otherwise in this License Agreement.You may download a copy of the Work on one occasion to one personal computer (including tablet, laptop, desktop, or other such devices). You may make one back-up copy of the Work to avoid losing it.The unauthorised use or distribution of copyrighted or other proprietary content is illegal and could subject you to liability for substantial money damages. You will be liable for any damage resulting from your misuse of the Work or any violation of this License Agreement, including any infringement by you of copyrights or proprietary rights.

Disclaimer:

Bentham Science Publishers does not guarantee that the information in the Work is error-free, or warrant that it will meet your requirements or that access to the Work will be uninterrupted or error-free. The Work is provided "as is" without warranty of any kind, either express or implied or statutory, including, without limitation, implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the results and performance of the Work is assumed by you. No responsibility is assumed by Bentham Science Publishers, its staff, editors and/or authors for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products instruction, advertisements or ideas contained in the Work.

Limitation of Liability:

In no event will Bentham Science Publishers, its staff, editors and/or authors, be liable for any damages, including, without limitation, special, incidental and/or consequential damages and/or damages for lost data and/or profits arising out of (whether directly or indirectly) the use or inability to use the Work. The entire liability of Bentham Science Publishers shall be limited to the amount actually paid by you for the Work.

General:

Any dispute or claim arising out of or in connection with this License Agreement or the Work (including non-contractual disputes or claims) will be governed by and construed in accordance with the laws of Singapore. Each party agrees that the courts of the state of Singapore shall have exclusive jurisdiction to settle any dispute or claim arising out of or in connection with this License Agreement or the Work (including non-contractual disputes or claims).Your rights under this License Agreement will automatically terminate without notice and without the need for a court order if at any point you breach any terms of this License Agreement. In no event will any delay or failure by Bentham Science Publishers in enforcing your compliance with this License Agreement constitute a waiver of any of its rights.You acknowledge that you have read this License Agreement, and agree to be bound by its terms and conditions. To the extent that any other terms and conditions presented on any website of Bentham Science Publishers conflict with, or are inconsistent with, the terms and conditions set out in this License Agreement, you acknowledge that the terms and conditions set out in this License Agreement shall prevail.

Bentham Science Publishers Pte. Ltd. 80 Robinson Road #02-00 Singapore 068898 Singapore Email: [email protected]

PREFACE

The book focused on the latest research in the field of text mining using python code. The main objective of the book is to apply various machine learning and deep learning techniques to textual data. Natural language processing and fuzzy rule generation are also discussed in detail along with a basic introduction to python, data handing and shaping. Various data sets are used to show various techniques of text mining in the different research domains. This book is beneficial for the audience who want to work in the field related to text mining. In the book, the authors have presented the content of the book in simple and understandable manner to the reader by using the step-by-step implementation of different algorithms. This book will teach text mining concepts from scratch which is organized in eight chapters.

Chapter 1 covers the basics and preliminaries of natural language processing. This chapter gives the basic idea about text mining workflow, information retrieval and extraction.

Chapter 2 provides a brief introduction to the python programming language. This chapter focuses on the core Python language and important libraries to do the natural language processing by using the different IDEs like Anaconda and Google Co laboratory.

Chapter 3 discusses the data analysis concentrating on data loading and pre-processing concepts of text mining using the python language, learning about importing some predefined Python libraries and visualization techniques using the python various modules.

Chapter 4 discusses the basics of text mining and writing the python programs by using the NLP open-source libraries. The chapter also discusses different text mining techniques like pre-processing, feature selection, feature extraction, text summarization with detailed examples.

Chapter 5 presents details about text classification and text prediction techniques. In this chapter, we have given the real movie review dataset and discussed four classifiers namely naive Bayes, random forest, k-nearest neighbour, and support vector machine in detail.

Chapter 6 presents the details about how to conduct text clustering in python by unsupervised machine learning techniques. To explain this, we adopted the IRIS dataset which is famous in UCI Machine learning Repository and well presented with python script.

Chapter 7 discusses fuzzy logic, different membership functions, their applications, challenges and how to implement different text mining concepts like pre-processing, feature extraction, clustering, association rules, and classification using the fuzzy membership functions and fuzzy rules.

Chapter 8 provides details about deep learning in text mining using python. In this chapter, the basics of deep learning, different activation functions, their applications, challenges and how to write the python program using deep learning have been presented and explained clearly.

CONSENT FOR PUBLICATION

Not applicable.

CONFLICT OF INTEREST

The author declares no conflict of interest, financial or otherwise.

ACKNOWLEDGEMENT

Declared none.

Mamta Mittal Delhi Skill & Entrepreneurship University, New Delhi, IndiaGopi Battineni University of Camerino, Camerino, ItalyMs Bhimavarapu Usharani Department of CSE, Koneru Lakshmaiah Education Foundation at Vaddeswaram, Andhra Pradesh, India &Lalit Mohan Goyal Department of Computer Engineering, J.C. Bose University of Science & Technology,

Introduction

Mittal Mamta,Battineni Gopi,Usharani Bhimavarapu,Mohan Goyal Lalit

Abstract

In this chapter, the basic idea of what text data is, along with the definition of natural language and techniques involved in it has been clearly explained. Besides this, the framework of text mining and its workflow framework is explained.

Keywords: Data mining, Linguistics, NLP, Sentimental analysis, Text data.

1.1. INTRODUCTION

The text information accessible on the web and personal computers is increasing quickly, but dealing with that text data is not an easy task. To deal with the task, the introduction of intelligent algorithms was made to recover significant data from the data archives [1]. This data recovery is called text mining. Text mining is not a new concept, it is an advanced concept from data mining, and all the algorithms of data mining can be employed on the text data [2]. The difference between the idea of actual data mining and text mining is data mining. It is only applied to organized data, whereas text mining can be applied both on uni structural and semi-structural data. These days, text mining depends upon whether we are searching for the text context or content of the text.

As we said, the text mining functions largely on unstructured data, in reality, to make this conceivable, the data has to be converted into a semi-structured or structured manner so the data mining-based machine learning algorithms (ML) can be applied easily [3]. This data conversion is done by data pre-processing techniques. The pre-processing of the text data is an essential step as the text data prepared for the mining is set up. If we do not do this, a data conflict might occur and it may be possible to attain comprehensive investigation results. Thus, during data pre-processing, all the accentuation and immaterial words are removed. Words can be gathered into groups, and can stem from their roots. All missing features can be replaced with average values. Text case could be replaced with a unique value. Based on the necessity of the application, we can apply various advances [4]. After the data pre-processing, the data is to be changed over into a vector space model, and on to that vector-space model, different algorithms work.

Clustering operations, navigation, and visualization can be applied to gather comparable patterns if users are looking for support with text content. Finding connections, identifying relationships, and summarizing texts require content analysis, including data recovery, data extraction, and natural language processing (NLP) [ 5 ]. In this chapter, we will explore the idea of NLP and its philosophy. Besides, we will understand what text data is and its syntax, text pre-processing steps and applications of text in the real world.

1.2. NATURAL LANGUAGE

A natural language is a way in which we speak to each other. Both spoken and written communication are natural languages. Most of our daily lives are filled with text data, and we can imagine how often we come across text data in the form of menus, E-mails, SMS, signs, webpages, and others [6]. As a species, we may address one another more than we write. It may be easier to learn how to speak than to write. We communicate primarily through voice and text. As a result of the importance of this type of data, we should have techniques to use and understand regular language, just as we do with other types of information.

The problem with natural language is that it is chaotic and has few guidelines. We can still understand each other most of the time just by looking at each other. However, human languages are pretty questionable. They are also constantly changing and evolving. Humans are exceptional at producing and understanding language and are capable of expressing, interpreting, and interpreting others' meanings in a nuanced way. Despite our extraordinary proficiency in a language, we are also very bad at understanding and describing the guidelines that govern speech.

It is still not possible to function with natural language on text data. A half-century of study has been devoted to it, which is difficult to comprehend. From the perspective of the youngster, who spends years learning a language, it is hard for the senior student of language, it is hard for the researcher who uses proposed experiments to demonstrate important phenomena, and it is hard for the architect who builds frameworks that arrange with natural language output. Hence, Turing made familiar discussions in normal language stand out in his test for understanding [7].

1.2.1. From Linguistics to Natural Language Processing (NLP)

Language is studied scientifically, including punctuation, semantics, and phonetics. Old-style phonetics involved formulating and evaluating rules of language [8]. However, in general, the difficulties associated with understanding natural language oppose clean mathematical formalisms.

The term linguist can refer to anyone who examines language. However, a self-described language specialist may be more focused on being out in the field. Science includes mathematics, where mathematicians dealing with natural language might refer to their work as mathematical semantics, focusing on discrete mathematical formulas and hypotheses that can be applied to natural language.

Computational linguistics, on the other hand, uses computer knowledge to study linguistics. Using computational tools and thinking has overtaken most fields. Linguistics might be the present computational language specialist. The study of computational semantics is the application of computers to the generation and understanding of natural language. The use of computational linguistics in grammar testing is a natural purpose for theoretical linguistics.

By composing and running software, large text datasets can be mined, and new and different things can be discovered. Due to their improved outcomes, robustness, and speed, statistical techniques and factual machine learning have largely replaced the old-style hierarchical principle-based ways of dealing with language during the 1990s. Currently, the field of natural language research is dominated by the use of statistics.

Today, data-driven approaches to handling natural language have become so well-known that they should be considered standard approaches to computational linguistics. Undoubtedly, the wide availability of and access to electronic data have contributed to this development; another factor might be the perceived fragility of approaches based mostly on available manual guidelines.

It is not only the basic statistical methods that can be used to deal with the natural language but also derivation techniques such as those used in applied machine learning. Understanding natural language involves many aspects of morphology, grammar, semantics, pragmatics, and world knowledge. One of the fundamental knowledge requirements for developing successful language frameworks is the ability to collect and encode all of this data.

1.2.2. Natural Language Processing (NLP)

As with text and speech, NLP is the automatic control of natural language by software. Over the past 50 years, the study of NLP has outgrown the field of semantics with the development of computers. The idea behind the processing of natural language and why it is so significant will be discussed in this section.

After reading this chapter, the reader can answer the following questions

• What is NLP, and how is it not quite the same as other data types?

• What makes it challenging to work with natural language.

• Where the field of NLP came from and how it is characterized by current professionals.

To reflect the empirical approach taken by statical methods or engineer-based strategies, computer linguistics is often referred to as NLP techniques. As a result of the statistical dominance of this field, it is often referred to as statistical NLP, perhaps to exclude it from computational linguistics techniques.

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben: