69,68 €
Text Analysis with Python: A Research-Oriented Guide is a quick and comprehensive reference on text mining using python code. The main objective of the book is to equip the reader with the knowledge to apply various machine learning and deep learning techniques to text data. The book is organized into eight chapters which present the topic in a structured and progressive way.
Key Features
· Introduces the reader to Python programming and data processing
· Introduces the reader to the preliminaries of natural language processing (NLP)
· Covers data analysis and visualization using predefined python libraries and datasets
· Teaches how to write text mining programs in Python
· Includes text classification and clustering techniques
· Informs the reader about different types of neural networks for text analysis
· Includes advanced analytical techniques such as fuzzy logic and deep learning techniques
· Explains concepts in a simplified and structured way that is ideal for learners
· Includes References for further reading
Text Analysis with Python: A Research-Oriented Guide is an ideal guide for students in data science and computer science courses, and for researchers and analysts who want to work on artificial intelligence projects that require the application of text mining and NLP techniques.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 176
Veröffentlichungsjahr: 2002
This is an agreement between you and Bentham Science Publishers Ltd. Please read this License Agreement carefully before using the ebook/echapter/ejournal (“Work”). Your use of the Work constitutes your agreement to the terms and conditions set forth in this License Agreement. If you do not agree to these terms and conditions then you should not use the Work.
Bentham Science Publishers agrees to grant you a non-exclusive, non-transferable limited license to use the Work subject to and in accordance with the following terms and conditions. This License Agreement is for non-library, personal use only. For a library / institutional / multi user license in respect of the Work, please contact: [email protected].
Bentham Science Publishers does not guarantee that the information in the Work is error-free, or warrant that it will meet your requirements or that access to the Work will be uninterrupted or error-free. The Work is provided "as is" without warranty of any kind, either express or implied or statutory, including, without limitation, implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the results and performance of the Work is assumed by you. No responsibility is assumed by Bentham Science Publishers, its staff, editors and/or authors for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products instruction, advertisements or ideas contained in the Work.
In no event will Bentham Science Publishers, its staff, editors and/or authors, be liable for any damages, including, without limitation, special, incidental and/or consequential damages and/or damages for lost data and/or profits arising out of (whether directly or indirectly) the use or inability to use the Work. The entire liability of Bentham Science Publishers shall be limited to the amount actually paid by you for the Work.
Bentham Science Publishers Pte. Ltd. 80 Robinson Road #02-00 Singapore 068898 Singapore Email: [email protected]
The book focused on the latest research in the field of text mining using python code. The main objective of the book is to apply various machine learning and deep learning techniques to textual data. Natural language processing and fuzzy rule generation are also discussed in detail along with a basic introduction to python, data handing and shaping. Various data sets are used to show various techniques of text mining in the different research domains. This book is beneficial for the audience who want to work in the field related to text mining. In the book, the authors have presented the content of the book in simple and understandable manner to the reader by using the step-by-step implementation of different algorithms. This book will teach text mining concepts from scratch which is organized in eight chapters.
Chapter 1 covers the basics and preliminaries of natural language processing. This chapter gives the basic idea about text mining workflow, information retrieval and extraction.
Chapter 2 provides a brief introduction to the python programming language. This chapter focuses on the core Python language and important libraries to do the natural language processing by using the different IDEs like Anaconda and Google Co laboratory.
Chapter 3 discusses the data analysis concentrating on data loading and pre-processing concepts of text mining using the python language, learning about importing some predefined Python libraries and visualization techniques using the python various modules.
Chapter 4 discusses the basics of text mining and writing the python programs by using the NLP open-source libraries. The chapter also discusses different text mining techniques like pre-processing, feature selection, feature extraction, text summarization with detailed examples.
Chapter 5 presents details about text classification and text prediction techniques. In this chapter, we have given the real movie review dataset and discussed four classifiers namely naive Bayes, random forest, k-nearest neighbour, and support vector machine in detail.
Chapter 6 presents the details about how to conduct text clustering in python by unsupervised machine learning techniques. To explain this, we adopted the IRIS dataset which is famous in UCI Machine learning Repository and well presented with python script.
Chapter 7 discusses fuzzy logic, different membership functions, their applications, challenges and how to implement different text mining concepts like pre-processing, feature extraction, clustering, association rules, and classification using the fuzzy membership functions and fuzzy rules.
Chapter 8 provides details about deep learning in text mining using python. In this chapter, the basics of deep learning, different activation functions, their applications, challenges and how to write the python program using deep learning have been presented and explained clearly.
Not applicable.
The author declares no conflict of interest, financial or otherwise.
Declared none.
In this chapter, the basic idea of what text data is, along with the definition of natural language and techniques involved in it has been clearly explained. Besides this, the framework of text mining and its workflow framework is explained.
The text information accessible on the web and personal computers is increasing quickly, but dealing with that text data is not an easy task. To deal with the task, the introduction of intelligent algorithms was made to recover significant data from the data archives [1]. This data recovery is called text mining. Text mining is not a new concept, it is an advanced concept from data mining, and all the algorithms of data mining can be employed on the text data [2]. The difference between the idea of actual data mining and text mining is data mining. It is only applied to organized data, whereas text mining can be applied both on uni structural and semi-structural data. These days, text mining depends upon whether we are searching for the text context or content of the text.
As we said, the text mining functions largely on unstructured data, in reality, to make this conceivable, the data has to be converted into a semi-structured or structured manner so the data mining-based machine learning algorithms (ML) can be applied easily [3]. This data conversion is done by data pre-processing techniques. The pre-processing of the text data is an essential step as the text data prepared for the mining is set up. If we do not do this, a data conflict might occur and it may be possible to attain comprehensive investigation results. Thus, during data pre-processing, all the accentuation and immaterial words are removed. Words can be gathered into groups, and can stem from their roots. All missing features can be replaced with average values. Text case could be replaced with a unique value. Based on the necessity of the application, we can apply various advances [4]. After the data pre-processing, the data is to be changed over into a vector space model, and on to that vector-space model, different algorithms work.
Clustering operations, navigation, and visualization can be applied to gather comparable patterns if users are looking for support with text content. Finding connections, identifying relationships, and summarizing texts require content analysis, including data recovery, data extraction, and natural language processing (NLP) [ 5 ]. In this chapter, we will explore the idea of NLP and its philosophy. Besides, we will understand what text data is and its syntax, text pre-processing steps and applications of text in the real world.
A natural language is a way in which we speak to each other. Both spoken and written communication are natural languages. Most of our daily lives are filled with text data, and we can imagine how often we come across text data in the form of menus, E-mails, SMS, signs, webpages, and others [6]. As a species, we may address one another more than we write. It may be easier to learn how to speak than to write. We communicate primarily through voice and text. As a result of the importance of this type of data, we should have techniques to use and understand regular language, just as we do with other types of information.
The problem with natural language is that it is chaotic and has few guidelines. We can still understand each other most of the time just by looking at each other. However, human languages are pretty questionable. They are also constantly changing and evolving. Humans are exceptional at producing and understanding language and are capable of expressing, interpreting, and interpreting others' meanings in a nuanced way. Despite our extraordinary proficiency in a language, we are also very bad at understanding and describing the guidelines that govern speech.
It is still not possible to function with natural language on text data. A half-century of study has been devoted to it, which is difficult to comprehend. From the perspective of the youngster, who spends years learning a language, it is hard for the senior student of language, it is hard for the researcher who uses proposed experiments to demonstrate important phenomena, and it is hard for the architect who builds frameworks that arrange with natural language output. Hence, Turing made familiar discussions in normal language stand out in his test for understanding [7].
Language is studied scientifically, including punctuation, semantics, and phonetics. Old-style phonetics involved formulating and evaluating rules of language [8]. However, in general, the difficulties associated with understanding natural language oppose clean mathematical formalisms.
The term linguist can refer to anyone who examines language. However, a self-described language specialist may be more focused on being out in the field. Science includes mathematics, where mathematicians dealing with natural language might refer to their work as mathematical semantics, focusing on discrete mathematical formulas and hypotheses that can be applied to natural language.
Computational linguistics, on the other hand, uses computer knowledge to study linguistics. Using computational tools and thinking has overtaken most fields. Linguistics might be the present computational language specialist. The study of computational semantics is the application of computers to the generation and understanding of natural language. The use of computational linguistics in grammar testing is a natural purpose for theoretical linguistics.
By composing and running software, large text datasets can be mined, and new and different things can be discovered. Due to their improved outcomes, robustness, and speed, statistical techniques and factual machine learning have largely replaced the old-style hierarchical principle-based ways of dealing with language during the 1990s. Currently, the field of natural language research is dominated by the use of statistics.
Today, data-driven approaches to handling natural language have become so well-known that they should be considered standard approaches to computational linguistics. Undoubtedly, the wide availability of and access to electronic data have contributed to this development; another factor might be the perceived fragility of approaches based mostly on available manual guidelines.
It is not only the basic statistical methods that can be used to deal with the natural language but also derivation techniques such as those used in applied machine learning. Understanding natural language involves many aspects of morphology, grammar, semantics, pragmatics, and world knowledge. One of the fundamental knowledge requirements for developing successful language frameworks is the ability to collect and encode all of this data.
As with text and speech, NLP is the automatic control of natural language by software. Over the past 50 years, the study of NLP has outgrown the field of semantics with the development of computers. The idea behind the processing of natural language and why it is so significant will be discussed in this section.
After reading this chapter, the reader can answer the following questions
• What is NLP, and how is it not quite the same as other data types?
• What makes it challenging to work with natural language.
• Where the field of NLP came from and how it is characterized by current professionals.
To reflect the empirical approach taken by statical methods or engineer-based strategies, computer linguistics is often referred to as NLP techniques. As a result of the statistical dominance of this field, it is often referred to as statistical NLP, perhaps to exclude it from computational linguistics techniques.