Table of Contents
BENTHAM SCIENCE PUBLISHERS LTD.
End User License Agreement (for non-institutional, personal use)
Usage Rules:
Disclaimer:
Limitation of Liability:
General:
List of Contributors
A Comprehensive Study of Natural Language Processing
Abstract
1. INTRODUCTION
2. Emergency of NLP
3. Working Model of NLP
4. Major Application of NLP
5. NLP’s Prime Challenges
CONCLUSION AND FUTURE DIRECTIONS
References
Recent Advancements in Text Summarization with Natural Language Processing
Abstract
1. INTRODUCTION
1.1. Evolution in NLP
1.2. Recent Advancement in NLP
1.3. Applications in NLP
1.4. Role of Natural Language Processing in Text Mining
1.5. Challenges in Handling Text Data
2. REVIEW OF NATURAL LANGUAGE PROCESSING CONCEPTS, TECHNIQUES, TRENDS, AND APPLICATIONS
3. NATURAL LANGUAGE PROCESSING (NLP) IN TEXT SUMMARIZATION
3.1. Nature of Text Summarization According to Input
3.2. Nature of Text Summarization According to Output
3.3. Challenges in Text Summarization Approaches
4. GENERATING SUMMARY USING EXTRACTIVE APPROACH
5. PROPOSED METHODOLOGY
5.1. Steps in Textrank Algorithm
6. RESULTS AND DISCUSSION
6.1. Spacy
6.2. NLTK
6.3. Sumy
6.4. ROUGE Scores as Evaluation Metrics for Generated Summary
CONCLUSION AND FUTURE SCOPE
References
Learning Techniques for Natural Language Processing: An Overview
Abstract
1. INTRODUCTION
1.1. Categorization of NLP
1.2. Natural Language Processing Phases
2. Review of Natural Language Processing
3. Natural Language Techniques
3.1. Popular NLP Techniques
3.1.1. Support Vector Machines
3.1.2. Neural Networks
3.1.3. Deep Learning Models
3.2. Traditional NLP Techniques
3.2.1. Probabilistic Models
3.2.2. N-Gram Models
3.2.3. Hidden Markov Models
3.3. Advanced NLP Techniques
3.3.1. Transfer learning
a. Advantages of Performing Transfer Learning
3.3.2. Domain Adaptation
4. Categorization of NLP Techniques
5. ROLE OF NATURAL LANGUAGE PROCESSING IN LARGE PROJECTS
5.1. Challenges in NLP Learning Techniques
CONCLUSION
References
Natural Language Processing: Basics, Challenges, and Clustering Applications
Abstract
1. INTRODUCTION
2. Review of NLP Challenges
3. NLP Approaches
4. Text Clustering: An essential task in NLP
4.1. Challenges of Text Clustering
5. Computational Methodology for Text Clustering
5.1. Vector Space Model
5.2. Experiments with K-Means
5.3. Using GA
6. Machine Translation and Other NLP Applications
CONCLUSION
References
Hybrid Approach to Text Translation in NLP Using Deep Learning and Ensemble Method
Abstract
1. INTRODUCTION
2. Review of Federated Learning
3. RECENT RESEARCH IN NLP USING DEEP LEARNING
4. PROBLEM IDENTIFICATION
5. PROPOSED SOLUTION
6. Result
7. Discussion
CONCLUSION
References
Deep Learning in Natural Language Processing
Abstract
1. INTRODUCTION
2. NLP Components
2.1. NLU
2.2. NLG
3. Deep Learning for Text Representation
3.1. Word Embeddings
3.2. Sentence and Document Embeddings
4. Deep Learning for Text Classification (TC)
5. Deep Learning for Sequence Labelling
5.1. POS
5.2. NER
5.3. Chunking and Parsing
6. Deep Learning for Text Generation
7. Applications of Deep Learning in NLP
CONCLUSION
References
Deep Learning-Based Text Identification from Hazy Images: A Self-Collected Dataset Approach
Abstract
1. INTRODUCTION
2. Literature Survey
2.1. Image Dehazing Methods
2.2. Text Detection Methods
3. Methodology and Framework
3.1. Algorithm
4. Experimental Setup and Results
4.1. Evaluation Results
4.2. Comparison with Existing Methods
CONCLUSION
References
Deep Learning-based Word Sense Disambiguation for Hindi Language Using Hindi WordNet Dataset
Abstract
1. INTRODUCTION
2. Hindi Wordnet
2.1. The Application Programming Interface for the Hindi Wordnet
3. Literature Review
4. Proposed Approach and Framework
5. Experimental Setup and Results
CONCLUSION
References
The Machine Translation Systems Demystifying the Approaches
Abstract
1. INTRODUCTION
2. APPROACHES TO BILINGUAL MACHINE TRANSLATION
2.1. Rule-based Techniques
2.2. Interlingua Approach
2.3. Example-based Techniques
2.4. Statistical Techniques
2.5. Rule-Based MT vs. Statistical MT
2.6. Soft Computing based Techniques
2.7. Neural Networks-based Techniques
2.8. Fuzzy Theory-based Techniques
2.9. Genetic Algorithms-based Techniques
2.10. Hybrid Techniques
3. POS TAGGING IN MACHINE TRANSLATION
4. BILINGUAL PRE-PROCESSING TECHNIQUES
5. BILINGUAL MORPHOLOGICAL ANALYSES
6. BILINGUAL REVERSE MORPHOLOGICAL ANALYSIS
7. BILINGUAL POST-PROCESSING (PP) TECHNIQUES
CONCLUSION
References
Machine Translation of English to Hindi with the LSTM Seq2Seq Model Utilizing Attention Mechanism
Abstract
1. INTRODUCTION
2. Literature Review
3. Methodology
3.1. Dataset
3.2. Data Preprocessing
3.3. Seq2Seq Model
3.4. LSTM
3.4.1. Uni-LSTM
3.4.2. Bi-LSTM
4. Proposed Method
4.1. LSTM Seq2Seq Model
4.2. Attention Mechanism
5. Material and Experimental Setup
5.1. Dataset and Source
5.2. Data Splitting
5.3. Data Preprocessing
5.4. Training
5.5. Evaluation
6. Result and Discussion
CONCLUSION
References
Natural Language Processing: A Historical Overview, Current Developments, and Future Prospects
Abstract
1. INTRODUCTION
2. Levels of NLP
3. Natural Language Generation
4. History of NLP
5. Related Work
6. Applications of NLP
7. Recent Trends in NLP
8. Future of NLP
8.1. Advanced Language Understanding
8.2. Multilingual and Cross-lingual NLP
8.3. Better Contextual Understanding
8.4. Few-shot and Zero-shot Learning
8.5. Ethical and Responsible AI
8.6. Domain-Specific NLP
8.7. Conversational Agents and Virtual Assistants
8.8. NLP in Unstructured Data Analysis
8.9. Integration with Other Technologies
8.10. Continued Research and Innovation
CONCLUSION
References
Recent Advances in Transfer Learning for Natural Language Processing (NLP)
Abstract
1. INTRODUCTION
2. Key Concepts and Architectures of Transfer Learning
2.1. Fine-tuning: Adapting Pretrained Models to Specific Tasks
2.2. Multi-task Learning: Enhancing Model Performance with Shared Knowledge
2.3. Domain Adaptation: Bridging the Gap between Source and Target Domains
2.4. Knowledge Distillation: Transferring Knowledge from Complex Models to Compact Models
2.5. Zero-shot Learning: Learning to Simplify Unseen Classes
3. Pre-trained Language Models
3.1. GPT-3: A Breakthrough in Language Generation and Understanding
3.2. BERT: Transforming Natural Language Understanding
3.3. RoBERTa: Robustly Optimized BERT Approach
4. Applications of Transfer Learning in NLP
4.1. Text Classification
4.2. Named Entity Recognition
4.3. Question Answering
4.4. Text Summarization
5. Limitations and Challenges of Transfer Learning
5.1. Dataset Biases
5.2. Domain Adaptation
5.3. Generalization Ability
5.4. Model Interpretability
5.5. Recent Challenges in NLP
6. Future Directions and Research Opportunities
6.1. Investigating Transfer Learning in Low-Resource Languages
6.2. Developing Transfer Learning Techniques for Speech and Multimodal NLP Tasks
6.3. Other Potential Research Directions
CONCLUSION
References
Beyond Syntax and Semantics: The Quantum Leap in Natural Language Processing
Abstract
1. INTRODUCTION
2. Background: Applications of Quantum Computing and QNLP
3. NATURAL LANGUAGE PROCESSING
3.1. Tokenization
3.2. Part-of-Speech (POS) Tagging
3.3. Named Entity Recognition (NER)
3.4. Sentiment Analysis
3.5. Parsing
3.6. Machine Translation
3.7. Information Retrieval
4. QUANTUM COMPUTING PRIMER
4.1. Emergence of Quantum Computing
4.2. Quantum Mechanics and Quantum Computing
4.2.1. Wave-particle Duality
4.2.2. Superposition
4.2.3. Coherence
4.2.4. Entanglement
4.2.5. Measurement
4.3. Continuous Variable Quantum Computing (CVQC)
4.4. Gate-based Quantum Computing
4.4.1. Qubits
4.4.2. Bloch Sphere Representation
4.4.3. Measurement
4.4.4. Single Qubit Gates
4.4.5. Two Qubit Gates
4.4.6. Multi-Qubit Gates
4.5. Quantum Circuit
5. Quantum Natural Language Processing (QNLP)
6. QNLP Algorithms
6.1. Quantum Embedding
6.1.1. Basis Embedding
6.1.2. Amplitude Embedding
6.2. Quantum Machine Learning (QML)
6.2.1. Quantum Approximate Optimization Algorithm (QAOA)
6.2.2. Quantum PCA
6.3. Quantum Algorithms for NLP
6.3.1. DisCoCat
6.4. Quantum Language Models
6.5. Quantum Natural Language Understanding (QNLU)
6.6. Quantum Text Compression
7. Challenges and Limitations of QLNP
8. Hybrid Quantum Classical Models
8.1. Variational Quantum Eigensolver
8.1.1. VQE for NLP
8.1.2. Discussion
CONCLUSION
FUTURE DIRECTIONS
References
Text Extraction from Blurred Images through NLP-based Post-processing
Abstract
1. INTRODUCTION
1.1. Overview of Text Extraction From Blurred Images
1.2. Traditional Image Processing Techniques for Text Extraction
2. NLP-Based Post-Processing Techniques for Text Extraction from Blurred Images
2.1. Introduction to Natural Language Processing Techniques
2.2. NLP Pipeline
2.3. Named Entity Recognition for Text Extraction
2.4. Part-of-speech Tagging for Text Extraction
2.5. Machine Learning Algorithms for Enhancing NLP-based Post-Processing Techniques
2.6. Convolutional Neural Networks (CNNs) for Enhancing the Performance of NLP-based Post-processing
3. Case Study: Improving Text Extraction from Blurred Images Using NLP-based Post-processing
3.1. Text Extraction from Blurred Images using Nlp Framework
3.2. Effectiveness of NLP-based Post-processing Techniques
4. Comparison with Traditional Text Extraction Techniques
5. Future Direction and Challenges
CONCLUSION
References
Speech-to-Sign Language Translator Using NLP
Abstract
1. INTRODUCTION
2. Methodology Used
2.1. Methods and Techniques
2.2. Speech to Text Conversion
2.3. Text Analysis
2.3.1. Tokenization
2.3.2. Stop Word Removal
2.3.3. Stemming
2.3.4. Lemmatization
2.3.5. Syntax Tree Generation
2.3.6. POS Tagging
2.4. Sign Language Gesture Generation
2.5. Display the Sign Language Gestures
3. RESULT AND DISCUSSION
CONCLUSION
References
Speech Technologies
Abstract
1. INTRODUCTION
1.1. Application Areas of Speech Technology
2. Speech Recognition
2.1. Speech Analysis
2.2. Feature Extraction
3. Modelling and matching
3.1. Speech Verification
4. Real Time Speech to Text Conversion
4.1. Recognition of Speech
4.2. CAN
4.3. CART
5. Interactive Voice Response Systems
6. Speech Synthesis
6.1. Hidden Markov Model Based Speech Synthesis
7. Speech Analytics
CONCLUSION
References
The Linguistic Frontier: Unleashing the Power of Natural Language Processing in Cybersecurity
Abstract
1. INTRODUCTION
1.1. Natural Language Processing: An Overview
1.2. The Relevance of NLP in Cybersecurity
2. Applications of NLP in Cybersecurity
2.1. Malware Classification and Detection
2.2. Detection of Attacks Based on Social Engineering
2.3. Cyber Threat Intelligence Analysis
2.4. Roboticization of Emergency Procedures
2.5. Conversational AI for Security Operations
2.6. Secure Code Review
2.7. Privacy Policy Analysis
2.8. Cybersecurity Training and Awareness
2.9. Dark Web Analysis
2.10. Predictive Analytical Models for Online Dangers
3. Challenges and Considerations
3.1. Data Quality and Availability
3.2. Domain-Specific Language and Context
3.3. Privacy and Ethical Concerns
3.4. Scalability and Real-time Processing
4. Literature Review
5. Future Research Directions
5.1. Improved NLP Algorithms for Cybersecurity
5.2. Cross-Domain Integration and Collaboration
5.3. Adversarial NLP and Robustness
5.4. Explainable AI and Trustworthiness
CONCLUSION
References
Recent Challenges and Advancements in Natural Language Processing
Abstract
1. INTRODUCTION
2. Components of NLP
2.1. Natural Language Understanding
2.2. Natural Language Generation (NLG)
2.3. Working Process of NLP
3. NLP – Then and Now
4. NLP Techniques
4.1. Tokenization
4.2. Part-of-speech (POS) Tagging
4.3. Named Entity Recognition (NER)
4.4. Sentiment Analysis
4.5. Text Classification
4.6. Machine Translation
4.7. Text Generation
4.8. Information Retrieval
4.9. Text Summarization
4.10. Dependency Parsing
5. NLP Models
5.1. Word2Vec is an NLP Model Widely Employed for Producing Word Embeddings
5.2. GloVe, Another well-known NLP Model
5.3. FastText, an Extension of Word2Vec that Incorporates Subword Information
5.4. Recurrent Neural Networks (RNNs) are an Essential Element in Natural Language Processing (NLP)
5.5. The Transformer Model Introduced An Innovative Architecture For Sequence-To-Sequence Tasks in NLP
5.6. Generative Pre-Trained Transformer
5.7. BERT, or Bidirectional Encoder Representations from Transformers
5.8. XLNet is Another Pre-trained Model that Utilizes Both Autoregressive Framework
6. RESULT AND DISCUSSION
CONCLUSION
References
Federated Learning for Internet of Vehicles: IoV Image Processing, Vision and Intelligent Systems
(Volume 2)
A Handbook of Computational Linguistics: Artificial Intelligence in Natural Language Processing
Edited by
Youddha Beer Singh
Galgotias College of Engineering and Technology
Greater Noida
India
Aditya Dev Mishra
Galgotias College of Engineering and Technology
Greater Noida
India
Pushpa Singh
GL Bajaj Institute of Technology & Management
Greater Noida
India
&
Dileep Kumar Yadav
BENTHAM SCIENCE PUBLISHERS LTD.
End User License Agreement (for non-institutional, personal use)
This is an agreement between you and Bentham Science Publishers Ltd. Please read this License Agreement carefully before using the book/echapter/ejournal (“Work”). Your use of the Work constitutes your agreement to the terms and conditions set forth in this License Agreement. If you do not agree to these terms and conditions then you should not use the Work.
Bentham Science Publishers agrees to grant you a non-exclusive, non-transferable limited license to use the Work subject to and in accordance with the following terms and conditions. This License Agreement is for non-library, personal use only. For a library / institutional / multi user license in respect of the Work, please contact: [email protected].
Usage Rules:
All rights reserved: The Work is the subject of copyright and Bentham Science Publishers either owns the Work (and the copyright in it) or is licensed to distribute the Work. You shall not copy, reproduce, modify, remove, delete, augment, add to, publish, transmit, sell, resell, create derivative works from, or in any way exploit the Work or make the Work available for others to do any of the same, in any form or by any means, in whole or in part, in each case without the prior written permission of Bentham Science Publishers, unless stated otherwise in this License Agreement.You may download a copy of the Work on one occasion to one personal computer (including tablet, laptop, desktop, or other such devices). You may make one back-up copy of the Work to avoid losing it.The unauthorised use or distribution of copyrighted or other proprietary content is illegal and could subject you to liability for substantial money damages. You will be liable for any damage resulting from your misuse of the Work or any violation of this License Agreement, including any infringement by you of copyrights or proprietary rights.
Disclaimer:
Bentham Science Publishers does not guarantee that the information in the Work is error-free, or warrant that it will meet your requirements or that access to the Work will be uninterrupted or error-free. The Work is provided "as is" without warranty of any kind, either express or implied or statutory, including, without limitation, implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the results and performance of the Work is assumed by you. No responsibility is assumed by Bentham Science Publishers, its staff, editors and/or authors for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products instruction, advertisements or ideas contained in the Work.
Limitation of Liability:
In no event will Bentham Science Publishers, its staff, editors and/or authors, be liable for any damages, including, without limitation, special, incidental and/or consequential damages and/or damages for lost data and/or profits arising out of (whether directly or indirectly) the use or inability to use the Work. The entire liability of Bentham Science Publishers shall be limited to the amount actually paid by you for the Work.
General:
Any dispute or claim arising out of or in connection with this License Agreement or the Work (including non-contractual disputes or claims) will be governed by and construed in accordance with the laws of Singapore. Each party agrees that the courts of the state of Singapore shall have exclusive jurisdiction to settle any dispute or claim arising out of or in connection with this License Agreement or the Work (including non-contractual disputes or claims).Your rights under this License Agreement will automatically terminate without notice and without the need for a court order if at any point you breach any terms of this License Agreement. In no event will any delay or failure by Bentham Science Publishers in enforcing your compliance with this License Agreement constitute a waiver of any of its rights.You acknowledge that you have read this License Agreement, and agree to be bound by its terms and conditions. To the extent that any other terms and conditions presented on any website of Bentham Science Publishers conflict with, or are inconsistent with, the terms and conditions set out in this License Agreement, you acknowledge that the terms and conditions set out in this License Agreement shall prevail.
Bentham Science Publishers Pte. Ltd.
80 Robinson Road #02-00
Singapore 068898
Singapore
Email: [email protected]
PREFACE
Natural Language processing is one of the fast-growing research areas that benefit the real world in various aspects. It gives the ability to machines to understand the text and audio in an efficient manner as human beings. NLP drives program code that supports virtual assistants, a voice-operated GPS system, text-to-speech transformation, and many more. NLP supports the creation of modern computers that understand human language with the help of deep learning and computational linguistics. Deep learning plays an important role in the processing of natural languages for various regional languages.
Natural Language Processing: A Handbook of Computational Linguistics covers chapters that focus on recent research in the form of reviews, surveys, technical articles, and state of art approaches. Through its numerous chapters, this edited book aims to include concepts in various areas such as recent developments and challenges in NLP, recent applications, learning techniques, text and sentence classification, speech technologies, machine translation, advances in information retrieval, and Indian language technologies. The objective of this book is to help researchers, academicians, and industry experts to give an idea/ direction/Research gaps for further extended research work.
KEY FEATURES
This book aims to provide state-of-the-art theoretical and experimental research on Natural Language Processing by using deep learning. The scope of this book is not only limited to academicians and researchers but also industry experts who work in the area of Natural Language Processing by using deep learning. The proposed book is certainly beneficial for both the academician and industry experts in terms of knowledge and further extended research work. The proposed book would be also useful as a reference guide for researchers, students, and engineers working in the area of natural language processing and deep learning.With the aid of various linguistic, statistical, and machine-learning techniques, text analytics transforms unstructured text data into information that can be analyzed. Even though organizations may find sentiment analysis intimidating, particularly if they have a sizable customer base, an NLP tool will often comb through consumer interactions, such as social media comments or reviews, or even brand name mentions, to see what is being said.When attempting to converse with someone who speaks a different language, language translation is of great assistance. Additionally, tools now identify the target language based on text input when translating from a different language to your own.We take for-granted features on our smartphones like autocorrect, autocomplete, and predictive text because they are so frequent. In that, they anticipate what you will type and either complete the word you are typing or propose a related one, autocomplete and predictive text are comparable to search engines.
This book certainly motivates the reader to work in the field of NLP by using deep learning. This book may also be used as a reference book for graduates/postgraduate students studying computer science, information technology, and electronics and communication engineering.
Youddha Beer Singh
Galgotias College of Engineering and Technology
Greater Noida
IndiaAditya Dev Mishra
Galgotias College of Engineering and Technology
Greater Noida
IndiaPushpa Singh
GL Bajaj Institute of Technology & Management
Greater Noida
India
&Dileep Kumar Yadav
Bennett University, Greater Noida
India
List of Contributors
Aastha TiwariDepartment of Computer Science and Engineering, IMS Engineering College, Ghaziabad, IndiaAditya Kumar YadavDepartment of Computer Science and Engineering, IMS Engineering College, Ghaziabad, IndiaAayushi ChauhanDepartment of Computer Science and Engineering, IMS Engineering College, Ghaziabad, IndiaArchana VermaSchool of Computer Applications, Noida Institute of Engineering and Technology, Greater Noida - 201310, Uttar Pradesh, IndiaAviral SrivastavaPennsylvania State University, USAAmbrish GangalKIET Group of Institutions, Ghaziabad, IndiaAsha Rani MishraDepartment of Computer Science & Technology, G.L Bajaj, Institute of Technology and Management, Greater Noida, IndiaAbhishek ThakurDepartment of EEE, BIT Mesra, IndiaAnuradha PillaiDepartment of Computer Engineering, J.C. Bose University of Science and Technology, YMCA, Faridabad, IndiaAbhishek SinghDepartment of Information Technology, School of Engineering and Technology, CSJM University, Kanpur, IndiaAshish AryaIndian Institute of Information Technology, Sonepat, IndiaArti RanjanGalgotias College of Engineering Technology, Greater Noida, IndiaArti RanjanDepartment of Computer Science Engineering, IGDTUW, New Delhi, IndiaBhumica VermaCSE Department , AKG Engineering College, Ghaziabad, IndiaDhiraj Prasad JaiswalSchool of Information Technology, The ICFAI University, Sikkim, IndiaDigvijay PandeyDepartment of Technical Education, I.E.T, Dr A.P.J. Abdul Kalam Technical University (AKTU), Lucknow (U.P), IndiaGagan GurungSchool of Information Technology, The ICFAI University, Sikkim, Sikkim, IndiaGarima SinghKIET Group of Institutions, Ghaziabad, IndiaManjeet SinghGovernment College Bhainswal Kalan, Sonipat, Haryana, IndiaM. RavinderDepartment of Computer Science Engineering, IGDTUW, New Delhi, IndiaNitin SharmaDepartment of Computer Science and Engineering, IMS Engineering College, Ghaziabad, IndiaNitin SharmaIT Department, AKG Engineering College, Ghaziabad, IndiaNehaManav Rachana University, Faridabad, Haryana, IndiaNeha SainiGovernment College Chhachhrauli, Yamunanagar, Haryana, IndiaNidhi SrivastavaAmity Institute of Information Technology, Lucknow, IndiaPreeti YadavM.J.P. Rohilkhand University, U.P., IndiaPayal GargDepartment of Computer Science & Technology, G.L Bajaj, Institute of Technology and Management, Greater Noida, IndiaRohit VashishtKIET Group of Institutions, Ghaziabad, IndiaRicha SinghKIET Group of Institutions, Muradnagar U.P, Ghaziabad, IndiaRekha KashyapKIET Group of Institutions, Muradnagar U.P, Ghaziabad, IndiaRashmi KumariSCSET Bennett University, Greater Noida, Uttar Pradesh, IndiaRohit TanwarSchool of Computer Science, University of Petroleum and Energy Studies(UPES), Dehradun (Uttarakhand), IndiaRahul ShahSchool of Information Technology, The ICFAI University, Sikkim, IndiaRaghwendra Kishore SinghNational Institute of Technology, Jameshedpur, IndiaSandeep Kumar VishwakarmaJ.C. Bose University of Science and Technology, YMCA, Faridabad, IndiaShahina AnjumDepartment of CSE, IEC College of Engineering and Technology, Greater Noida (U.P), IndiaSunil Kumar YadavDepartment of CSE, Accurate Institute of Management and Technology, Greater Noida (U.P), IndiaSubhajit GhoshDepartment of CSE, IMS Engineering College, Ghaziabad, IndiaSubhranil DasSchool of Business Faculty of Business and Leadership MIT World Peace University, Pune, IndiaSandeep VishwakarmaJCB UST YMCA Faridabad, Haryana, IndiaSunil KumarUIET CSJM University, Kanpur, IndiaShree Harsh AttriSharda School of Engineering and Technology, Department of Computer Science and Engineering, Sharda University, Greater Noida, Uttar Pradesh, IndiaSonia DeshmukhKIET Group of Institutions, Ghaziabad, IndiaSunil KumarDepartment of Computer Engineering, J.C. Bose University of Science and Technology, Faridabad, IndiaSandeep Kumar VishwakarmaDepartment of Computer Engineering, J.C. Bose University of Science and Technology, Faridabad, IndiaTarun KumarSchool of Computing Science and Engineering, Galgotia University, Greater Noida, Uttar Pradesh, IndiaVibhor HaritDepartment of Computer Science and Engineering, IMS Engineering College, Ghaziabad, IndiaViral ParmarPandit Deendayal Energy University, Gandhinagar, Gujarat, India
A Comprehensive Study of Natural Language Processing
Rohit Vashisht1,*,Sonia Deshmukh1,Ambrish Gangal1,Garima Singh1
1 KIET Group of Institutions, Ghaziabad, India
Abstract
Natural Language Processing (NLP) has received a lot of interest in the current era of digitization due to its capacity to computationally show and analyze human behaviours. Machine transformation, email spam recognition, information mining, and summarization, as well as medical and inquiry response, are just a few of the many tasks it is used for today. The development of NLP from the 1950s to 2023, with various outcomes in the specified period of time, has been outlined in this article. In addition, the fundamental NLP working components are used to show the analogy between the processing done by the human brain and NLP. Major NLP applications have been explored with examples. Last but not least, significant challenges and possible future directions in the same field have been highlighted.
Keywords: Lemmatization, Language processing, Machine learning, Stemming, Tokenization.
*Corresponding author Rohit Vashisht: KIET Group of Institutions, Ghaziabad, India; E-mail:
[email protected]References
[1]Nadkarni P.M., Ohno-Machado L., Chapman W.W.. Natural language processing: an introduction., J. Am. Med. Inform. Assoc..2011; 18(5): 544-551.
[CrossRef] [PubMed][2]Khurana D., Koli A., Khatter K., Singh S.. Natural language processing: state of the art, current trends and challenges., Multimedia Tools Appl..2023; 82(3): 3713-3744.
[CrossRef] [PubMed][3]Jain A., Kulkarni G., Shah V.. Natural Language Processing., Int. J. Comput. Sci. Eng..2018; 6(1): 161-167.
[CrossRef][4]Tsujii J.. Computational Linguistics and Natural Language Processing, Comput. Linguist.2021; 47(4): 707-727.
[CrossRef][5]Bateman J., Zock M.. , . Natural Language Generation.. The Oxford Handbook of Computational Linguistics; 2012.[6]Dilmegani C.. , . Natural language generation (NLG) in 2023. Available from: https://research.aimultiple.com/nlg/ [Accessed on: 01-Sep-2023][7]French R.M.. The Turing Test: the first 50 years., Trends Cogn. Sci..2000; 4(3): 115-122.
[CrossRef] [PubMed][8]Pieraccini R.. , . From AUDREY to Siri. Is speech recognition a solved problem?. 2020. Available from: https://www1.icsi.berkeley.edu/pubs/speech/audreytosiri12.pdf[9]Yadav S.P., Gupta A., Dos Santos Nascimento C., et al. , . Voice-Based Virtual-Controlled Intelligent Personal Assistants; International Conference on Computational Intelligence, Communication Technology and Networking (CICTN); Ghaziabad, India.. 2023, . p. 563.-568,.[10]Krishna Prakash Kalyanathaya D., Akila P.. Advances in Natural Language Processing - A Survey of Current Research Trends, Development Tools and Industry Applications, Int J Recent Technol Eng..2019; 7(5): 199-201.[11]Antoniak M., Mimno D.. Evaluating the stability of embedding-based word similarities., Trans. Assoc. Comput. Linguist..2018; 6: 107-119.
[CrossRef][12]Le Q. V., Mikolov T.. Distributed representations of sentences and documents, arXiv.. 1405.4053.2014[CrossRef][13]Conneau A., Lample G., Ranzato M., Denoyer L., Jégou H.. Word translation without parallel data, arXiv.. 1710.04087.2017[CrossRef][14]Ruder S., Vulić I., Søgaard A.. A survey of cross-lingual word embedding models, arXiv.. 1706.04902.2017[CrossRef][15]Levy O., Goldberg Y.. , . Dependency-Based Word Embeddings.; Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics; Baltimore, Maryland.. 2014, . p. 302.-308,.[16]Yadav S.P., Zaidi S., Nascimento C.D.S., de Albuquerque V.H.C., Chauhan S.S.. , . Analysis and Design of automatically generating for GPS Based Moving Object Tracking System.; 2023 International Conference on Artificial Intelligence and Smart Communication (AISC); Greater Noida, India.. 2023, . p. 1.-5,.[17]Chiche A., Yitagesu B.. Part of speech tagging: a systematic review of deep learning and machine learning approaches., J. Big Data.2022; 9(1): 10.
[CrossRef][18]Kang N., Singh B., Afzal Z., van Mulligen E.M., Kors J.A.. Using rule-based natural language processing to improve disease normalization in biomedical text., J. Am. Med. Inform. Assoc..2013; 20(5): 876-881.
[CrossRef] [PubMed][19]Nadeau D., Sekine S.. A survey of named entity recognition and classification, Lingvisticæ Investigationes., Int. J. Linguist. languag. Res..2007; 30(1): 3-26.
[CrossRef][20]Chen X., Xie H., Tao X.. Vision, status, and research topics of Natural Language Processing., Nat. Lang. Process. J..2022; 1(100001): 100001.
[CrossRef][21]Wani N.. , . International Conference On Latest Trends in Civil, Mechanical and Electrical Engineering. 2021. Design and Development of CHATBOT: A Review..[22]Salama R., Al-Turjman F., Aeri M., Yadav S.P.. , . Intelligent Hardware Solutions for COVID -19 and Alike Diagnosis - A survey; 2023 International Conference on Computational Intelligence, Communication Technology and Networking (CICTN); Ghaziabad, India.. 2023, . p. 796.-800,.Recent Advancements in Text Summarization with Natural Language Processing
Asha Rani Mishra1,*,Payal Garg1
1 Department of Computer Science & Technology, G.L Bajaj Institute of Technology and Management, Greater Noida, India
Abstract
Computers can now comprehend and interpret human languages thanks to Natural Language Processing (NLP), a subfield of artificial intelligence. NLP is now being used in a variety of fields, including healthcare, banking, marketing, and entertainment. NLP is employed in the healthcare industry for activities like disease surveillance, medical coding, and clinical documentation. NLP may extract relevant data from patient data and clinical notes. Sentiment classification, fraud prevention, and risk management are three areas of finance where NLP is applied. It can identify trends in financial data, spot anomalies that can point to fraud, and examine news stories and social network feeds to learn more about consumer trends and market dynamics. NLP is utilized in marketing for chatbot development, sentiment analysis, and consumer feedback analysis. It can assist in determining the needs and preferences of the consumer, create tailored marketing campaigns, and offer chatbot-based customer care. Speech recognition, language translation, and content suggestion are all uses of NLP in the entertainment industry. In order to suggest movies, TV series, and other material that viewers are likely to love, NLP analyses user behaviour and preferences. It can also translate text between languages and instantly translate audio and video content. It is anticipated that NLP technology will develop further and be used in new fields and use cases. It will soon be a necessary tool for enterprises and organizations in a variety of sectors. In this chapter, we will highlight the overview and adoption of NLP in different applications. Also, this chapter discusses text summarization, an important application of NLP. Different techniques of generating text summaries along with evaluation metrics are the highlights of the chapter.
Keywords: Cosine Similarity, Extractive Summarization, Natural Language Processing (NLP), ROUGE Scores, TF-IDF, TextRank, Text Summarization.
*Corresponding author Asha Rani Mishra: Department of Computer Science & Technology, G.L Bajaj Institute of Technology and Management, Greater Noida, India; E-mail:
[email protected]1. INTRODUCTION
Data is generated by many systems every day in large volumes. The large volume of text data is present in almost every domain and different sources like tweets,
articles, reviews, and comments. Text data is unstructured in nature since it does not fit into any predefined data model that is available to us as relational databases. For example, to store text data, organizations are using different file systems to access it as needed. There are many challenges while analyzing data to extract meaningful patterns or have insights for business decisions. Most of the algorithms in machine learning or data analytics are compatible with numeric data. Textual data, since it does not follow any structure, does not have regular syntax or patterns, so direct use of any mathematical or statistical model is not feasible. An essential component of artificial intelligence (AI) is natural language processing (NLP) tool that can provide many transformations that can be easily interpreted by the machine.
1.1. Evolution in NLP
In the 1950s and 1960s, at the dawn of artificial intelligence and computer science, NLP began to take shape. The goal of some of the earliest rule-based system development in NLP was to interpret and produce human language. Due to the limitations of these systems of linguistic comprehension, research in the 1990s and 2000s switched to statistical and machine learning-based methods. Unsupervised and semi-supervised learning algorithms for NLP were developed in the early 2000s as a result of the accessibility of enormous volumes of text data and computing capacity. Large-scale language models that could train to interpret and produce language in an unsupervised or weakly supervised manner were made possible by these techniques.
NLP experienced substantial breakthroughs in the middle of the 2010s as a result of the advent of deep learning techniques such as Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). With the use of these techniques, models that could learn to represent language in a continuous vector space may be created, allowing for more precise language comprehension and creation.
Transformer-based models like BERT and GPT have become the preeminent paradigm in NLP in recent years. These models may be customized for a wide range of NLP tasks and, in many cases, achieves state-of-the-art performance after being pretrained on substantial amounts of text data. NLP has developed a wide range of applications over the course of its progress, including chatbots, sentiment analysis, speech recognition, machine translation, and virtual assistants. We may anticipate even more ground-breaking uses for NLP in the future as research in the field advances.
1.2. Recent Advancement in NLP
The field of NLP has made numerous advancements recently. The following are some of the most important ones:
Large Pretrained Language Models: The creation of large pretrained language models, like GPT-3, which are capable of a variety of language tasks with high accuracy, is one of the most significant developments in NLP. These models can be customized for particular tasks like text categorization, sentiment analysis, and automated translation because they are trained on vast amounts of text data.Transfer Learning: Transfer learning is where a model is first trained on a large amount of data for a specific task and then fine-tuned on a smaller dataset for a different but related task. This technique has been shown to improve performance on a wide range of NLP tasks.Neural Machine Translation (NMT): This method of translation translates text from one language into another by using neural networks. NMT has been shown to produce more fluent and accurate translations than traditional statistical machine translation methods.Multilingual NLP: Recent developments in multilingual NLP have made it possible to train models in many languages simultaneously, boosting performance on multilingual tasks. Multilingual NLP refers to the ability of NLP models to process and understand text in numerous languages.Explainable AI (XAI) is the term used to describe an AI model's capacity to offer clear, comprehensible justifications for its predictions. Recent developments in NLP have made it possible to create XAI models that can give thorough justifications for their linguistic predictions, enhancing their trustworthiness and interpretability.Zero-Shot Learning: This method enables NLP models to carry out tasks for which they have not been specifically taught. Utilizing the general information and context obtained during pre-training, this is accomplished. It has been demonstrated that zero-shot learning works well for tasks like text classification and machine translation.Ethical Issues: As NLP is used more frequently in a variety of applications, ethical issues like prejudice, fairness, and privacy are receiving more attention. Researchers are working hard to create strategies that will reduce bias and guarantee that NLP models are impartial and considerate of user privacy.
1.3. Applications in NLP
Various Applications of NLP are shown in Fig. (1). Understanding some Natural Language Processing applications can help finish various time-consuming jobs more quickly and effectively while reducing the workload.
Fig. (1))
Various Applications of NLP. Email filtering: Email is used on a daily basis related to jobs, studies, or a variety of other topics having all different kinds of sources; some are work-related, while others are spam or promotional communications. Natural language processing is used in this situation. It distinguishes between incoming emails that are “important” or “spam” and filters them into those folders.Language translation: Due to the advent of technology, society has become a global village, making it necessary to engage with others who may speak a language that is unfamiliar to us. By translating the language with all of its sentiments, natural language processing aids great help.Smart assistants: In the modern world, a new smart device is introduced every day, making the world increasingly smarter. And not just machines are benefiting from this development, with the development of technology, intelligent personal assistants like Siri, Alexa, and Cortana. They even respond in the same way as humans.Natural Language Processing: Natural Language Processing makes all of this feasible. Composing any language into its constituent elements of speech, root stem, and other linguistic qualities aids the computer system in comprehending it. It not only aids in language comprehension but also in processing meaning and emotions and responding in a human-like manner.Document analysis: Document analysis is another application for NLP. Companies, colleges, schools and other similar institutions continually have an abundance of data that needs to be kept, organised, and searched for. All of this may be accomplished with NLP. Along with performing a keyword search, it also organises the results into relevant categories, preventing the user from having to spend time and effort browsing through numerous files to find a certain person's information. Not content with that, it also helps the user make knowledgeable decisions about how to handle claims and manage risks.Predictive text: Predictive text is a comparable application to online searches. Every time we use our smartphones to input anything, we use it. The keyboard suggests possible words whenever we input a few letters on the screen, and once we have written a few words, it begins to offer possible words for the following word. These predictive sentences could initially be a little inaccurate. Even so, over time, it learns from our messages and begins to correctly propose the next word even when we have not yet written a single letter of it. By giving our smartphones the ability to suggest phrases and learn from our messaging behaviours, NLP is used to accomplish all of this.Automatic summarization: Data has expanded along with numerous innovations and creations. This growth in data has also widened the range of data processing. However, manual data processing takes time and is prone to mistakes. NLP has a solution for it as well; in addition to summarizing information's meaning, it can also decipher its emotional significance. Consequently, the summarizing procedure is expedited and flawless.Social media monitoring: Everybody has a social media account, and sharing views, likes, dislikes, experiences, etc. on these platforms reveals a lot about the people. We discover details not only about specific people but also about goods and services. The relevant businesses can process this data to learn more about their goods and services so they can enhance or modify them. NLP is used in this situation. It makes it possible for the computer system to comprehend and analyze unstructured social media data in order to generate the necessary findings in a useful format for businesses.Sentiment analysis: The majority of discussions and texts are emotional because of daily interactions, posted material and comments, and book, restaurant, and product reviews. Understanding these feelings is just as crucial as comprehending the meaning of the words themselves. Humans are capable of deciphering the emotional undertones of written and spoken words, but with the aid of natural language processing, computers can also comprehend the sentiments of a document in addition to its literal meaning.Chatbots: As technology has advanced, everything from studying to shopping, buying tickets, and customer service has gone digital. The Chabot responds immediately and accurately rather than making the user wait a lengthy time to receive some brief and immediate answers. These chatbots are equipped with conversational capabilities thanks to NLP, enabling more accurate consumer responses than simple one-word answers. Chabot has also been useful in areas with limited or unreliable human power. NLP-based chatbots also include emotional intelligence, which enables them to efficiently comprehend and address the customer's emotional needs.
1.4. Role of Natural Language Processing in Text Mining
Text analytics, popularly known as ‘Text Mining’, are the techniques or processes that can be used to obtain qualitative information and hidden insights or patterns from the text data [1]. Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML) can help any end user to achieve this goal. Text analytics can be applied by using a collection of machine learning algorithms and linguistic and statistical techniques. Text analytics is used for modelling purposes to extract relevant information from text data. Analysis can be descriptive, predictive, or exploratory in nature.
The purpose of text mining is to transform text documents into a structured format so that they can be used efficiently with AI techniques. Context understanding, as well as semantics, is the foundation of many operations in text mining [2]. Text mining helps to find out concepts, hidden patterns, topics, keywords or features, or any relevant attributes in the text data. Text mining performs pre-processing, feature extraction, selection, and categorization steps to provide a structured representation of text documents. The goal of text mining can be skilled with other methodologies for better analysis, like NLP. NLP based text mining provides better in-depth understanding with reduced efforts. Fig. (2) shows how text mining discovers important or relevant information in text by transforming text into suitable data for further analysis by using NLP methodology. NLP is an important text-mining component that mainly focuses on linguistic analysis.
1.5. Challenges in Handling Text Data
From many heterogeneous diverse sources, text data are collected every day in a large volume. Recently, Machine Learning and Natural Language Processing have gained a lot of popularity for analyzing text data. In the case of text data, NLP includes several techniques for the interpretation of human-based language, which can be either in the form of statistical, machine learning, or rule-based learning. NLP plays an important role in problems where there is a need to understand the context of sentences, like finding the meaning of similar words. Various challenges associated with text data are listed below.
Difficulty in semantic/ context understandingInteraction/Relationship among entities is complexData collection from heterogeneous sourcesComplex/High dimensional data spaceData ambiguityInformation extraction techniques are inefficient because either text data is unstructured or semi-structured.Complex due to big and real-time dataMissing dataVariation in languageHomonymsLanguage TranslationDimension reductionCannot handle data sparsenessMore storage requirementExtracting information from text dataFig. (2))