168,99 €
This book provides state-of-the-art approaches to deep learning in areas of detection and prediction, as well as future framework development, building service systems and analytical aspects in which artificial neural networks, fuzzy logic, genetic algorithms, and hybrid mechanisms are used.
Deep learning algorithms and techniques are found to be useful in various areas, such as automatic machine translation, automatic handwriting generation, visual recognition, fraud detection, and detecting developmental delays in children. “Deep Learning Techniques for Automation and Industrial Applications” presents a concise introduction to the recent advances in this field of artificial intelligence (AI). The broad-ranging discussion covers the algorithms and applications in AI, reasoning, machine learning, neural networks, reinforcement learning, and their applications in various domains like agriculture, manufacturing, and healthcare. Applying deep learning techniques or algorithms successfully in these areas requires a concerted effort, fostering integrative research between experts from diverse disciplines from data science to visualization.
This book provides state-of-the-art approaches to deep learning covering detection and prediction, as well as future framework development, building service systems, and analytical aspects. For all these topics, various approaches to deep learning, such as artificial neural networks, fuzzy logic, genetic algorithms, and hybrid mechanisms, are explained.
Audience
The book will be useful to researchers and industry engineers working in information technology, data analytics network security, and manufacturing. Graduate and upper-level undergraduate students in advanced modeling and simulation courses will find this book very useful.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 379
Veröffentlichungsjahr: 2024
Cover
Table of Contents
Series Page
Title Page
Copyright Page
Preface
1 Text Extraction from Images Using Tesseract
1.1 Introduction
1.2 Literature Review
1.3 Development Areas
1.4 Existing System
1.5 Enhancing Text Extraction Using OCR Tesseract
1.6 Unified Modeling Language (UML) Diagram
1.7 System Requirements
1.8 Testing
1.9 Result
1.10 Future Scope
1.11 Conclusion
References
2 Chili Leaf Classification Using Deep Learning Techniques
2.1 Introduction
2.2 Objectives
2.3 Literature Survey
2.4 About the Dataset
2.5 Methodology
2.6 Result
2.7 Conclusion and Future Work
References
3 Fruit Leaf Classification Using Transfer Learning Techniques
3.1 Introduction
3.2 Literature Review
3.3 Methodology
3.4 Conclusion and Future Work
References
4 Classification of University of California (UC), Merced Land-Use Dataset Remote Sensing Images Using Pre-Trained Deep Learning Models
4.1 Introduction
4.2 Motivation and Contribution
4.3 Methodology
4.4 Experiments and Results
4.5 Conclusion
References
5 Sarcastic and Phony Contents Detection in Social Media Hindi Tweets
5.1 Introduction
5.2 Literature Review
5.3 Research Gap
5.4 Objective
5.5 Proposed Methodology
5.6 Expected Outcomes
References
6 Removal of Haze from Synthetic and Real Scenes Using Deep Learning and Other AI Techniques
6.1 Introduction
6.2 Formation of a Haze Model
6.3 Different Techniques of Single-Image Dehazing
6.4 Results and Discussions
6.5 Output for Synthetic Scenes
6.6 Output for Real Scenes
6.7 Conclusions
References
7 HOG and Haar Feature Extraction-Based Security System for Face Detection and Counting
7.1 Introduction
7.2 Literature Survey
7.3 Proposed Work
7.4 Experiments and Results
7.5 Conclusion and Scope of Future Work
References
8 A Comparative Analysis of Different CNN Models for Spatial Domain Steganalysis
8.1 Introduction
8.2 General Framework
8.3 Experimental Results and Analysis
8.4 Conclusion and Discussion
Acknowledgments
References
9 Making Invisible Bluewater Visible Using Machine and Deep Learning Techniques–A Review
9.1 Introduction
9.2 Determination of Groundwater Potential (GWP) Parameters
9.3 GWP Determination: Methods and Techniques
9.4 GWP Output: Applications
9.5 GWP Research Gaps: Future Research Areas
9.6 Conclusion
References
10 Fruit Leaf Classification Using Transfer Learning for Automation and Industrial Applications
10.1 Introduction
10.2 Data Collection and Preprocessing
10.3 Loading a Pre-Trained Model for Fruit Leaf Classification Using Transfer Learning
10.4 Training and Evaluation
10.5 Applications in Automation and Industry
10.6 Conclusion
10.7 Future Work
References
11 Green AI: Carbon-Footprint Decoupling System
11.1 Introduction
11.2 CO
2
Emissions in Sectors
11.3 Heating and Cooking Emissions
11.4 Automobile Systems Emission
11.5 Power Systems Emission
11.6 Total CO
2
Emission
11.7 Green AI With a Control Strategy of Carbon Emission
11.8 Green Software
11.9 Conclusion
11.10 Future Scope and Limitation
References
12 Review of State-of-Art Techniques for Political Polarization from Social Media Network
12.1 Introduction
12.2 Political Polarization
12.3 State-of-the-Art Techniques
12.4 Literature Survey
12.5 Conclusion
References
13 Collaborative Design and Case Analysis of Mobile Shopping Apps: A Deep Learning Approach
13.1 Introduction
13.2 Personalized Interaction Design Framework for Mobile Shopping
13.3 Case Analysis
13.4 Conclusions
References
14 Exploring the Potential of Machine Learning and Deep Learning for COVID-19 Detection
14.1 Introduction
14.2 Supervised Learning Techniques
14.3 Unsupervised Learning Techniques
14.4 Deep Learning Techniques
14.5 Reinforcement Learning Techniques
14.6 Comparison of Machine Learning and Deep Learning Techniques
14.7 Challenges and Limitations
14.8 Conclusion and Future Directions
References
Index
End User License Agreement
Chapter 1
Table 1.1 Precision.
Table 1.2 Accuracy observed for different types of images.
Chapter 2
Table 2.1 The training accuracy, training loss, validation accuracy, and valid...
Table 2.2 The training accuracy, training loss, validation accuracy, and valid...
Chapter 6
Table 6.1 Ciede2000 for synthetic and real scenes (Figure 6.4 and Figure 6.5).
Table 6.2 SSIM and PSNR values for synthetic scenes (Figure 6.4).
Table 6.3 Average Ciede2000 values for real scene dataset (45 images).
Table 6.4 Average SSIM and PSNR values for synthetic scenes for outdoor haze d...
Chapter 7
Table 7.1 Confusion matrix of individual counting via face detection.
Chapter 8
Table 8.1 Experimental setup.
Table 8.2 Setting of parameters for the pretrained models.
Table 8.3 Validation and testing accuracies for all the CNN models used.
Chapter 9
Table 9.1 Research questions associated with GWP parameters.
Table 9.2 Research questions associated with GWP methods using RS-GIS and IoT.
Table 9.3 Research questions associated with GWP methods using AIML techniques...
Table 9.4 Research questions associated with GWP results and applications.
Table 9.5 Research questions associated with future direction in GWP research.
Chapter 10
Table 10.1 Case studies of fruit leaf classification in industry using transfe...
Chapter 11
Table 11.1 Country wise code and total consumption of energy in Matric Tones.
Table 11.2 Washington, DC: World Resources Institute. Available: https://www.c...
Chapter 12
Table 12.1 The pros and cons of hybrid and single techiques.
Chapter 1
Figure 1.1 Methodology.
Figure 1.2 UML diagram.
Figure 1.3 Architecture.
Chapter 2
Figure 2.1 Example of a healthy leaf.
Figure 2.2 Example of an unhealthy leaf.
Figure 2.3 Methodologies used in this chapter.
Figure 2.4 Training dataset.
Figure 2.5 Image after data augmentation is applied.
Figure 2.6 Graph displaying the results of the CNN model.
Chapter 3
Figure 3.1 A flow chart diagram of the methodology.
Figure 3.2 Sample images from the dataset categorized into diseased and health...
Figure 3.3 Images after data augmentation (horizontal and vertical flip).
Figure 3.4 Accuracy and loss graph: CNN.
Figure 3.5 Accuracy and loss graph: RestNet50.
Figure 3.6 Accuracy and loss graph: InceptionV3.
Figure 3.7 Accuracy and loss graph: VGG19.
Chapter 4
Figure 4.1 Methodology of classification.
Figure 4.2 UC Merced dataset images.
Figure 4.3 Sample way of expanding various deep architectures.
Figure 4.4 Accuracy and loss plots’ comparison of the VGG family.
Figure 4.5 Accuracy and loss plots of ResNet50.
Figure 4.6 Accuracy and loss plots of Resnet101.
Figure 4.7 Accuracy and loss plot of ResNet152.
Figure 4.8 Accuracy and loss plots of MobileNet.
Figure 4.9 Accuracy and loss plots of Inception.
Figure 4.10 Accuracy and loss plots of Xception.
Figure 4.11 Accuracy and loss plots of the DenseNet family.
Figure 4.12 Accuracy and loss plots of NasNet.
Figure 4.13 (a–h) Accuracy and loss plots of the EfficientNet family.
Figure 4.13 (i–p) Accuracy and loss plots of the EfficientNet family.
Figure 4.14 (a–h) Accuracy and loss plots of the EfficientNetV2 family.
Figure 4.14 (i–n) Accuracy and loss plots of the EfficientNetV2 family.
Figure 4.15 Comprehensive analysis of various deep learning models.
Chapter 5
Figure 5.1 Sample sarcastic Hindi tweet.
Figure 5.2 Block diagram of sarcasm detection on Twitter.
Figure 5.3 Feature classification for sarcasm detection.
Figure 5.4 Word2Vec word embedding.
Chapter 6
Figure 6.1 Scattering model.
Figure 6.2 Color attenuation prior flowchart.
Figure 6.3 Encoder layer and decoder layer of the CNN.
Figure 6.4 Synthetic scenes using different techniques.
Figure 6.5 Real hazy scenes using different techniques.
Chapter 7
Figure 7.1 Approaches for face detection.
Figure 7.2 Process of HOG feature extraction.
Figure 7.3 Individual counting via face detection.
Figure 7.4 Individual counting via face detection.
Chapter 8
Figure 8.1 General architecture of pretrained models for steganalysis used in ...
Figure 8.2 The square convolutional kernel.
Figure 8.3 XuNet architecture.
Figure 8.4 VGGNet architecture [40].
Figure 8.5 ResNet-50 architecture [41].
Figure 8.6 EfficientNet-B0 architecture [36].
Figure 8.7 Training and validation accuracies’ plots of all the CNN models use...
Figure 8.8 Plot of accuracies of XuNet without the K
5
preprocessing filter.
Chapter 9
Figure 9.1 Groundwater influencing parameters.
Figure 9.2 GWP influencing parameters and potential data sources.
Figure 9.3 Thematic layers and sample delineated groundwater storage and recha...
Chapter 10
Figure 10.1 Common pre-trained models used in transfer learning.
Figure 10.2 Architecture of VGG-16 Model [10].
Figure 10.3 ResNet-50 architecture [12].
Figure 10.4 An inception module [14].
Figure 10.5 Steps to load a pre-trained model.
Figure 10.6 The training process.
Figure 10.7 The evaluation process.
Figure 10.8 Metrics for measuring model performance.
Chapter 11
Figure 11.1 Artificial intelligence conferences.
Figure 11.2 Map of CO
2
emission.
Figure 11.3 Transform column types.
Figure 11.4 Relation between CO
2
emissions in India and renewable energy data.
Figure 11.5 Count of C_id by country related to the table.
Chapter 12
Figure 12.1 DNNs with mandatory parameter sharing for MTL.
Figure 12.2 Soft parameter sharing for MTL in DNN.
Chapter 13
Figure 13.1 Modular interactive information architecture of a shopping app.
Figure 13.2 Principle interaction path.
Figure 13.3 Linear interactive path.
Figure 13.4 Web page and app of a sample site.
Figure 13.5 Display interface of the eBay app.
Figure 13.6 User interface of a shopping app.
Figure 13.7 Second-level page design of different topics of an app.
Chapter 14
Figure 14.1 Comparison between machine learning and deep learning.
Figure 14.2 Supervised learning.
Figure 14.3 Logistic regression.
Figure 14.4 Decision trees.
Figure 14.5 Support vector machines.
Figure 14.6 Naive Bayes.
Figure 14.7 K-Nearest neighbors.
Figure 14.8 Unsupervised learning.
Figure 14.9 Clustering.
Figure 14.10 Reinforcement learning techniques.
Cover Page
Table of Contents
Series Page
Title Page
Copyright Page
Preface
Begin Reading
Index
WILEY END USER LICENSE AGREEMENT
ii
iii
iv
xiii
xiv
xv
xvi
xvii
xviii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
257
258
259
260
Scrivener Publishing100 Cummings Center, Suite 541JBeverly, MA 01915-6106
Publishers at ScrivenerMartin Scrivener ([email protected])Phillip Carmical ([email protected])
Edited by
Pramod Singh Rathore
Sachin Ahuja
Srinivasa Rao Burri
Ajay Khunteta
Anupam Baliyan
and
Abhishek Kumar
This edition first published 2024 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA© 2024 Scrivener Publishing LLCFor more information about Scrivener publications please visit www.scrivenerpublishing.com.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
Wiley Global Headquarters111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchant-ability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read.
Library of Congress Cataloging-in-Publication Data
ISBN 978-1-394-23424-0
Cover image: Pixabay.ComCover design by Russell Richardson
Artificial intelligence learning is the fastest-growing field in computer science. Deep learning algorithms and techniques are found to be useful in different areas, such as automatic machine translation, automatic handwriting generation, visual recognition, fraud detection, and detecting developmental delays in children. Deep Learning Techniques for Automation and Industrial Applications presents a concise introduction to the recent advances in the field of artificial intelligence (AI). The broad-ranging discussion herein covers the algorithms and applications in the areas of AI, reasoning, machine learning, neural networks, reinforcement learning, and their applications in various domains like agriculture and healthcare. Applying deep learning techniques or algorithms successfully in these areas requires a concerted effort, fostering integrative research between experts ranging from diverse disciplines, from data science to visualization.
This book provides state-of-the-art approaches to deep learning in these areas. It covers detection and prediction, as well as future framework development, building service systems, and analytical aspects. For all these topics, various approaches to deep learning, such as artificial neural networks, fuzzy logic, genetic algorithms, and hybrid mechanisms, are explained.
The successful application of deep learning techniques to enable meaningful, cost-effective, personalized cloud security service is a primary and current goal. However, realizing this goal requires effective understanding, application, and amalgamation of deep learning and several other computing technologies to deploy such a system effectively. This book helps to clarify certain key mechanisms of technology to realize a successful system. It enables the processing of very large datasets to help with precise and comprehensive forecasts of risk and delivers recommended actions that improve outcomes for consumers. This is a novel application domain of deep learning and is of prime importance to all of human civilization.
Preparing both undergraduate and graduate students for advanced modeling and simulation courses, this book helps them to carry out effective simulation studies. In addition, graduate students will be able to comprehend and conduct AI and data mining research after completing this book.
The book comprises fourteen chapters. In Chapter 1, images play a crucial role in describing, representing, and conveying information, which aids in human productivity, cost, analysis, and other areas. Text extraction is the method used to convert text to plain text. Text extraction is a very challenging problem because of the many changes in these texts’ size, orientation, and alignment, as well as low-resolution/pixelated images, noisy backgrounds, etc. Using the Tesseract OCR engine, we aim to reduce these issues in this project. Tesseract is developed by Google and its an opensource optical character recognition (OCR) engine. OCR technology allows computers to recognize text in images, making it possible to convert images of text into machine-readable text. Tesseract has been trained in a wide variety of languages and scripts, including English, Chinese, Arabic, and more.
Chapter 2 addresses agriculture, which is the most important part of everyone’s life. Crops and plants play a huge role in the making of life and taking care of those crops and plants is both important and tough. Thus, to detect the disease in plants, this chapter demonstrates how to tell whether a plant is healthy or not. The dataset consists of chilly plant leaves collected from one of the fields located in Andhra Pradesh, India. Many image classification models, as well as transfer learning models, are applied. Deep learning models CNN and transfer learning models, like InceptionV3 and VGG16, are applied with and without data augmentation.
In Chapter 3, researchers have applied various deep learning and transfer learning methods to accurately predict the disease of a damaged plant, so that we can cure the plant in its initial stage. The models are trained on the image dataset containing various categories of plants like mango and pomegranate. The results state that ResNet outperformed Inception, VGG19, and CNN by giving an accuracy of 88% and 87.5% percent for pomegranate and mango respectively.
In Chapter 4, researchers have compared different deep learning-based classification techniques on a remote sensing image dataset. The dataset has been taken from the UC Merced Land Use Dataset, which contains a total of 21 classes, with every class consisting of 100 images of size 256 x 256. The models used in this study are VGG, ResNet, Inception, Dense Net, and Efficient Net, which are deep convolutional network architectures for image classification with different numbers of layers. To make meaningful comparisons, all models were extended by adding three layers at the end to improve their performance. The performance of the VGG19 model was found to superior. This model was able to classify almost all images belonging to 21 classes with an accuracy of 100% in training and 95.07% in testing data, followed by VGG16 with 93% and ResNet with 91% accuracy in testing data.
Chapter 5 deals with sarcasm. Sarcasm is a sardonic or bitter remark, intended to express disrespect or ridicule. It is used in Hindi language, originating from many of Hindi idioms and proverbs, and often uses indirect sarcasm, as in saying, “” with the meaning, “.” Sentiment categorization is easier in comparison to sarcasm detection, as we see in the above-written idiom, which contains a negative sentiment. The intention of the sentence is to call an uneducated person knowledgeable among a group of fools. In the present scenario, people on social media platforms like Twitter, Facebook, and WhatsApp succeed in recognizing sarcasm despite interacting with strangers across the world. Sarcasm detection is a challenging task in Natural Language Processing due to the richness of morphology. Detecting sarcasm in Hindi language tweets is a prime task for Natural Language processing to avoid misconstruing sarcastic statements as original literal statements.
Chapter 6 explains how the image is the main source for all image processing fields, like surveillance, detection, recognition, satellite, etc. Good visibility of images captured by sensors becomes crucial for all computer vision tasks. Sometimes the scene quality is degraded by bad weather conditions like haze, fog, or smoke, therefore making it difficult for the computer vision area to obtain actual information. Haze can be removed from a single-input scene by using dehazing methods. Synthetic haze can be created by a haze generator, and, currently, most image dehazing techniques are applied for synthetic haze. Various single-image dehazing techniques are being developed and tested on real-world scenes that are captured in hazy environments using cameras.
Chapter 7 demonstrates how the framework needs accurate and realtime performance to count how many people are present at a particular moment in a particular frame. So, our counting framework automatically detects each person’s face and makes instantaneous decisions to count the number of persons in front of the camera or within a set of images. The work of individual counting can be done in two broad ways: the first is the detection of faces; the second is the counting approach used to track and count people within a frame.
Chapter 8 describes how CNN networks can show a resemblance to traditional steganalysis by using filters for feature extraction. Due to the use of content-adaptive steganographic methods, the stego message is hidden more often in the complex areas of the image and thus cannot be detected with a simple statistical analysis of the image. The stego information in these steganographic methods affect the dependencies between the pixels introduced through various kinds of noise present in the image. Thus, the difference between the cover and stego image is identified through the noise part rather than the image content. Different researchers have used various preprocessing filters for calculating the noise residuals and passing them to the CNN network instead of images directly. This work employs a content adaptive steganography method, Highly Undetectable Steganography (HUGO), for creating stego images. Furthermore, this work provides a comparative analysis of one of the variants of CNN models specific for steganalysis and various pre-trained models of computer vision that apply to steganalysis.
Chapter 9 discusses how groundwater abstraction beyond the safe limit is causing a rapid groundwater table depletion at the rate of 1–2 m/year in many districts. Uncontained and unplanned usage may affect food production by 20 percent. Due to the significant impact of this imperceptible resource on various aspects of life, the economy, the environment, and society, there is a pressing need to enhance the scientific comprehension, estimation, and administration of groundwater management. A scientific framework for the demarcation of its potential storage and recharge zonal maps, i.e., GWPSZ and GWRZ, can be instrumental in this regard for urban and rural water committees to objectively manage the resources at the regional level.
Chapter 10 explains the process of using pre-trained models to classify different types of fruit leaves accurately. We also discuss the advantages of transfer learning for industrial applications, including improved accuracy, reduced training time, and better utilization of resources. We provide code examples and practical guidance for implementing transfer learning using popular deep learning frameworks like TensorFlow. By the end of this chapter, readers will have a good understanding of how to use transfer learning for fruit leaf classification and how it can be applied in industrial settings.
Chapter 11 reveals that the carbon footprint of these calculations proves to be rather high. Deep learning research can be challenging for academics, students, and researchers, especially those from emerging economies, due to the financial expense of calculations. In addition to accuracy and related metrics, productivity is included as an evaluation criterion in this chapter’s proposed practical solution. Our objective is to create green AI that significantly outperforms red AI in terms of receiver performance and to increase green AI by reducing its environmental impact.
Chapter 12 explains how networking services have modified the methods and scale of cyberspace communication. In the past decade, the social network has gained significant attention. The Internet and Web 2.0 applications are becoming more affordable for accessing social network sites like Twitter, Facebook, LinkedIn, and Google+. People are becoming more interested in information news and opinion on a wide range of issues and therefore more reliant on social networks.
Chapter 13 engages in an in-depth analysis of the associated ideas and challenges that arise in the process of shopping app interface design. It further demonstrates the functional simplicity, ease of use, and effectiveness that is centered on the experience of the user and is built on an interactive model via the use of classical instances and practical design projects. After that, we use these theories, together with associated ideas in design psychology, behavior, design aesthetics, and ergonomics, to thoroughly study mobile shopping app interaction design. It presents mobile shopping apps via the process of “global-local-global” for repeated personalized interactive design, and it gives useful recommendations and ideas for the construction of mobile and vertical shopping apps. Additionally, the chapter includes analysis and data collecting of practical instances.
Chapter 14 presents an extensive review of the application of machine learning and deep learning methods in detecting COVID-19. It emphasizes the significance of early and accurate diagnosis in effectively managing the disease during the COVID-19 pandemic. The chapter encompasses various aspects of machine learning and deep learning techniques, including supervised and unsupervised learning, convolutional neural networks, recurrent neural networks, reinforcement learning, and a comparative analysis of machine learning and deep learning methods for COVID-19 detection.
We are deeply grateful to everyone who helped with this book and greatly appreciate the dedicated support and valuable assistance rendered by Martin Scrivener and the Scrivener Publishing team during its publication.
Pramod Singh Rathore
Department of Computer and Communication Engineering Manipal University Jaipur Rajasthan, India
Dr. Sachin Ahuja
Department of Computer Science Chandigarh University, India
Srinivasa Rao Burri
Senior Software Engineering Manager Western Union Denver, CO
Dr. Ajay Khunteta
Department of Computer Science and Engineering, Poornima University, Jaipur, India
Dr. Anupam Baliyan
Dean of Academic Planning and Research Galgotias University, India
Dr. Abhishek Kumar
Faculty of Engineering, Manipal University, Jaipur, India
Santosh Kumar*, Nilesh Kumar Sharma, Mridul Sharma and Nikita Agrawal
Department of Computer Science, Global Institute of Technology, Jaipur, India
Images play a crucial role in describing, representing, and conveying information, which are essential in many businesses and organizations. Text extraction from images is the method used to convert text to plain text. Text extraction refers to the systematic procedure used for converting textual material into a simplified plain text format. The task at hand presents a significant difficulty as a result of the many changes in variables such as the size, orientation, and alignment of the text. Furthermore, the inclusion of low-resolution, pixelated pictures, coupled with the existence of noisy backgrounds, exacerbates the complexity associated with the process of text extraction. Using the Tesseract OCR engine, we aim to reduce these issues in this project.
Tesseract is developed by Google and is an open-source optical character recognition (OCR) engine. OCR technology allows computers to recognize text in images, making it possible to convert images of text into machine-readable text. Tesseract has been trained in a wide variety of languages and scripts, including English, Chinese, and Arabic.
Tesseract can process images that are rotated, tilted, or skewed and can recognize text written in different scripts, such as English and Arabic. It uses machine learning algorithms to improve its recognition accuracy over time, making it suitable for use in a wide range of applications, including document scanning, archiving, and indexing.
Keywords: Text extraction, LSTM, image text, OCR, Tesseract
Text extraction, also known as optical character recognition (OCR), is the process of automatically extracting text from an image or scanned document and converting it into machine-readable text that can be used for indexing, searching, editing, or storing the document.
OCR software uses advanced algorithms and machine learning techniques to identify and recognize characters and words within the document. The process of text extraction typically involves the following steps:
Pre-processing: In this step, the image or scanned document is prepared for OCR by optimizing its quality. Techniques such as noise reduction, image enhancement, and contrast adjustment are used to improve image readability and reduce errors during OCR.
Layout analysis: The OCR software analyzes the layout of the document to identify the text areas, headings, and other features.
Optical character recognition: In this step, the OCR software reads and recognizes the characters in the document and translates them into machine-readable text.
Post-processing: In this step, the recognized text is refined to correct errors and improve its accuracy. Techniques such as spell-checking, grammar correction, and formatting are used to improve the quality of the output [
1
,
2
].
Utilizing OCR software, such as Google’s open-source OCR engine Tesseract, the aforementioned steps are possible. Tesseract uses advanced machine learning techniques to improve its recognition accuracy over time and can recognize text in multiple languages and scripts.
The following are the steps involved in text extraction using OCR software:
Scanning or importing the image or document to be processed
Preprocessing the image or document by adjusting its quality, orientation, and size
Performing layout analysis to identify the text areas, headings, and other features(d) Running OCR on the document to extract the text
Post-processing the recognized text to correct errors and improve its accuracy and formatting
Saving the extracted text in a machine-readable format such as plain text or a searchable PDF.
Furthermore, by training the OCR engine on a specific dataset of languages, it is possible to improve the accuracy of OCR results [3, 4].
Text extraction
Python
OCR
React JS
Flask
Text extraction, also known as optical character recognition (OCR), is a valuable technology because it enables the automated extraction of text from images and scanned documents. Here are some reasons why text extraction is important:
Improved efficiency: Text extraction automates the process of converting image-based text into machine-readable text. This saves time and effort compared to manual data entry, especially when dealing with large volumes of data.
Easy to search and index: Text extraction allows users to convert image-based text into searchable and indexable text. This is especially important when dealing with large collections of documents or when searching for specific information within a document.
Improved accessibility: Text extraction makes it possible to convert image-based text into machine-readable text that assistive technologies like screen readers can read. This improves accessibility for people with visual impairments.
Cost-effective: Text extraction can save organizations money by reducing manual data entry and improving the accuracy of document management processes (DMPs).
Multilingual support: OCR technology has advanced significantly in recent years and now supports the recognition of text in multiple languages and scripts. This means that text extraction can be used in various multilingual applications, such as multilingual document processing
[5]
or translating
[6]
.
Digitization of Printed Documents: OCR can be used to scan and digitize printed documents such as books, magazines, and newspapers. The resulting digital text can then be stored, searched, and edited on a computer.
Automatic data entry: OCR can be used to automatically extract data from forms, invoices, and receipts, saving time and reducing errors compared to manual data entry.
Handwriting recognition: OCR can be used to recognize handwritten text, enabling applications such as digital note-taking or handwriting-based input for mobile devices.
Text-to-speech: OCR can be used to convert printed text into speech, enabling visually impaired individuals to access written information.
Language translation: OCR can be used to recognize text in one language and automatically translate it into another.
Automatic license plate recognition: OCR can be used to recognize license plates on vehicles, enabling applications such as automatic toll collection or traffic monitoring.
Document classification: OCR can be used to recognize the type of document being scanned, enabling automatic sorting and routing of documents.
Image indexing: OCR can be used to index and search image collections, enabling users to search for images based on the text they contain.
Historical document analysis: OCR can be used to digitize and analyze historical documents, enabling researchers to study the content and context of these documents in new ways.
Passport and ID verification: OCR can be used to read and verify the text on passports and other identification documents, improving security and reducing the risk of fraud.
Text extraction using optical character recognition (OCR) has been a hot research topic for several years. The advancements in OCR technology have made it possible to extract text accurately and quickly from images, which has numerous applications in various industries. In this literature review, we will explore the current state of research in text extraction using Tesseract OCR [7].
Tesseract OCR is an open-source OCR engine developed by Google that has gained popularity due to its accuracy and robustness. Several studies have been conducted to explore the effectiveness of Tesseract OCR in text extraction. One such study by Bhargava et al. (2020) compared the performance of Tesseract OCR with other popular OCR engines, such as Adobe Acrobat, ABBYY, FineReader, and OmniPage [8]. The study found that Tesseract OCR outperformed other OCR engines in terms of accuracy, speed, and ease of use.
In another study by Kumar et al. (2017), the authors proposed a novel method for text extraction using Tesseract OCR [9, 10]. The proposed method involved preprocessing the input image by applying various filters to enhance the quality of the image. The preprocessed image was then fed to the Tesseract OCR engine for text extraction. The results of the study showed that the proposed method improved the accuracy of text extraction significantly.
Moreover, several studies have focused on improving the accuracy of Tesseract OCR through machine learning techniques. For instance, in a study by M. Alzahrani et al. (2020), the authors proposed a machine learning-based approach to improve the accuracy of Tesseract OCR in recognizing handwritten text. The approach involved training a support vector machine (SVM) classifier using a dataset of handwritten characters. The trained SVM was then used to classify the characters extracted by the Tesseract OCR engine. The results showed that the proposed approach significantly improved the accuracy of Tesseract OCR in recognizing handwritten text.
Furthermore, a recent study by Bhargava et al. (2022) proposed a deep learning-based approach for text extraction using Tesseract OCR [11, 12]. The proposed approach involved training a convolutional neural network (CNN) using a dataset of images and corresponding ground truth text. The trained CNN was then used to preprocess the input image before feeding it to the Tesseract OCR engine for text extraction. The results of the study showed that the proposed approach outperformed traditional OCR engines in terms of accuracy and speed [13].
In addition to improving the accuracy of Tesseract OCR, several studies have explored its applications in various industries. For instance, in a study by S. J. Alzahrani et al. (2018), the authors proposed a method for extracting text from bank statements using Tesseract OCR. The proposed method involved preprocessing the input image by removing noise and enhancing the quality of the image. The preprocessed image was then fed to the Tesseract OCR engine for text extraction. The results showed that the proposed method was effective in extracting text from bank statements, which can be used for financial analysis and fraud detection.
Similarly, in a study by R. K. Samal et al. (2017), the authors proposed a method for extracting text from medical images using Tesseract OCR. The proposed method involved preprocessing the input image by removing artifacts and enhancing the quality of the image. The preprocessed image was then fed to the Tesseract OCR engine for text extraction. The results showed that the proposed method was effective in extracting text from medical images, which can be used for medical diagnosis and research.
In conclusion, Tesseract OCR has proven to be an effective tool for text extraction from images. Several studies have explored its effectiveness and applications in various industries. The advancements in machine learning and deep learning techniques have further improved the accuracy of Tesseract OCR in recognizing text from images. The future scope of research in text extraction using Tesseract appears promising, with potential enhancements in areas like multi-language support, complex document formatting, and real-time processing [14–17].
React JS is a front-end JavaScript library used for building user interfaces.
React JS uses a component-based architecture, which makes it easy to build reusable components for the UI.
React JS can be used with other JavaScript libraries and frameworks to create powerful web applications.
Flask is a Python web framework used for building web applications and APIs.
It is a micro-framework, meaning that it provides the minimum set of tools needed to build a web application.
Flask is flexible and allows developers to customize and configure it to meet their needs.
It provides a built-in development server, which makes it easy to get started with building a web application.
Flask is highly extensible and can be used with various Python libraries and tools to build powerful web applications.
Here are some other differences between React JS and Flask to keep in mind:
React JS is a front-end library, while Flask is a back-end web framework.
React JS is written in JavaScript, while Flask is written in Python.
React JS is used for building the user interface, while Flask is used for handling server-side processing and database integration.
React JS is used in combination with other front-end tools and libraries, while Flask is used with other Python tools and libraries.
In terms of using React JS and Flask together, we can build a web application where React JS handles the front-end user interface and Flask handles the back-end server-side processing. This can be done by creating a REST API in Flask and making API calls to it from React JS. Alternatively, we can serve Flask as a static file from the React JS application’s public directory and make API calls to it from there.
The automatic conversion of text in an image into letter codes that can be used in computer and text-processing applications is known as offline handwriting recognition. This process captures a static representation of handwriting. Due to variations in handwriting styles, offline handwriting recognition is a challenging task. Currently, OCR engines primarily focus on machine-printed text, and Intelligent Character Recognition (ICR) on hand-printed text (written in capital letters). Charles et al. have presented a comprehensive review of various techniques used for the design of OCR systems. The paper discusses slow techniques that provide accurate results and fast techniques that provide inefficient results. Ray Kurzweil invented the first OCR software in 1974, allowing recognition of any font. This software utilized an advanced matrix method (pattern matching) to compare bitmaps of the template character with the bitmaps of the read character and determine the closest match [18–20].
Figure 1.1 Methodology.
However, the existing system has several disadvantages. Firstly, it is sensitive to variations in sizing, which affect the accuracy of the recognition process. Secondly, individual differences in handwriting styles pose a challenge to the system. Finally, the system provides inefficient results with less probability, which limits its effectiveness in practical applications as described in Figure 1.1[21].
Text extraction using OCR Tesseract can be a complex task, and achieving 98% accuracy requires careful consideration of several factors, including image preprocessing, feature extraction, model selection, and post-processing. Here are some steps we can follow to design an optimized algorithm for text extraction using OCR Tesseract:
Image preprocessing: The first step is to preprocess the input image to enhance the quality of the text. This can include operations such as noise reduction, contrast adjustment, and binarization. The goal is to create an image that maximizes the contrast between the text and the background while minimizing noise and artifacts.
Region of interest (ROI) detection: The next step is to detect the regions of the image that contain text. This can be done using techniques such as connected component analysis, contour detection, or object recognition. The goal is to isolate the regions of the image that contain text and remove any non-text regions.
Text segmentation: Once the regions of interest have been detected, the next step is to segment the text into individual characters or words. This can be done using techniques such as morphological operations, edge detection, or machine learning-based segmentation.
Feature Extraction: The next step is to extract features from the segmented text regions. This can include features such as character shape, size, and orientation. The goal is to create a set of features that can be used to distinguish between different characters and improve the accuracy of the OCR algorithm.
Model Selection: The next step is to select a suitable OCR model. Tesseract is a popular open-source OCR engine that can be trained on a variety of datasets. The model selection depends on the specific requirements of the project, such as the language of the text, the font style, and the image quality.
Post-Processing: The final step is to post-process the OCR output to correct any errors and improve the accuracy.
Visual representation of the interactions between public user and the system itself, showcasing various use cases or functionalities that the system provides. Figure 1.2 helps to illustrate the high-level functional requirements of a system and how different actors interact with it.
Figure 1.2 UML diagram.
Figure 1.3 described the model architecture of the image. Table 1.1 described the precision having type and accuracy attributes present.
Figure 1.3 Architecture.
Table 1.1 Precision.
Type
Accuracy
Machine Printed
90%
Handwritten (consistent)
76%
Preprocess the input image to enhance the quality of the text.
Convert the image to grayscale.
Apply Gaussian blur to remove noise and apply adaptive thresholding to enhance contrast.
Detect the regions of interest (ROI) that contain text.
Apply morphological operations to remove noise.
Find contours and filter out non-text regions.
Segment the text into individual characters or words.
Apply dilation and erosion to close gaps between characters.
Find individual characters or words using bounding boxes.
Extract features from the segmented text regions.
Compute features such as character shape, size, and orientation.
Select a suitable OCR model such as Tesseract based on the specific requirements of the project.
Initialize the OCR engine with the selected model.
Recognize the text from the segmented regions using the OCR engine.
Apply the OCR algorithm to each segmented region.
Post-process the OCR output to correct any errors and improve accuracy.
Apply spell-checking to correct spelling errors.
Apply language model-based correction to correct grammatical errors.
Apply voting-based consensus to improve accuracy.
Generate the final output in a machine-readable format such as plain text, HTML, or XML.
Provide a user interface to enable users to interact with the system.
Maintain the system to ensure optimal performance and update the OCR model and software as needed.
React JS
Flask
Text editor/Integrated Development Environment (IDE)
FTP client
Web browser
Graphics editor (Nvidia)
Processor
Intel Pentium IV 2.0 GHz and above
RAM
512 MB and above
Hard disk
80GB and above
Monitor
CRT or LCD monitor
Keyboard
Normal or multimedia
Mouse
Compatible mouse
Table 1.2 shows the accuracy observed for different types of images.
Table 1.2 Accuracy observed for different types of images.
Type of input image
Total words
Successful
Error
Accuracy (%)
Black text with white background
118
109
9
92%
Colored text
36
26
10
72%
The future scope of the OCR Tesseract text extraction model is very promising, with a wide range of potential applications and opportunities for further development. Some of the potential areas for future growth and expansion include: