Deep Learning Techniques for Automation and Industrial Applications -  - E-Book

Deep Learning Techniques for Automation and Industrial Applications E-Book

0,0
168,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

This book provides state-of-the-art approaches to deep learning in areas of detection and prediction, as well as future framework development, building service systems and analytical aspects in which artificial neural networks, fuzzy logic, genetic algorithms, and hybrid mechanisms are used.

Deep learning algorithms and techniques are found to be useful in various areas, such as automatic machine translation, automatic handwriting generation, visual recognition, fraud detection, and detecting developmental delays in children. “Deep Learning Techniques for Automation and Industrial Applications” presents a concise introduction to the recent advances in this field of artificial intelligence (AI). The broad-ranging discussion covers the algorithms and applications in AI, reasoning, machine learning, neural networks, reinforcement learning, and their applications in various domains like agriculture, manufacturing, and healthcare. Applying deep learning techniques or algorithms successfully in these areas requires a concerted effort, fostering integrative research between experts from diverse disciplines from data science to visualization.

This book provides state-of-the-art approaches to deep learning covering detection and prediction, as well as future framework development, building service systems, and analytical aspects. For all these topics, various approaches to deep learning, such as artificial neural networks, fuzzy logic, genetic algorithms, and hybrid mechanisms, are explained.

Audience

The book will be useful to researchers and industry engineers working in information technology, data analytics network security, and manufacturing. Graduate and upper-level undergraduate students in advanced modeling and simulation courses will find this book very useful.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 379

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Table of Contents

Series Page

Title Page

Copyright Page

Preface

1 Text Extraction from Images Using Tesseract

1.1 Introduction

1.2 Literature Review

1.3 Development Areas

1.4 Existing System

1.5 Enhancing Text Extraction Using OCR Tesseract

1.6 Unified Modeling Language (UML) Diagram

1.7 System Requirements

1.8 Testing

1.9 Result

1.10 Future Scope

1.11 Conclusion

References

2 Chili Leaf Classification Using Deep Learning Techniques

2.1 Introduction

2.2 Objectives

2.3 Literature Survey

2.4 About the Dataset

2.5 Methodology

2.6 Result

2.7 Conclusion and Future Work

References

3 Fruit Leaf Classification Using Transfer Learning Techniques

3.1 Introduction

3.2 Literature Review

3.3 Methodology

3.4 Conclusion and Future Work

References

4 Classification of University of California (UC), Merced Land-Use Dataset Remote Sensing Images Using Pre-Trained Deep Learning Models

4.1 Introduction

4.2 Motivation and Contribution

4.3 Methodology

4.4 Experiments and Results

4.5 Conclusion

References

5 Sarcastic and Phony Contents Detection in Social Media Hindi Tweets

5.1 Introduction

5.2 Literature Review

5.3 Research Gap

5.4 Objective

5.5 Proposed Methodology

5.6 Expected Outcomes

References

6 Removal of Haze from Synthetic and Real Scenes Using Deep Learning and Other AI Techniques

6.1 Introduction

6.2 Formation of a Haze Model

6.3 Different Techniques of Single-Image Dehazing

6.4 Results and Discussions

6.5 Output for Synthetic Scenes

6.6 Output for Real Scenes

6.7 Conclusions

References

7 HOG and Haar Feature Extraction-Based Security System for Face Detection and Counting

7.1 Introduction

7.2 Literature Survey

7.3 Proposed Work

7.4 Experiments and Results

7.5 Conclusion and Scope of Future Work

References

8 A Comparative Analysis of Different CNN Models for Spatial Domain Steganalysis

8.1 Introduction

8.2 General Framework

8.3 Experimental Results and Analysis

8.4 Conclusion and Discussion

Acknowledgments

References

9 Making Invisible Bluewater Visible Using Machine and Deep Learning Techniques–A Review

9.1 Introduction

9.2 Determination of Groundwater Potential (GWP) Parameters

9.3 GWP Determination: Methods and Techniques

9.4 GWP Output: Applications

9.5 GWP Research Gaps: Future Research Areas

9.6 Conclusion

References

10 Fruit Leaf Classification Using Transfer Learning for Automation and Industrial Applications

10.1 Introduction

10.2 Data Collection and Preprocessing

10.3 Loading a Pre-Trained Model for Fruit Leaf Classification Using Transfer Learning

10.4 Training and Evaluation

10.5 Applications in Automation and Industry

10.6 Conclusion

10.7 Future Work

References

11 Green AI: Carbon-Footprint Decoupling System

11.1 Introduction

11.2 CO

2

Emissions in Sectors

11.3 Heating and Cooking Emissions

11.4 Automobile Systems Emission

11.5 Power Systems Emission

11.6 Total CO

2

Emission

11.7 Green AI With a Control Strategy of Carbon Emission

11.8 Green Software

11.9 Conclusion

11.10 Future Scope and Limitation

References

12 Review of State-of-Art Techniques for Political Polarization from Social Media Network

12.1 Introduction

12.2 Political Polarization

12.3 State-of-the-Art Techniques

12.4 Literature Survey

12.5 Conclusion

References

13 Collaborative Design and Case Analysis of Mobile Shopping Apps: A Deep Learning Approach

13.1 Introduction

13.2 Personalized Interaction Design Framework for Mobile Shopping

13.3 Case Analysis

13.4 Conclusions

References

14 Exploring the Potential of Machine Learning and Deep Learning for COVID-19 Detection

14.1 Introduction

14.2 Supervised Learning Techniques

14.3 Unsupervised Learning Techniques

14.4 Deep Learning Techniques

14.5 Reinforcement Learning Techniques

14.6 Comparison of Machine Learning and Deep Learning Techniques

14.7 Challenges and Limitations

14.8 Conclusion and Future Directions

References

Index

End User License Agreement

List of Tables

Chapter 1

Table 1.1 Precision.

Table 1.2 Accuracy observed for different types of images.

Chapter 2

Table 2.1 The training accuracy, training loss, validation accuracy, and valid...

Table 2.2 The training accuracy, training loss, validation accuracy, and valid...

Chapter 6

Table 6.1 Ciede2000 for synthetic and real scenes (Figure 6.4 and Figure 6.5).

Table 6.2 SSIM and PSNR values for synthetic scenes (Figure 6.4).

Table 6.3 Average Ciede2000 values for real scene dataset (45 images).

Table 6.4 Average SSIM and PSNR values for synthetic scenes for outdoor haze d...

Chapter 7

Table 7.1 Confusion matrix of individual counting via face detection.

Chapter 8

Table 8.1 Experimental setup.

Table 8.2 Setting of parameters for the pretrained models.

Table 8.3 Validation and testing accuracies for all the CNN models used.

Chapter 9

Table 9.1 Research questions associated with GWP parameters.

Table 9.2 Research questions associated with GWP methods using RS-GIS and IoT.

Table 9.3 Research questions associated with GWP methods using AIML techniques...

Table 9.4 Research questions associated with GWP results and applications.

Table 9.5 Research questions associated with future direction in GWP research.

Chapter 10

Table 10.1 Case studies of fruit leaf classification in industry using transfe...

Chapter 11

Table 11.1 Country wise code and total consumption of energy in Matric Tones.

Table 11.2 Washington, DC: World Resources Institute. Available: https://www.c...

Chapter 12

Table 12.1 The pros and cons of hybrid and single techiques.

List of Illustrations

Chapter 1

Figure 1.1 Methodology.

Figure 1.2 UML diagram.

Figure 1.3 Architecture.

Chapter 2

Figure 2.1 Example of a healthy leaf.

Figure 2.2 Example of an unhealthy leaf.

Figure 2.3 Methodologies used in this chapter.

Figure 2.4 Training dataset.

Figure 2.5 Image after data augmentation is applied.

Figure 2.6 Graph displaying the results of the CNN model.

Chapter 3

Figure 3.1 A flow chart diagram of the methodology.

Figure 3.2 Sample images from the dataset categorized into diseased and health...

Figure 3.3 Images after data augmentation (horizontal and vertical flip).

Figure 3.4 Accuracy and loss graph: CNN.

Figure 3.5 Accuracy and loss graph: RestNet50.

Figure 3.6 Accuracy and loss graph: InceptionV3.

Figure 3.7 Accuracy and loss graph: VGG19.

Chapter 4

Figure 4.1 Methodology of classification.

Figure 4.2 UC Merced dataset images.

Figure 4.3 Sample way of expanding various deep architectures.

Figure 4.4 Accuracy and loss plots’ comparison of the VGG family.

Figure 4.5 Accuracy and loss plots of ResNet50.

Figure 4.6 Accuracy and loss plots of Resnet101.

Figure 4.7 Accuracy and loss plot of ResNet152.

Figure 4.8 Accuracy and loss plots of MobileNet.

Figure 4.9 Accuracy and loss plots of Inception.

Figure 4.10 Accuracy and loss plots of Xception.

Figure 4.11 Accuracy and loss plots of the DenseNet family.

Figure 4.12 Accuracy and loss plots of NasNet.

Figure 4.13 (a–h) Accuracy and loss plots of the EfficientNet family.

Figure 4.13 (i–p) Accuracy and loss plots of the EfficientNet family.

Figure 4.14 (a–h) Accuracy and loss plots of the EfficientNetV2 family.

Figure 4.14 (i–n) Accuracy and loss plots of the EfficientNetV2 family.

Figure 4.15 Comprehensive analysis of various deep learning models.

Chapter 5

Figure 5.1 Sample sarcastic Hindi tweet.

Figure 5.2 Block diagram of sarcasm detection on Twitter.

Figure 5.3 Feature classification for sarcasm detection.

Figure 5.4 Word2Vec word embedding.

Chapter 6

Figure 6.1 Scattering model.

Figure 6.2 Color attenuation prior flowchart.

Figure 6.3 Encoder layer and decoder layer of the CNN.

Figure 6.4 Synthetic scenes using different techniques.

Figure 6.5 Real hazy scenes using different techniques.

Chapter 7

Figure 7.1 Approaches for face detection.

Figure 7.2 Process of HOG feature extraction.

Figure 7.3 Individual counting via face detection.

Figure 7.4 Individual counting via face detection.

Chapter 8

Figure 8.1 General architecture of pretrained models for steganalysis used in ...

Figure 8.2 The square convolutional kernel.

Figure 8.3 XuNet architecture.

Figure 8.4 VGGNet architecture [40].

Figure 8.5 ResNet-50 architecture [41].

Figure 8.6 EfficientNet-B0 architecture [36].

Figure 8.7 Training and validation accuracies’ plots of all the CNN models use...

Figure 8.8 Plot of accuracies of XuNet without the K

5

preprocessing filter.

Chapter 9

Figure 9.1 Groundwater influencing parameters.

Figure 9.2 GWP influencing parameters and potential data sources.

Figure 9.3 Thematic layers and sample delineated groundwater storage and recha...

Chapter 10

Figure 10.1 Common pre-trained models used in transfer learning.

Figure 10.2 Architecture of VGG-16 Model [10].

Figure 10.3 ResNet-50 architecture [12].

Figure 10.4 An inception module [14].

Figure 10.5 Steps to load a pre-trained model.

Figure 10.6 The training process.

Figure 10.7 The evaluation process.

Figure 10.8 Metrics for measuring model performance.

Chapter 11

Figure 11.1 Artificial intelligence conferences.

Figure 11.2 Map of CO

2

emission.

Figure 11.3 Transform column types.

Figure 11.4 Relation between CO

2

emissions in India and renewable energy data.

Figure 11.5 Count of C_id by country related to the table.

Chapter 12

Figure 12.1 DNNs with mandatory parameter sharing for MTL.

Figure 12.2 Soft parameter sharing for MTL in DNN.

Chapter 13

Figure 13.1 Modular interactive information architecture of a shopping app.

Figure 13.2 Principle interaction path.

Figure 13.3 Linear interactive path.

Figure 13.4 Web page and app of a sample site.

Figure 13.5 Display interface of the eBay app.

Figure 13.6 User interface of a shopping app.

Figure 13.7 Second-level page design of different topics of an app.

Chapter 14

Figure 14.1 Comparison between machine learning and deep learning.

Figure 14.2 Supervised learning.

Figure 14.3 Logistic regression.

Figure 14.4 Decision trees.

Figure 14.5 Support vector machines.

Figure 14.6 Naive Bayes.

Figure 14.7 K-Nearest neighbors.

Figure 14.8 Unsupervised learning.

Figure 14.9 Clustering.

Figure 14.10 Reinforcement learning techniques.

Guide

Cover Page

Table of Contents

Series Page

Title Page

Copyright Page

Preface

Begin Reading

Index

WILEY END USER LICENSE AGREEMENT

Pages

ii

iii

iv

xiii

xiv

xv

xvi

xvii

xviii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

257

258

259

260

Scrivener Publishing100 Cummings Center, Suite 541JBeverly, MA 01915-6106

Publishers at ScrivenerMartin Scrivener ([email protected])Phillip Carmical ([email protected])

Deep Learning Techniques for Automation and Industrial Applications

Edited by

Pramod Singh Rathore

Sachin Ahuja

Srinivasa Rao Burri

Ajay Khunteta

Anupam Baliyan

and

Abhishek Kumar

This edition first published 2024 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA© 2024 Scrivener Publishing LLCFor more information about Scrivener publications please visit www.scrivenerpublishing.com.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

Wiley Global Headquarters111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchant-ability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read.

Library of Congress Cataloging-in-Publication Data

ISBN 978-1-394-23424-0

Cover image: Pixabay.ComCover design by Russell Richardson

Preface

Artificial intelligence learning is the fastest-growing field in computer science. Deep learning algorithms and techniques are found to be useful in different areas, such as automatic machine translation, automatic handwriting generation, visual recognition, fraud detection, and detecting developmental delays in children. Deep Learning Techniques for Automation and Industrial Applications presents a concise introduction to the recent advances in the field of artificial intelligence (AI). The broad-ranging discussion herein covers the algorithms and applications in the areas of AI, reasoning, machine learning, neural networks, reinforcement learning, and their applications in various domains like agriculture and healthcare. Applying deep learning techniques or algorithms successfully in these areas requires a concerted effort, fostering integrative research between experts ranging from diverse disciplines, from data science to visualization.

This book provides state-of-the-art approaches to deep learning in these areas. It covers detection and prediction, as well as future framework development, building service systems, and analytical aspects. For all these topics, various approaches to deep learning, such as artificial neural networks, fuzzy logic, genetic algorithms, and hybrid mechanisms, are explained.

The successful application of deep learning techniques to enable meaningful, cost-effective, personalized cloud security service is a primary and current goal. However, realizing this goal requires effective understanding, application, and amalgamation of deep learning and several other computing technologies to deploy such a system effectively. This book helps to clarify certain key mechanisms of technology to realize a successful system. It enables the processing of very large datasets to help with precise and comprehensive forecasts of risk and delivers recommended actions that improve outcomes for consumers. This is a novel application domain of deep learning and is of prime importance to all of human civilization.

Preparing both undergraduate and graduate students for advanced modeling and simulation courses, this book helps them to carry out effective simulation studies. In addition, graduate students will be able to comprehend and conduct AI and data mining research after completing this book.

The book comprises fourteen chapters. In Chapter 1, images play a crucial role in describing, representing, and conveying information, which aids in human productivity, cost, analysis, and other areas. Text extraction is the method used to convert text to plain text. Text extraction is a very challenging problem because of the many changes in these texts’ size, orientation, and alignment, as well as low-resolution/pixelated images, noisy backgrounds, etc. Using the Tesseract OCR engine, we aim to reduce these issues in this project. Tesseract is developed by Google and its an opensource optical character recognition (OCR) engine. OCR technology allows computers to recognize text in images, making it possible to convert images of text into machine-readable text. Tesseract has been trained in a wide variety of languages and scripts, including English, Chinese, Arabic, and more.

Chapter 2 addresses agriculture, which is the most important part of everyone’s life. Crops and plants play a huge role in the making of life and taking care of those crops and plants is both important and tough. Thus, to detect the disease in plants, this chapter demonstrates how to tell whether a plant is healthy or not. The dataset consists of chilly plant leaves collected from one of the fields located in Andhra Pradesh, India. Many image classification models, as well as transfer learning models, are applied. Deep learning models CNN and transfer learning models, like InceptionV3 and VGG16, are applied with and without data augmentation.

In Chapter 3, researchers have applied various deep learning and transfer learning methods to accurately predict the disease of a damaged plant, so that we can cure the plant in its initial stage. The models are trained on the image dataset containing various categories of plants like mango and pomegranate. The results state that ResNet outperformed Inception, VGG19, and CNN by giving an accuracy of 88% and 87.5% percent for pomegranate and mango respectively.

In Chapter 4, researchers have compared different deep learning-based classification techniques on a remote sensing image dataset. The dataset has been taken from the UC Merced Land Use Dataset, which contains a total of 21 classes, with every class consisting of 100 images of size 256 x 256. The models used in this study are VGG, ResNet, Inception, Dense Net, and Efficient Net, which are deep convolutional network architectures for image classification with different numbers of layers. To make meaningful comparisons, all models were extended by adding three layers at the end to improve their performance. The performance of the VGG19 model was found to superior. This model was able to classify almost all images belonging to 21 classes with an accuracy of 100% in training and 95.07% in testing data, followed by VGG16 with 93% and ResNet with 91% accuracy in testing data.

Chapter 5 deals with sarcasm. Sarcasm is a sardonic or bitter remark, intended to express disrespect or ridicule. It is used in Hindi language, originating from many of Hindi idioms and proverbs, and often uses indirect sarcasm, as in saying, “” with the meaning, “.” Sentiment categorization is easier in comparison to sarcasm detection, as we see in the above-written idiom, which contains a negative sentiment. The intention of the sentence is to call an uneducated person knowledgeable among a group of fools. In the present scenario, people on social media platforms like Twitter, Facebook, and WhatsApp succeed in recognizing sarcasm despite interacting with strangers across the world. Sarcasm detection is a challenging task in Natural Language Processing due to the richness of morphology. Detecting sarcasm in Hindi language tweets is a prime task for Natural Language processing to avoid misconstruing sarcastic statements as original literal statements.

Chapter 6 explains how the image is the main source for all image processing fields, like surveillance, detection, recognition, satellite, etc. Good visibility of images captured by sensors becomes crucial for all computer vision tasks. Sometimes the scene quality is degraded by bad weather conditions like haze, fog, or smoke, therefore making it difficult for the computer vision area to obtain actual information. Haze can be removed from a single-input scene by using dehazing methods. Synthetic haze can be created by a haze generator, and, currently, most image dehazing techniques are applied for synthetic haze. Various single-image dehazing techniques are being developed and tested on real-world scenes that are captured in hazy environments using cameras.

Chapter 7 demonstrates how the framework needs accurate and realtime performance to count how many people are present at a particular moment in a particular frame. So, our counting framework automatically detects each person’s face and makes instantaneous decisions to count the number of persons in front of the camera or within a set of images. The work of individual counting can be done in two broad ways: the first is the detection of faces; the second is the counting approach used to track and count people within a frame.

Chapter 8 describes how CNN networks can show a resemblance to traditional steganalysis by using filters for feature extraction. Due to the use of content-adaptive steganographic methods, the stego message is hidden more often in the complex areas of the image and thus cannot be detected with a simple statistical analysis of the image. The stego information in these steganographic methods affect the dependencies between the pixels introduced through various kinds of noise present in the image. Thus, the difference between the cover and stego image is identified through the noise part rather than the image content. Different researchers have used various preprocessing filters for calculating the noise residuals and passing them to the CNN network instead of images directly. This work employs a content adaptive steganography method, Highly Undetectable Steganography (HUGO), for creating stego images. Furthermore, this work provides a comparative analysis of one of the variants of CNN models specific for steganalysis and various pre-trained models of computer vision that apply to steganalysis.

Chapter 9 discusses how groundwater abstraction beyond the safe limit is causing a rapid groundwater table depletion at the rate of 1–2 m/year in many districts. Uncontained and unplanned usage may affect food production by 20 percent. Due to the significant impact of this imperceptible resource on various aspects of life, the economy, the environment, and society, there is a pressing need to enhance the scientific comprehension, estimation, and administration of groundwater management. A scientific framework for the demarcation of its potential storage and recharge zonal maps, i.e., GWPSZ and GWRZ, can be instrumental in this regard for urban and rural water committees to objectively manage the resources at the regional level.

Chapter 10 explains the process of using pre-trained models to classify different types of fruit leaves accurately. We also discuss the advantages of transfer learning for industrial applications, including improved accuracy, reduced training time, and better utilization of resources. We provide code examples and practical guidance for implementing transfer learning using popular deep learning frameworks like TensorFlow. By the end of this chapter, readers will have a good understanding of how to use transfer learning for fruit leaf classification and how it can be applied in industrial settings.

Chapter 11 reveals that the carbon footprint of these calculations proves to be rather high. Deep learning research can be challenging for academics, students, and researchers, especially those from emerging economies, due to the financial expense of calculations. In addition to accuracy and related metrics, productivity is included as an evaluation criterion in this chapter’s proposed practical solution. Our objective is to create green AI that significantly outperforms red AI in terms of receiver performance and to increase green AI by reducing its environmental impact.

Chapter 12 explains how networking services have modified the methods and scale of cyberspace communication. In the past decade, the social network has gained significant attention. The Internet and Web 2.0 applications are becoming more affordable for accessing social network sites like Twitter, Facebook, LinkedIn, and Google+. People are becoming more interested in information news and opinion on a wide range of issues and therefore more reliant on social networks.

Chapter 13 engages in an in-depth analysis of the associated ideas and challenges that arise in the process of shopping app interface design. It further demonstrates the functional simplicity, ease of use, and effectiveness that is centered on the experience of the user and is built on an interactive model via the use of classical instances and practical design projects. After that, we use these theories, together with associated ideas in design psychology, behavior, design aesthetics, and ergonomics, to thoroughly study mobile shopping app interaction design. It presents mobile shopping apps via the process of “global-local-global” for repeated personalized interactive design, and it gives useful recommendations and ideas for the construction of mobile and vertical shopping apps. Additionally, the chapter includes analysis and data collecting of practical instances.

Chapter 14 presents an extensive review of the application of machine learning and deep learning methods in detecting COVID-19. It emphasizes the significance of early and accurate diagnosis in effectively managing the disease during the COVID-19 pandemic. The chapter encompasses various aspects of machine learning and deep learning techniques, including supervised and unsupervised learning, convolutional neural networks, recurrent neural networks, reinforcement learning, and a comparative analysis of machine learning and deep learning methods for COVID-19 detection.

We are deeply grateful to everyone who helped with this book and greatly appreciate the dedicated support and valuable assistance rendered by Martin Scrivener and the Scrivener Publishing team during its publication.

Pramod Singh Rathore

Department of Computer and Communication Engineering Manipal University Jaipur Rajasthan, India

Dr. Sachin Ahuja

Department of Computer Science Chandigarh University, India

Srinivasa Rao Burri

Senior Software Engineering Manager Western Union Denver, CO

Dr. Ajay Khunteta

Department of Computer Science and Engineering, Poornima University, Jaipur, India

Dr. Anupam Baliyan

Dean of Academic Planning and Research Galgotias University, India

Dr. Abhishek Kumar

Faculty of Engineering, Manipal University, Jaipur, India

1Text Extraction from Images Using Tesseract

Santosh Kumar*, Nilesh Kumar Sharma, Mridul Sharma and Nikita Agrawal

Department of Computer Science, Global Institute of Technology, Jaipur, India

Abstract

Images play a crucial role in describing, representing, and conveying information, which are essential in many businesses and organizations. Text extraction from images is the method used to convert text to plain text. Text extraction refers to the systematic procedure used for converting textual material into a simplified plain text format. The task at hand presents a significant difficulty as a result of the many changes in variables such as the size, orientation, and alignment of the text. Furthermore, the inclusion of low-resolution, pixelated pictures, coupled with the existence of noisy backgrounds, exacerbates the complexity associated with the process of text extraction. Using the Tesseract OCR engine, we aim to reduce these issues in this project.

Tesseract is developed by Google and is an open-source optical character recognition (OCR) engine. OCR technology allows computers to recognize text in images, making it possible to convert images of text into machine-readable text. Tesseract has been trained in a wide variety of languages and scripts, including English, Chinese, and Arabic.

Tesseract can process images that are rotated, tilted, or skewed and can recognize text written in different scripts, such as English and Arabic. It uses machine learning algorithms to improve its recognition accuracy over time, making it suitable for use in a wide range of applications, including document scanning, archiving, and indexing.

Keywords: Text extraction, LSTM, image text, OCR, Tesseract

1.1 Introduction

Text extraction, also known as optical character recognition (OCR), is the process of automatically extracting text from an image or scanned document and converting it into machine-readable text that can be used for indexing, searching, editing, or storing the document.

OCR software uses advanced algorithms and machine learning techniques to identify and recognize characters and words within the document. The process of text extraction typically involves the following steps:

Pre-processing: In this step, the image or scanned document is prepared for OCR by optimizing its quality. Techniques such as noise reduction, image enhancement, and contrast adjustment are used to improve image readability and reduce errors during OCR.

Layout analysis: The OCR software analyzes the layout of the document to identify the text areas, headings, and other features.

Optical character recognition: In this step, the OCR software reads and recognizes the characters in the document and translates them into machine-readable text.

Post-processing: In this step, the recognized text is refined to correct errors and improve its accuracy. Techniques such as spell-checking, grammar correction, and formatting are used to improve the quality of the output [

1

,

2

].

Utilizing OCR software, such as Google’s open-source OCR engine Tesseract, the aforementioned steps are possible. Tesseract uses advanced machine learning techniques to improve its recognition accuracy over time and can recognize text in multiple languages and scripts.

The following are the steps involved in text extraction using OCR software:

Scanning or importing the image or document to be processed

Preprocessing the image or document by adjusting its quality, orientation, and size

Performing layout analysis to identify the text areas, headings, and other features(d) Running OCR on the document to extract the text

Post-processing the recognized text to correct errors and improve its accuracy and formatting

Saving the extracted text in a machine-readable format such as plain text or a searchable PDF.

Furthermore, by training the OCR engine on a specific dataset of languages, it is possible to improve the accuracy of OCR results [3, 4].

1.1.1 Areas

Text extraction

Python

OCR

React JS

Flask

1.1.2 Why Text Extraction?

Text extraction, also known as optical character recognition (OCR), is a valuable technology because it enables the automated extraction of text from images and scanned documents. Here are some reasons why text extraction is important:

Improved efficiency: Text extraction automates the process of converting image-based text into machine-readable text. This saves time and effort compared to manual data entry, especially when dealing with large volumes of data.

Easy to search and index: Text extraction allows users to convert image-based text into searchable and indexable text. This is especially important when dealing with large collections of documents or when searching for specific information within a document.

Improved accessibility: Text extraction makes it possible to convert image-based text into machine-readable text that assistive technologies like screen readers can read. This improves accessibility for people with visual impairments.

Cost-effective: Text extraction can save organizations money by reducing manual data entry and improving the accuracy of document management processes (DMPs).

Multilingual support: OCR technology has advanced significantly in recent years and now supports the recognition of text in multiple languages and scripts. This means that text extraction can be used in various multilingual applications, such as multilingual document processing

[5]

or translating

[6]

.

1.1.3 Applications of OCR

Digitization of Printed Documents: OCR can be used to scan and digitize printed documents such as books, magazines, and newspapers. The resulting digital text can then be stored, searched, and edited on a computer.

Automatic data entry: OCR can be used to automatically extract data from forms, invoices, and receipts, saving time and reducing errors compared to manual data entry.

Handwriting recognition: OCR can be used to recognize handwritten text, enabling applications such as digital note-taking or handwriting-based input for mobile devices.

Text-to-speech: OCR can be used to convert printed text into speech, enabling visually impaired individuals to access written information.

Language translation: OCR can be used to recognize text in one language and automatically translate it into another.

Automatic license plate recognition: OCR can be used to recognize license plates on vehicles, enabling applications such as automatic toll collection or traffic monitoring.

Document classification: OCR can be used to recognize the type of document being scanned, enabling automatic sorting and routing of documents.

Image indexing: OCR can be used to index and search image collections, enabling users to search for images based on the text they contain.

Historical document analysis: OCR can be used to digitize and analyze historical documents, enabling researchers to study the content and context of these documents in new ways.

Passport and ID verification: OCR can be used to read and verify the text on passports and other identification documents, improving security and reducing the risk of fraud.

1.2 Literature Review

Text extraction using optical character recognition (OCR) has been a hot research topic for several years. The advancements in OCR technology have made it possible to extract text accurately and quickly from images, which has numerous applications in various industries. In this literature review, we will explore the current state of research in text extraction using Tesseract OCR [7].

Tesseract OCR is an open-source OCR engine developed by Google that has gained popularity due to its accuracy and robustness. Several studies have been conducted to explore the effectiveness of Tesseract OCR in text extraction. One such study by Bhargava et al. (2020) compared the performance of Tesseract OCR with other popular OCR engines, such as Adobe Acrobat, ABBYY, FineReader, and OmniPage [8]. The study found that Tesseract OCR outperformed other OCR engines in terms of accuracy, speed, and ease of use.

In another study by Kumar et al. (2017), the authors proposed a novel method for text extraction using Tesseract OCR [9, 10]. The proposed method involved preprocessing the input image by applying various filters to enhance the quality of the image. The preprocessed image was then fed to the Tesseract OCR engine for text extraction. The results of the study showed that the proposed method improved the accuracy of text extraction significantly.

Moreover, several studies have focused on improving the accuracy of Tesseract OCR through machine learning techniques. For instance, in a study by M. Alzahrani et al. (2020), the authors proposed a machine learning-based approach to improve the accuracy of Tesseract OCR in recognizing handwritten text. The approach involved training a support vector machine (SVM) classifier using a dataset of handwritten characters. The trained SVM was then used to classify the characters extracted by the Tesseract OCR engine. The results showed that the proposed approach significantly improved the accuracy of Tesseract OCR in recognizing handwritten text.

Furthermore, a recent study by Bhargava et al. (2022) proposed a deep learning-based approach for text extraction using Tesseract OCR [11, 12]. The proposed approach involved training a convolutional neural network (CNN) using a dataset of images and corresponding ground truth text. The trained CNN was then used to preprocess the input image before feeding it to the Tesseract OCR engine for text extraction. The results of the study showed that the proposed approach outperformed traditional OCR engines in terms of accuracy and speed [13].

In addition to improving the accuracy of Tesseract OCR, several studies have explored its applications in various industries. For instance, in a study by S. J. Alzahrani et al. (2018), the authors proposed a method for extracting text from bank statements using Tesseract OCR. The proposed method involved preprocessing the input image by removing noise and enhancing the quality of the image. The preprocessed image was then fed to the Tesseract OCR engine for text extraction. The results showed that the proposed method was effective in extracting text from bank statements, which can be used for financial analysis and fraud detection.

Similarly, in a study by R. K. Samal et al. (2017), the authors proposed a method for extracting text from medical images using Tesseract OCR. The proposed method involved preprocessing the input image by removing artifacts and enhancing the quality of the image. The preprocessed image was then fed to the Tesseract OCR engine for text extraction. The results showed that the proposed method was effective in extracting text from medical images, which can be used for medical diagnosis and research.

In conclusion, Tesseract OCR has proven to be an effective tool for text extraction from images. Several studies have explored its effectiveness and applications in various industries. The advancements in machine learning and deep learning techniques have further improved the accuracy of Tesseract OCR in recognizing text from images. The future scope of research in text extraction using Tesseract appears promising, with potential enhancements in areas like multi-language support, complex document formatting, and real-time processing [14–17].

1.3 Development Areas

1.3.1 React JavaScript (JS)

React JS is a front-end JavaScript library used for building user interfaces.

React JS uses a component-based architecture, which makes it easy to build reusable components for the UI.

React JS can be used with other JavaScript libraries and frameworks to create powerful web applications.

1.3.2 Flask

Flask is a Python web framework used for building web applications and APIs.

It is a micro-framework, meaning that it provides the minimum set of tools needed to build a web application.

Flask is flexible and allows developers to customize and configure it to meet their needs.

It provides a built-in development server, which makes it easy to get started with building a web application.

Flask is highly extensible and can be used with various Python libraries and tools to build powerful web applications.

Here are some other differences between React JS and Flask to keep in mind:

React JS is a front-end library, while Flask is a back-end web framework.

React JS is written in JavaScript, while Flask is written in Python.

React JS is used for building the user interface, while Flask is used for handling server-side processing and database integration.

React JS is used in combination with other front-end tools and libraries, while Flask is used with other Python tools and libraries.

In terms of using React JS and Flask together, we can build a web application where React JS handles the front-end user interface and Flask handles the back-end server-side processing. This can be done by creating a REST API in Flask and making API calls to it from React JS. Alternatively, we can serve Flask as a static file from the React JS application’s public directory and make API calls to it from there.

1.4 Existing System

The automatic conversion of text in an image into letter codes that can be used in computer and text-processing applications is known as offline handwriting recognition. This process captures a static representation of handwriting. Due to variations in handwriting styles, offline handwriting recognition is a challenging task. Currently, OCR engines primarily focus on machine-printed text, and Intelligent Character Recognition (ICR) on hand-printed text (written in capital letters). Charles et al. have presented a comprehensive review of various techniques used for the design of OCR systems. The paper discusses slow techniques that provide accurate results and fast techniques that provide inefficient results. Ray Kurzweil invented the first OCR software in 1974, allowing recognition of any font. This software utilized an advanced matrix method (pattern matching) to compare bitmaps of the template character with the bitmaps of the read character and determine the closest match [18–20].

Figure 1.1 Methodology.

However, the existing system has several disadvantages. Firstly, it is sensitive to variations in sizing, which affect the accuracy of the recognition process. Secondly, individual differences in handwriting styles pose a challenge to the system. Finally, the system provides inefficient results with less probability, which limits its effectiveness in practical applications as described in Figure 1.1[21].

1.5 Enhancing Text Extraction Using OCR Tesseract

Text extraction using OCR Tesseract can be a complex task, and achieving 98% accuracy requires careful consideration of several factors, including image preprocessing, feature extraction, model selection, and post-processing. Here are some steps we can follow to design an optimized algorithm for text extraction using OCR Tesseract:

Image preprocessing: The first step is to preprocess the input image to enhance the quality of the text. This can include operations such as noise reduction, contrast adjustment, and binarization. The goal is to create an image that maximizes the contrast between the text and the background while minimizing noise and artifacts.

Region of interest (ROI) detection: The next step is to detect the regions of the image that contain text. This can be done using techniques such as connected component analysis, contour detection, or object recognition. The goal is to isolate the regions of the image that contain text and remove any non-text regions.

Text segmentation: Once the regions of interest have been detected, the next step is to segment the text into individual characters or words. This can be done using techniques such as morphological operations, edge detection, or machine learning-based segmentation.

Feature Extraction: The next step is to extract features from the segmented text regions. This can include features such as character shape, size, and orientation. The goal is to create a set of features that can be used to distinguish between different characters and improve the accuracy of the OCR algorithm.

Model Selection: The next step is to select a suitable OCR model. Tesseract is a popular open-source OCR engine that can be trained on a variety of datasets. The model selection depends on the specific requirements of the project, such as the language of the text, the font style, and the image quality.

Post-Processing: The final step is to post-process the OCR output to correct any errors and improve the accuracy.

1.6 Unified Modeling Language (UML) Diagram

1.6.1 Use Case Diagram

Visual representation of the interactions between public user and the system itself, showcasing various use cases or functionalities that the system provides. Figure 1.2 helps to illustrate the high-level functional requirements of a system and how different actors interact with it.

Figure 1.2 UML diagram.

1.6.2 Model Architecture

Figure 1.3 described the model architecture of the image. Table 1.1 described the precision having type and accuracy attributes present.

Figure 1.3 Architecture.

Table 1.1 Precision.

Type

Accuracy

Machine Printed

90%

Handwritten (consistent)

76%

1.6.3 Pseudocode

Preprocess the input image to enhance the quality of the text.

Convert the image to grayscale.

Apply Gaussian blur to remove noise and apply adaptive thresholding to enhance contrast.

Detect the regions of interest (ROI) that contain text.

Apply morphological operations to remove noise.

Find contours and filter out non-text regions.

Segment the text into individual characters or words.

Apply dilation and erosion to close gaps between characters.

Find individual characters or words using bounding boxes.

Extract features from the segmented text regions.

Compute features such as character shape, size, and orientation.

Select a suitable OCR model such as Tesseract based on the specific requirements of the project.

Initialize the OCR engine with the selected model.

Recognize the text from the segmented regions using the OCR engine.

Apply the OCR algorithm to each segmented region.

Post-process the OCR output to correct any errors and improve accuracy.

Apply spell-checking to correct spelling errors.

Apply language model-based correction to correct grammatical errors.

Apply voting-based consensus to improve accuracy.

Generate the final output in a machine-readable format such as plain text, HTML, or XML.

Provide a user interface to enable users to interact with the system.

Maintain the system to ensure optimal performance and update the OCR model and software as needed.

1.7 System Requirements

1.7.1 Software Requirements

React JS

Flask

Text editor/Integrated Development Environment (IDE)

FTP client

Web browser

Graphics editor (Nvidia)

1.7.2 Hardware Requirements

Processor

Intel Pentium IV 2.0 GHz and above

RAM

512 MB and above

Hard disk

80GB and above

Monitor

CRT or LCD monitor

Keyboard

Normal or multimedia

Mouse

Compatible mouse

1.8 Testing

1.9 Result

Table 1.2 shows the accuracy observed for different types of images.

Table 1.2 Accuracy observed for different types of images.

Type of input image

Total words

Successful

Error

Accuracy (%)

Black text with white background

118

109

9

92%

Colored text

36

26

10

72%

1.10 Future Scope

The future scope of the OCR Tesseract text extraction model is very promising, with a wide range of potential applications and opportunities for further development. Some of the potential areas for future growth and expansion include: