E-Book
53,95 €

Deep Learning: Theory, Architectures and Applications in Speech, Image and Language Processing E-Book

0,0

53,95 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.

Herausgeber: Bentham Science Publishers
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

This book is a detailed reference guide on deep learning and its applications. It aims to provide a basic understanding of deep learning and its different architectures that are applied to process images, speech, and natural language. It explains basic concepts and many modern use cases through fifteen chapters contributed by computer science academics and researchers. By the end of the book, the reader will become familiar with different deep learning approaches and models, and understand how to implement various deep learning algorithms using multiple frameworks and libraries.

This book is divided into three parts. The first part explains the basic operating understanding, history, evolution, and challenges associated with deep learning. The basic concepts of mathematics and the hardware requirements for deep learning implementation, and some of its popular frameworks for medical applications are also covered.

The second part is dedicated to sentiment analysis using deep learning and machine learning techniques. This book section covers the experimentation and application of deep learning techniques and architectures in real-world applications. It details the salient approaches, issues, and challenges in building ethically aligned machines. An approach inspired by traditional Eastern thought and wisdom is also presented.

The final part covers artificial intelligence approaches used to explain the machine learning models that enhance transparency for the benefit of users. A review and detailed description of the use of knowledge graphs in generating explanations for black-box recommender systems and a review of ethical system design and a model for sustainable education is included in this section. An additional chapter demonstrates how a semi-supervised machine learning technique can be used for cryptocurrency portfolio management.

The book is a timely reference for academicians, professionals, researchers and students at engineering and medical institutions working on artificial intelligence applications.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 359

Veröffentlichungsjahr: 2000

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

BESTSELLER

Das rote Zimmer - Krimi Hörbuch

Mark Dawson

BESTSELLER

Mörderfinder - Die Spur der Mädchen - Max Bischoff, Band 1 (Ungekürzte Lesung)

Arno Strobel

BESTSELLER

Asrai - Das Herz der Drachen

Liane Mars

BESTSELLER

Das Leben fing im Sommer an (Ungekürzte Lesung)

Christoph Kramer

BESTSELLER

Die Tochter des Serienkillers - Die Familie des Serienkillers, Teil 2 (Ungekürzt)

Alice Hunter

BESTSELLER

Das zerrissene Herz - Empire of Sins and Souls, Band 3 (Ungekürzte Lesung)

Beril Kehribar

BESTSELLER

Dämonenmagie und ein Martini

Annette Marie

BESTSELLER

Russian Roulette - Letzte Kugel

Don Both

BESTSELLER

Nightworld Academy 6 - Die Schule für Hexen, Vampire und Werwölfe

LJ Swallow

BESTSELLER

Die Rache der Eltern (Ungekürzt)

Daniel Hurst

BESTSELLER

Asrai - Die Magie der Drachen

Liane Mars

BESTSELLER

DARK DUTY: Tödliche Sehnsucht

J. S. Wonda

BESTSELLER

I Am Fury (Ungekürzte Lesung)

Emily Varga

BESTSELLER

Merciful Death - Erbarme dich ihrer - Die Mercy Kilpatrick Serie, Band 1 (Ungekürzte Lesung)

Кендра Эллиот

BESTSELLER

Stille Zeugen (Ein Fall für Engel und Sander, Band 1)

Angela Lautenschläger

BESTSELLER

Hunting Souls (Romantasy-Dilogie, Bd. 2) - Unsere verfluchten Herzen (Ungekürzte Lesung)

Tina Köpke

BESTSELLER

Silber - Das dritte Buch der Träume (Ungekürzte Lesung)

Im Augenblick - Seelenmagie, Band 3 (Ungekürzt)

Alana Falk

BESTSELLER

A Sea of Starlight: Take Me Back, Hold Me Close, Bring Me Home (3in1-Bundle)

Alina A. E. Maurer

Leseprobe

Table of Contents

BENTHAM SCIENCE PUBLISHERS LTD.

End User License Agreement (for non-institutional, personal use)

Usage Rules:

Disclaimer:

Limitation of Liability:

General:

FOREWORD

PREFACE

List of Contributors

Deep Learning: History and Evolution

Abstract

INTRODUCTION

OVERVIEW OF THE NEURAL NETWORK

The Neural Network's Basic Structure

Artificial Neuron Model with FFNN

DEEP LEARNING NEURAL NETWORK

A Deep Forward Neural Network

CNN Architecture and its Components

DIFFERENT CNN ARCHITECTURE

LeNet-5

GoogleNet/Inception

VGGNet

ResNet

UNSUPERVISE NEURAL NETWORK ARCHITECTURE

Deep Belief Network

Autoencoder

LSTM

CONCLUSION

CONSENT FOR PUBLICATION

CONFLICT OF INTEREST

ACKNOWLEDGEMENT

REFERENCES

Application of Artificial Intelligence in Medical Imaging

Abstract

INTRODUCTION

MACHINE-LEARNING

Supervised Learning

Unsupervised Learning

Semi-supervised Learning

Active Learning

Reinforcement Learning

Evolutionary Learning

Introduction to Deep Learning

APPLICATION OF ML IN MEDICAL IMAGING

DEEP LEARNING IN MEDICAL IMAGING

Image Classification

Object Classification

Organ or Region Detection

Data Mining

The Sign-up Process

Other Imaging Applications

CONCLUSION

CONSENT FOR PUBLICATION

CONFLICT OF INTEREST

ACKNOWLEDGEMENT

REFERENCES

Classification Tool to Predict the Presence of Colon Cancer Using Histopathology Images

Abstract

INTRODUCTION

METHODS AND PREPARATION

Dataset Preparation

Related Works

METHODOLOGY

Convolutional Neural Network (CNN)

ResNet50

RESULTS

CONCLUSION

CONSENT FOR PUBLICATION

CONFLICT OF INTEREST

ACKNOWLEDGEMENT

REFERENCES

Deep Learning For Lung Cancer Detection

Abstract

INTRODUCTION

RELATED WORKS

METHODOLOGY

VGG16 Architecture

ResNet50 Architecture

FLOWCHART OF THE METHODOLOGY

EXPERIMENTAL RESULTS

CONCLUDING REMARKS

ACKNOWLEDGEMENTS

REFERENCES

Exploration of Medical Image Super-Resolution in terms of Features and Adaptive Optimization

Abstract

INTRODUCTION

LITERATURE REVIEW

METHODOLOGIES

Pre-Upsampling Super Resolution

Very Deep Super-Resolution Models

Post Upsampling Super Resolution

Residual Networks

Multi-stage Residual Networks (MDSR)

Balanced Two-Stage Residual Networks

Recursive Networks

Deep Recursive Convolution Network (DRCN)

Progressive Reconstruction Networks

Attention-Based Network

Pixel Loss

Perceptual Loss

Adversarial Loss

SYSTEM TOOLS

FINDINGS

CONCLUSION

ACKNOWLEDGEMENTS

REFERENCES

Analyzing the Performances of Different ML Algorithms on the WBCD Dataset

Abstract

INTRODUCTION

LITERATURE REVIEW

DATASET DESCRIPTION

PRE-PROCESSING OF DATA

Exploratory Data Analysis(EDA)

Model Accuracy: Receiver Operating Characteristic (ROC) curve:

Results

Conclusion

ACKNOWLEDGEMENTS

REFERENCES

Application and Evaluation of Machine Learning Algorithms in Classifying Cardiotocography (CTG) Signals

Abstract

INTRODUCTION

LITERATURE REVIEW

ARCHITECTURE AND DATASET DETAILS

MODELS AND METHODS

Logistic Regression

Support Vector Machine

Naïve Bayes

Decision Tree

Random Forest

K-nearest Neighbor

SMOTE (Synthetic Minority Oversampling Technique)

Method

PERFORMANCE MEASURES

EXPERIMENTAL ANALYSIS AND RESULTS

CONCLUSION

CONSENT FOR PUBLICATION

CONFLICT OF INTEREST

ACKNOWLEDGEMENT

REFERENCES

Deep SLRT: The Development of Deep Learning based Multilingual and Multimodal Sign Language Recognition and Translation Framework

Abstract

INTRODUCTION

RELATED WORKS

Subunit Modelling and Extraction of Manual Features and Non-manual Features

Challenges and Deep Learning Methods for SLRT Research

THE PROPOSED MODEL

Algorithm: 2 NMT-GAN based Deep SLRT Video Generation (Backward)

Training Details

EXPERIMENTAL RESULTS

CONCLUSION

ACKNOWLEDGEMENTS

REFERENCES

Hybrid Convolutional Recurrent Neural Network for Isolated Indian Sign Language Recognition

Abstract

INTRODUCTION

RELATED WORK

METHODOLOGY

Proposed H-CRNN Framework

Data Acquisition, Preprocessing, and Augmentation

Proposed H-CRNN Architecture

Experiments and Results

CONCLUSION AND FUTURE WORK

ACKNOWLEDGEMENTS

REFERENCES

A Proposal of an Android Mobile Application for Senior Citizen Community with Multi-lingual Sentiment Analysis Chatbot

Abstract

INTRODUCTION

LITERATURE REVIEW

Twitter data

PROPOSED FRAMEWORK

IMPLEMENTATION OVERVIEW

Exploratory Data Analysis (EDA)

Feature Extraction

Classification

Support Vector Machine

Decision Tree

Random Forest

Implementation

Pickling the Model

Translation

Integrating with the Android App

Code Snippets

Support Vector Machine

Decision Tree

Random Forest

RESULTS AND CONCLUSION

Results

Feature Extraction

Classification

Conclusion

CONSENT FOR PUBLICATION

CONFLICT OF INTEREST

ACKNOWLEDGEMENT

REFERENCES

Technology Inspired-Elaborative Education Model (TI-EEM): A futuristic need for a Sustainable Education Ecosystem

Abstract

INTRODUCTION

BACKGROUND

METHODOLOGY

RESULT AND DISCUSSION

CONCLUSION

CONSENT FOR PUBLICATION

CONFLICT OF INTEREST

ACKNOWLEDGEMENT

REFERENCES

Knowledge Graphs for Explaination of Black-Box Recommender System

Abstract

INTRODUCTION

Introduction to Recommender System

Introduction to Knowledge Graphs

RECOMMENDER SYSTEMS

Types of Recommender Systems

KNOWLEDGE GRAPHS

Knowledge Graphs for Providing Recommendations

Knowledge Graphs for Generating Explanations

GENERATING EXPLANATIONS FOR BLACK-BOX RECOMME-NDER SYSTEMS

PROPOSED CASE STUDY

MovieLens Dataset

Modules

Knowledge Graph Generation

The Proposed Approach for Case Study

Results

Graph Visualisation

CONCLUSION

REFERENCES

Universal Price Tag Reader for Retail Supermarket

Abstract

INTRODUCTION

LITERATURE REVIEW

METHODOLOGY

Image Pre-processing and Cropping

Optical Character Recognition

Price of the product

Name of the product

Discounted Price

RESULTS AND FUTURE SCOPE

CONCLUDING REMARKS

ACKNOWLEDGEMENTS

REFERENCES

The Value Alignment Problem: Building Ethically Aligned Machines

Abstract

INTRODUCTION

Value Alignment Problem

Approaches for Solving AI-VAP

Top-Down Approach

Limitations, Issues, and Challenges of Extant Approaches

Eastern Perspectives of Intelligence for Solving AI-VAP

Proposed Approach

Conclusion

REFERENCES

Cryptocurrency Portfolio Management Using Reinforcement Learning

Abstract

INTRODUCTION

RELATED WORK

DATASET PRE-PROCESSING

Simple Moving Average

Moving Average Convergence/Divergence

Parabolic Stop and Reverse

Relative Strength Index

MODELING AND EVALUATION

Convolutional Neural Networks (CNN)

Dense Neural Network Model

CONCLUSION AND FUTURE SCOPE

REFERENCES

Deep Learning: Theory, Architectures and Applications in

Speech, Image and Language Processing

Edited by

Gyanendra Verma

National Institute of Technology Raipur,

Raipur, India

Rajesh Doriya

National Institute of Technology Raipur,

Raipur, India

BENTHAM SCIENCE PUBLISHERS LTD.

End User License Agreement (for non-institutional, personal use)

This is an agreement between you and Bentham Science Publishers Ltd. Please read this License Agreement carefully before using the ebook/echapter/ejournal (“Work”). Your use of the Work constitutes your agreement to the terms and conditions set forth in this License Agreement. If you do not agree to these terms and conditions then you should not use the Work.

Bentham Science Publishers agrees to grant you a non-exclusive, non-transferable limited license to use the Work subject to and in accordance with the following terms and conditions. This License Agreement is for non-library, personal use only. For a library / institutional / multi user license in respect of the Work, please contact: [email protected].

Usage Rules:

All rights reserved: The Work is the subject of copyright and Bentham Science Publishers either owns the Work (and the copyright in it) or is licensed to distribute the Work. You shall not copy, reproduce, modify, remove, delete, augment, add to, publish, transmit, sell, resell, create derivative works from, or in any way exploit the Work or make the Work available for others to do any of the same, in any form or by any means, in whole or in part, in each case without the prior written permission of Bentham Science Publishers, unless stated otherwise in this License Agreement.You may download a copy of the Work on one occasion to one personal computer (including tablet, laptop, desktop, or other such devices). You may make one back-up copy of the Work to avoid losing it.The unauthorised use or distribution of copyrighted or other proprietary content is illegal and could subject you to liability for substantial money damages. You will be liable for any damage resulting from your misuse of the Work or any violation of this License Agreement, including any infringement by you of copyrights or proprietary rights.

Disclaimer:

Bentham Science Publishers does not guarantee that the information in the Work is error-free, or warrant that it will meet your requirements or that access to the Work will be uninterrupted or error-free. The Work is provided "as is" without warranty of any kind, either express or implied or statutory, including, without limitation, implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the results and performance of the Work is assumed by you. No responsibility is assumed by Bentham Science Publishers, its staff, editors and/or authors for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products instruction, advertisements or ideas contained in the Work.

Limitation of Liability:

In no event will Bentham Science Publishers, its staff, editors and/or authors, be liable for any damages, including, without limitation, special, incidental and/or consequential damages and/or damages for lost data and/or profits arising out of (whether directly or indirectly) the use or inability to use the Work. The entire liability of Bentham Science Publishers shall be limited to the amount actually paid by you for the Work.

General:

Any dispute or claim arising out of or in connection with this License Agreement or the Work (including non-contractual disputes or claims) will be governed by and construed in accordance with the laws of Singapore. Each party agrees that the courts of the state of Singapore shall have exclusive jurisdiction to settle any dispute or claim arising out of or in connection with this License Agreement or the Work (including non-contractual disputes or claims).Your rights under this License Agreement will automatically terminate without notice and without the need for a court order if at any point you breach any terms of this License Agreement. In no event will any delay or failure by Bentham Science Publishers in enforcing your compliance with this License Agreement constitute a waiver of any of its rights.You acknowledge that you have read this License Agreement, and agree to be bound by its terms and conditions. To the extent that any other terms and conditions presented on any website of Bentham Science Publishers conflict with, or are inconsistent with, the terms and conditions set out in this License Agreement, you acknowledge that the terms and conditions set out in this License Agreement shall prevail.

Bentham Science Publishers Pte. Ltd. 80 Robinson Road #02-00 Singapore 068898 Singapore Email: [email protected]

FOREWORD

Machine Learning proved its usefulness in many applications in Image Processing and Computer Vision, Medical Imaging, Satellite imaging, Remote Sensing, Surveillance, etc., over the past decade. At the same time, Machine Learning, particularly Artificial Neural Networks has evolved and demonstrated excellent performance over traditional machine learning algorithms. These methods are known as Deep Learning.

Nowadays, Deep Learning has become the researcher's first choice in contrast to traditional machine learning due to its apex performance on speech, image, and text processing. Deep learning algorithms provide efficient solutions to problems ranging from image and speech processing to text processing. The research on deep learning is getting enriched day by day as we witness new learning models.

Deep learning models significantly impacted speech, image, and text-domain and raised the performance bar substantially in many standard evaluations. Moreover, new challenges are easily tackled by utilizing deep learning, which older systems could not have handled. However, it is challenging to comprehend, let alone guide, the learning process in deep neural networks; there is an air of uncertainty about exactly what and how these networks learn.

This book aims to provide the audience with a basic understanding of deep learning and its different architectures. Background knowledge of machine learning helps explore various aspects of deep learning. By the end of the book, I hope that the reader understands different deep learning approaches, models, pre-trained models, and gains familiarity with implementing various deep learning algorithms using multiple frameworks and libraries.

Dr. Shitala Prasad Scientist, Institute for Infocomm Research, A*Star Singapore 138632 Singapore

PREFACE

Machine Learning proved its usefulness in many applications in the domain of Image Processing and Computer Vision, Medical Imaging, Satellite imaging, Remote Sensing, Surveillance, etc ., over the past decade. At the same time, Machine Learning methods themselves have evolved, particularly deep learning methods that have demonstrated significant performance over traditional machine learning algorithms.

Today’s Deep Learning has become researchers’ first choice in contrast to traditional machine learning due to its apex performance in many applications in the domain of speech, image, and text processing. Deep learning algorithms provide efficient solutions to problems ranging from vision and speech to text processing. The research on deep learning is getting enriched day by day as we witness new learning models.

This book contains two major parts. Part one includes the fundamentals of Deep Learning, theory, and architecture of Deep Learning. Moreover, this part provides a detailed description of the theory, frameworks, and non-conventional approaches to deep learning. It covers foundational mathematics that is essential in understanding the framework. Moreover, it covers various kinds of models found in practice.

Chapter 1 contains the basic operating understanding, history, evolution, and challenges associated with deep learning. We will also cover some basic concepts of mathematics and the hardware requirements for deep learning implementation, and some of its popular software frameworks. We will start with neural networks, which focus on the basics of neural networks, including input/output layers, hidden layers, and how networks learn through forward and backpropagation. We will also cover the standard multilayer perceptron networks and their building blocks. Moreover, we will include a review of deep learning concepts in general and deep learning in particular to build a basic understanding of this book. Chapters 2–7 are based on applying artificial intelligence to medical images with various deep learning approaches. It also covers the application of Deep Learning in lung cancer detection, medical imaging, and COVID-19 analysis.

The second part, chapters 8–10, is dedicated to sentiment analysis using deep learning and machine learning techniques. This book section covers the experimentation and application of deep learning techniques and architectures in real-world applications. It details the salient approaches, issues, and challenges in building ethically aligned machines. An approach inspired by traditional Eastern thought and wisdom is also presented.

The third part, Chapters 11–15, is miscellaneous and covers the different artificial intelligence approaches used to explain the machine learning models that enhance transparency between the user and the model. A review and detailed description of the use of knowledge graphs in generating explanations for black-box recommender systems and elaborative education ecosystems for sustainable quality education is provided. Reinforcement learning is a semi-supervised learning technique for portfolio management.

Gyanendra Verma National Institute of Technology Raipur Raipur, India &Rajesh Doriya National Institute of Technology Raipur Raipur, India

List of Contributors

Aqib Ali SayedAmity School of Engineering and Technology, Amity University Mumbai, Maharashtra 410206, IndiaAbdul Jabbar PerumbalathSchool of Computer Science, Mahathma Gandhi University,Kottayam, Kerala, IndiaAnil VermaDepartment of Computer Science and Engineering, Lovely Professional University, Jalandhar, Punjab, IndiaAman SinghDepartment of Computer Science and Engineering, Lovely Professional University, Jalandhar, Punjab, IndiaArchana MathiazhaganSchool of Computing, SASTRA Deemed University, Thanjavur 613401, IndiaBabita PandaSchool of Electrical Engg., KIIT University, Bhubaneswar, IndiaChellapilla Vasantha LakshmiDayalbagh Educational Institute, Agra, IndiaChellapilla PatvardhanDayalbagh Educational Institute, Agra, IndiaDivya AnandDepartment of Computer Science and Engineering, Lovely Professional University, Jalandhar, Punjab, IndiaElakkiya RajasekarSchool of Computing, SASTRA Deemed University, Thanjavur 613401, IndiaGyanendra VermaDepartment of Information Technology, National Institute of Technology, Raipur, IndiaHarshawardhan TiwariJyothy Institute of Technology, Pipeline Rd, near Ravi Shankar Guruji Ashram, Thathaguni, Karnataka, IndiaHarshee PitrodaDepartment of Computer Engineering, NMIMS University, Mukesh Patel School of Technology Management & Engineering, Mumbai, IndiaIshani SahaDepartment of Computer Engineering, NMIMS University, Mukesh Patel School of Technology Management & Engineering, Mumbai, IndiaJay PrajapatiDepartment of Data Science, SVKM’s NMIMS, Mumbai, Maharashtra, IndiaJayalakshmi Ramachandran NairResearch Scholar, Bharathidasan University, Tiruchirappalli, Tamil Nadu, IndiaJaykumar Suraj LachureDepartment of Information Technology, National Institute of Technology, Raipur, IndiaMayank GuptaDepartment of Computer Science and Engineering, Punjab Engineering College, Chandigarh, IndiaMuhamed Ilyas PoovankavilPG and Research Department of Computer Science, Sullamussalam Science College, Areekode, Malappuram Dt, Kerala, IndiaManisha TiwariDepartment of Computer Engineering, NMIMS University, Mukesh Patel School of Technology Management & Engineering, Mumbai, IndiaNatarajan BalasubramanianSchool of Computing, SASTRA Deemed to be University, Thanjavur, Tamilnadu 613401, IndiaNehha SeetharamanAmity School of Engineering and Technology, Amity University Mumbai, Maharashtra 410206, IndiaPoonam SainiDepartment of Computer Science and Engineering, Punjab Engineering College, Chandigarh, IndiaParth KalkotwarDwarkadas J. Sanghvi College of Engineering, Mumbai, IndiaPrajwal Sethu MadhavJyothy Institute of Technology, Pipeline Rd, near Ravi Shankar Guruji Ashram, Thathaguni, Karnataka, IndiaPriyanka Prashanth KumarJyothy Institute of Technology, Pipeline Rd, near Ravi Shankar Guruji Ashram, Thathaguni, Karnataka, IndiaRakesh Kumar DhakaITM University, Gwalior, IndiaRajkumar NarayananDepartment of Sciences, St. Claret College, Bengaluru, Karnataka, IndiaRajalakshmi ElangovanSchool of Computing, SASTRA Deemed University, Thanjavur 613401, IndiaRishika VijDepartment of Veterinary Physiology & Biochemistry, Dr. GC Negi College of Veterinary & Animal Science, Palampur, Himachal Pradesh, IndiaRajesh DoriyaDepartment of Information Technology, National Institute of Technology, Raipur, IndiaSaleena Thorayanpilackal SulaimanPG and Research Department of Computer Science, Sullamussalam Science College, Areekode, Malappuram Dt, Kerala, IndiaSampurna PandaITMUniversity,Gwalior, sampurnapanda, IndiaSanay ShahDwarkadas J. Sanghvi College of Engineering, Mumbai, IndiaSaurav TiwariDwarkadas J. Sanghvi College of Engineering, Mumbai, IndiaSindhu NairDwarkadas J. Sanghvi College of Engineering, Mumbai, IndiaSiba PandaDepartment of Data Science, SVKM’s NMIMS, Mumbai, Maharashtra, IndiaSumathy Pichai PillaiDepartment of Computer Science & Applications, Bharathidasan University, Tiruchirappalli, Tamil Nadu, IndiaSrishti Sakshi SinhaDepartment of CSE, Pondicherry University, Puducherry, IndiaSukrati ChaturvediDayalbagh Educational Institute, Agra, IndiaSushila RatreAmity School of Engineering and Technology, Amity University, Mumbai, Maharashtra 410206, IndiaTrupthi MuralidharrJyothy Institute of Technology, Pipeline Rd, near Ravi Shankar Guruji Ashram, Thathaguni, Karnataka 560082, IndiaVatsal KhandorDwarkadas J. Sanghvi College of Engineering, Mumbai, IndiaUma VijayasundaramDepartment of Computer Science, Pondicherry University, Puducherry, India

Deep Learning: History and Evolution

Jaykumar Suraj Lachure1,*,Gyanendra Verma1,Rajesh Doriya1

1 National Institute of Technology Raipur, Raipur, India

Abstract

Recently, deep learning (DL) computing has become more popular in the machine learning (ML) community. In the field of ML, the most widely used computational approach is DL. It can solve many complex problems, cognitive tasks, and matching problems without any human performance or interface. ML cannot handle large amounts of data and DL can easily handle it. In the last few years, the field of DL has witnessed success in a range of applications. DL outperformed in many application domains, e.g., robotics, bioinformatics, agriculture, cybersecurity, natural language processing (NLP), medical information processing, etc. Despite various reviews on the state of the art in DL, they all concentrated on a single aspect of it, resulting in a general lack of understanding. There is a need to provide a better beginning point for comprehending DL. This paper aims to provide a more comprehensive overview of DL, including current advancements. This paper discusses the importance of DL and introduces DL approaches and networks. It then explains convolutional neural networks (CNNs), the most widely used DL network type and subsequent evolved model starting with LeNET, AlexNet with the Letnet-5, AlexNet, GoogleNet, and ResNet networks, and ending with the High-Resolution network. This paper also discusses the difficulties and solutions to help researchers recognize research gaps for DL applications.

Keywords: Convolution neural network, Deep learning applications, Deep Learning, Image classification, Machine Learning, Medical image analysis.Natural Language Processing.

*Corresponding author Jaykumar Suraj Lachure: National Institute of Technology Raipur, India; E-mail: [email protected]

INTRODUCTION

In the last decade, machine learning (ML) models [1-3] have been widely used in every field and have been applied in versatile applications like classification, image/video retrieval, text mining, multimedia, anomaly detection, attack detection, video recommendation, image classification, etc. Nowadays, deep learning (DL) is frequently employed in comparison to other machine learning methods. DL stands for representative learning. The unpredictable expansion of

DL and distributed learning necessitates ongoing study. Deep and distributed learning studies are continuing to emerge as a result of unanticipated advances in data availability and huge advancements in hardware technologies such as High-Performance Computing (HPC). DL is a Neural Network (NN) that outperforms its predecessors. DL also employs transformations and graph technology to create multi-layer learning models. In fields such as Natural Language Processing (NLP), data processing, visual data processing, and audio and speech processing, the most recent DL techniques have achieved extraordinary performance. The representation of input data is often what determines the success of an ML approach. A proper data representation outperforms a poor data representation. Thus, for many years, feature engineering has been a prominent study topic in ML. This method helps to build features from raw data. It also involves a lot of human effort and is quite field-specific. These are the scale-invariant feature transform (SIFT), histogram of oriented gradients (HOG), and bag of words (BoW).

The DL algorithms automatically extract features, and this helps researchers extract discriminative features with minimal human effort and field knowledge. A multi-layer data representation architecture extracts low-level features at the first layer, while the last layer extracts high-level features. Artificial Intelligence (AI) is the basis of all technology, including ML, DL, and NLP, etc., which processes data for particular applications, much like in the human brain's basic sensory regions. The human brain can automatically derive data representation using different scenes. This procedure's output is the classified objects, while the input is the incoming scene information. This mimics the human brain's workings. Thus, it accentuates DL's key advantage.

Due to its significant success, DL is presently one of the most important research fashions in ML. Architectures, issues, computational tools, the evolution matrix, and applications are all significant elements in DL. In DL networks, convolutional neural networks (CNN) are widely employed. CNN automatically finds key features, making it the most widely used. Therefore, we delved deep into CNN by showing its core elements. From the AlexNet network to the GoogleNet with high-resolution network, each uses the most prevalent CNN topologies.

Several deep learning models have solely dealt with one application or issue in recent years, such as examining CNN architectures or deep learning. There are different applications like autonomous machines, deep learning for plant disease detection and classification, deep learning for security and malicious attack detection, and so on. Table 1 shown below provides a few domains and applications of DL. Prior to diving into DL applications, it is important to grasp the concepts, problems, and benefits of DL. Learning DL to address research gaps and applications takes a lot of time and research. Our proposal is to conduct an extensive review of DL to provide a better starting point for a comprehensive grasp of DL.

Table 1Different Domains of DL and Applications.Internet &CloudMedicine & BiologyMedia & EntertainmentSecurity & DefenseAutonomous MachinesAgricultureImage ClassificationCancer Cell DetectionVideo CaptioningFace DetectionPedestrian DetectionCrop RecommendationSpeech RecognitionDiabetic GradingVideo SearchVideo SurveillanceLane TrackingLeaf Disease DetectionLanguage TranslationDrug DiscoveryReal Time TranslationSatellite ImageryRecognize Traffic SignFruit ClassificationLanguage ProcessingDrug-Drug InteractionRecommendationMalicious AttackObject DetectionSmart IrrigationSentiment AnalysisDrug-protein InteractionImage/ video RetrievalFirewall SecurityObject TrackingLeaf Identification

For our review, we focused on open challenges, computational tools, and applications. This review can also be a springboard for further DL discussions.

The review helps individuals learn more about recent breakthroughs in DL research, which will help them grow in the field. In order to deliver precise alternatives to the field, researchers would be given greater autonomy. Here are our contributions:

This review aids researchers and students in gaining comprehensive knowledge about DL.We will describe the historical overview of neural networks.We discuss deep learning approaches using Deep Feedforward Neural Networks, Deep Backward Neural Networks, and CNN, as well as their concepts, theories, and current architectures.We describe the different CNN architectures like AlexNet, GoogleNet, and ResNet.We describe deep learning models that use auto-encoders, long short-term memory, and a deep belief network architecture.

The rest of the paper is organized as follows: A description of neural networks and its fundamental structure is given in Section 2. Section 3 provides the different neural network architectures. Section 4 discusses the detailed study of CNN and its components, with different architectures of CNN models. Section 5 discusses the different DL models with a time-series base and a deep belief network. Section 6 concludes with the discussion of DL.

OVERVIEW OF THE NEURAL NETWORK

Over the years, many people have contributed to the development of neural networks [2, 4, 5]. Given the current spike in interest in DL, it's not surprising that credit for substantial advancements is being contested. The following is an overview of the most significant contributions in an objective manner. McCulloch and Pitts developed the first mathematical neuron model in 1943. However, this model does not attempt to replicate the biophysical mechanism of an actual neuron. Intriguingly, this model omitted education. Hebb developed the concept of physiologically driven learning in neural networks in 1949. Hebbian learning is an unsupervised neural network learning technique. Rosenblatt introduced the Perceptron in 1957. A perceptron is a single-layer based neural network that can be used to classify a perceptron. It uses the Heaviside activation function in the current ANN language. Widrow and Hoff introduced the delta-learning rule for learning a perceptron. To update the neurons' weights, the delta-learning rule uses gradient descent. It is a back propagation algorithm variation. To train neural networks, Ivakhnenko invented the Group Method of Data Handling (GMDH) in 1968. These networks were the first feedforward multilayer perceptron deep learning networks. In 1971, the first 8-layer deep GMDH net was used with the number of layers. Each level contains units per layer that could be learned rather than predetermined.

A perceptron cannot learn XOR since it is not linearly separable. In 1974, the error back propagation (BP) algorithm was proposed for weighted learning in a supervised manner. Fukushima introduced the Neocognitron in 1980. The Neocognitron is viewed as a deep neural network in the same vein as the deep GMDH networks (DNN). The D-FFNNs (Deep Feedforward Neural Networks) are the ancestors of this network, and it has a similar design. In 1982, Hopfield developed the Hopfield Network, which is also known as a content-addressable memory neural network. Recurrent neural networks are similar to Hopfield networks. In the given example, backpropagation resurfaced in 1986, and this learning technique can build meaningful internal representations for broad neural network learning tasks.

Terry Sejnowski created NETtalk in 1987. That programme improved over time in pronouncing English words. In 1989, the back propagation (CNN) first did handwritten digit learning. Hochreiter studied a basic issue in 1991 when training a deep learning network via backpropagation. According to his research, backpropagation signals either drop or rise without limits. In the event of a decline, the network depth is proportionate. also called the “vanishing or bursting gradient issue.” Pre-training Recurrent Neural Network (RNN) unsupervised to speed up future supervised learning was suggested in 1992 as a partial solution. The RNN investigated contained over 1000 layers. In 1995, Wang and Terman introduced oscillatory neural networks.

Image and audio segmentation, as well as time series production, are examples of applications. In 1997, Long Short-Term Memory (LSTM) was proposed by Hochreiter and Schmidhuber, which is a supervised model for learning recurrent neural networks (RNNs). LSTM networks avoid decaying error signals between layers.

It was integrated with backpropagation to improve learning at CNN in 1998. It was therefore created to classify handwritten numbers on checks using LeNet-5, which typically contains a 7-level convolutional network. The greedy layer-wise approach was used to train the model and was demonstrated by Hinton et al. in 2006. The third wave of neural networks popularised the phrase “deep learning.”

In 2012, CNN, with a GPU, AlexNet, beat LeNet5 to win the ImageNet Large Scale Visual Recognition Challenge. In 2014, Goodfellow et al. introduced generative adversarial networks. Two neural networks battle in the fashion of a game mode. Overall, this creates a generative model that can produce fresh data. This is the evolution of the Hopfield network to CNN and other CNN architectures that have been replaced over the years.coolest machine learning idea in 20 years, according to Yann LeCun. With deep neural networks, Yoshua Bengio, Yann LeCun, and Geoffrey Hinton won the Turing Award in 2019.

The Neural Network's Basic Structure

Artificial Neural Networks (ANNs) are basic mathematical models based on how the brain works [6]. However, the models discussed below are not biologically realistic. Instead, these models analyse the data. The different neural models are explained as follows:

Artificial Neuron Model with FFNN

Any neural network starts with a neuron model (Fig. 1) depicts an artificial neuron model. In a neuron model, the basic input, x, is feed with weighted w and bias b to summarized [7]. Assume that the input vector Rn and the weight vector w are both vectors, with n equal to the input dimension N. The bias term is not always existing and might be remove. They are added together to create the an activation function argument, giving the neuron model's output:(z)=wTx+b. Only the argument of provides a linear discriminant function. The activation function is identified as transfer or unit function or transforms z nonlinearly.

(1)

The ReLU activation function is termed as a rectifier and most widely used in DNNs. The softmax function:

(2)Fig. (1)) Artificial Neuron Model.

The softmax maps an n-dimensional x to an n-dimensional y. Therefore, y represents the probability for each of the n elements. It is sometimes used as the last layer in a network. The activation function uses the Heaviside step function in the perceptron model. The neurons must be connected in NN. A feedforward arrangement in its simplest form is shown in Fig. (2) and Fig. (3)., which illustrate the shallow and deep architecture of NN.

Fig. (2)) Shallow Architecture of NN. Fig. (3)) Deep Architecture of NN.

Generalized deepness of a network in NN is the sum of non-linear revolutions between the layers that are separated, whereas hidden layer width is the number of hidden neurons. Fig. (2) has a single hidden layer, whereas Fig. (3) has a three number of hidden layers. The depths for the shallow and deep architectures of NN are two and four. Debatable, however, topologies with two layers are called “shallow” and those with more than two hidden layers are typically called “deep” in Feedforward Neural Networks (FFNN).

The activation functions of a feedforward neural network (FNN) might be linear or non-linear. The NN lacks any cycles that would permit direct input. How an MLP gets its output from its input.

(3)

Equation (3) illustrates the neural network's discriminant function. An optimization method to find the optimal parameters for training data sets with a cost function or an error function is being developed.

Recurrent Neural Networks: The RNN family has 2 subclasses that are able to be identified by their characteristics of signal processing [8]. The first type is composed of Finite Recurrent Networks (FRN), whereas the second type is composed of Infinite Impulse Recurrent Networks (IIRN). However, an FRN comes under a directed acyclic graph (DAG) type that may be unrolled and replaced by a FNN, whereas an IIRN comes under a directed cyclic graph (DCG) that cannot be unrolled.

Hopfield Network: A Hopfield Network is an example of a FRN. It is a network of McCulloch-Pitts neurons that is entirely connected. For a

McCulloch-Pitts neuron, the activation function is as:

(4)

The activation neuron of the function is as:

(5)(6)

xiis updated synchronously or asynchronously with the xj.wijis updated weight for updating the xi value for sign value.

Boltzmann Machine: It uses a noisy Hopfield network with a probabilistic-based activation function. From Eq. 7, it is shown that probability is updated with an update from Eq. 5. This model is significant as it was one of the first to use hidden units. The contrastive-divergence algorithm is used to train Boltzmann Machines.

(7)

Boltzmann Machines are two-layered neural networks with visible and hidden layers.

The edges between the two layers are undirected within the graph, which implies information could flow in both directions. The network is completely connected, which means every neuron is connected to another through undirected edges Fig. (4) shows how to transform the Boltzmann machine into an RBM [9]. RBM is a basic structure used in many applications and for creating different networks. (Table 2) provides the usage of models and their working nature, not the comparison. Each model in the table performs differently for different domains.

Fig. (4)) Conversion of Boltzmann Machine to Restricted Boltzmann machine (RBM).

Table 2Deep Learning Models and its Learning Algorithms.ModelUnsupervisedSupervisedMultilayer PerceptronNoYesDeep Belief NetworkYesYesRestricted Boltzmann MachineYesYesConvolutional Neural NetworkYesYesRecurrent Neural NetworkNoYesConvolutional Deep Belief NetworkYesYesAutoencoderYesNoDeep Boltzmann MachineNoYesLong short-term memoryNoYes

DEEP LEARNING NEURAL NETWORK

The neural network consists of deep layers of neurons [10]. The neurons must constantly learn to tackle tasks or to apply in different ways to produce better results. It learns every time based on new updated information. A deep neural network uses multiple layers of nodes to extract high-level functions from incoming data [1, 4]. It means changing data into something more creative and abstract. The Deep Forward Neural Networks (DFNN) are explained as below:

A Deep Forward Neural Network

A FNN contains a set of neurons and a hidden layer for any continuous function. The reason for adopting an FFNN with multiple hidden layers is that it uses the universal approximation theorem, which does not explain how to learn such a network. A related concern is that the network's diameter can grow exponentially. Unexpectedly, the universal approximation theorem holds for FFNN with a limited number of hidden neurons and numerous hidden layers. So DFFNNs are employed instead of shallow FFNNs for learnability. Approximating an unknown function f* is:

(8)

Here, f is a function with a specific family that is reliant on the parameters θ, and ɸ is a non-linear activation function with a single layer. For deep hidden layers, ɸ has the form is as below:

(9)

In place of assuming the precise family functions from f, D-FFNNs learn Eq. 9 function by approximating it withɸ, which is approached by the n separate hidden layers.

CNN Architecture and its Components

A CNN [4, 11-13] is a special type of FFNN that uses a combination of convolution layers, ReLU, and pooling layers. These layers are usually combined with several layers of FNN. In traditional ANN, each neuron in a layer is linked to all the neurons in the next layer. Each connection is a parameter in the network, and each connection is how the network works. In CNN, there could be different variables that are not fully connected layers. This significance cuts down on the number of parameters and reduces the operations in the network. All the connections between neurons and local receptive fields use a set of weights, and we call this set of weights a kernel, or core.

Kernel: All the neurons that attach to their local receptive fields will share the same kernel. The neurons' calculations results will be stored in a matrix called the activation map. Weight sharing refers to the fact that CNNs can share their weight. Consequently, different kernels will produce different activation maps, and hyper-parameters can be used to change the number of kernels in the map. The number of weights in a network is proportional to the kernel i.e. to the size of the local receptive field. Fig. (5) shows the typical CNN architecture with 3-channel input. Each channel was connected with a convolution layer, pooling, and then again, convolution, pooling, and merge. The merge layer connects with the fully connected layer (FC) to provide the decision using the softmax function.

Fig. (5)) Typical CNN with 3-Channel input.

The softmax equation is given in eq. 10, where it is calculated to provide the classification based on their threshold values.

(10)

The different layers in CNN models are explained as follows:

Convolution layer: A convolution layer is a critical component of a convolutional neural network's architecture. A convolutional layer, like a hidden layer in a conventional neural network, seeks to convert the input to a higher level of abstraction. On the other hand, the convolutional layer, rather than relying on total connectivity to perform calculations between the input and hidden neurons, takes advantage of local connectivity. A convolutional layer slides at least one kernel across the input, convoluting each region. The results are stored in activation maps, which are the outputs of the convolutional layer.Pooling layer: It is frequently sandwiched between two layers of convolution. By retaining as much information as possible, pooling layers attempt to minimise the input dimension. Additionally, a pooling layer can impart spatial invariance to the network, hence increasing generality. The zero padding, stride, pooling window size, and hyperparameters of a pooling layer. The pooling layer, like the kernel of a convolutional layer, scans the whole input using the specified pooling window size. By pooling with a stride of 2, a window size of 2, and zero padding, the input dimension is halved. Min-pooling, averaging, and more sophisticated methods such as stochastic pooling and fractional max-pooling are examples of pooling procedures. Max pooling is the most commonly used pooling technique, as it efficiently captures picture invariance. Max-pooling is used to get the extreme value from each sub-window.Fully connected layer: The smallest unit in FFNN is a completely connected layer. Between the penultimate and output layers of a normal CNN, a fully connected layer is frequently added to represent non-linear interactions between input features. However, the numerous criteria given have been questioned recently, posing the possibility of overfitting. It has been used in some CNN architectures instead of linear layers.

DIFFERENT CNN ARCHITECTURE

CNN is a common FFNN model that was designed to recognise visual patterns directly from group or pixel images with minimal preprocessing [11, 14]. An image database, ImageNet, was proposed for object recognition research. An annual software challenge called the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) tests software's ability to detect and classify objects and scenes. Below, we discuss the CNN architectures of ILSVRC's main competitors.

LeNet-5

In 1998, LeNet-5 used a 7-level convolutional network developed by LeCun et al. to classify digits. For processing higher resolution images, it requires a large number of convolutional layers; therefore, processing resources are restricted to computing in Fig. (6).

Fig. (6)) LetNet-5 Architecture.

AlexNet: In 2012, AlexNet surpassed all previous opponents, by cutting the topmost-5 errors from 26% to 15.3%. The AlexNet network was deeper, featured more filters per layer, and stacked convolutional layers were used than in LeNet5. Convolutions, max-pooling, dropout, data-augmentation, ReLU activations, and SGD with momentum were used in AlexNet in Fig. (7). Every fully connected layer and convolutional layer had a ReLU activation function. For the first 6 days, AlexNet was trained on 2 Nvidia Geforce GTX 580 GPUs. In addition, the SuperVision group designed AlexNet by Geoffrey Hinton and Ilya Sutskever.

Fig. (7)) AlexNet Architecture.

GoogleNet/Inception

The ILSVRC 2014 competition was won by GoogLeNet (Inception V1). The challenge organizers were now forced to evaluate this near-human performance. It turns out that beating Google's accuracy requires some human instruction. Using ensemble mode, a human expert achieved a top-5 error rate of 5.1 percent for a single model and 3.6 percent for multi-models.

The network employed a LeNet-inspired CNN, but it included a new element called an inception component. There was also RMSprop and batch normalization. This module uses numerous minor convolutions to reduce the number of parameters. Their architecture uses a 22-layer deep CNN with 4 million parameters instead of 60 million (AlexNet).

VGGNet

VGGNet was the runner-up at ILSVRC 2014 and was developed by Zisserman and Simonyan. VGGNet uses 16 convolutional layers with a fine and reliable design. Only 3x3 convolutions, but many filters, and the programme ran for 2–3 weeks on four GPUs continuously. It is now the most commonly used method for extracting features from images or photos. The VGGNet weight configuration is open source and is now being utilised in many other applications and challenges. VGGNet has 138 million parameters, which can be difficult to manage in Fig. (8).

Fig. (8)) VGGNet Architecture.

ResNet

Finally, Kaiming et al. proposed and developed a novel architecture with “skip connections” and significant batch-normalization at ILSVRC 2015 called the Residual Neural Network (ResNet).

Fig. (9)) ResNet Architecture.

These skip connections, also called gated units or gated recurrent units, are closely related to recent successful RNN elements. They trained a NN with 152 layers that was less complex than the VGGNet. Using this dataset, it achieves a top-5 error rate of 3.57%. GoogleNet has inception components, while ResNet has residual connections.

UNSUPERVISE NEURAL NETWORK ARCHITECTURE

Deep Belief Network

A DBN is a model that combines different forms of NN [1, 15]. DBN is a hybrid of RBM and Deep Feedforward Neural Networks (D-FFNN). The RBM serve as the input, and the D-FFNN serve as the output. RBMs are commonly stacked, which means they are used sequentially. Because RBM and D-FFNN are independent networks with two different learning techniques, this enriches the DBN. RBMs are commonly used to unsupervised initialise a model. A supervised technique is used to fine-tune the settings. These two stages of DBN training are discussed in greater detail below in Fig. (10).

Fig. (10)) Deep Belief Network.

Autoencoder

It is an unsupervised NN model for feature selection or dimension reduction [1, 3, 16]. An autoencoder's input and output layers have the same size, symmetrically. An input pattern x is learned to a new encoding, which provides an output pattern identical to the input pattern, i.e., (c). So the encoding c can replicate x. Autoencoders are built similarly to DBN. Interestingly, the original autoencoder only pre-trained the first half of the network with RBM and then unrolled the network, creating the second half. Pre-training is followed by fine-tuning, as in DBN. Fig. (11) shows the typical denoising autoencoder. Autoencoders are unsupervised learning models because they don't need labels.

Fig. (11)) Denoising Autoencoder.

The model has been used to reduce dimensionality successfully. When given enough data, autoencoders can produce a superior 2-dimensional representation of array data. PCAs use linear transformations, whereas autoencoders use non-linear transformations. This usually results in improved performance. Some of these models are called sparse autoencoders, denoising autoencoders, or variational autoencoders, and there are many different types of them.

LSTM

In 1997, Hochreiter and Schmidhuber proposed LSTM networks. LSTM is an RNN variant that can alleviate RNN faults, such as long-term dependencies [3, 8]. LSTM also prevents gradients from disappearing or bursting. In 1999, an LSTM with a forget gate was introduced. As a result, unlike DFNs, LSTM with feedback links became the standard LSTM network structure. They can also process data sequences as opposed to single data pieces. As a result, LSTMs are excellent for evaluating voice or video data. Fig. (12) shows a typical LSTM with a forget gate, input gate, and output gate all connected by a single flip-flop.

Fig. (12)) Long Short Term Memory.

CONCLUSION

In this paper, we gave an overview of the history of neural networks and deep learning models. The basic models of artificial neural networks like the shallow FNN and deep FNN are discussed. The brief details of neural network models like RNN with Hopfield, Boltzmann machine, and RBM are studied. The architecture of DFNN, with a special example of CNN is discussed. The application of different models like Lenet-5, Alexnet, VGG group, GoogleNet/Inception, ResNet has been discussed. In this study, we gave an overview of deep learning models such as Deep Feedforward Neural Networks, Convolutional Neural Networks, Deep Belief Networks, Autoencoders, and Long Short-Term Memory Networks in this study. These models could be the main architecture of deep learning today. As a result, a fundamental understanding of these concepts is essential for being prepared for future AI breakthroughs.

CONSENT FOR PUBLICATION

Not applicable.

CONFLICT OF INTEREST

The authors declare no conflict of interest, financial or otherwise.

ACKNOWLEDGEMENT

Declared none.

REFERENCES

[1]Zhang Q., Yang L.T., Chen Z., Li P.. A survey on deep learning for big dataInf. Fusion vol. 42, no. October 2017, pp. 146–157, 2018.. 10.1016/j.inffus.2017.10.006[2]Shyu M., Chen S., Iyengar S.S.. A survey on deep learning techniques.Strad Res.20207810.37896/sr7.8/037[3]Dargan S., Kumar M., Ayyagari M.R., Kumar G.. A survey of deep learning and its applications: A new paradigm to machine learning.Arch. Comput. Methods Eng.20202741071109210.1007/s11831-019-09344-w[4]Emmert-Streib F., Yang Z., Feng H., Tripathi S., Dehmer M.. An introductory review of deep learning for prediction models with big data.Front. Artif. Intell.20203February410.3389/frai.2020.0000433733124[5]Schmidhuber J.. Deep learning in neural networks: An overview.Neural Networks2015618511710.1016/j.neunet.2014.09.00325462637[6]

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben: