53,95 €
This book is a detailed reference guide on deep learning and its applications. It aims to provide a basic understanding of deep learning and its different architectures that are applied to process images, speech, and natural language. It explains basic concepts and many modern use cases through fifteen chapters contributed by computer science academics and researchers. By the end of the book, the reader will become familiar with different deep learning approaches and models, and understand how to implement various deep learning algorithms using multiple frameworks and libraries.
This book is divided into three parts. The first part explains the basic operating understanding, history, evolution, and challenges associated with deep learning. The basic concepts of mathematics and the hardware requirements for deep learning implementation, and some of its popular frameworks for medical applications are also covered.
The second part is dedicated to sentiment analysis using deep learning and machine learning techniques. This book section covers the experimentation and application of deep learning techniques and architectures in real-world applications. It details the salient approaches, issues, and challenges in building ethically aligned machines. An approach inspired by traditional Eastern thought and wisdom is also presented.
The final part covers artificial intelligence approaches used to explain the machine learning models that enhance transparency for the benefit of users. A review and detailed description of the use of knowledge graphs in generating explanations for black-box recommender systems and a review of ethical system design and a model for sustainable education is included in this section. An additional chapter demonstrates how a semi-supervised machine learning technique can be used for cryptocurrency portfolio management.
The book is a timely reference for academicians, professionals, researchers and students at engineering and medical institutions working on artificial intelligence applications.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 359
Veröffentlichungsjahr: 2000
This is an agreement between you and Bentham Science Publishers Ltd. Please read this License Agreement carefully before using the ebook/echapter/ejournal (“Work”). Your use of the Work constitutes your agreement to the terms and conditions set forth in this License Agreement. If you do not agree to these terms and conditions then you should not use the Work.
Bentham Science Publishers agrees to grant you a non-exclusive, non-transferable limited license to use the Work subject to and in accordance with the following terms and conditions. This License Agreement is for non-library, personal use only. For a library / institutional / multi user license in respect of the Work, please contact: [email protected].
Bentham Science Publishers does not guarantee that the information in the Work is error-free, or warrant that it will meet your requirements or that access to the Work will be uninterrupted or error-free. The Work is provided "as is" without warranty of any kind, either express or implied or statutory, including, without limitation, implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the results and performance of the Work is assumed by you. No responsibility is assumed by Bentham Science Publishers, its staff, editors and/or authors for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products instruction, advertisements or ideas contained in the Work.
In no event will Bentham Science Publishers, its staff, editors and/or authors, be liable for any damages, including, without limitation, special, incidental and/or consequential damages and/or damages for lost data and/or profits arising out of (whether directly or indirectly) the use or inability to use the Work. The entire liability of Bentham Science Publishers shall be limited to the amount actually paid by you for the Work.
Bentham Science Publishers Pte. Ltd. 80 Robinson Road #02-00 Singapore 068898 Singapore Email: [email protected]
Machine Learning proved its usefulness in many applications in Image Processing and Computer Vision, Medical Imaging, Satellite imaging, Remote Sensing, Surveillance, etc., over the past decade. At the same time, Machine Learning, particularly Artificial Neural Networks has evolved and demonstrated excellent performance over traditional machine learning algorithms. These methods are known as Deep Learning.
Nowadays, Deep Learning has become the researcher's first choice in contrast to traditional machine learning due to its apex performance on speech, image, and text processing. Deep learning algorithms provide efficient solutions to problems ranging from image and speech processing to text processing. The research on deep learning is getting enriched day by day as we witness new learning models.
Deep learning models significantly impacted speech, image, and text-domain and raised the performance bar substantially in many standard evaluations. Moreover, new challenges are easily tackled by utilizing deep learning, which older systems could not have handled. However, it is challenging to comprehend, let alone guide, the learning process in deep neural networks; there is an air of uncertainty about exactly what and how these networks learn.
This book aims to provide the audience with a basic understanding of deep learning and its different architectures. Background knowledge of machine learning helps explore various aspects of deep learning. By the end of the book, I hope that the reader understands different deep learning approaches, models, pre-trained models, and gains familiarity with implementing various deep learning algorithms using multiple frameworks and libraries.
Machine Learning proved its usefulness in many applications in the domain of Image Processing and Computer Vision, Medical Imaging, Satellite imaging, Remote Sensing, Surveillance, etc ., over the past decade. At the same time, Machine Learning methods themselves have evolved, particularly deep learning methods that have demonstrated significant performance over traditional machine learning algorithms.
Today’s Deep Learning has become researchers’ first choice in contrast to traditional machine learning due to its apex performance in many applications in the domain of speech, image, and text processing. Deep learning algorithms provide efficient solutions to problems ranging from vision and speech to text processing. The research on deep learning is getting enriched day by day as we witness new learning models.
This book contains two major parts. Part one includes the fundamentals of Deep Learning, theory, and architecture of Deep Learning. Moreover, this part provides a detailed description of the theory, frameworks, and non-conventional approaches to deep learning. It covers foundational mathematics that is essential in understanding the framework. Moreover, it covers various kinds of models found in practice.
Chapter 1 contains the basic operating understanding, history, evolution, and challenges associated with deep learning. We will also cover some basic concepts of mathematics and the hardware requirements for deep learning implementation, and some of its popular software frameworks. We will start with neural networks, which focus on the basics of neural networks, including input/output layers, hidden layers, and how networks learn through forward and backpropagation. We will also cover the standard multilayer perceptron networks and their building blocks. Moreover, we will include a review of deep learning concepts in general and deep learning in particular to build a basic understanding of this book. Chapters 2–7 are based on applying artificial intelligence to medical images with various deep learning approaches. It also covers the application of Deep Learning in lung cancer detection, medical imaging, and COVID-19 analysis.
The second part, chapters 8–10, is dedicated to sentiment analysis using deep learning and machine learning techniques. This book section covers the experimentation and application of deep learning techniques and architectures in real-world applications. It details the salient approaches, issues, and challenges in building ethically aligned machines. An approach inspired by traditional Eastern thought and wisdom is also presented.
The third part, Chapters 11–15, is miscellaneous and covers the different artificial intelligence approaches used to explain the machine learning models that enhance transparency between the user and the model. A review and detailed description of the use of knowledge graphs in generating explanations for black-box recommender systems and elaborative education ecosystems for sustainable quality education is provided. Reinforcement learning is a semi-supervised learning technique for portfolio management.
Recently, deep learning (DL) computing has become more popular in the machine learning (ML) community. In the field of ML, the most widely used computational approach is DL. It can solve many complex problems, cognitive tasks, and matching problems without any human performance or interface. ML cannot handle large amounts of data and DL can easily handle it. In the last few years, the field of DL has witnessed success in a range of applications. DL outperformed in many application domains, e.g., robotics, bioinformatics, agriculture, cybersecurity, natural language processing (NLP), medical information processing, etc. Despite various reviews on the state of the art in DL, they all concentrated on a single aspect of it, resulting in a general lack of understanding. There is a need to provide a better beginning point for comprehending DL. This paper aims to provide a more comprehensive overview of DL, including current advancements. This paper discusses the importance of DL and introduces DL approaches and networks. It then explains convolutional neural networks (CNNs), the most widely used DL network type and subsequent evolved model starting with LeNET, AlexNet with the Letnet-5, AlexNet, GoogleNet, and ResNet networks, and ending with the High-Resolution network. This paper also discusses the difficulties and solutions to help researchers recognize research gaps for DL applications.
In the last decade, machine learning (ML) models [1-3] have been widely used in every field and have been applied in versatile applications like classification, image/video retrieval, text mining, multimedia, anomaly detection, attack detection, video recommendation, image classification, etc. Nowadays, deep learning (DL) is frequently employed in comparison to other machine learning methods. DL stands for representative learning. The unpredictable expansion of
DL and distributed learning necessitates ongoing study. Deep and distributed learning studies are continuing to emerge as a result of unanticipated advances in data availability and huge advancements in hardware technologies such as High-Performance Computing (HPC). DL is a Neural Network (NN) that outperforms its predecessors. DL also employs transformations and graph technology to create multi-layer learning models. In fields such as Natural Language Processing (NLP), data processing, visual data processing, and audio and speech processing, the most recent DL techniques have achieved extraordinary performance. The representation of input data is often what determines the success of an ML approach. A proper data representation outperforms a poor data representation. Thus, for many years, feature engineering has been a prominent study topic in ML. This method helps to build features from raw data. It also involves a lot of human effort and is quite field-specific. These are the scale-invariant feature transform (SIFT), histogram of oriented gradients (HOG), and bag of words (BoW).
The DL algorithms automatically extract features, and this helps researchers extract discriminative features with minimal human effort and field knowledge. A multi-layer data representation architecture extracts low-level features at the first layer, while the last layer extracts high-level features. Artificial Intelligence (AI) is the basis of all technology, including ML, DL, and NLP, etc., which processes data for particular applications, much like in the human brain's basic sensory regions. The human brain can automatically derive data representation using different scenes. This procedure's output is the classified objects, while the input is the incoming scene information. This mimics the human brain's workings. Thus, it accentuates DL's key advantage.
Due to its significant success, DL is presently one of the most important research fashions in ML. Architectures, issues, computational tools, the evolution matrix, and applications are all significant elements in DL. In DL networks, convolutional neural networks (CNN) are widely employed. CNN automatically finds key features, making it the most widely used. Therefore, we delved deep into CNN by showing its core elements. From the AlexNet network to the GoogleNet with high-resolution network, each uses the most prevalent CNN topologies.
Several deep learning models have solely dealt with one application or issue in recent years, such as examining CNN architectures or deep learning. There are different applications like autonomous machines, deep learning for plant disease detection and classification, deep learning for security and malicious attack detection, and so on. Table 1 shown below provides a few domains and applications of DL. Prior to diving into DL applications, it is important to grasp the concepts, problems, and benefits of DL. Learning DL to address research gaps and applications takes a lot of time and research. Our proposal is to conduct an extensive review of DL to provide a better starting point for a comprehensive grasp of DL.
For our review, we focused on open challenges, computational tools, and applications. This review can also be a springboard for further DL discussions.
The review helps individuals learn more about recent breakthroughs in DL research, which will help them grow in the field. In order to deliver precise alternatives to the field, researchers would be given greater autonomy. Here are our contributions:
This review aids researchers and students in gaining comprehensive knowledge about DL.We will describe the historical overview of neural networks.We discuss deep learning approaches using Deep Feedforward Neural Networks, Deep Backward Neural Networks, and CNN, as well as their concepts, theories, and current architectures.We describe the different CNN architectures like AlexNet, GoogleNet, and ResNet.We describe deep learning models that use auto-encoders, long short-term memory, and a deep belief network architecture.The rest of the paper is organized as follows: A description of neural networks and its fundamental structure is given in Section 2. Section 3 provides the different neural network architectures. Section 4 discusses the detailed study of CNN and its components, with different architectures of CNN models. Section 5 discusses the different DL models with a time-series base and a deep belief network. Section 6 concludes with the discussion of DL.
Over the years, many people have contributed to the development of neural networks [2, 4, 5]. Given the current spike in interest in DL, it's not surprising that credit for substantial advancements is being contested. The following is an overview of the most significant contributions in an objective manner. McCulloch and Pitts developed the first mathematical neuron model in 1943. However, this model does not attempt to replicate the biophysical mechanism of an actual neuron. Intriguingly, this model omitted education. Hebb developed the concept of physiologically driven learning in neural networks in 1949. Hebbian learning is an unsupervised neural network learning technique. Rosenblatt introduced the Perceptron in 1957. A perceptron is a single-layer based neural network that can be used to classify a perceptron. It uses the Heaviside activation function in the current ANN language. Widrow and Hoff introduced the delta-learning rule for learning a perceptron. To update the neurons' weights, the delta-learning rule uses gradient descent. It is a back propagation algorithm variation. To train neural networks, Ivakhnenko invented the Group Method of Data Handling (GMDH) in 1968. These networks were the first feedforward multilayer perceptron deep learning networks. In 1971, the first 8-layer deep GMDH net was used with the number of layers. Each level contains units per layer that could be learned rather than predetermined.
A perceptron cannot learn XOR since it is not linearly separable. In 1974, the error back propagation (BP) algorithm was proposed for weighted learning in a supervised manner. Fukushima introduced the Neocognitron in 1980. The Neocognitron is viewed as a deep neural network in the same vein as the deep GMDH networks (DNN). The D-FFNNs (Deep Feedforward Neural Networks) are the ancestors of this network, and it has a similar design. In 1982, Hopfield developed the Hopfield Network, which is also known as a content-addressable memory neural network. Recurrent neural networks are similar to Hopfield networks. In the given example, backpropagation resurfaced in 1986, and this learning technique can build meaningful internal representations for broad neural network learning tasks.
Terry Sejnowski created NETtalk in 1987. That programme improved over time in pronouncing English words. In 1989, the back propagation (CNN) first did handwritten digit learning. Hochreiter studied a basic issue in 1991 when training a deep learning network via backpropagation. According to his research, backpropagation signals either drop or rise without limits. In the event of a decline, the network depth is proportionate. also called the “vanishing or bursting gradient issue.” Pre-training Recurrent Neural Network (RNN) unsupervised to speed up future supervised learning was suggested in 1992 as a partial solution. The RNN investigated contained over 1000 layers. In 1995, Wang and Terman introduced oscillatory neural networks.
Image and audio segmentation, as well as time series production, are examples of applications. In 1997, Long Short-Term Memory (LSTM) was proposed by Hochreiter and Schmidhuber, which is a supervised model for learning recurrent neural networks (RNNs). LSTM networks avoid decaying error signals between layers.
It was integrated with backpropagation to improve learning at CNN in 1998. It was therefore created to classify handwritten numbers on checks using LeNet-5, which typically contains a 7-level convolutional network. The greedy layer-wise approach was used to train the model and was demonstrated by Hinton et al. in 2006. The third wave of neural networks popularised the phrase “deep learning.”
In 2012, CNN, with a GPU, AlexNet, beat LeNet5 to win the ImageNet Large Scale Visual Recognition Challenge. In 2014, Goodfellow et al. introduced generative adversarial networks. Two neural networks battle in the fashion of a game mode. Overall, this creates a generative model that can produce fresh data. This is the evolution of the Hopfield network to CNN and other CNN architectures that have been replaced over the years.coolest machine learning idea in 20 years, according to Yann LeCun. With deep neural networks, Yoshua Bengio, Yann LeCun, and Geoffrey Hinton won the Turing Award in 2019.
Artificial Neural Networks (ANNs) are basic mathematical models based on how the brain works [6]. However, the models discussed below are not biologically realistic. Instead, these models analyse the data. The different neural models are explained as follows:
Any neural network starts with a neuron model (Fig. 1) depicts an artificial neuron model. In a neuron model, the basic input, x, is feed with weighted w and bias b to summarized [7]. Assume that the input vector Rn and the weight vector w are both vectors, with n equal to the input dimension N. The bias term is not always existing and might be remove. They are added together to create the an activation function argument, giving the neuron model's output:(z)=wTx+b. Only the argument of provides a linear discriminant function. The activation function is identified as transfer or unit function or transforms z nonlinearly.
(1)The ReLU activation function is termed as a rectifier and most widely used in DNNs. The softmax function:
(2)Fig. (1)) Artificial Neuron Model.The softmax maps an n-dimensional x to an n-dimensional y. Therefore, y represents the probability for each of the n elements. It is sometimes used as the last layer in a network. The activation function uses the Heaviside step function in the perceptron model. The neurons must be connected in NN. A feedforward arrangement in its simplest form is shown in Fig. (2) and Fig. (3)., which illustrate the shallow and deep architecture of NN.
Fig. (2)) Shallow Architecture of NN. Fig. (3)) Deep Architecture of NN.Generalized deepness of a network in NN is the sum of non-linear revolutions between the layers that are separated, whereas hidden layer width is the number of hidden neurons. Fig. (2) has a single hidden layer, whereas Fig. (3) has a three number of hidden layers. The depths for the shallow and deep architectures of NN are two and four. Debatable, however, topologies with two layers are called “shallow” and those with more than two hidden layers are typically called “deep” in Feedforward Neural Networks (FFNN).
The activation functions of a feedforward neural network (FNN) might be linear or non-linear. The NN lacks any cycles that would permit direct input. How an MLP gets its output from its input.
(3)Equation (3) illustrates the neural network's discriminant function. An optimization method to find the optimal parameters for training data sets with a cost function or an error function is being developed.
Recurrent Neural Networks: The RNN family has 2 subclasses that are able to be identified by their characteristics of signal processing [8]. The first type is composed of Finite Recurrent Networks (FRN), whereas the second type is composed of Infinite Impulse Recurrent Networks (IIRN). However, an FRN comes under a directed acyclic graph (DAG) type that may be unrolled and replaced by a FNN, whereas an IIRN comes under a directed cyclic graph (DCG) that cannot be unrolled.
Hopfield Network: A Hopfield Network is an example of a FRN. It is a network of McCulloch-Pitts neurons that is entirely connected. For a
McCulloch-Pitts neuron, the activation function is as:
(4)The activation neuron of the function is as:
(5)(6)xiis updated synchronously or asynchronously with the xj.wijis updated weight for updating the xi value for sign value.
Boltzmann Machine: It uses a noisy Hopfield network with a probabilistic-based activation function. From Eq. 7, it is shown that probability is updated with an update from Eq. 5. This model is significant as it was one of the first to use hidden units. The contrastive-divergence algorithm is used to train Boltzmann Machines.
(7)Boltzmann Machines are two-layered neural networks with visible and hidden layers.
The edges between the two layers are undirected within the graph, which implies information could flow in both directions. The network is completely connected, which means every neuron is connected to another through undirected edges Fig. (4) shows how to transform the Boltzmann machine into an RBM [9]. RBM is a basic structure used in many applications and for creating different networks. (Table 2) provides the usage of models and their working nature, not the comparison. Each model in the table performs differently for different domains.
Fig. (4)) Conversion of Boltzmann Machine to Restricted Boltzmann machine (RBM).The neural network consists of deep layers of neurons [10]. The neurons must constantly learn to tackle tasks or to apply in different ways to produce better results. It learns every time based on new updated information. A deep neural network uses multiple layers of nodes to extract high-level functions from incoming data [1, 4]. It means changing data into something more creative and abstract. The Deep Forward Neural Networks (DFNN) are explained as below:
A FNN contains a set of neurons and a hidden layer for any continuous function. The reason for adopting an FFNN with multiple hidden layers is that it uses the universal approximation theorem, which does not explain how to learn such a network. A related concern is that the network's diameter can grow exponentially. Unexpectedly, the universal approximation theorem holds for FFNN with a limited number of hidden neurons and numerous hidden layers. So DFFNNs are employed instead of shallow FFNNs for learnability. Approximating an unknown function f* is:
(8)Here, f is a function with a specific family that is reliant on the parameters θ, and ɸ is a non-linear activation function with a single layer. For deep hidden layers, ɸ has the form is as below:
(9)In place of assuming the precise family functions from f, D-FFNNs learn Eq. 9 function by approximating it withɸ, which is approached by the n separate hidden layers.
A CNN [4, 11-13] is a special type of FFNN that uses a combination of convolution layers, ReLU, and pooling layers. These layers are usually combined with several layers of FNN. In traditional ANN, each neuron in a layer is linked to all the neurons in the next layer. Each connection is a parameter in the network, and each connection is how the network works. In CNN, there could be different variables that are not fully connected layers. This significance cuts down on the number of parameters and reduces the operations in the network. All the connections between neurons and local receptive fields use a set of weights, and we call this set of weights a kernel, or core.
Kernel: All the neurons that attach to their local receptive fields will share the same kernel. The neurons' calculations results will be stored in a matrix called the activation map. Weight sharing refers to the fact that CNNs can share their weight. Consequently, different kernels will produce different activation maps, and hyper-parameters can be used to change the number of kernels in the map. The number of weights in a network is proportional to the kernel i.e. to the size of the local receptive field. Fig. (5) shows the typical CNN architecture with 3-channel input. Each channel was connected with a convolution layer, pooling, and then again, convolution, pooling, and merge. The merge layer connects with the fully connected layer (FC) to provide the decision using the softmax function.
Fig. (5)) Typical CNN with 3-Channel input.The softmax equation is given in eq. 10, where it is calculated to provide the classification based on their threshold values.
(10)The different layers in CNN models are explained as follows:
Convolution layer: A convolution layer is a critical component of a convolutional neural network's architecture. A convolutional layer, like a hidden layer in a conventional neural network, seeks to convert the input to a higher level of abstraction. On the other hand, the convolutional layer, rather than relying on total connectivity to perform calculations between the input and hidden neurons, takes advantage of local connectivity. A convolutional layer slides at least one kernel across the input, convoluting each region. The results are stored in activation maps, which are the outputs of the convolutional layer.Pooling layer: It is frequently sandwiched between two layers of convolution. By retaining as much information as possible, pooling layers attempt to minimise the input dimension. Additionally, a pooling layer can impart spatial invariance to the network, hence increasing generality. The zero padding, stride, pooling window size, and hyperparameters of a pooling layer. The pooling layer, like the kernel of a convolutional layer, scans the whole input using the specified pooling window size. By pooling with a stride of 2, a window size of 2, and zero padding, the input dimension is halved. Min-pooling, averaging, and more sophisticated methods such as stochastic pooling and fractional max-pooling are examples of pooling procedures. Max pooling is the most commonly used pooling technique, as it efficiently captures picture invariance. Max-pooling is used to get the extreme value from each sub-window.Fully connected layer: The smallest unit in FFNN is a completely connected layer. Between the penultimate and output layers of a normal CNN, a fully connected layer is frequently added to represent non-linear interactions between input features. However, the numerous criteria given have been questioned recently, posing the possibility of overfitting. It has been used in some CNN architectures instead of linear layers.CNN is a common FFNN model that was designed to recognise visual patterns directly from group or pixel images with minimal preprocessing [11, 14]. An image database, ImageNet, was proposed for object recognition research. An annual software challenge called the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) tests software's ability to detect and classify objects and scenes. Below, we discuss the CNN architectures of ILSVRC's main competitors.
In 1998, LeNet-5 used a 7-level convolutional network developed by LeCun et al. to classify digits. For processing higher resolution images, it requires a large number of convolutional layers; therefore, processing resources are restricted to computing in Fig. (6).
Fig. (6)) LetNet-5 Architecture.AlexNet: In 2012, AlexNet surpassed all previous opponents, by cutting the topmost-5 errors from 26% to 15.3%. The AlexNet network was deeper, featured more filters per layer, and stacked convolutional layers were used than in LeNet5. Convolutions, max-pooling, dropout, data-augmentation, ReLU activations, and SGD with momentum were used in AlexNet in Fig. (7). Every fully connected layer and convolutional layer had a ReLU activation function. For the first 6 days, AlexNet was trained on 2 Nvidia Geforce GTX 580 GPUs. In addition, the SuperVision group designed AlexNet by Geoffrey Hinton and Ilya Sutskever.
Fig. (7)) AlexNet Architecture.The ILSVRC 2014 competition was won by GoogLeNet (Inception V1). The challenge organizers were now forced to evaluate this near-human performance. It turns out that beating Google's accuracy requires some human instruction. Using ensemble mode, a human expert achieved a top-5 error rate of 5.1 percent for a single model and 3.6 percent for multi-models.
The network employed a LeNet-inspired CNN, but it included a new element called an inception component. There was also RMSprop and batch normalization. This module uses numerous minor convolutions to reduce the number of parameters. Their architecture uses a 22-layer deep CNN with 4 million parameters instead of 60 million (AlexNet).
VGGNet was the runner-up at ILSVRC 2014 and was developed by Zisserman and Simonyan. VGGNet uses 16 convolutional layers with a fine and reliable design. Only 3x3 convolutions, but many filters, and the programme ran for 2–3 weeks on four GPUs continuously. It is now the most commonly used method for extracting features from images or photos. The VGGNet weight configuration is open source and is now being utilised in many other applications and challenges. VGGNet has 138 million parameters, which can be difficult to manage in Fig. (8).
Fig. (8)) VGGNet Architecture.Finally, Kaiming et al. proposed and developed a novel architecture with “skip connections” and significant batch-normalization at ILSVRC 2015 called the Residual Neural Network (ResNet).
Fig. (9)) ResNet Architecture.These skip connections, also called gated units or gated recurrent units, are closely related to recent successful RNN elements. They trained a NN with 152 layers that was less complex than the VGGNet. Using this dataset, it achieves a top-5 error rate of 3.57%. GoogleNet has inception components, while ResNet has residual connections.
A DBN is a model that combines different forms of NN [1, 15]. DBN is a hybrid of RBM and Deep Feedforward Neural Networks (D-FFNN). The RBM serve as the input, and the D-FFNN serve as the output. RBMs are commonly stacked, which means they are used sequentially. Because RBM and D-FFNN are independent networks with two different learning techniques, this enriches the DBN. RBMs are commonly used to unsupervised initialise a model. A supervised technique is used to fine-tune the settings. These two stages of DBN training are discussed in greater detail below in Fig. (10).
Fig. (10)) Deep Belief Network.It is an unsupervised NN model for feature selection or dimension reduction [1, 3, 16]. An autoencoder's input and output layers have the same size, symmetrically. An input pattern x is learned to a new encoding, which provides an output pattern identical to the input pattern, i.e., (c). So the encoding c can replicate x. Autoencoders are built similarly to DBN. Interestingly, the original autoencoder only pre-trained the first half of the network with RBM and then unrolled the network, creating the second half. Pre-training is followed by fine-tuning, as in DBN. Fig. (11) shows the typical denoising autoencoder. Autoencoders are unsupervised learning models because they don't need labels.
Fig. (11)) Denoising Autoencoder.The model has been used to reduce dimensionality successfully. When given enough data, autoencoders can produce a superior 2-dimensional representation of array data. PCAs use linear transformations, whereas autoencoders use non-linear transformations. This usually results in improved performance. Some of these models are called sparse autoencoders, denoising autoencoders, or variational autoencoders, and there are many different types of them.
In 1997, Hochreiter and Schmidhuber proposed LSTM networks. LSTM is an RNN variant that can alleviate RNN faults, such as long-term dependencies [3, 8]. LSTM also prevents gradients from disappearing or bursting. In 1999, an LSTM with a forget gate was introduced. As a result, unlike DFNs, LSTM with feedback links became the standard LSTM network structure. They can also process data sequences as opposed to single data pieces. As a result, LSTMs are excellent for evaluating voice or video data. Fig. (12) shows a typical LSTM with a forget gate, input gate, and output gate all connected by a single flip-flop.
Fig. (12)) Long Short Term Memory.In this paper, we gave an overview of the history of neural networks and deep learning models. The basic models of artificial neural networks like the shallow FNN and deep FNN are discussed. The brief details of neural network models like RNN with Hopfield, Boltzmann machine, and RBM are studied. The architecture of DFNN, with a special example of CNN is discussed. The application of different models like Lenet-5, Alexnet, VGG group, GoogleNet/Inception, ResNet has been discussed. In this study, we gave an overview of deep learning models such as Deep Feedforward Neural Networks, Convolutional Neural Networks, Deep Belief Networks, Autoencoders, and Long Short-Term Memory Networks in this study. These models could be the main architecture of deep learning today. As a result, a fundamental understanding of these concepts is essential for being prepared for future AI breakthroughs.
Not applicable.
The authors declare no conflict of interest, financial or otherwise.
Declared none.