Artificial Intelligence Applications and Reconfigurable Architectures -  - E-Book

Artificial Intelligence Applications and Reconfigurable Architectures E-Book

0,0
173,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

ARTIFICIAL INTELLIGENCE APPLICATIONS and RECONFIGURABLE ARCHITECTURES

The primary goal of this book is to present the design, implementation, and performance issues of AI applications and the suitability of the FPGA platform.

This book covers the features of modern Field Programmable Gate Arrays (FPGA) devices, design techniques, and successful implementations pertaining to AI applications. It describes various hardware options available for AI applications, key advantages of FPGAs, and contemporary FPGA ICs with software support. The focus is on exploiting parallelism offered by FPGA to meet heavy computation requirements of AI as complete hardware implementation or customized hardware accelerators. This is a comprehensive textbook on the subject covering a broad array of topics like technological platforms for the implementation of AI, capabilities of FPGA, suppliers’ software tools and hardware boards, and discussion of implementations done by researchers to encourage the AI community to use and experiment with FPGA.

Readers will benefit from reading this book because

  • It serves all levels of students and researcher’s as it deals with the basics and minute details of Ecosystem Development Requirements for Intelligent applications with reconfigurable architectures whereas current competitors’ books are more suitable for understanding only reconfigurable architectures.
  • It focuses on all aspects of machine learning accelerators for the design and development of intelligent applications and not on a single perspective such as only on reconfigurable architectures for IoT applications.
  • It is the best solution for researchers to understand how to design and develop various AI, deep learning, and machine learning applications on the FPGA platform.
  • It is the best solution for all types of learners to get complete knowledge of why reconfigurable architectures are important for implementing AI-ML applications with heavy computations.

Audience

Researchers, industrial experts, scientists, and postgraduate students who are working in the fields of computer engineering, electronics, and electrical engineering, especially those specializing in VLSI and embedded systems, FPGA, artificial intelligence, Internet of Things, and related multidisciplinary projects.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 306

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.


Ähnliche


Table of Contents

Cover

Series Page

Title Page

Copyright Page

Preface

1 Strategic Infrastructural Developments to Reinforce Reconfigurable Computing for Indigenous AI Applications

1.1 Introduction

1.2 Infrastructural Requirements for AI

1.3 Categories in AI Hardware

1.4 Hardware AI Accelerators to Support RC

1.5 Architecture and Accelerator for AI-Based Applications

1.6 Conclusion

References

2 Review of Artificial Intelligence Applications and Architectures

2.1 Introduction

2.2 Technological Platforms for AI Implementation— Graphics Processing Unit

2.3 Technological Platforms for AI Implementation— Field Programmable Gate Array (FPGA)

2.4 Design Implementation Aspects

2.5 Conclusion

References

3 An Organized Literature Review on Various Cubic Root Algorithmic Practices for Developing Efficient VLSI Computing System—Understanding Complexity

3.1 Introduction

3.2 Motivation

3.3 Numerous Cubic Root Methods for Emergent VLSI Computing System—Extraction

3.4 Performance Study and Discussion

3.5 Further Research

3.6 Conclusion

References

4 An Overview of the Hierarchical Temporal Memory Accelerators

4.1 Introduction

4.2 An Overview of Hierarchical Temporal Memory

4.3 HTM on Edge

4.4 Digital Accelerators

4.5 Analog and Mixed-Signal Accelerators

4.6 Discussion

4.7 Open Problems

4.8 Conclusion

References

5 NLP-Based AI-Powered Sanskrit Voice Bot

5.1 Introduction

5.2 Literature Survey

5.3 Pipeline

5.4 Methodology

5.5 Results

5.6 Further Discussion on Classification Algorithms

5.7 Conclusion

Acknowledgment

References

6 Automated Attendance Using Face Recognition

6.1 Introduction

6.2 All Modules Details

6.3 Algorithm

6.4 Proposed Architecture of System

6.5 Conclusion

References

7 A Smart System for Obstacle Detection to Assist Visually Impaired in Navigating Autonomously Using Machine Learning Approach

7.1 Introduction

7.2 Related Research

7.3 Evaluation of Related Research

7.4 Proposed Smart System for Obstacle Detection to Assist Visually Impaired in Navigating Autonomously Using Machine Learning Approach

7.5 Conclusion and Future Scope

References

8 Crop Disease Detection Accelerated by GPU

8.1 Introduction

8.2 Literature Review

8.3 Algorithmic Study

8.4 Proposed System

8.5 Dataset

8.6 Existing Techniques

8.7 Conclusion

References

9 A Relative Study on Object and Lane Detection

9.1 Introduction

9.2 Algorithmic Survey

9.3 YOLO v/s Other Algorithms

9.4 YOLO and Its Version History

9.5 A Survey in Lane Detection Approaches

9.6 Conclusion

References

10 FPGA-Based Automatic Speech Emotion Recognition Using Deep Learning Algorithm

10.1 Introduction

10.2 Related Work

10.3 FPGA Implementation of Proposed SER

10.4 Implementation and Results

10.5 Conclusion and Future Scope

References

11 Hardware Implementation of RNN Using FPGA

11.1 Introduction

11.2 Proposed Design

11.3 Methodology

11.4 PYNQ Architecture and Functions

11.5 Result and Discussion

11.6 Conclusion

References

Index

End User License Agreement

List of Tables

Chapter 1

Table 1.1 Comparison of hardware platforms for AI applications.

Chapter 3

Table 3.1 Tabular summary on related works.

Table 3.2 Performance comparison of various state-of-the-art design.

Table 3.3 Design and verification status for the collected manuscripts.

Chapter 4

Table 4.1 A comparison of the state-of-the-art HTM digital accelerators. One m...

Table 4.2 A comparison of the state-of-the-art analog and mixed-signal HTM acc...

Chapter 5

Table 5.1 Equations used for regression analysis.

Table 5.2 Linear reg hyperparameters.

Table 5.3 Evaluation result.

Table 5.4 Logistic regression.

Table 5.5 Classification evaluation result.

Table 5.6 PCA hyperparameters.

Table 5.7 Threshold parameters.

Chapter 8

Table 8.1 Summary of literature review.

Chapter 9

Table 9.1 Performance overview of the proposed lane detecting methods [20].

Table 9.2 Summary of data collection [20].

Chapter 10

Table 10.1 Specifications of the PYNQ board.

Table 10.2 Hardware cost of proposed SER.

List of Illustrations

Chapter 1

Figure 1.1 A typical reconfigurable computing model.

Figure 1.2 A reconfigurable functional unit.

Figure 1.3 Reconfigurable unit in side CPU or GPU as functional unit.

Figure 1.4 Reconfigurable functional unit as a coprocessor.

Figure 1.5 Reconfigurable functional unit as a coprocessor.

Figure 1.6 Reconfigurable functional unit as an attached processing unit.

Figure 1.7 Reconfigurable functional unit as an attached processing unit.

Figure 1.8 As a standalone processing unit for highly connected NN.

Figure 1.9 A processing unit for highly connected NN.

Figure 1.10 Typical Design of a hardware accelerator and a programming element...

Figure 1.11 A neural processing unit.

Figure 1.12 Fully connected reconfigurable NN.

Figure 1.13 FPGA-based accelerator for data mining.

Chapter 2

Figure 2.1 Streaming multiprocessor (SM) [20].

Figure 2.2 DPUCZDX8G top-level block diagram [15].

Figure 2.3 DPUCZDX8G hardware architecture [15].

Figure 2.4 AI tensor block architecture [22].

Figure 2.5 The MAC core architecture on Intel Stratix 10 NX FPGA [23].

Chapter 3

Figure 3.1 Curve of dictum.

Chapter 4

Figure 4.1 The state-of-the-art digital, analog, and mixed-signal HTM accelera...

Figure 4.2 The biological neocortex structure including the regions [20] and t...

Figure 4.3 The high-level block diagram of the HTM accelerators proposed by: (...

Figure 4.4 (Left) Pyragrid high-level architecture with multiple HTM region sl...

Figure 4.5 Estimated data movement in the previously proposed HTM digital acce...

Figure 4.6 HTM network scaling topologies used by the state-of-the-art acceler...

Figure 4.7 Normalized latency of the conventional computing platforms (CPU and...

Figure 4.8 Normalized power consumption of the GPU [30], CPU, state-of-the-art...

Chapter 5

Flowchart 5.1 Generic.

Flowchart 5.2 Step related to web scrapping.

Flowchart 5.3 Steps for reading text from an image.

Flowchart 5.4 Steps for connecting MySQL.

Flowchart 5.5 Steps for regression.

Flowchart 5.6 Steps for SVM.

Flowchart 5.7 Spam classifier.

Flowchart 5.8 PCA.

Flowchart 5.9 Steps for anomaly detection.

Flowchart 5.10 NLP pipeline.

Figure 5.1 Web scrapping.

Figure 5.2 MySQL connectivity.

Figure 5.3 MySQL window.

Figure 5.4 Image input

.

Figure 5.5 Read text.

Figure 5.6 Vocabulary.

Figure 5.7 Tokenized vocabulary.

Figure 5.8 Tokenized text.

Figure 5.9 Contour.

Figure 5.10 Linear regression line.

Figure 5.11 Reg prediction.

Figure 5.12 Errors (high light errors).

Figure 5.13 Learning curve.

Figure 5.14 Classified data using logistic regression.

Figure 5.15 Classifier predicted output.

Figure 5.16 Linear Kernel SVM.

Figure 5.17 Gaussian Kernel SVM.

Figure 5.18 Visualization of Eigen vector and values.

Figure 5.19 Visualization of principal components.

Figure 5.20 Anomaly data detected.

Figure 5.21 Detected text in English.

Figure 5.22 Detected text in Sanskrit.

Figure 5.23 Text to speech.

Figure 5.24 Dataset.

Figure 5.25 Visualization of dataset.

Figure 5.26 Model results on sepal parameters.

Figure 5.27 Model results on petal parameters.

Figure 5.28 Classification line.

Figure 5.29 Confusion matrix - gradient descent.

Figure 5.30 Confusion matrix - Naive Bayes.

Chapter 6

Figure 6.1 Results of Haar cascade classifier.

Figure 6.2 Contrast adjustment.

Figure 6.3 Comparison of different filters.

Figure 6.4 Final results of image enhancement.

Figure 6.5 Proposed algorithm.

Figure 6.6 Proposed architecture.

Chapter 7

Figure 7.1 Architecture of proposed system.

Figure 7.2 Steps for object detection in histogram.

Figure 7.3 Computing gradient using HOG.

Figure 7.4 Architecture of SSD [11].

Chapter 8

Figure 8.1 Image acquisition.

Figure 8.2 Image preprocessing.

Figure 8.3 Image segmentation.

Figure 8.4 Feature extraction.

Figure 8.5 Farm aid.

Figure 8.6 Disease detecting robot.

Figure 8.7 Proposed system.

Chapter 9

Figure 9.1 Architecture diagram of you only look once (YOLO) convolutional neu...

Figure 9.2 The graph displays the speed/accuracy trade off on the mAP at 0.5 I...

Figure 9.3 Real time systems on PASCAL VOC 2007. Notice that YOLO v1 when comp...

Figure 9.4 YOLO v1 object detection (Source: You only look once: Unified, real...

Figure 9.5 Test detection results on PASCAL VOC2012:YOLOv2 acts equivalent to ...

Figure 9.6 YOLO version 3 conceptual design (Source: YOLO v3 plotting against ...

Figure 9.7 MS COCO object detection graph [8].

Figure 9.8 Source: YOLO v5 vs its other versions.

Figure 9.9 Flow diagram of algorithm for lane detection [3].

Chapter 10

Figure 10.1 Typical SER system.

Figure 10.2 Emotional space of arousal and valence.

Figure 10.3 Categories of speech features.

Figure 10.4 Flow diagram of the proposed scheme.

Figure 10.5 View of the PYNQ board.

Figure 10.6 Performance of proposed SER system.

Chapter 11

Figure 11.1 Flowchart of neural network.

Figure 11.2 OR gate truth table.

Figure 11.3 Flowchart of prediction model.

Figure 11.4 Flow of RNN model.

Figure 11.5 Architecture of RNN model.

Figure 11.6 Architecture with example.

Figure 11.7 One hot encoding.

Figure 11.8 PYNQ Z2 board.

Figure 11.9 Accuracy graph.

Guide

Cover Page

Series Page

Title Page

Copyright Page

Preface

Table of Contents

Begin Reading

Index

WILEY END USER LICENSE AGREEMENT

Pages

ii

iii

iv

xiii

xiv

xv

xvi

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

137

138

139

140

141

142

143

144

145

146

147

148

149

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

Scrivener Publishing100 Cummings Center, Suite 541JBeverly, MA 01915-6106

Publishers at ScrivenerMartin Scrivener ([email protected])Phillip Carmical ([email protected])

Artificial Intelligence Applications and Reconfigurable Architectures

Edited by

Anuradha D. Thakare

Department of Computer Engineering, Pimpri Chinchwad College of Engineering, Pune, India

and

Sheetal Umesh Bhandari

Department of Electronics and Telecommunication Engineering, Pimpri Chinchwad College of Engineering, Pune, India

This edition first published 2023 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA© 2023 Scrivener Publishing LLCFor more information about Scrivener publications please visit www.scrivenerpublishing.com.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

Wiley Global Headquarters111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchant-ability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read.

Library of Congress Cataloging-in-Publication Data

ISBN 978-1-119-85729-7

Cover image: Pixabay.ComCover design by Russell Richardson

Preface

Artificial intelligence (AI) algorithms are gaining importance as the backbone of different fields like computer vision, robotics, finance, biotechnology, etc., which will radically change human life. However, the computational complexity involved in AI algorithms continues to impose a challenge to state-of-the-art computing systems, particularly when the application demands low power, high throughput and low latency. At the same time, the use of field programmable gate arrays (FPGAs) for compute-intensive applications is increasingly prevalent due to the parallelism provided by thousands of configurable logic blocks (CLBs), on-chip processor core, and other resources accessible for digital designing.

This book provides detailed insights into FPGA devices and their suitability for AI applications. In addition to covering the features of modern FPGA devices, design techniques and successful implementations pertaining to AI applications, this book also describes various hardware options available for AI applications, key advantages of FPGAs, and contemporary FPGA ICs with software support. It focuses on exploiting the parallelism offered by FPGA to meet heavy computation requirements of AI as complete hardware implementation or customized hardware accelerators. It is a comprehensive textbook on the subject, covering a broad array of topics like technological platforms for implementation of AI, capabilities of FPGA, suppliers’ software tools and hardware boards and discusses the implementations done by researchers to encourage the AI community to use and experiment with FPGA.

The primary goal of this book is to present the design, implementation and performance issues of AI applications and the suitability of FPGA platform. Researchers will gain clear insights into the challenges and issues faced in designing AI applications in addition to research directions for the design and development of FPGA-based systems. With the advent of technology, the reader will be able to provide high-performance low-energy-consumption solutions with the variety of AI applications covered in this book.

Because of the hybrid nature of the application and implementation discussed, this book makes a few assumptions about the background of the reader and introduces relevant concepts as the need arises, with the main focus on reconfigurable architecture of AI applications. This book is intended for readers across the globe. It can be used for courses like Reconfigurable Architectures for Machine Learning, FPGA for AI-ML Applications, and Hardware Accelerators for DL taught at the undergraduate and postgraduate levels. Also, the book will be useful for researchers working in the AI and FPGA domain, and a large professional audience as well, such as engineers, scientists, those involved in industrial research and development, and academicians. The book is organized into 11 chapters, which are briefly described below.

Chapter 1

presents the strategic infrastructural developments to support indigenous AI applications. It describes the ecosystem required for AI applications, particularly the AI hardware used to accelerate the performance of applications. It shows how accelerators can significantly decrease the amount of time it takes to train and execute an AI model that can be used to implement special AI-based tasks that cannot be steered on a CPU. The chapter also talks about vendor and research laboratories supporting AI infrastructure.

Chapter 2

reviews the latest implementation technologies of AI applications. In this investigation, implementation platforms like GPU and FPGA are examined by the author. The chapter concludes with the comparative benefits of FPGA structures over GPU and suggests a few FPGAs suitable for AI implementation.

Chapter 3

presents the state-of-the-art revision work carried out in developing high-performance VLSI computing system. This work will help in understanding the computational complexity level with respect to simulation, synthesis, implementation, timing analysis and physical design layout for developing algorithms consisting of different operations like addition, multiplication, division, squaring, cubing, square root, cube root, etc.

Chapter 4

provides a comprehensive survey of hierarchical temporal memory (HTM)-based neuromorphic computing systems. This study covers features offered by HTM like system performance when processing spatial and temporal information, power dissipation, and network latency. Furthermore, challenges associated with enabling realtime processing, on-chip learning, system scalability, and reliability are addressed. This study serves as a foundation for selecting proper HTM network architecture and technological solutions for devices with predefined computational capacity, power budget, and footprint area.

Chapter 5

discusses an AI-powered Sanskrit voice bot. The complexity of the algorithm demands a hardware accelerator to further improve the performance of the bot as suggested by the authors.

Chapter 6

presents a face recognition model developed for an attendance system developed by OpenCV Python supported by Xilinx ML Suite for FPGA implementation.

Chapter 7

presents a smart system for obstacle detection to assist the visually impaired in autonomously navigating using a machine learning approach. Machine learning algorithms work on the objects captured through the cameras. The audio outputs are used to teach blind people how to determine the location of items. Further various obstacle detection approaches are discussed that can create and develop an autonomous navigation system to assist the visually impaired.

Chapter 8

presents a crop disease detection system accelerated by GPU. A major problem facing farmers is plants getting affected by diseases. To prevent yield losses, it’s necessary to detect disease in the crop. Manually monitoring crop diseases becomes very time-consuming and difficult especially if the farm is large because of the greater workload for the farmer, and therefore cannot always be done accurately. If the disease is nonnative, many times farmers are not aware of it. Hence, this work focuses on crop disease detection with the help of image processing techniques and machine learning algorithms like SVM, ANN, and SAS classifiers.

Chapter 9

presents a comparative study of object detection and lane detection algorithms. This work provides a survey on lane detection approaches based on performance analysis of existing lane detection approaches like CNN, Hough transform, Gaussian filter and Canny edge detection. The authors propose approaches on different datasets such as curved roads, big datasets, rainy days, yellow-white strips, day and night lights. A detailed direct comparison of the You Only Look Once (YOLO) algorithm with object detection using color masking is presented.

Chapter 10

presents a case study on deep learning-based speech emotion recognition using Python productivity for Zynq (PYNQ) open-source framework that is further implemented on PYNQ-Z1 FPGA board.

Chapter 11

discusses the hardware implementation of recursive neural network. It is done on PYNQZ2, which is a ZYNQ XC7Z020 FPGA-based FPGA development board. The authors have concluded that the implemented network is faster than other mobile platforms and will likely evolve into an RNN coprocessor for future devices.

In closing, we, the editors, wish to acknowledge the valuable contributions of the reviewers in improving the quality, coherence, and content of the chapters presented. We would also like to acknowledge the help of all those involved in this book directly or indirectly and, more specifically, the publishing team. Without their support, this book would not have become a reality.

As always, the greatest debt one owes is to one’s colleagues, friends and family. Therefore, we thank our friends who have been a constant source of encouragement throughout this project and shared their technical expertise and offered other kinds of support. Finally, we must thank our family members as they are responsible for this book in even more ways than they know. This book is dedicated to them.

We hope that this book will become part of an ever-evolving knowledge repository. As such, there may be areas that need improvement and inadvertent errors that need correcting. Therefore, we sincerely request that the readers feel free to email their suggestions and feedback on the book to us. We will surely try to incorporate the relevant suggestions in the next edition.

Dr. Anuradha D. Thakare

Department of Computer Engineering, Pimpri Chinchwad College of Engineering, Pune, India

Dr. Sheetal Umesh Bhandari

Department of Electronics & Telecommunication Engineering, Pimpri Chinchwad College of Engineering, Pune, India

1Strategic Infrastructural Developments to Reinforce Reconfigurable Computing for Indigenous AI Applications

Deepti Khurge

Pimpri Chinchwad College of Engineering, Pune, India

Abstract

Artificial intelligence (AI) methodologies have the potential to reform many aspects of human life. The capabilities of AI are continuously evolving so as its enterprise adoption. Globally governments and industries are actively conceiving where and how to leverage AI. Machine learning (ML) and AI are evolving at a faster rate than silicon can be developed. To take advantage of AI to its potential, the appropriate AI infrastructure must be strategically planned. AI solutions will require appropriate hardware, software, and scalable processing models. The ecosystem of AI business applications, hence, can be seen as a whole.

The need for enterprises to comprehend the correct technology and infrastructure required to implement AI-powered solutions is growing by the day. Significant AI infrastructures are AI networking infrastructure, workloads, data preparation, data management and governance training, and Internet of Things (IoT). If the potential in the labor force, academic institutions, and governance standing is identified and leveraged effectively, commercial strategies can lead to an AI breakthrough.

Keywords: Artificial intelligence, reconfigurable computing, GPU, FPGA, ASIC, hardware accelerator

1.1 Introduction

Recently, reconfigurable computing has made significant advancements in the acceleration of AI applications. Reconfigurable computing is a computing architecture that focuses on the high-performance flexibility of hardware and software components. After production, they are reprogrammed to specific applications based on their functionality requirements. It is a significant research field in computer architectures and software systems. By putting the computationally intensive parts of an algorithm onto reconfigurable hardware, many algorithms may be considerably accelerated. Artificial intelligence algorithms and application that has traditionally suffered from lack of a clear methodology to implement. Researchers have used reconfigurable computing as one means of accelerating computationally intense and parallel algorithms. There is a need to explore the recent improvements in the tools and methodologies used in reconfigurable computing which strengthen its applicability towards accelerating AI methodologies [1].

Contemporary AI applications, such as finance, healthcare, military, etc., are designed on the grounds of complex artificial neural networks (ANN), having complex computation including huge data, constraints and recurring layer to layer communication [12]. With AI technology growing cutting-edge significantly, AI algorithms are still developing, and one ANN algorithm can only acclimatize to one application. Hence, an ideal AI hardware must be able to adapt to changing and developing algorithm, support diverse ANN based on necessities, and switch between ANN flexibly. Microchips built on reconfigurable computing may be able to resourcefully support user specific computational pattern, computing architecture, and memory hierarchy by allowing runtime configuration in said areas by efficiently supporting diverse NNs with high output computations and communications [9, 12].

1.2 Infrastructural Requirements for AI

As AI progresses from experimentation to adoption, it will necessitate a huge investment in computer resources and infrastructure. Due to technological advancements, complex and resource-intensive, the system costs will rise. As AI’s necessity for large volumes of data increase, so data has to be on cloud so, predominantly hybrid cloud solutions will be required, to create concrete infrastructural foundation. These solutions will ensure that the needs of businesses and workloads will be sufficed and provide support to the increasing demands required to sustain AI, and ensure to be at the appropriate cost. Organizations require adequate performance computing resources, which including CPUs and GPUs, to effectively exploit the opportunities posed by AI. Basic AI operations can be handled in a CPU-based environment, but deep learning requires many big data sets and the use of scalable machine learning algorithms. CPU-based processing may not be adequate for this. Especially compared to regular CPUs, GPUs can expedite AI and ML operations with great amounts. As per computing capacity and density demand for high-performance networks and storage will also expand. The following criteria are specially given attention to setup an ecosystem for AL-based infrastructural development [4, 16].

a. Storage capacity or volume

As the volume of data grows, it is important for any infrastructure to scale storage. Many parameters influence how much storage an application uses, including how much AI it will use and if it will need to make real-time predictions. For example, a healthcare application that employs AI algorithms to make real-time decisions on disease prediction may require all-flash storage, VLSI applications may need faster but much larger storage will suffice. system design must account for the volume of data generated by AI applications. When AI applications are exposed to more data, they make better predictions [4, 6, 7].

b. Networking infrastructure

AI-based systems and algorithm implemented on devices or on cloud are required to deal with huge data. Many of infrastructure with large computer networks are responsible for real time data transmission. AI efforts to satisfy these demands nut networking infrastructure will keep on rising high. Such system needs high bandwidth and very low latency.

c. Security

Application such as military, health care needs AI to manage sensitive data. Such data may be a patient records, financial information, and personal data, defence related data. Such data that get hampered will be dangerous for any organization. Having data attacks or data breach can lead to pronounced consequences in organizations. Comprehensive security strategy should be adopted such AI infrastructure.

d. Cost-effective solutions

As AI systems become more complicated, they become more expensive to run, thus maximizing the performance of infrastructure. In such conditions it is critical to keeping costs these system under control. Expecting continued growth in the number of firms employing AI in the next years, putting more strain on network, server, and storage infrastructures to support this technology cost effective solutions are desired

e. High computing capacity

Organizations require sufficient performance computing resources, such as CPUs and GPUs, to properly utilize the opportunities given by AI. Basic AI workloads can be handled in a CPU-based environment, but deep learning requires many big data sets and the use of scalable neural network techniques. CPU-based computation may not be sufficient for this. Demand for high-performance networks and storage will increase, as will computing capacity and density [6, 7].

Hence, while delivering the high performance eco system for AI-based systems the organizations should adopt the strategic developments methods to foster the needs of the infrastructure [3]. Gradually starting from robust security areas, the large storage backups, high performing computational models and cost effective solutions to go hand in hand to develop state of art technological solutions.

1.3 Categories in AI Hardware

Next important developmental phase in adopting AI solutions is strong hardware support. The hardware should be technologically accommodative to existing infrastructure as well as capable of establishing heuristic methodologies in terms of adaption [5, 6].

The hardware used for AI today mainly consists of one or more of the following:

CPU — Central Processing Units

GPU — Graphics Processing Units

FPGA — Field Programmable Gate Arrays

ASIC — Application Specific Integrated Circuits

a. CPU

The CPU is the standard processor used in many devices. Compared to FPGAs and GPUs, the architecture of CPUs has a limited number of cores optimized for sequential serial processing. Arm® processors can be an exception to this because of their robust implementation of Single Instruction Multiple Data (SIMD) architecture, which allows for simultaneous operation on multiple data points, but their performance is still not comparable to GPUs or FPGAs.

The limited number of cores diminishes the effectiveness of a CPU processor to process the large amounts of data in parallel needed to properly run an AI algorithm. The architecture of FPGAs and GPUs is designed with the intensive parallel processing capabilities required for handling multiple tasks quickly and simultaneously. FPGA and GPU processors can execute an AI algorithm much more quickly than a CPU. This means that an AI application or neural network will learn and react several times faster on a FPGA or GPU compared to a CPU.

CPUs do offer some initial pricing advantages. When training small neural networks with a limited dataset, a CPU can be used, but the trade-off will be time. The CPU-based system will run much more slowly than an FPGA or GPU-based system. Another benefit of the CPU-based application will be power consumption. Compared to a GPU configuration, the CPU will deliver better energy efficiency.

b. GPUs

Graphic processing units (GPUs) were originally developed for use in generating computer graphics, virtual reality training environments and video that rely on advanced computations and floating-point capabilities for drawing geometric objects, lighting and color depth. In order for artificial intelligence to be successful, it needs a lot of data to analyze and learn from. This requires substantial computing power to execute the AI algorithms and shift large amounts of data. GPUs can perform these operations because they are specifically designed to quickly process large amounts of data used in rendering video and graphics. Their strong computational abilities have helped to make them popular in machine learning and artificial intelligence applications. GPUs are good for parallel processing, which is the computation of very large numbers of arithmetic operations in parallel [4]. This delivers respectable acceleration in applications with repetitive workloads that are performed repeatedly in rapid succession. Pricing on GPUs can come in under competitive solutions, with the average graphics card having a 5-year lifecycle [2, 4].

AI on GPUs does have its limitations. GPUs do not generally deliver as much performance as ASIC designs where the microchip is specifically designed for an AI application. GPUs deliver a lot of computational power at the expense of energy efficiency and heat. Heat can create durability issues for the application, impair performance and limit types of operational environments [2]. The ability to update AI algorithms and add new capabilities is also not comparable to FPGA processors.

c. FPGAs

FPGAs are types of integrated circuits with programmable hardware fabric. This differs from GPUs and CPUs in that the function circuitry inside an FPGA processor is not hard etched. This enables an FPGA processor to be programmed and updated as needed. This also gives designers the ability to build a neural network from scratch and structure the FPGA to best meet their needs.

The reprogrammable, reconfigurable architecture of FPGAs delivers key benefits to the ever-changing AI landscape, allowing designers to quickly test new and updated algorithms quickly. This delivers strong competitive advantages in speeding time to market and cost savings by not requiring the development and release of new hardware [7, 15].

FPGAs deliver a combination of speed, programmability and flexibility that translates into performance efficiencies by reducing the cost and complexities inherent in the development of application-specific integrated circuits (ASICs) [8].

Key advantages FPGAs deliver include:

Excellent performance with reduced latency advantages

: FPGAs provide low latency as well as deterministic latency (DL). DL as a model will continuously produce the same output from an initial state or given starting condition. The DL provides a known response time which is critical for many applications with hard deadlines. This enables faster execution of real-time applications like speech recognition, video streaming and motion recognition [

8

,

15

].

Cost effectiveness

: FPGAs can be reprogrammed after manufacturing for different data types and capabilities, delivering real value over having to replace the application with new hardware [

8

]. By integrating additional capabilities — like an image processing pipeline — onto the same chip, designers can reduce costs and save board space by using the FPGA for more than just AI. The long product lifecycle of FPGAs can deliver increased utility for an application that can be measured in years or even decades. This characteristic makes them ideal for use in industrial, aerospace, defence, medical and transportation markets.

Energy efficiency

: FPGAs give designers the ability to fine-tune the hardware to the match application needs. The conventional processors, such as CPUs, utilize a large amount of energy and cannot be customized to suit any one targeted application. GPUs are programmable but need higher amount of energy. FPGAs offer a midway solution with high programmability and energy efficiency with acceptable the throughput for the application

Majorly the expectation from the hardware which will implement AI-based solution needlessly should have following properties.

Execution of huge number of calculations in simultaneously rather than sequentially. Performing calculations with low-precision numbers so these AI algorithms are effectively implemented by requiring a smaller number of transistors to accomplish the task.

Accommodating complete algorithm in a single AI chip to address speed of memory access. Using good Hardware description languages to efficiently convert AI computer code into executable files on an AI chip [

2

,

14

].

Geometric flexibility initially to have handy hardware for a variety of jobs.

Considering above constraints, it is evident that FPGAs can host multiple functions in parallel and can even assign parts of the chip for specific functions which greatly enhances operational and energy efficiency. The unique architecture of FPGAs places small amounts of distributed memory into the fabric, bringing it closer to the processing. This reduces latency and, more importantly, can reduce power consumption compared to a GPU design. AI chips normally enhance speed and efficiency by adding a large number of reduced size transistors, which are faster and energy efficient. But considering AI systems with complex algorithms, these features prove insufficient to perform identical, predictable, and independent calculations.

d. ASICs

ASICs can be used for both training, which is initial construction and refinement of algorithm, and inference, which if applying algorithm to real world. GPUs are best suited for training and FPGAs for inference. ASIC can provide a generous solution combining properties of GPU [4] and FPGA. ASICs majorly can be customized as follows

Vision processing units (VPUs), image and vision processors, and coprocessors;

Tensor processing units (TPUs), such as the first TPU developed by Google for its machine learning framework, TensorFlow’

Neural compute units (NCUs), including those from ARM.

1.3.1 Comparing Hardware for Artificial Intelligence