99,99 €
ARTIFICIAL INTELLIGENCE HARDWARE DESIGN Learn foundational and advanced topics in Neural Processing Unit design with real-world examples from leading voices in the field In Artificial Intelligence Hardware Design: Challenges and Solutions, distinguished researchers and authors Drs. Albert Chun Chen Liu and Oscar Ming Kin Law deliver a rigorous and practical treatment of the design applications of specific circuits and systems for accelerating neural network processing. Beginning with a discussion and explanation of neural networks and their developmental history, the book goes on to describe parallel architectures, streaming graphs for massive parallel computation, and convolution optimization. The authors offer readers an illustration of in-memory computation through Georgia Tech's Neurocube and Stanford's Tetris accelerator using the Hybrid Memory Cube, as well as near-memory architecture through the embedded eDRAM of the Institute of Computing Technology, the Chinese Academy of Science, and other institutions. Readers will also find a discussion of 3D neural processing techniques to support multiple layer neural networks, as well as information like: * A thorough introduction to neural networks and neural network development history, as well as Convolutional Neural Network (CNN) models * Explorations of various parallel architectures, including the Intel CPU, Nvidia GPU, Google TPU, and Microsoft NPU, emphasizing hardware and software integration for performance improvement * Discussions of streaming graph for massive parallel computation with the Blaize GSP and Graphcore IPU * An examination of how to optimize convolution with UCLA Deep Convolutional Neural Network accelerator filter decomposition Perfect for hardware and software engineers and firmware developers, Artificial Intelligence Hardware Design is an indispensable resource for anyone working with Neural Processing Units in either a hardware or software capacity.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 196
Veröffentlichungsjahr: 2021
Cover
Series Page
Title Page
Copyright Page
Author Biographies
Preface
Acknowledgments
Table of Figures
1 Introduction
1.1 Development History
1.2 Neural Network Models
1.3 Neural Network Classification
1.4 Neural Network Framework
1.5 Neural Network Comparison
Exercise
References
2 Deep Learning
2.1 Neural Network Layer
2.2 Deep Learning Challenges
Exercise
References
3 Parallel Architecture
3.1 Intel Central Processing Unit (CPU)
3.2 NVIDIA Graphics Processing Unit (GPU)
3.3 NVIDIA Deep Learning Accelerator (NVDLA)
3.4 Google Tensor Processing Unit (TPU)
3.5 Microsoft Catapult Fabric Accelerator
Exercise
References
4 Streaming Graph Theory
4.1 Blaize Graph Streaming Processor
4.2 Graphcore Intelligence Processing Unit
Exercise
References
5 Convolution Optimization
5.1 Deep Convolutional Neural Network Accelerator
5.2 Eyeriss Accelerator
Exercise
References
6 In‐Memory Computation
6.1 Neurocube Architecture
6.2 Tetris Accelerator
6.3 NeuroStream Accelerator
Exercise
References
7 Near‐Memory Architecture
7.1 DaDianNao Supercomputer
7.2 Cnvlutin Accelerator
Exercise
References
8 Network Sparsity
8.1 Energy Efficient Inference Engine (EIE)
8.2 Cambricon‐X Accelerator
8.3 SCNN Accelerator
8.4 SeerNet Accelerator
Exercise
References
9 3D Neural Processing
9.1 3D Integrated Circuit Architecture
9.2 Power Distribution Network
9.3 3D Network Bridge
9.4 Power‐Saving Techniques
Exercise
References
Appendix A: Neural Network Topology
Index
End User License Agreement
Chapter 1
Table 1.1 Neural network framework.
Chapter 2
Table 2.1 AlexNet neural network model.
Chapter 3
Table 3.1 Intel Xeon family comparison.
Table 3.2 NVIDIA GPU architecture comparison.
Table 3.3 TPU v1 applications.
Table 3.4 Tensor processing unit comparison.
Chapter 5
Table 5.1 Efficiency loss comparison.
Table 5.2 DNN accelerator performance comparison.
Table 5.3 Eyeriss v2 architectural hierarchy.
Table 5.4 Eyeriss architecture.
Chapter 6
Table 6.1 Neurocube performance comparison.
Chapter 8
Table 8.1 SeerNet system performance comparison.
Chapter 1
Figure 1.1 High‐tech revolution.
Figure 1.2 Neural network development timeline.
Figure 1.3 ImageNet challenge.
Figure 1.4 Neural network model.
Figure 1.5 Regression.
Figure 1.6 Clustering.
Figure 1.7 Neural network top 1 accuracy vs. computational complexity.
Figure 1.8 Neural network top 1 accuracy density vs. model efficiency [14]....
Figure 1.9 Neural network memory utilization and computational complexity [1...
Chapter 2
Figure 2.1 Deep neural network AlexNet architecture [1].
Figure 2.2 Deep neural network AlexNet model parameters.
Figure 2.3 Deep neural network AlexNet feature map evolution [3].
Figure 2.4 Convolution function.
Figure 2.5 Nonlinear activation functions.
Figure 2.6 Pooling functions.
Figure 2.7 Dropout layer.
Figure 2.8 Deep learning hardware issues [1].
Chapter 3
Figure 3.1 Intel Xeon processor ES 2600 family Grantley platform ring archit...
Figure 3.2 Intel Xeon processor scalable family Purley platform mesh archite...
Figure 3.3 Two‐socket configuration.
Figure 3.4 Four‐socket ring configuration.
Figure 3.5 Four‐socket crossbar configuration.
Figure 3.6 Eight‐socket configuration.
Figure 3.7 Sub‐NUMA cluster domains [3].
Figure 3.8 Cache hierarchy comparison.
Figure 3.9 Intel multiple sockets parallel processing.
Figure 3.10 Intel multiple socket training performance comparison [4].
Figure 3.11 Intel AVX‐512 16 bits FMA operations (VPMADDWD + VPADDD).
Figure 3.12 Intel AVX‐512 with VNNI 16 bits FMA operation (VPDPWSSD).
Figure 3.13 Intel low‐precision convolution.
Figure 3.14 Intel Xenon processor training throughput comparison [2].
Figure 3.15 Intel Xenon processor inference throughput comparison [2].
Figure 3.16 NVIDIA turing GPU architecture.
Figure 3.17 NVIDIA GPU shared memory.
Figure 3.18 Tensor core 4 × 4 × 4 matrix operation [9].
Figure 3.19 Turing tensor core performance [7].
Figure 3.20 Matrix D thread group indices.
Figure 3.21 Matrix D 4 × 8 elements computation.
Figure 3.22 Different size matrix multiplication.
Figure 3.23 Simultaneous multithreading (SMT).
Figure 3.24 Multithreading schedule.
Figure 3.25 GPU with HBM2 architecture.
Figure 3.26 Eight GPUs NVLink2 configuration.
Figure 3.27 Four GPUs NVLink2 configuration.
Figure 3.28 Two GPUs NVLink2 configuration.
Figure 3.29 Single GPU NVLink2 configuration.
Figure 3.30 NVDLA core architecture.
Figure 3.31 NVDLA small system model.
Figure 3.32 NVDLA large system model.
Figure 3.33 NVDLA software dataflow.
Figure 3.34 Tensor processing unit architecture.
Figure 3.35 Tensor processing unit floorplan.
Figure 3.36 Multiply–Accumulate (MAC) systolic array.
Figure 3.37 Systolic array matrix multiplication.
Figure 3.38 Cost of different numerical format operation.
Figure 3.39 TPU brain floating‐point format.
Figure 3.40 CPU, GPU, and TPU performance comparison [15].
Figure 3.41 Tensor Processing Unit (TPU) v1.
Figure 3.42 Tensor Processing Unit (TPU) v2.
Figure 3.43 Tensor Processing Unit (TPU) v3.
Figure 3.44 Google TensorFlow subgraph optimization.
Figure 3.45 Microsoft Brainwave configurable cloud architecture.
Figure 3.46 Tour network topology.
Figure 3.47 Microsoft Brainwave design flow.
Figure 3.48 The Catapult fabric shell architecture.
Figure 3.49 The Catapult fabric microarchitecture.
Figure 3.50 Microsoft low‐precision quantization [27].
Figure 3.51 Matrix‐vector multiplier overview.
Figure 3.52 Tile engine architecture.
Figure 3.53 Hierarchical decode and dispatch scheme.
Figure 3.54 Sparse matrix‐vector multiplier architecture.
Figure 3.55 (a) Sparse Matrix; (b) CSR Format; and (c) CISR Format.
Chapter 4
Figure 4.1 Data streaming TCS model.
Figure 4.2 Blaize depth‐first scheduling approach.
Figure 4.3 Blaize graph streaming processor architecture.
Figure 4.4 Blaize GSP thread scheduling.
Figure 4.5 Blaize GSP instruction scheduling.
Figure 4.6 Streaming vs. sequential processing comparison.
Figure 4.7 Blaize GSP convolution operation.
Figure 4.8 Intelligence processing unit architecture [8].
Figure 4.9 Intelligence processing unit mixed‐precision multiplication.
Figure 4.10 Intelligence processing unit single‐precision multiplication.
Figure 4.11 Intelligence processing unit interconnect architecture [9].
Figure 4.12 Intelligence processing unit bulk synchronous parallel model.
Figure 4.13 Intelligence processing unit bulk synchronous parallel execution...
Figure 4.14 Intelligence processing unit bulk synchronous parallel inter‐chi...
Chapter 5
Figure 5.1 Deep convolutional neural network hardware architecture.
Figure 5.2 Convolution computation.
Figure 5.3 Filter decomposition with zero padding.
Figure 5.4 Filter decomposition approach.
Figure 5.5 Data streaming architecture with the data flow.
Figure 5.6 DCNN accelerator COL buffer architecture.
Figure 5.7 Data streaming architecture with 1×1 convolution mode.
Figure 5.8 Max pooling architecture.
Figure 5.9 Convolution engine architecture.
Figure 5.10 Accumulation (ACCU) buffer architecture.
Figure 5.11 Neural network model compression.
Figure 5.12 Eyeriss system architecture.
Figure 5.13 2D convolution to 1D multiplication mapping.
Figure 5.14 2D convolution to 1D multiplication – step #1.
Figure 5.15 2D convolution to 1D multiplication – step #2.
Figure 5.16 2D convolution to 1D multiplication – step #3.
Figure 5.17 2D convolution to 1D multiplication – step #4.
Figure 5.18 Output stationary.
Figure 5.19 Output stationary index looping.
Figure 5.20 Weight stationary.
Figure 5.21 Weight stationary index looping.
Figure 5.22 Input stationary.
Figure 5.23 Input stationary index looping.
Figure 5.24 Eyeriss Row Stationary (RS) dataflow.
Figure 5.25 Filter reuse.
Figure 5.26 Feature map reuse.
Figure 5.27 Partial sum reuse.
Figure 5.28 Eyeriss run‐length compression.
Figure 5.29 Eyeriss processing element architecture.
Figure 5.30 Eyeriss global input network.
Figure 5.31 Eyeriss processing element mapping (AlexNet CONV1).
Figure 5.32 Eyeriss processing element mapping (AlexNet CONV2).
Figure 5.33 Eyeriss processing element mapping (AlexNet CONV3).
Figure 5.34 Eyeriss processing element mapping (AlexNet CONV4/CONV5).
Figure 5.35 Eyeriss processing element operation (AlexNet CONV1).
Figure 5.36 Eyeriss processing element operation (AlexNet CONV2).
Figure 5.37 Eyeriss processing element (AlexNet CONV3).
Figure 5.38 Eyeriss processing element operation (AlexNet CONV4/CONV5).
Figure 5.39 Eyeriss architecture comparison.
Figure 5.40 Eyeriss v2 system architecture.
Figure 5.41 Network‐on‐Chip configurations.
Figure 5.42 Mesh network configuration.
Figure 5.43 Eyeriss v2 hierarchical mesh network examples.
Figure 5.44 Eyeriss v2 input activation hierarchical mesh network.
Figure 5.45 Weights hierarchical mesh network.
Figure 5.46 Eyeriss v2 partial sum hierarchical mesh network.
Figure 5.47 Eyeriss v1 neural network model performance. [6]
Figure 5.48 Eyeriss v2 neural network model performance. [6]
Figure 5.49 Compressed sparse column format.
Figure 5.50 Eyeriss v2 PE architecture.
Figure 5.51 Eyeriss v2 row stationary plus dataflow.
Figure 5.52 Eyeriss architecture AlexNet throughput speedup [6].
Figure 5.53 Eyeriss architecture AlexNet energy efficiency [6].
Figure 5.54 Eyeriss architecture MobileNet throughput speedup [6].
Figure 5.55 Eyeriss architecture MobileNet energy efficiency [6].
Chapter 6
Figure 6.1 Neurocube architecture.
Figure 6.2 Neurocube organization.
Figure 6.3 Neurocube 2D mesh network.
Figure 6.4 Memory‐centric neural computing flow.
Figure 6.5 Programmable neurosequence generator architecture.
Figure 6.6 Neurocube programmable neurosequence generator.
Figure 6.7 Tetris system architecture.
Figure 6.8 Tetris neural network engine.
Figure 6.9 In‐memory accumulation.
Figure 6.10 Global buffer bypass.
Figure 6.11 NN partitioning scheme comparison.
Figure 6.12 Tetris performance and power comparison [7].
Figure 6.13 NeuroStream and NeuroCluster architecture.
Figure 6.14 NeuroStream coprocessor architecture.
Figure 6.15 NeuroStream 4D tiling.
Figure 6.16 NeuroStream roofline plot [8].
Chapter 7
Figure 7.1 DaDianNao system architecture.
Figure 7.2 DaDianNao neural functional unit architecture.
Figure 7.3 DaDianNao pipeline configuration.
Figure 7.4 DaDianNao multi‐node mapping.
Figure 7.5 DaDianNao timing performance (Training) [1].
Figure 7.6 DaDianNao timing performance (Inference) [1].
Figure 7.7 DaDianNao power reduction (Training) [1].
Figure 7.8 DaDianNao power reduction (Inference) [1].
Figure 7.9 DaDianNao basic operation.
Figure 7.10 Cnvlutin basic operation.
Figure 7.11 DaDianNao architecture.
Figure 7.12 Cnvlutin architecture.
Figure 7.13 DaDianNao processing order.
Figure 7.14 Cnvlutin processing order.
Figure 7.15 Cnvlutin zero free neuron array format.
Figure 7.16 Cnvlutin dispatch.
Figure 7.17 Cnvlutin timing comparison [4].
Figure 7.18 Cnvlutin power comparison [4].
Figure 7.19 Cnvlutin
2
ineffectual activation skipping.
Figure 7.20 Cnvlutin
2
ineffectual weight skipping.
Chapter 8
Figure 8.1 EIE leading nonzero detection network.
Figure 8.2 EIE processing element architecture.
Figure 8.3 Deep compression weight sharing and quantization.
Figure 8.4 Matrix W, vector a and b are interleaved over four processing ele...
Figure 8.5 Matrix W layout in compressed sparse column format.
Figure 8.6 EIE timing performance comparison [1].
Figure 8.7 EIE energy efficient comparison [1].
Figure 8.8 Cambricon‐X architecture.
Figure 8.9 Cambricon‐X processing element architecture.
Figure 8.10 Cambricon‐X sparse compression.
Figure 8.11 Cambricon‐X buffer controller architecture.
Figure 8.12 Cambricon‐X index module architecture.
Figure 8.13 Cambricon‐X direct indexing architecture.
Figure 8.14 Cambricon‐X step indexing architecture.
Figure 8.15 Cambricon‐X timing performance comparison [4].
Figure 8.16 Cambricon‐X energy efficiency comparison [4].
Figure 8.17 SCNN convolution.
Figure 8.18 SCNN convolution nested loop.
Figure 8.19 PT‐IS‐CP‐dense dataflow.
Figure 8.20 SCNN architecture.
Figure 8.21 SCNN dataflow.
Figure 8.22 SCNN weight compression.
Figure 8.23 SCNN timing performance comparison [5].
Figure 8.24 SCNN energy efficiency comparison [5].
Figure 8.25 SeerNet architecture.
Figure 8.26 SeerNet Q‐ReLU and Q‐max‐pooling.
Figure 8.27 SeerNet quantization.
Figure 8.28 SeerNet sparsity‐mask encoding.
Chapter 9
Figure 9.1 2.5D interposer architecture.
Figure 9.2 3D stacked architecture.
Figure 9.3 3D‐IC PDN configuration (pyramid shape).
Figure 9.4 PDN – Conventional PDN Manthan geometry.
Figure 9.5 Novel PDN X topology.
Figure 9.6 3D network bridge.
Figure 9.7 Neural network layer multiple nodes connection.
Figure 9.8 3D network switch.
Figure 9.9 3D network bridge segmentation.
Figure 9.10 Multiple‐channel bidirectional high‐speed link.
Figure 9.11 Power switch configuration.
Figure 9.12 3D neural processing power gating approach.
Figure 9.13 3D neural processing clock gating approach.
Cover Page
Series Page
Title Page
Copyright Page
Author Biographies
Preface
Acknowledgments
Table of Figures
Table of Contents
Begin Reading
Appendix A Neural Network Topology
Index
Wiley End User License Agreement
ii
iii
iv
xi
xiii
xiv
xv
xvii
xviii
xix
xx
xxi
xxii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
191
192
193
194
195
196
197
198
199
200
201
203
204
205
206
207
208
209
IEEE Press445 Hoes LanePiscataway, NJ 08854
IEEE Press Editorial BoardEkram Hossain, Editor in Chief
Jón Atli Benediktsson
Xiaoou Li
Jeffrey Reed
Anjan Bose
Lian Yong
Diomidis Spinellis
David Alan Grier
Andreas Molisch
Saeid Nahavandi
Elya B. Joffe
Sarah Spurgeon
Ahmet Murat Tekalp
Albert Chun Chen Liu and Oscar Ming Kin Law
Kneron Inc.,San Diego, CA, USA
Copyright © 2021 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per‐copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750‐8400, fax (978) 750‐4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748‐6011, fax (201) 748‐6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762‐2974, outside the United States at (317) 572‐3993 or fax (317) 572‐4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging‐in‐Publication data applied for:
ISBN: 9781119810452
Cover design by WileyCover image: © Rasi Bhadramani/iStock/Getty Images
Albert Chun Chen Liu is Kneron’s founder and CEO. He is Adjunct Associate Professor at National Tsing Hua University, National Chiao Tung University, and National Cheng Kung University. After graduating from the Taiwan National Cheng Kung University, he got scholarships from Raytheon and the University of California to join the UC Berkeley/UCLA/UCSD research programs and then earned his Ph.D. in Electrical Engineering from the University of California Los Angeles (UCLA). Before establishing Kneron in San Diego in 2015, he worked in R&D and management positions in Qualcomm, Samsung Electronics R&D Center, MStar, and Wireless Information.
Albert has been invited to give lectures on computer vision technology and artificial intelligence at the University of California and be a technical reviewer for many internationally renowned academic journals. Also, Albert owned more than 30 international patents in artificial intelligence, computer vision, and image processing. He has published more than 70 papers. He is a recipient of the IBM Problem Solving Award based on the use of the EIP tool suite in 2007 and IEEE TCAS Darlington award in 2021.
Oscar Ming Kin Law developed his interest in smart robot development in 2014. He has successfully integrated deep learning with the self‐driving car, smart drone, and robotic arm. He is currently working on humanoid development. He received a Ph.D. in Electrical and Computer Engineering from the University of Toronto, Canada.
Oscar currently works at Kneron for in‐memory computing and smart robot development. He has worked at ATI Technologies, AMD, TSMC, and Qualcomm and led various groups for chip verification, standard cell design, signal integrity, power analysis, and Design for Manufacturability (DFM). He has conducted different seminars at the University of California, San Diego, University of Toronto, Qualcomm, and TSMC. He has also published over 60 patents in various areas.
With the breakthrough of the Convolutional Neural Network (CNN) for image classification in 2012, Deep Learning (DL) has successfully solved many complex problems and widely used in our everyday life, automotive, finance, retail, and healthcare. In 2016, Artificial Intelligence (AI) exceeded human intelligence that Google AlphaGo won the GO world championship through Reinforcement Learning (RL). AI revolution gradually changes our world, like a personal computer (1977), Internet (1994), and smartphone (2007). However, most of the efforts focus on software development rather than hardware challenges:
Big input data
Deep neural network
Massive parallel processing
Reconfigurable network
Memory bottleneck
Intensive computation
Network pruning
Data sparsity
This book shows how to resolve the hardware problems through various design ranging from CPU, GPU, TPU to NPU. Novel hardware can be evolved from those designs for further performance and power improvement:
Parallel architecture
Streaming Graph Theory
Convolution optimization
In‐memory computation
Near‐memory architecture
Network sparsity
3D neural processing
Organization of the Book
Chapter 1 introduces neural network and discusses neural network development history.
Chapter 2 reviews Convolutional Neural Network (CNN) model and describes each layer functions and examples.
Chapter 3 lists out several parallel architectures, Intel CPU, Nvidia GPU, Google TPU, and Microsoft NPU. It emphasizes hardware/software integration for performance improvement. Nvidia Deep Learning Accelerator (NVDLA) open‐source project is chosen for FPGA hardware implementation.
Chapter 4 introduces a streaming graph for massive parallel computation through Blaize GSP and Graphcore IPU. They apply the Depth First Search (DFS) for task allocation and Bulk Synchronous Parallel Model (BSP) for parallel operations.
Chapter 5 shows how to optimize convolution with the University of California, Los Angeles (UCLA) Deep Convolutional Neural Network (DCNN) accelerator filter decomposition and Massachusetts Institute of Technology (MIT) Eyeriss accelerator Row Stationary dataflow.
Chapter 6 illustrates in‐memory computation through Georgia Institute of Technologies Neurocube and Stanford Tetris accelerator using Hybrid Memory Cube (HMC) as well as University of Bologna Neurostream accelerator using Smart Memory Cubes (SMC).
Chapter 7 highlights near‐memory architecture through the Institute of Computing Technology (ICT), Chinese Academy of Science, DaDianNao supercomputer and University of Toronto Cnvlutin accelerator. It also shows Cnvlutin how to avoid ineffectual zero operations.
Chapter 8 chooses Stanford Energy Efficient Inference Engine, Institute of Computing Technology (ICT), Chinese Academy of Science Cambricon‐X, Massachusetts Institute of Technology (MIT) SCNN processor and Microsoft SeerNet accelerator to handle network sparsity.
Chapter 9 introduces an innovative 3D neural processing with a network bridge to overcome power and thermal challenges. It also solves the memory bottleneck and handles the large neural network processing.
In English edition, several chapters are rewritten with more detailed descriptions. New deep learning hardware architectures are also included. Exercises challenge the reader to solve the problems beyond the scope of this book. The instructional slides are available upon request.
We shall continue to explore different deep learning hardware architectures (i.e. Reinforcement Learning) and work on a in‐memory computing architecture with new high‐speed arithmetic approach. Compared with the Google Brain floating‐point (BFP16) format, the new approach offers a wider dynamic range, higher performance, and less power dissipation. It will be included in a future revision.
Albert Chun Chen LiuOscar Ming Kin Law
First, we would like to thank all who have supported the publication of the book. We are thankful to Iain Law and Enoch Law for the manuscript preparation and project development. We would like to thank Lincoln Lee and Amelia Leung for reviewing the content. We also thank Claire Chang, Charlene Jin, and Alex Liao for managing the book production and publication. In addition, we are grateful to the readers of the Chinese edition for their valuable feedback on improving the content of this book. Finally, we would like to thank our families for their support throughout the publication of this book.
Albert Chun Chen LiuOscar Ming Kin Law
1.1
High‐tech revolution
1.2
Neural network development timeline
1.3
ImageNet challenge
1.4
Neural network model
1.5
Regression
1.6
Clustering
1.7
Neural network top 1 accuracy vs. computational complexity
1.8