94,99 €
An important working resource for engineers and researchers involved in the design, development, and implementation of signal processing systems
The last decade has seen a rapid expansion of the use of field programmable gate arrays (FPGAs) for a wide range of applications beyond traditional digital signal processing (DSP) systems. Written by a team of experts working at the leading edge of FPGA research and development, this second edition of FPGA-based Implementation of Signal Processing Systems has been extensively updated and revised to reflect the latest iterations of FPGA theory, applications, and technology. Written from a system-level perspective, it features expert discussions of contemporary methods and tools used in the design, optimization and implementation of DSP systems using programmable FPGA hardware. And it provides a wealth of practical insights—along with illustrative case studies and timely real-world examples—of critical concern to engineers working in the design and development of DSP systems for radio, telecommunications, audio-visual, and security applications, as well as bioinformatics, Big Data applications, and more. Inside you will find up-to-date coverage of:
FPGA-based Implementation of Signal Processing Systems, 2nd Edition is an indispensable guide for engineers and researchers involved in the design and development of both traditional and cutting-edge data and signal processing systems. Senior-level electrical and computer engineering graduates studying signal processing or digital signal processing also will find this volume of great interest.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 610
Veröffentlichungsjahr: 2017
Roger Woods
Queen’s University, Belfast, UK
John McAllister
Queen’s University, Belfast, UK
Gaye Lightbody
University of Ulster, UK
Ying Yi
SN Systems — Sony Interactive Entertainment, UK
This edition first published 2017 © 2017 John Wiley & Sons, Ltd
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Roger Woods, John McAllister, Gaye Lightbody and Ying Yi to be identified as the authors of this work has been asserted in accordance with law.
Registered OfficesJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA John Wiley & Sons, Ltd., The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial OfficeThe Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging-in-Publication Data
Names: Woods, Roger, 1963- author. | McAllister, John, 1979- author. | Lightbody, Gaye, author. | Yi, Ying (Electrical engineer), author. Title: FPGA-based implementation of signal processing systems / Roger Woods, John McAllister, Gaye Lightbody, Ying Yi. Description: Second editon. | Hoboken, NJ : John Wiley & Sons Inc., 2017. | Revised edition of: FPGA-based implementation of signal processing systems / Roger Woods … [et al.]. 2008. | Includes bibliographical references and index. Identifiers: LCCN 2016051193 | ISBN 9781119077954 (cloth) | ISBN 9781119077978 (epdf) | ISBN 9781119077961 (epub) Subjects: LCSH: Signal processing--Digital techniques. | Digital integrated circuits. | Field programmable gate arrays. Classification: LCC TK5102.5 .F647 2017 | DDC 621.382/2--dc23 LC record available at https://lccn.loc.gov/2016051193
Cover Design: Wiley Cover Image: © filo/Gettyimages; (Graph) Courtesy of the authors
The book is dedicated by the main author to his wife, Pauline, for all for her support and care, particularly over the past two years.
The support from staff from the Royal Victoria Hospital and Musgrave Park Hospital is greatly appreciated.
Preface
List of Abbreviations
1 Introduction to Field Programmable Gate Arrays
1.1 Introduction
1.2 Field Programmable Gate Arrays
1.3 Influence of Programmability
1.4 Challenges of FPGAs
Bibliography
2 DSP Basics
2.1 Introduction
2.2 Definition of DSP Systems
2.3 DSP Transformations
2.4 Filters
2.5 Adaptive Filtering
2.6 Final Comments
Bibliography
3 Arithmetic Basics
3.1 Introduction
3.2 Number Representations
3.3 Arithmetic Operations
3.4 Alternative Number Representations
3.5 Division
3.6 Square Root
3.7 Fixed-Point versus Floating-Point
3.8 Conclusions
Bibliography
4 Technology Review
4.1 Introduction
4.2 Implications of Technology Scaling
4.3 Architecture and Programmability
4.4 DSP Functionality Characteristics
4.5 Microprocessors
4.6 DSP Processors
4.7 Graphical Processing Units
4.8 System-on-Chip Solutions
4.9 Heterogeneous Computing Platforms
4.10 Conclusions
Bibliography
5 Current FPGA Technologies
5.1 Introduction
5.2 Toward FPGAs
5.3 Altera Stratix
®
V and 10 FPGA Family
5.4 Xilinx Ultrascale
TM
/Virtex-7 FPGA families
5.5 Xilinx Zynq FPGA Family
5.6 Lattice iCE40isp FPGA Family
5.7 MicroSemi RTG4 FPGA Family
5.8 Design Stratregies for FPGA-based DSP Systems
5.9 Conclusions
Bibliography
6 Detailed FPGA Implementation Techniques
6.1 Introduction
6.2 FPGA Functionality
6.3 Mapping to LUT-Based FPGA Technology
6.4 Fixed-Coefficient DSP
6.5 Distributed Arithmetic
6.6 Reduced-Coefficient Multiplier
6.7 Conclusions
Bibliography
7 Synthesis Tools for FPGAs
7.1 Introduction
7.2 High-Level Synthesis
7.3 Xilinx Vivado
7.4 Control Logic Extraction Phase Example
7.5 Altera SDK for OpenCL
7.6 Other HLS Tools
7.7 Conclusions
Bibliography
8 Architecture Derivation for FPGA-based DSP Systems
8.1 Introduction
8.2 DSP Algorithm Characteristics
8.3 DSP Algorithm Representations
8.4 Pipelining DSP Systems
8.5 Parallel Operation
8.6 Conclusions
Bibliography
9 Complex DSP Core Design for FPGA
9.1 Introduction
9.2 Motivation for Design for Reuse
9.3 Intellectual Property Cores
9.4 Evolution of IP cores
9.5 Parameterizable (Soft) IP Cores
9.6 IP Core Integration
9.7 Current FPGA-based IP cores
9.8 Watermarking IP
9.9 Summary
Bibliography
10 Advanced Model-Based FPGA Accelerator Design
10.1 Introduction
10.2 Dataflow Modeling of DSP Systems
10.3 Architectural Synthesis of Custom Circuit Accelerators from DFGs
10.4 Model-Based Development of Multi-Channel Dataflow Accelerators
10.5 Model-Based Development for Memory-Intensive Accelerators
10.6 Summary
Notes
Bibliography
11 Adaptive Beamformer Example
11.1 Introduction to Adaptive Beamforming
11.2 Generic Design Process
11.3 Algorithm to Architecture
11.4 Efficient Architecture Design
11.5 Generic QR Architecture
11.6 Retiming the Generic Architecture
11.7 Parameterizable QR Architecture
11.8 Generic Control
11.9 Beamformer Design Example
11.10 Summary
Bibliography
12 FPGA Solutions for Big Data Applications
12.1 Introduction
12.2 Big Data
12.3 Big Data Analytics
12.4 Acceleration
12.5
k
-Means Clustering FPGA Implementation
12.6 FPGA-Based Soft Processors
12.7 System Hardware
12.8 Conclusions
Bibliography
13 Low-Power FPGA Implementation
13.1 Introduction
13.2 Sources of Power Consumption
13.3 FPGA Power Consumption
13.4 Power Consumption Reduction Techniques
13.5 Dynamic Voltage Scaling in FPGAs
13.6 Reduction in Switched Capacitance
13.7 Final Comments
Bibliography
14 Conclusions
14.1 Introduction
14.2 Evolution in FPGA Design Approaches
14.3 Big Data and the Shift toward Computing
14.4 Programming Flow for FPGAs
14.5 Support for Floating-Point Arithmetic
14.6 Memory Architectures
Bibliography
Index
EULA
Chapter 1
Table 1.1
Chapter 3
Table 3.1
Table 3.2
Table 3.3
Table 3.4
Table 3.5
Table 3.6
Table 3.7
Table 3.8
Table 3.9
Table 3.10
Table 3.11
Chapter 4
Table 4.1
Table 4.2
Chapter 5
Table 5.1
Table 5.2
Table 5.3
Table 5.4
Chapter 6
Table 6.1
Table 6.2
Table 6.3
Table 6.4
Chapter 8
Table 8.1
Table 8.2
Table 8.3
Table 8.4
Table 8.5
Chapter 9
Table 9.1
Chapter 10
Table 10.1
Table 10.2
Table 10.3
Table 10.4
Table 10.5
Table 10.6
Table 10.7
Table 10.8
Chapter 11
Table 11.1
Table 11.2
Table 11.3
Table 11.4
Table 11.5
Table 11.6
Table 11.7
Table 11.8
Chapter 12
Table 12.1
Table 12.2
Table 12.3
Table 12.4
Table 12.5
Chapter 13
Table 13.1
Table 13.2
Table 13.3
Table 13.4
Table 13.5
Cover
Table of Contents
Preface
xv
xvi
xvii
xviii
xix
xxi
xxii
xxiii
xxiv
xxv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
82
83
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
250
251
252
253
254
255
256
258
259
260
261
262
264
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
325
Digital signal processing (DSP) is the cornerstone of many products and services in the digital age. It is used in applications such as high-definition TV, mobile telephony, digital audio, multimedia, digital cameras, radar, sonar detectors, biomedical imaging, global positioning, digital radio, speech recognition, to name but a few! The evolution of DSP solutions has been driven by application requirements which, in turn, have only been possible to realize because of developments in silicon chip technology. Currently, a mix of programmable and dedicated system-on-chip (SoC) solutions are required for these applications and thus this has been a highly active area of research and development over the past four decades.
The result has been the emergence of numerous technologies for DSP implementation, ranging from simple microcontrollers right through to dedicated SoC solutions which form the basis of high-volume products such as smartphones. With the architectural developments that have occurred in field programmable gate arrays (FPGAs) over the years, it is clear that they should be considered as a viable DSP technology. Indeed, developments made by FPGA vendors would support this view of their technology. There are strong commercial pressures driving adoption of FPGA technology across a range of applications and by a number of commercial drivers.
The increasing costs of developing silicon technology implementations have put considerable pressure on the ability to create dedicated SoC systems. In the mobile phone market, volumes are such that dedicated SoC systems are required to meet stringent energy requirements, so application-specific solutions have emerged which vary in their degree of programmability, energy requirements and cost. The need to balance these requirements suggests that many of these technologies will coexist in the immediate future, and indeed many hybrid technologies are starting to emerge. This, of course, creates a considerable interest in using technology that is programmable as this acts to considerably reduce risks in developing new technologies.
Commonly used DSP technologies encompass software programmable solutions such as microcontrollers and DSP microprocessors. With the inclusion of dedicated DSP processing engines, FPGA technology has now emerged as a strong DSP technology. Their key advantage is that they enable users to create system architectures which allow the resources to be best matched to the system processing needs. Whilst memory resources are limited, they have a very high-bandwidth, on-chip capability. Whilst the prefabricated aspect of FPGAs avoids many of the deep problems met when developing SoC implementations, the creation of an efficient implementation from a DSP system description remains a highly convoluted problem which is a core theme of this book.
The book looks to address FPGA-based DSP systems, considering implementation at numerous levels.
Circuit-level
optimization techniques that allow the underlying FPGA fabric to be used more intelligently are reviewed first. By considering the detailed underlying FPGA platform, it is shown how system requirements can be mapped to provide an area-efficient, faster implementation. This is demonstrated for a number of DSP transforms and fixed coefficient filtering.
Architectural
solutions can be created from a signal flow graph (SFG) representation. In effect, this requires the user to exploit the highly regular, highly computative, data-independent nature of DSP systems to produce highly parallel, pipelined FPGA-based circuit architectures. This is demonstrated for filtering and beamforming applications.
System
solutions are now a challenge as FPGAs have now become a heterogeneous platform involving multiple hardware and software components and interconnection fabrics. There is a need for a higher-level system modeling language, e.g. dataflow which will facilitate architectural optimizations but also to address system-level considerations such as interconnection and memory.
The book covers these areas of FPGA implementation, but its key differentiating factor is that it concentrates on the second and third areas listed above, namely the creation of circuit architectures and system-level modeling; this is because circuit-level optimization techniques have been covered in greater detail elsewhere. The work is backed up with the authors’ experiences in implementing practical real DSP systems and covers numerous examples including an adaptive beamformer based on a QR-based recursive least squares (RLS) filter, finite impulse response (FIR) and infinite impulse response (IIR) filters, a full search motion estimation and a fast Fourier transform (FFT) system for electronic support measures. The book also considers the development of intellectual property (IP) cores as this has become a critical aspect in the creation of DSP systems. One chapter is given over to describing the creation of such IP cores and another to the creation of an adaptive filtering core.
The book is aimed at working engineers who are interested in using FPGA technology efficiently in signal and data processing applications. The earlier chapters will be of interest to graduates and students completing their studies, taking the readers through a number of simple examples that show the trade-off when mapping DSP systems into FPGA hardware. The middle part of the book contains a number of illustrative, complex DSP system examples that have been implemented using FPGAs and whose performance clearly illustrates the benefit of their use. They provide insights into how to best use the complex FPGA technology to produce solutions optimized for speed, area and power which the authors believe is missing from current literature. The book summarizes over 30 years of learned experience of implementing complex DSP systems undertaken in many cases with commercial partners.
The second edition has been updated and improved in a number of ways. It has been updated to reflect technology evolutions in FPGA technology, to acknowledge developments in programming and synthesis tools, to reflect on algorithms for Big Data applications, and to include improvements to some background chapters. The text has also been updated using relevant examples where appropriate.
Technology update: As FPGAs are linked to silicon technology advances, their architecture continually changes, and this is reflected in Chapter 5. A major change is the inclusion of the ARM® processor core resulting in a shift for FPGAs to a heterogeneous computing platform. Moreover, the increased use of graphical processing units (GPUs) in DSP systems is reflected in Chapter 4.
Programming tools update: Since the first edition was published, there have been a number of innovations in tool developments, particularly in the creation of commercial C-based high-level synthesis (HLS) and open computing language (OpenCL) tools. The material in Chapter 7 has been updated to reflect these changes, and Chapter 10 has been changed to reflect the changes in model-based synthesis tools.
“Big Data” processing: DSP involves processing of data content such as audio, speech, music and video information, but there is now great interest in collating huge data sets from on-line facilities and processing them quickly. As FPGAs have started to gain some traction in this area, a new chapter, Chapter 12, has been added to reflect this development.
The FPGA is a heterogeneous platform comprising complex resources such as hard and soft processors, dedicated blocks optimized for processing DSP functions and processing elements connected by both programmable and fast, dedicated interconnections. The book focuses on the challenges of implementing DSP systems on such platforms with a concentration on the high-level mapping of DSP algorithms into suitable circuit architectures.
The material is organized into three main sections.
Chapter 2 starts with a DSP primer, covering both FIR and IIR filtering, transforms including the FFT and discrete cosine transform (DCT) and concluding with adaptive filtering algorithms, covering both the least mean squares (LMS) and RLS algorithms. Chapter 3 is dedicated to computer arithmetic and covers number systems, arithmetic functions and alternative number representations such as logarithmic number representations (LNS) and coordinate rotation digital computer (CORDIC). Chapter 4 covers the technologies available to implement DSP algorithms and includes microprocessors, DSP microprocessors, GPUs and SoC architectures, including systolic arrays. In Chapter 5, a detailed description of commercial FPGAs is given with a concentration on the two main vendors, namely Xilinx and Altera, specifically their UltraScaleTM/Zynq® and Stratix® 10 FPGA families respectively, but also covering technology offerings from Lattice and MicroSemi.
This section covers efficient implementation from circuit architecture onto specific FPGA families; creation of circuit architecture from SFG representations; and system-level specification and implementation methodologies from high-level representations. Chapter 6 covers only briefly the efficient implementation of FPGA designs from circuit architecture descriptions as many of these approaches have been published; the text covers distributed arithmetic and reduced coefficient multiplier approaches and shows how these have been applied to fixed coefficient filters and DSP transforms. Chapter 7 covers HLS for FPGA design including new sections to reflect Xilinx’s Vivado HLS tool flow and also Altera’s OpenCL approach. The process of mapping SFG representations of DSP algorithms onto circuit architectures (the starting point in Chapter 6) is then described in Chapter 8. It shows how dataflow graph (DFG) descriptions can be transformed for varying levels of parallelism and pipelining to create circuit architectures which best match the application requirements, backed up with simple FIR and IIR filtering examples.
One of the ways to perform system design is to create predefined designs termed IP cores which will typically have been optimized using the techniques outlined in Chapter 8. The creation of such IP cores is outlined in Chapter 9 and acts to address the key to design productivity by encouraging “design for reuse.” Chapter 10 considers model-based design for heterogeneous FPGA and focuses on dataflow modeling as a suitable design approach for FPGA-based DSP systems. The chapter outlines how it is possible to include pipelined IP cores via the white box concept using two examples, namely a normalized lattice filter (NLF) and a fixed beamformer example.
The final section of the book, consisting of Chapters 11–13, covers the application of the techniques. Chapter 11 looks at the creation of a soft, highly parameterizable core for RLS filtering, showing how a generic architecture can be created to allow a range of designs to be synthesized with varying performance. Chapter 12 illustrates how FPGAs can be applied to Big Data applications where the challenge is to accelerate some complex processing algorithms. Increasingly FPGAs are seen as a low-power solution, and FPGA power consumption is discussed in Chapter 13. The chapter starts with a discussion on power consumption, highlights the importance of dynamic and static power consumption, and then describes some techniques to reduce power consumption.
The authors have been fortunate to receive valuable help, support and suggestions from numerous colleagues, students and friends, including: Michaela Blott, Ivo Bolsens, Gordon Brebner, Bill Carter, Joe Cavallaro, Peter Cheung, John Gray, Wayne Luk, Bob Madahar, Alan Marshall, Paul McCambridge, Satnam Singh, Steve Trimberger and Richard Walke.
The authors’ research has been funded from a number of sources, including the Engineering and Physical Sciences Research Council, Xilinx, Ministry of Defence, Qinetiq, BAE Systems, Selex and Department of Employment and Learning for Northern Ireland.
Several chapters are based on joint work that was carried out with the following colleagues and students: Moslem Amiri, Burak Bardak, Kevin Colgan, Tim Courtney, Scott Fischaber, Jonathan Francey, Tim Harriss, Jean-Paul Heron, Colm Kelly, Bob Madahar, Eoin Malins, Stephen McKeown, Karen Rafferty, Darren Reilly, Lok-Kee Ting, David Trainor, Richard Turner, Fahad M Siddiqui and Richard Walke.
The authors thank Ella Mitchell and Nithya Sechin of John Wiley & Sons and Alex Jackson and Clive Lawson for their personal interest and help and motivation in preparing and assisting in the production of this work.
1D
One-dimensional
2D
Two-dimensional
ABR
Auditory brainstem response
ACC
Accumulator
ADC
Analogue-to-digital converter
AES
Advanced encryption standard
ALM
Adaptive logic module
ALU
Arithmetic logic unit
ALUT
Adaptive lookup table
AMD
Advanced Micro Devices
ANN
Artificial neural network
AoC
Analytics-on-chip
API
Application program interface
APU
Application processing unit
ARM
Advanced RISC machine
ASIC
Application-specific integrated circuit
ASIP
Application-specific instruction processor
AVS
Adaptive voltage scaling
BC
Boundary cell
BCD
Binary coded decimal
BCLA
Block CLA with intra-group, carry ripple
BRAM
Block random access memory
CAPI
Coherent accelerator processor interface
CB
Current block
CCW
Control and communications wrapper
CE
Clock enable
CISC
Complex instruction set computer
CLA
Carry lookahead adder
CLB
Configurable logic block
CNN
Convolutional neural network
CMOS
Complementary metal oxide semiconductor
CORDIC
Coordinate rotation digital computer
CPA
Carry propagation adder
CPU
Central processing unit
CSA
Conditional sum adder
CSDF
Cyclo-static dataflow
CWT
Continuous wavelet transform
DA
Distributed arithmetic
DCT
Discrete cosine transform
DDR
Double data rate
DES
Data Encryption Standard
DFA
Dataflow accelerator
DFG
Dataflow graph
DFT
Discrete Fourier transform
DG
Dependence graph
disRAM
Distributed random access memory
DM
Data memory
DPN
Dataflow process network
DRx
Digital receiver
DSP
Digital signal processing
DST
Discrete sine transform
DTC
Decision tree classification
DVS
Dynamic voltage scaling
DWT
Discrete wavelet transform
E
2
PROM
Electrically erasable programmable read-only memory
EBR
Embedded Block RAM
ECC
Error correction code
EEG
Electroencephalogram
EPROM
Electrically programmable read-only memory
E-SGR
Enhanced Squared Givens rotation algorithm
EW
Electronic warfare
FBF
Fixed beamformer
FCCM
FPGA-based custom computing machine
FE
Functional engine
FEC
Forward error correction
FFE
Free-form expression
FFT
Fast Fourier transform
FIFO
First-in, first-out
FIR
Finite impulse response
FPGA
Field programmable gate array
FPL
Field programmable logic
FPU
Floating-point unit
FSM
Finite state machine
FSME
Full search motion estimation
GFLOPS
Giga floating-point operations per second
GMAC
Giga multiply-accumulates
GMACS
Giga multiply-accumulate per second
GOPS
Giga operations per second
GPUPU
General-purpose graphical processing unit
GPU
Graphical processing unit
GRNN
General regression neural network
GSPS
Gigasamples per second
HAL
Hardware abstraction layer
HDL
Hardware description language
HKMG
High-K metal gate
HLS
High-level synthesis
I2C
Inter-Integrated circuit
I/O
Input/output
IC
Internal cell
ID
Instruction decode
IDE
Integrated design environment
IDFT
Inverse discrete Fourier transform
IEEE
Institute of Electrical and Electronic Engineers
IF
Instruction fetch
IFD
Instruction fetch and decode
IFFT
Inverse fast Fourier transform
IIR
Infinite impulse response
IM
Instruction memory
IoT
Internet of things
IP
Intellectual property
IR
Instruction register
ITRS
International Technology Roadmap for Semiconductors
JPEG
Joint Photographic Experts Group
KCM
Constant-coefficient multiplication
KM
Kernel memory
KPN
Kahn process network
LAB
Logic array blocks
LDCM
Logic delay measurement circuit
LDPC
Low-density parity-check
LLVM
Low-level virtual machine
LMS
Least mean squares
LNS
Logarithmic number representations
LPDDR
Low-power double data rate
LS
Least squares
lsb
Least significant bit
LTI
Linear time-invariant
LUT
Lookup table
MA
Memory access
MAC
Multiply-accumulate
MAD
Minimum absolute difference
MADF
Multidimensional arrayed dataflow
MD
Multiplicand
ME
Motion estimation
MIL-STD
Military standard
MIMD
Multiple instruction, multiple data
MISD
Multiple instruction, single data
MLAB
Memory LAB
MMU
Memory management unit
MoC
Model of computation
MPE
Media processing engine
MPEG
Motion Picture Experts Group
MPSoC
Multi-processing SoC
MR
Multiplier
MR-DFG
Multi-rate dataflow graph
msb
Most significant bit
msd
Most significant digit
MSDF
Multidimensional synchronous dataflow
MSI
Medium-scale integration
MSPS
Megasamples per second
NaN
Not a Number
NLF
Normalized lattice filter
NRE
Non-recurring engineering
OCM
On-chip memory
OFDM
Orthogonal frequency division multiplexing
OFDMA
Orthogonal frequency division multiple access
OLAP
On-line analytical processing
OpenCL
Open computing language
OpenMP
Open multi-processing
ORCC
Open RVC-CAL Compiler
PAL
Programmable Array Logic
PB
Parameter bank
PC
Program counter
PCB
Printed circuit board
PCI
Peripheral component interconnect
PD
Pattern detect
PE
Processing element
PL
Programmable logic
PLB
Programmable logic block
PLD
Programmable logic device
PLL
Phase locked loop
PPT
Programmable power technology
PS
Processing system
QAM
Quadrature amplitude modulation
QR-RLS
QR recursive least squares
RAM
Random access memory
RAN
Radio access network
RCLA
Block CLA with inter-block ripple
RCM
Reduced coefficient multiplier
RF
Register file
RISC
Reduced instruction set computer
RLS
Recursive least squares
RNS
Residue number representations
ROM
Read-only memory
RT
Radiation tolerant
RTL
Register transfer level
RVC
Reconfigurable video coding
SBNR
Signed binary number representation
SCU
Snoop control unit
SD
Signed digits
SDF
Synchronous dataflow
SDK
Software development kit
SDNR
Signed digit number representation
SDP
Simple dual-port
SERDES
Serializer/deserializer
SEU
Single event upset
SFG
Signal flow graph
SGR
Squared Givens rotation
SIMD
Single instruction, multiple data
SISD
Single instruction, single data
SMP
Shared-memory multi-processors
SNR
Signal-to-noise ratio
SoC
System-on-chip
SOCMINT
Social media intelligence
SoPC
System on programmable chip
SPI
Serial peripheral interface
SQL
Structured query language
SR-DFG
Single-rate dataflow graph
SRAM
Static random access memory
SRL
Shift register lookup table
SSD
Shifted signed digits
SVM
Support vector machine
SW
Search window
TCP
Transmission Control Protocol
TFLOPS
Tera floating-point operations per second
TOA
Time of arrival
TR
Throughout rate
TTL
Transistor-transistor logic
UART
Universal asynchronous receiver/transmitter
ULD
Ultra-low density
UML
Unified modeling language
VHDL
VHSIC hardware description language
VHSIC
Very high-speed integrated circuit
VLIW
Very long instruction word
VLSI
Very large scale integration
WBC
White box component
WDF
Wave digital filter
Electronics continues to make an impact in the twenty-first century and has given birth to the computer industry, mobile telephony and personal digital entertainment and services industries, to name but a few. These markets have been driven by developments in silicon technology as described by Moore’s law (Moore 1965), which is represented pictorially in Figure 1.1. This has seen the number of transistors double every 18 months. Moreover, not only has the number of transistors doubled at this rate, but also the costs have decreased, thereby reducing the cost per transistor at every technology advance.
Figure 1.1 Moore’s law
In the 1970s and 1980s, electronic systems were created by aggregating standard components such as microprocessors and memory chips with digital logic components, e.g. dedicated integrated circuits along with dedicated input/output (I/O) components on printed circuit boards (PCBs). As levels of integration grew, manufacturing working PCBs became more complex, largely due to greater component complexity in terms of the increase in the number of transistors and I/O pins. In addition, the development of multi-layer boards with as many as 20 separate layers increased the design complexity. Thus, the probability of incorrectly connecting components grew, particularly as the possibility of successfully designing and testing a working system before production was coming under greater and greater time pressures.
The problem became more challenging as system descriptions evolved during product development. Pressure to create systems to meet evolving standards, or that could change after board construction due to system alterations or changes in the design specification, meant that the concept of having a “fully specified” design, in terms of physical system construction and development on processor software code, was becoming increasingly challenging. Whilst the use of programmable processors such as microcontrollers and microprocessors gave some freedom to the designer to make alterations in order to correct or modify the system after production, this was limited. Changes to the interconnections of the components on the PCB were restricted to I/O connectivity of the processors themselves. Thus the attraction of using programmability interconnection or “glue logic” offered considerable potential, and so the concept of field programmable logic (FPL), specifically field programmable gate array (FPGA) technology, was born.
From this unassuming start, though, FPGAs have grown into a powerful technology for implementing digital signal processing (DSP) systems. This emergence is due to the integration of increasingly complex computational units into the fabric along with increasing complexity and number of levels in memory. Coupled with a high level of programmable routing, this provides an impressive heterogeneous platform for improved levels of computing. For the first time ever, we have seen evolutions in heterogeneous FPGA-based platforms from Microsoft, Intel and IBM. FPGA technology has had an increasing impact on the creation of DSP systems. Many FPGA-based solutions exist for wireless base station designs, image processing and radar systems; these are, of course, the major focus of this text.
Microsoft has developed acceleration of the web search engine Bing using FPGAs and shows improved ranking throughput in a production search infrastructure. IBM and Xilinx have worked closely together to show that they can accelerate the reading of data from web servers into databases by applying an accelerated Memcache2; this is a general-purpose distributed memory caching system used to speed up dynamic database-driven searches (Blott and Vissers 2014). Intel have developed a multicore die with Altera FPGAs, and their recent purchase of the company (Clark 2015) clearly indicates the emergence of FPGAs as a core component in heterogeneous computing with a clear target for data centers.
The FPGA concept emerged in 1985 with the XC2064™ FPGA family from Xilinx. At the same time, a company called Altera was also developing a programmable device, later to become the EP1200, which was the first high-density programmable logic device (PLD). Altera’s technology was manufactured using 3-μm complementary metal oxide semiconductor (CMOS) electrically programmable read-only memory (EPROM) technology and required ultraviolet light to erase the programming, whereas Xilinx’s technology was based on conventional static random access memory (SRAM) technology and required an EPROM to store the programming.
The co-founder of Xilinx, Ross Freeman, argued that with continuously improving silicon technology, transistors were going to become cheaper and cheaper and could be used to offer programmability. This approach allowed system design errors which had only been recognized at a late stage of development to be corrected. By using an FPGA to connect the system components, the interconnectivity of the components could be changed as required by simply reprogramming them. Whilst this approach introduced additional delays due to the programmable interconnect, it avoided a costly and time-consuming PCB redesign and considerably reduced the design risks.
At this stage, the FPGA market was populated by a number of vendors, including Xilinx, Altera, Actel, Lattice, Crosspoint, Prizm, Plessey, Toshiba, Motorola, Algotronix and IBM. However, the costs of developing technologies not based on conventional integrated circuit design processes and the need for programming tools saw the demise of many of these vendors and a reduction in the number of FPGA families. SRAM technology has now emerged as the dominant technology largely due to cost, as it does not require a specialist technology. The market is now dominated by Xilinx and Altera, and, more importantly, the FPGA has grown from a simple glue logic component to a complete system on programmable chip (SoPC) comprising on-board physical processors, soft processors, dedicated DSP hardware, memory and high-speed I/O.
The FPGA evolution was neatly described by Steve Trimberger in his FPL2007 plenary talk (see the summary in Table 1.1). The evolution of the FPGA can be divided into three eras. The age of invention was when FPGAs started to emerge and were being used as system components typically to provide programmable interconnect giving protection to design evolutions and variations. At this stage, design tools were primitive, but designers were quite happy to extract the best performance by dealing with lookup tables (LUTs) or single transistors.
Table 1.1 Three ages of FPGAs
Period
Age
Comments
1984–1991
Invention
Technology is limited, FPGAs are much smaller than the application problem size. Design automation is secondary, architecture efficiency is key
1992–1999
Expansion
FPGA size approaches the problem size. Ease of design becomes critical
2000–present
Accumulation
FPGAs are larger than the typical problem size. Logic capacity limited by I/O bandwidth
As highlighted above, there was a rationalization of the technologies in the early 1990s, referred to by Trimberger as the great architectural shakedown. The age of expansion was when the FPGA started to approach the problem size and thus design complexity was key. This meant that it was no longer sufficient for FPGA vendors to just produce place and route tools and it became critical that hardware description languages (HDLs) and associated synthesis tools were created. The final evolution period was the period of accumulation when FPGAs started to incorporate processors and high-speed interconnection. Of course, this is very relevant now and is described in more detail in Chapter 5 where the recent FPGA offerings are reviewed.
This has meant that the FPGA market has grown from nothing in just over 20 years to become a key player in the IC industry, worth some $3.9 billion in 2014 and expected to be worth around $7.3 billion in 2022 (MarketsandMarkets 2016). It has been driven by the growth in the automotive sector, mobile devices in the consumer electronics sector and the number of data centers.
Whilst Moore’s law is presented here as being the cornerstone for driving FPGA evolution and indeed electronics, it also has been the driving force for computing. However, all is not well with computing’s reliance on silicon technology. Whilst the number of transistors continues to double, the scaling of clock speed has not continued at the same rate. This is due to the increase in power consumption, particularly the increase in static power. The issue of the heat dissipation capability of packaging means that computing platform providers such as Intel have limited their processor power to 30 W. This resulted in an adjustment in the prediction for clock rates between 2005 and 2011 (as illustrated in Figure 1.2) as clock rate is a key contributor to power consumption (ITRS 2005).
Figure 1.2 Change in ITRS scaling prediction for clock frequencies
In 2005, the International Technology Roadmap for Semiconductors (ITRS) predicted that a 100 GHz clock would be achieved in 2020, but this estimation had to be revised first in 2007 and then again in 2011. This has been seen in the current technology where a clock rate of some 30 GHz was expected in 2015 based on the original forecast, but we see that speeds have been restricted to 3–4 GHz. This has meant that the performance per gigahertz has effectively stalled since 2005 and has generated the interest by major computing companies in exploring different architectures that employ FPGA technology (Putnam et al. 2014; Blott and Vissers 2014).
On many occasions, the growth indicated by Moore’s law has led people to argue that transistors are essentially free and therefore can be exploited, as in the case of programmable hardware, to provide additional flexibility. This could be backed up by the observation that the cost of a transistor has dropped from one-tenth of a cent in the 1980s to one-thousandth of a cent in the 2000s. Thus we have seen the introduction of hardware programmability into electronics in the form of FPGAs.
In order to make a single transistor programmable in an SRAM technology, the programmability is controlled by storing a “1” or a “0” on the gate of the transistor, thereby making it conduct or not. This value is then stored in an SRAM cell which, if it requires six transistors, will will mean that we need seven transistors to achieve one programmable equivalent in FPGA. The reality is that in an overall FPGA implementation, the penalty is nowhere as harsh as this, but it has to be taken into consideration in terms of ultimate system cost.
It is the ability to program the FPGA hardware after fabrication that is the main appeal of the technology; this provides a new level of reassurance in an increasingly competitive market where “right first time” system construction is becoming more difficult to achieve. It would appear that that assessment was vindicated in the late 1990s and early 2000s: when there was a major market downturn, the FPGA market remained fairly constant when other microelectronic technologies were suffering. Of course, the importance of programmability has already been demonstrated by the microprocessor, but this represented a new change in how programmability was performed.
The argument developed in the previous section presents a clear advantage of FPGA technology in overcoming PCB design errors and manufacturing faults. Whilst this might have been true in the early days of FPGA technology, evolution in silicon technology has moved the FPGA from being a programmable interconnection technology to making it into a system component. If the microprocessor or microcontroller was viewed as programmable system component, the current FPGA devices must also be viewed in this vein, giving us a different perspective on system implementation.
In electronic system design, the main attraction of the microprocessor is that it considerably lessens the risk of system development. As the hardware is fixed, all of the design effort can be concentrated on developing the code. This situation has been complemented by the development of efficient software compilers which have largely removed the need for the designer to create assembly language; to some extent, this can even absolve the designer from having a detailed knowledge of the microprocessor architecture (although many practitioners would argue that this is essential to produce good code). This concept has grown in popularity, and embedded microprocessor courses are now essential parts of any electrical/electronic or computer engineering degree course.
A lot of this process has been down to the software developer’s ability to exploit an underlying processor architecture, the von Neumann architecture. However, this advantage has also been the limiting factor in its application to the topic of this text, namely DSP. In the von Neumann architecture, operations are processed sequentially, which allows relatively straightforward interpretation of the hardware for programming purposes; however, this severely limits the performance in DSP applications which exhibit high levels of parallelism and have operations that are highly data-independent. This cries out for parallel realization, and whilst DSP microprocessors go some way toward addressing this situation by providing concurrency in the form of parallel hardware and software “pipelining,” there is still the concept of one architecture suiting all sizes of the DSP problem.
This limitation is overcome in FPGAs as they allow what can be considered to be a second level of programmability, namely programming of the underlying processor architecture. By creating an architecture that best meets the algorithmic requirements, high levels of performance in terms of area, speed and power can be achieved. This concept is not new as the idea of deriving a system architecture to suit algorithmic requirements has been the cornerstone of application-specific integrated circuit (ASIC) implementations. In high volumes, ASIC implementations have resulted in the most cost-effective, fastest and lowest-energy solutions. However, increasing mask costs and the impact of “right first time” system realization have made the FPGA a much more attractive alternative.
In this sense, FPGAs capture the performance aspects offered by ASIC implementation, but with the advantage of programmability usually associated with programmable processors. Thus, FPGA solutions have emerged which currently offer several hundreds of giga operations per second (GOPS) on a single FPGA for some DSP applications, which is at least an order of magnitude better performance than microprocessors.
In many texts, Moore’s law is used to highlight the evolution of silicon technology, but another interesting viewpoint particularly relevant for FPGA technology is Makimoto’s wave, which was first published in the January 1991 edition of Electronics Weekly. It is based on an observation by Tsugio Makimoto who noted that technology has shifted between standardization and customization. In the 1960s, 7400 TTL series logic chips were used to create applications; and then in the early 1970s, the custom large-scale integration era emerged where chips were created (or customized) for specific applications such as the calculator. The chips were now increasing in their levels of integration and so the term “medium-scale integration” (MSI) was born. The evolution of the microprocessor in the 1970s saw the swing back towards standardization where one “standard” chip was used for a wide range of applications.
The 1980s then saw the birth of ASICs where designers could overcome the fact that the sequential microprocessor posed severe limitations in DSP applications where higher levels of computations were needed. The DSP processor also emerged, such as the TMS32010, which differed from conventional processors as they were based on the Harvard architecture which had separate program and data memories and separate buses. Even with DSP processors, ASICs offered considerable potential in terms of processing power and, more importantly, power consumption. The development of the FPGA from a “glue component” that allowed other components to be connected together to form a system to become a component or even a system itself led to its increased popularity.
The concept of coupling microprocessors with FPGAs in heterogeneous platforms was very attractive as this represented a completely programmable platform with microprocessors to implement the control-dominated aspects of DSP systems and FPGAs to implement the data-dominated aspects. This concept formed the basis of FPGA-based custom computing machines (FCCMs) which formed the basis for “configurable” or reconfigurable computing (Villasenor and Mangione-Smith 1997). In these systems, users could not only implement computational complex algorithms in hardware, but also use the programmability aspect of the hardware to change the system functionality, allowing the development of “virtual hardware” where hardware could ‘virtually” implement systems that are an order of magnitude larger (Brebner 1997).
We would argue that there have been two programmability eras. The first occurred with the emergence of the microprocessor in the 1970s, where engineers could develop programmable solutions based on this fixed hardware. The major challenge at this time was the software environments; developers worked with assembly language, and even when compilers and assemblers emerged for C, best performance was achieved by hand-coding. Libraries started to appear which provided basic common I/O functions, thereby allowing designers to concentrate on the application. These functions are now readily available as core components in commercial compilers and assemblers. The need for high-level languages grew, and now most programming is carried out in high-level programming languages such as C and Java, with an increased use of even higher-level environments such as the unified modeling language (UML).
The second era of programmability was ushered in by FPGAs. Makimoto indicates that field programmability is standardized in manufacture and customized in application. This can be considered to have offered hardware programmability if you think in terms of the first wave as the programmability in the software domain where the hardware remains fixed. This is a key challenge as most computer programming tools work on the fixed hardware platform principle, allowing optimizations to be created as there is clear direction on how to improve performance from an algorithmic representation. With FPGAs, the user is given full freedom to define the architecture which best suits the application. However, this presents a problem in that each solution must be handcrafted and every hardware designer knows the issues in designing and verifying hardware designs!
Some of the trends in the two eras have similarities. In the early days, schematic capture was used to design early circuits, which was synonymous with assembly-level programming. Hardware description languages such as VHSIC Hardware Description Language (VHDL) and Verilog then started to emerge that could used to produce a higher level of abstraction, with the current aim to have C-based tools such as SystemC and Catapult® from Mentor Graphics as a single software-based programming environment (Very High Speed Integrated Circuit (VHSIC) was a US Department of Defense funded program in the late 1970s and early 1980s with the aim of producing the next generation of integrated circuits). Initially, as with software programming languages, there was mistrust in the quality of the resulting code produced by these approaches.
With the establishment of improved cost-effectiveness, synthesis tools are equivalent to the evolution of efficient software compilers for high-level programming languages, and the evolution of library functions allowed a high degree of confidence to be subsequently established; the use of HDLs is now commonplace for FPGA implementation. Indeed, the emergence of intellectual property (IP) cores mirrored the evolution of libraries such as I/O programming functions for software flows; they allowed common functions to be reused as developers trusted the quality of the resulting implementation produced by such libraries, particularly as pressures to produce more code within the same time-span grew. The early IP cores emerged from basic function libraries into complex signal processing and communications functions such as those available from the FPGA vendors and the various web-based IP repositories.
In the early days, FPGAs were seen as glue logic chips used to plug components together to form complex systems. FPGAs then increasingly came to be seen as complete systems in themselves, as illustrated in Table 1.1
