E-Book
94,99 €

FPGA-based Implementation of Signal Processing Systems E-Book

Roger Woods

0,0

94,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: John Wiley & Sons
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

An important working resource for engineers and researchers involved in the design, development, and implementation of signal processing systems

The last decade has seen a rapid expansion of the use of field programmable gate arrays (FPGAs) for a wide range of applications beyond traditional digital signal processing (DSP) systems. Written by a team of experts working at the leading edge of FPGA research and development, this second edition of FPGA-based Implementation of Signal Processing Systems has been extensively updated and revised to reflect the latest iterations of FPGA theory, applications, and technology. Written from a system-level perspective, it features expert discussions of contemporary methods and tools used in the design, optimization and implementation of DSP systems using programmable FPGA hardware. And it provides a wealth of practical insights—along with illustrative case studies and timely real-world examples—of critical concern to engineers working in the design and development of DSP systems for radio, telecommunications, audio-visual, and security applications, as well as bioinformatics, Big Data applications, and more. Inside you will find up-to-date coverage of:

FPGA solutions for Big Data Applications, especially as they apply to huge data sets
The use of ARM processors in FPGAs and the transfer of FPGAs towards heterogeneous computing platforms
The evolution of High Level Synthesis tools—including new sections on Xilinx's HLS Vivado tool flow and Altera's OpenCL approach
Developments in Graphical Processing Units (GPUs), which are rapidly replacing more traditional DSP systems

FPGA-based Implementation of Signal Processing Systems, 2nd Edition is an indispensable guide for engineers and researchers involved in the design and development of both traditional and cutting-edge data and signal processing systems. Senior-level electrical and computer engineering graduates studying signal processing or digital signal processing also will find this volume of great interest.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 610

Veröffentlichungsjahr: 2017

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

FPGA-based Implementation of Signal Processing Systems

Second Edition

Roger Woods

Queen’s University, Belfast, UK

John McAllister

Queen’s University, Belfast, UK

Gaye Lightbody

University of Ulster, UK

Ying Yi

SN Systems — Sony Interactive Entertainment, UK

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Roger Woods, John McAllister, Gaye Lightbody and Ying Yi to be identified as the authors of this work has been asserted in accordance with law.

Registered OfficesJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA John Wiley & Sons, Ltd., The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial OfficeThe Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging-in-Publication Data

Names: Woods, Roger, 1963- author. | McAllister, John, 1979- author. | Lightbody, Gaye, author. | Yi, Ying (Electrical engineer), author. Title: FPGA-based implementation of signal processing systems / Roger Woods, John McAllister, Gaye Lightbody, Ying Yi. Description: Second editon. | Hoboken, NJ : John Wiley & Sons Inc., 2017. | Revised edition of: FPGA-based implementation of signal processing systems / Roger Woods … [et al.]. 2008. | Includes bibliographical references and index. Identifiers: LCCN 2016051193 | ISBN 9781119077954 (cloth) | ISBN 9781119077978 (epdf) | ISBN 9781119077961 (epub) Subjects: LCSH: Signal processing--Digital techniques. | Digital integrated circuits. | Field programmable gate arrays. Classification: LCC TK5102.5 .F647 2017 | DDC 621.382/2--dc23 LC record available at https://lccn.loc.gov/2016051193

Cover Design: Wiley Cover Image: © filo/Gettyimages; (Graph) Courtesy of the authors

The book is dedicated by the main author to his wife, Pauline, for all for her support and care, particularly over the past two years.

The support from staff from the Royal Victoria Hospital and Musgrave Park Hospital is greatly appreciated.

Preface

List of Abbreviations

1 Introduction to Field Programmable Gate Arrays

1.1 Introduction

1.2 Field Programmable Gate Arrays

1.3 Influence of Programmability

1.4 Challenges of FPGAs

Bibliography

2 DSP Basics

2.1 Introduction

2.2 Definition of DSP Systems

2.3 DSP Transformations

2.4 Filters

2.5 Adaptive Filtering

2.6 Final Comments

Bibliography

3 Arithmetic Basics

3.1 Introduction

3.2 Number Representations

3.3 Arithmetic Operations

3.4 Alternative Number Representations

3.5 Division

3.6 Square Root

3.7 Fixed-Point versus Floating-Point

3.8 Conclusions

Bibliography

4 Technology Review

4.1 Introduction

4.2 Implications of Technology Scaling

4.3 Architecture and Programmability

4.4 DSP Functionality Characteristics

4.5 Microprocessors

4.6 DSP Processors

4.7 Graphical Processing Units

4.8 System-on-Chip Solutions

4.9 Heterogeneous Computing Platforms

4.10 Conclusions

Bibliography

5 Current FPGA Technologies

5.1 Introduction

5.2 Toward FPGAs

5.3 Altera Stratix

V and 10 FPGA Family

5.4 Xilinx Ultrascale

/Virtex-7 FPGA families

5.5 Xilinx Zynq FPGA Family

5.6 Lattice iCE40isp FPGA Family

5.7 MicroSemi RTG4 FPGA Family

5.8 Design Stratregies for FPGA-based DSP Systems

5.9 Conclusions

Bibliography

6 Detailed FPGA Implementation Techniques

6.1 Introduction

6.2 FPGA Functionality

6.3 Mapping to LUT-Based FPGA Technology

6.4 Fixed-Coefficient DSP

6.5 Distributed Arithmetic

6.6 Reduced-Coefficient Multiplier

6.7 Conclusions

Bibliography

7 Synthesis Tools for FPGAs

7.1 Introduction

7.2 High-Level Synthesis

7.3 Xilinx Vivado

7.4 Control Logic Extraction Phase Example

7.5 Altera SDK for OpenCL

7.6 Other HLS Tools

7.7 Conclusions

Bibliography

8 Architecture Derivation for FPGA-based DSP Systems

8.1 Introduction

8.2 DSP Algorithm Characteristics

8.3 DSP Algorithm Representations

8.4 Pipelining DSP Systems

8.5 Parallel Operation

8.6 Conclusions

Bibliography

9 Complex DSP Core Design for FPGA

9.1 Introduction

9.2 Motivation for Design for Reuse

9.3 Intellectual Property Cores

9.4 Evolution of IP cores

9.5 Parameterizable (Soft) IP Cores

9.6 IP Core Integration

9.7 Current FPGA-based IP cores

9.8 Watermarking IP

9.9 Summary

Bibliography

10 Advanced Model-Based FPGA Accelerator Design

10.1 Introduction

10.2 Dataflow Modeling of DSP Systems

10.3 Architectural Synthesis of Custom Circuit Accelerators from DFGs

10.4 Model-Based Development of Multi-Channel Dataflow Accelerators

10.5 Model-Based Development for Memory-Intensive Accelerators

10.6 Summary

Notes

Bibliography

11 Adaptive Beamformer Example

11.1 Introduction to Adaptive Beamforming

11.2 Generic Design Process

11.3 Algorithm to Architecture

11.4 Efficient Architecture Design

11.5 Generic QR Architecture

11.6 Retiming the Generic Architecture

11.7 Parameterizable QR Architecture

11.8 Generic Control

11.9 Beamformer Design Example

11.10 Summary

Bibliography

12 FPGA Solutions for Big Data Applications

12.1 Introduction

12.2 Big Data

12.3 Big Data Analytics

12.4 Acceleration

12.5

-Means Clustering FPGA Implementation

12.6 FPGA-Based Soft Processors

12.7 System Hardware

12.8 Conclusions

Bibliography

13 Low-Power FPGA Implementation

13.1 Introduction

13.2 Sources of Power Consumption

13.3 FPGA Power Consumption

13.4 Power Consumption Reduction Techniques

13.5 Dynamic Voltage Scaling in FPGAs

13.6 Reduction in Switched Capacitance

13.7 Final Comments

Bibliography

14 Conclusions

14.1 Introduction

14.2 Evolution in FPGA Design Approaches

14.3 Big Data and the Shift toward Computing

14.4 Programming Flow for FPGAs

14.5 Support for Floating-Point Arithmetic

14.6 Memory Architectures

Bibliography

Index

EULA

List of Tables

Chapter 1

Table 1.1

Chapter 3

Table 3.1

Table 3.2

Table 3.3

Table 3.4

Table 3.5

Table 3.6

Table 3.7

Table 3.8

Table 3.9

Table 3.10

Table 3.11

Chapter 4

Table 4.1

Table 4.2

Chapter 5

Table 5.1

Table 5.2

Table 5.3

Table 5.4

Chapter 6

Table 6.1

Table 6.2

Table 6.3

Table 6.4

Chapter 8

Table 8.1

Table 8.2

Table 8.3

Table 8.4

Table 8.5

Chapter 9

Table 9.1

Chapter 10

Table 10.1

Table 10.2

Table 10.3

Table 10.4

Table 10.5

Table 10.6

Table 10.7

Table 10.8

Chapter 11

Table 11.1

Table 11.2

Table 11.3

Table 11.4

Table 11.5

Table 11.6

Table 11.7

Table 11.8

Chapter 12

Table 12.1

Table 12.2

Table 12.3

Table 12.4

Table 12.5

Chapter 13

Table 13.1

Table 13.2

Table 13.3

Table 13.4

Table 13.5

Guide

Cover

Table of Contents

Preface

Pages

xvi

xvii

xviii

xix

xxi

xxii

xxiii

xxiv

xxv

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

250

251

252

253

254

255

256

258

259

260

261

262

264

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

325

Preface

DSP and FPGAs

Digital signal processing (DSP) is the cornerstone of many products and services in the digital age. It is used in applications such as high-definition TV, mobile telephony, digital audio, multimedia, digital cameras, radar, sonar detectors, biomedical imaging, global positioning, digital radio, speech recognition, to name but a few! The evolution of DSP solutions has been driven by application requirements which, in turn, have only been possible to realize because of developments in silicon chip technology. Currently, a mix of programmable and dedicated system-on-chip (SoC) solutions are required for these applications and thus this has been a highly active area of research and development over the past four decades.

The result has been the emergence of numerous technologies for DSP implementation, ranging from simple microcontrollers right through to dedicated SoC solutions which form the basis of high-volume products such as smartphones. With the architectural developments that have occurred in field programmable gate arrays (FPGAs) over the years, it is clear that they should be considered as a viable DSP technology. Indeed, developments made by FPGA vendors would support this view of their technology. There are strong commercial pressures driving adoption of FPGA technology across a range of applications and by a number of commercial drivers.

The increasing costs of developing silicon technology implementations have put considerable pressure on the ability to create dedicated SoC systems. In the mobile phone market, volumes are such that dedicated SoC systems are required to meet stringent energy requirements, so application-specific solutions have emerged which vary in their degree of programmability, energy requirements and cost. The need to balance these requirements suggests that many of these technologies will coexist in the immediate future, and indeed many hybrid technologies are starting to emerge. This, of course, creates a considerable interest in using technology that is programmable as this acts to considerably reduce risks in developing new technologies.

Commonly used DSP technologies encompass software programmable solutions such as microcontrollers and DSP microprocessors. With the inclusion of dedicated DSP processing engines, FPGA technology has now emerged as a strong DSP technology. Their key advantage is that they enable users to create system architectures which allow the resources to be best matched to the system processing needs. Whilst memory resources are limited, they have a very high-bandwidth, on-chip capability. Whilst the prefabricated aspect of FPGAs avoids many of the deep problems met when developing SoC implementations, the creation of an efficient implementation from a DSP system description remains a highly convoluted problem which is a core theme of this book.

Book Coverage

The book looks to address FPGA-based DSP systems, considering implementation at numerous levels.

Circuit-level

optimization techniques that allow the underlying FPGA fabric to be used more intelligently are reviewed first. By considering the detailed underlying FPGA platform, it is shown how system requirements can be mapped to provide an area-efficient, faster implementation. This is demonstrated for a number of DSP transforms and fixed coefficient filtering.

Architectural

solutions can be created from a signal flow graph (SFG) representation. In effect, this requires the user to exploit the highly regular, highly computative, data-independent nature of DSP systems to produce highly parallel, pipelined FPGA-based circuit architectures. This is demonstrated for filtering and beamforming applications.

System

solutions are now a challenge as FPGAs have now become a heterogeneous platform involving multiple hardware and software components and interconnection fabrics. There is a need for a higher-level system modeling language, e.g. dataflow which will facilitate architectural optimizations but also to address system-level considerations such as interconnection and memory.

The book covers these areas of FPGA implementation, but its key differentiating factor is that it concentrates on the second and third areas listed above, namely the creation of circuit architectures and system-level modeling; this is because circuit-level optimization techniques have been covered in greater detail elsewhere. The work is backed up with the authors’ experiences in implementing practical real DSP systems and covers numerous examples including an adaptive beamformer based on a QR-based recursive least squares (RLS) filter, finite impulse response (FIR) and infinite impulse response (IIR) filters, a full search motion estimation and a fast Fourier transform (FFT) system for electronic support measures. The book also considers the development of intellectual property (IP) cores as this has become a critical aspect in the creation of DSP systems. One chapter is given over to describing the creation of such IP cores and another to the creation of an adaptive filtering core.

Audience

The book is aimed at working engineers who are interested in using FPGA technology efficiently in signal and data processing applications. The earlier chapters will be of interest to graduates and students completing their studies, taking the readers through a number of simple examples that show the trade-off when mapping DSP systems into FPGA hardware. The middle part of the book contains a number of illustrative, complex DSP system examples that have been implemented using FPGAs and whose performance clearly illustrates the benefit of their use. They provide insights into how to best use the complex FPGA technology to produce solutions optimized for speed, area and power which the authors believe is missing from current literature. The book summarizes over 30 years of learned experience of implementing complex DSP systems undertaken in many cases with commercial partners.

Second Edition Updates

The second edition has been updated and improved in a number of ways. It has been updated to reflect technology evolutions in FPGA technology, to acknowledge developments in programming and synthesis tools, to reflect on algorithms for Big Data applications, and to include improvements to some background chapters. The text has also been updated using relevant examples where appropriate.

Technology update: As FPGAs are linked to silicon technology advances, their architecture continually changes, and this is reflected in Chapter 5. A major change is the inclusion of the ARM® processor core resulting in a shift for FPGAs to a heterogeneous computing platform. Moreover, the increased use of graphical processing units (GPUs) in DSP systems is reflected in Chapter 4.

Programming tools update: Since the first edition was published, there have been a number of innovations in tool developments, particularly in the creation of commercial C-based high-level synthesis (HLS) and open computing language (OpenCL) tools. The material in Chapter 7 has been updated to reflect these changes, and Chapter 10 has been changed to reflect the changes in model-based synthesis tools.

“Big Data” processing: DSP involves processing of data content such as audio, speech, music and video information, but there is now great interest in collating huge data sets from on-line facilities and processing them quickly. As FPGAs have started to gain some traction in this area, a new chapter, Chapter 12, has been added to reflect this development.

Organization

The FPGA is a heterogeneous platform comprising complex resources such as hard and soft processors, dedicated blocks optimized for processing DSP functions and processing elements connected by both programmable and fast, dedicated interconnections. The book focuses on the challenges of implementing DSP systems on such platforms with a concentration on the high-level mapping of DSP algorithms into suitable circuit architectures.

The material is organized into three main sections.

First Section: Basics of DSP, Arithmetic and Technologies

Chapter 2 starts with a DSP primer, covering both FIR and IIR filtering, transforms including the FFT and discrete cosine transform (DCT) and concluding with adaptive filtering algorithms, covering both the least mean squares (LMS) and RLS algorithms. Chapter 3 is dedicated to computer arithmetic and covers number systems, arithmetic functions and alternative number representations such as logarithmic number representations (LNS) and coordinate rotation digital computer (CORDIC). Chapter 4 covers the technologies available to implement DSP algorithms and includes microprocessors, DSP microprocessors, GPUs and SoC architectures, including systolic arrays. In Chapter 5, a detailed description of commercial FPGAs is given with a concentration on the two main vendors, namely Xilinx and Altera, specifically their UltraScaleTM/Zynq® and Stratix® 10 FPGA families respectively, but also covering technology offerings from Lattice and MicroSemi.

Second Section: Architectural/System-Level Implementation

This section covers efficient implementation from circuit architecture onto specific FPGA families; creation of circuit architecture from SFG representations; and system-level specification and implementation methodologies from high-level representations. Chapter 6 covers only briefly the efficient implementation of FPGA designs from circuit architecture descriptions as many of these approaches have been published; the text covers distributed arithmetic and reduced coefficient multiplier approaches and shows how these have been applied to fixed coefficient filters and DSP transforms. Chapter 7 covers HLS for FPGA design including new sections to reflect Xilinx’s Vivado HLS tool flow and also Altera’s OpenCL approach. The process of mapping SFG representations of DSP algorithms onto circuit architectures (the starting point in Chapter 6) is then described in Chapter 8. It shows how dataflow graph (DFG) descriptions can be transformed for varying levels of parallelism and pipelining to create circuit architectures which best match the application requirements, backed up with simple FIR and IIR filtering examples.

One of the ways to perform system design is to create predefined designs termed IP cores which will typically have been optimized using the techniques outlined in Chapter 8. The creation of such IP cores is outlined in Chapter 9 and acts to address the key to design productivity by encouraging “design for reuse.” Chapter 10 considers model-based design for heterogeneous FPGA and focuses on dataflow modeling as a suitable design approach for FPGA-based DSP systems. The chapter outlines how it is possible to include pipelined IP cores via the white box concept using two examples, namely a normalized lattice filter (NLF) and a fixed beamformer example.

Third Section: Applications to Big Data, Low Power

The final section of the book, consisting of Chapters 11–13, covers the application of the techniques. Chapter 11 looks at the creation of a soft, highly parameterizable core for RLS filtering, showing how a generic architecture can be created to allow a range of designs to be synthesized with varying performance. Chapter 12 illustrates how FPGAs can be applied to Big Data applications where the challenge is to accelerate some complex processing algorithms. Increasingly FPGAs are seen as a low-power solution, and FPGA power consumption is discussed in Chapter 13. The chapter starts with a discussion on power consumption, highlights the importance of dynamic and static power consumption, and then describes some techniques to reduce power consumption.

Acknowledgments

The authors have been fortunate to receive valuable help, support and suggestions from numerous colleagues, students and friends, including: Michaela Blott, Ivo Bolsens, Gordon Brebner, Bill Carter, Joe Cavallaro, Peter Cheung, John Gray, Wayne Luk, Bob Madahar, Alan Marshall, Paul McCambridge, Satnam Singh, Steve Trimberger and Richard Walke.

The authors’ research has been funded from a number of sources, including the Engineering and Physical Sciences Research Council, Xilinx, Ministry of Defence, Qinetiq, BAE Systems, Selex and Department of Employment and Learning for Northern Ireland.

Several chapters are based on joint work that was carried out with the following colleagues and students: Moslem Amiri, Burak Bardak, Kevin Colgan, Tim Courtney, Scott Fischaber, Jonathan Francey, Tim Harriss, Jean-Paul Heron, Colm Kelly, Bob Madahar, Eoin Malins, Stephen McKeown, Karen Rafferty, Darren Reilly, Lok-Kee Ting, David Trainor, Richard Turner, Fahad M Siddiqui and Richard Walke.

The authors thank Ella Mitchell and Nithya Sechin of John Wiley & Sons and Alex Jackson and Clive Lawson for their personal interest and help and motivation in preparing and assisting in the production of this work.

List of Abbreviations

One-dimensional

Two-dimensional

ABR

Auditory brainstem response

ACC

Accumulator

ADC

Analogue-to-digital converter

AES

Advanced encryption standard

ALM

Adaptive logic module

ALU

Arithmetic logic unit

ALUT

Adaptive lookup table

AMD

Advanced Micro Devices

ANN

Artificial neural network

AoC

Analytics-on-chip

API

Application program interface

APU

Application processing unit

ARM

Advanced RISC machine

ASIC

Application-specific integrated circuit

ASIP

Application-specific instruction processor

AVS

Adaptive voltage scaling

Boundary cell

BCD

Binary coded decimal

BCLA

Block CLA with intra-group, carry ripple

BRAM

Block random access memory

CAPI

Coherent accelerator processor interface

Current block

CCW

Control and communications wrapper

Clock enable

CISC

Complex instruction set computer

CLA

Carry lookahead adder

CLB

Configurable logic block

CNN

Convolutional neural network

CMOS

Complementary metal oxide semiconductor

CORDIC

Coordinate rotation digital computer

CPA

Carry propagation adder

CPU

Central processing unit

CSA

Conditional sum adder

CSDF

Cyclo-static dataflow

CWT

Continuous wavelet transform

Distributed arithmetic

DCT

Discrete cosine transform

DDR

Double data rate

DES

Data Encryption Standard

DFA

Dataflow accelerator

DFG

Dataflow graph

DFT

Discrete Fourier transform

Dependence graph

disRAM

Distributed random access memory

Data memory

DPN

Dataflow process network

DRx

Digital receiver

DSP

Digital signal processing

DST

Discrete sine transform

DTC

Decision tree classification

DVS

Dynamic voltage scaling

DWT

Discrete wavelet transform

PROM

Electrically erasable programmable read-only memory

EBR

Embedded Block RAM

ECC

Error correction code

EEG

Electroencephalogram

EPROM

Electrically programmable read-only memory

E-SGR

Enhanced Squared Givens rotation algorithm

Electronic warfare

FBF

Fixed beamformer

FCCM

FPGA-based custom computing machine

Functional engine

FEC

Forward error correction

FFE

Free-form expression

FFT

Fast Fourier transform

FIFO

First-in, first-out

FIR

Finite impulse response

FPGA

Field programmable gate array

FPL

Field programmable logic

FPU

Floating-point unit

FSM

Finite state machine

FSME

Full search motion estimation

GFLOPS

Giga floating-point operations per second

GMAC

Giga multiply-accumulates

GMACS

Giga multiply-accumulate per second

GOPS

Giga operations per second

GPUPU

General-purpose graphical processing unit

GPU

Graphical processing unit

GRNN

General regression neural network

GSPS

Gigasamples per second

HAL

Hardware abstraction layer

HDL

Hardware description language

HKMG

High-K metal gate

HLS

High-level synthesis

I2C

Inter-Integrated circuit

I/O

Input/output

Internal cell

Instruction decode

IDE

Integrated design environment

IDFT

Inverse discrete Fourier transform

IEEE

Institute of Electrical and Electronic Engineers

Instruction fetch

IFD

Instruction fetch and decode

IFFT

Inverse fast Fourier transform

IIR

Infinite impulse response

Instruction memory

IoT

Internet of things

Intellectual property

Instruction register

ITRS

International Technology Roadmap for Semiconductors

JPEG

Joint Photographic Experts Group

KCM

Constant-coefficient multiplication

Kernel memory

KPN

Kahn process network

LAB

Logic array blocks

LDCM

Logic delay measurement circuit

LDPC

Low-density parity-check

LLVM

Low-level virtual machine

LMS

Least mean squares

LNS

Logarithmic number representations

LPDDR

Low-power double data rate

Least squares

lsb

Least significant bit

LTI

Linear time-invariant

LUT

Lookup table

Memory access

MAC

Multiply-accumulate

MAD

Minimum absolute difference

MADF

Multidimensional arrayed dataflow

Multiplicand

Motion estimation

MIL-STD

Military standard

MIMD

Multiple instruction, multiple data

MISD

Multiple instruction, single data

MLAB

Memory LAB

MMU

Memory management unit

MoC

Model of computation

MPE

Media processing engine

MPEG

Motion Picture Experts Group

MPSoC

Multi-processing SoC

Multiplier

MR-DFG

Multi-rate dataflow graph

msb

Most significant bit

msd

Most significant digit

MSDF

Multidimensional synchronous dataflow

MSI

Medium-scale integration

MSPS

Megasamples per second

NaN

Not a Number

NLF

Normalized lattice filter

NRE

Non-recurring engineering

OCM

On-chip memory

OFDM

Orthogonal frequency division multiplexing

OFDMA

Orthogonal frequency division multiple access

OLAP

On-line analytical processing

OpenCL

Open computing language

OpenMP

Open multi-processing

ORCC

Open RVC-CAL Compiler

PAL

Programmable Array Logic

Parameter bank

Program counter

PCB

Printed circuit board

PCI

Peripheral component interconnect

Pattern detect

Processing element

Programmable logic

PLB

Programmable logic block

PLD

Programmable logic device

PLL

Phase locked loop

PPT

Programmable power technology

Processing system

QAM

Quadrature amplitude modulation

QR-RLS

QR recursive least squares

RAM

Random access memory

RAN

Radio access network

RCLA

Block CLA with inter-block ripple

RCM

Reduced coefficient multiplier

RISC

Reduced instruction set computer

RLS

Recursive least squares

RNS

Residue number representations

ROM

Read-only memory

Radiation tolerant

RTL

RVC

Reconfigurable video coding

SBNR

Signed binary number representation

SCU

Snoop control unit

Signed digits

SDF

Synchronous dataflow

SDK

Software development kit

SDNR

Signed digit number representation

SDP

Simple dual-port

SERDES

Serializer/deserializer

SEU

Single event upset

SFG

Signal flow graph

SGR

Squared Givens rotation

SIMD

Single instruction, multiple data

SISD

Single instruction, single data

SMP

Shared-memory multi-processors

SNR

Signal-to-noise ratio

SoC

System-on-chip

SOCMINT

Social media intelligence

SoPC

System on programmable chip

SPI

Serial peripheral interface

SQL

Structured query language

SR-DFG

Single-rate dataflow graph

SRAM

Static random access memory

SRL

Shift register lookup table

SSD

Shifted signed digits

SVM

Support vector machine

Search window

TCP

Transmission Control Protocol

TFLOPS

Tera floating-point operations per second

TOA

Time of arrival

Throughout rate

TTL

Transistor-transistor logic

UART

Universal asynchronous receiver/transmitter

ULD

Ultra-low density

UML

Unified modeling language

VHDL

VHSIC hardware description language

VHSIC

Very high-speed integrated circuit

VLIW

Very long instruction word

VLSI

Very large scale integration

WBC

White box component

WDF

Wave digital filter

1Introduction to Field Programmable Gate Arrays

1.1 Introduction

Electronics continues to make an impact in the twenty-first century and has given birth to the computer industry, mobile telephony and personal digital entertainment and services industries, to name but a few. These markets have been driven by developments in silicon technology as described by Moore’s law (Moore 1965), which is represented pictorially in Figure 1.1. This has seen the number of transistors double every 18 months. Moreover, not only has the number of transistors doubled at this rate, but also the costs have decreased, thereby reducing the cost per transistor at every technology advance.

Figure 1.1 Moore’s law

In the 1970s and 1980s, electronic systems were created by aggregating standard components such as microprocessors and memory chips with digital logic components, e.g. dedicated integrated circuits along with dedicated input/output (I/O) components on printed circuit boards (PCBs). As levels of integration grew, manufacturing working PCBs became more complex, largely due to greater component complexity in terms of the increase in the number of transistors and I/O pins. In addition, the development of multi-layer boards with as many as 20 separate layers increased the design complexity. Thus, the probability of incorrectly connecting components grew, particularly as the possibility of successfully designing and testing a working system before production was coming under greater and greater time pressures.

The problem became more challenging as system descriptions evolved during product development. Pressure to create systems to meet evolving standards, or that could change after board construction due to system alterations or changes in the design specification, meant that the concept of having a “fully specified” design, in terms of physical system construction and development on processor software code, was becoming increasingly challenging. Whilst the use of programmable processors such as microcontrollers and microprocessors gave some freedom to the designer to make alterations in order to correct or modify the system after production, this was limited. Changes to the interconnections of the components on the PCB were restricted to I/O connectivity of the processors themselves. Thus the attraction of using programmability interconnection or “glue logic” offered considerable potential, and so the concept of field programmable logic (FPL), specifically field programmable gate array (FPGA) technology, was born.

From this unassuming start, though, FPGAs have grown into a powerful technology for implementing digital signal processing (DSP) systems. This emergence is due to the integration of increasingly complex computational units into the fabric along with increasing complexity and number of levels in memory. Coupled with a high level of programmable routing, this provides an impressive heterogeneous platform for improved levels of computing. For the first time ever, we have seen evolutions in heterogeneous FPGA-based platforms from Microsoft, Intel and IBM. FPGA technology has had an increasing impact on the creation of DSP systems. Many FPGA-based solutions exist for wireless base station designs, image processing and radar systems; these are, of course, the major focus of this text.

Microsoft has developed acceleration of the web search engine Bing using FPGAs and shows improved ranking throughput in a production search infrastructure. IBM and Xilinx have worked closely together to show that they can accelerate the reading of data from web servers into databases by applying an accelerated Memcache2; this is a general-purpose distributed memory caching system used to speed up dynamic database-driven searches (Blott and Vissers 2014). Intel have developed a multicore die with Altera FPGAs, and their recent purchase of the company (Clark 2015) clearly indicates the emergence of FPGAs as a core component in heterogeneous computing with a clear target for data centers.

1.2 Field Programmable Gate Arrays

The FPGA concept emerged in 1985 with the XC2064™ FPGA family from Xilinx. At the same time, a company called Altera was also developing a programmable device, later to become the EP1200, which was the first high-density programmable logic device (PLD). Altera’s technology was manufactured using 3-μm complementary metal oxide semiconductor (CMOS) electrically programmable read-only memory (EPROM) technology and required ultraviolet light to erase the programming, whereas Xilinx’s technology was based on conventional static random access memory (SRAM) technology and required an EPROM to store the programming.

The co-founder of Xilinx, Ross Freeman, argued that with continuously improving silicon technology, transistors were going to become cheaper and cheaper and could be used to offer programmability. This approach allowed system design errors which had only been recognized at a late stage of development to be corrected. By using an FPGA to connect the system components, the interconnectivity of the components could be changed as required by simply reprogramming them. Whilst this approach introduced additional delays due to the programmable interconnect, it avoided a costly and time-consuming PCB redesign and considerably reduced the design risks.

At this stage, the FPGA market was populated by a number of vendors, including Xilinx, Altera, Actel, Lattice, Crosspoint, Prizm, Plessey, Toshiba, Motorola, Algotronix and IBM. However, the costs of developing technologies not based on conventional integrated circuit design processes and the need for programming tools saw the demise of many of these vendors and a reduction in the number of FPGA families. SRAM technology has now emerged as the dominant technology largely due to cost, as it does not require a specialist technology. The market is now dominated by Xilinx and Altera, and, more importantly, the FPGA has grown from a simple glue logic component to a complete system on programmable chip (SoPC) comprising on-board physical processors, soft processors, dedicated DSP hardware, memory and high-speed I/O.

The FPGA evolution was neatly described by Steve Trimberger in his FPL2007 plenary talk (see the summary in Table 1.1). The evolution of the FPGA can be divided into three eras. The age of invention was when FPGAs started to emerge and were being used as system components typically to provide programmable interconnect giving protection to design evolutions and variations. At this stage, design tools were primitive, but designers were quite happy to extract the best performance by dealing with lookup tables (LUTs) or single transistors.

Table 1.1 Three ages of FPGAs

Period

Age

Comments

1984–1991

Invention

Technology is limited, FPGAs are much smaller than the application problem size. Design automation is secondary, architecture efficiency is key

1992–1999

Expansion

FPGA size approaches the problem size. Ease of design becomes critical

2000–present

Accumulation

FPGAs are larger than the typical problem size. Logic capacity limited by I/O bandwidth

As highlighted above, there was a rationalization of the technologies in the early 1990s, referred to by Trimberger as the great architectural shakedown. The age of expansion was when the FPGA started to approach the problem size and thus design complexity was key. This meant that it was no longer sufficient for FPGA vendors to just produce place and route tools and it became critical that hardware description languages (HDLs) and associated synthesis tools were created. The final evolution period was the period of accumulation when FPGAs started to incorporate processors and high-speed interconnection. Of course, this is very relevant now and is described in more detail in Chapter 5 where the recent FPGA offerings are reviewed.

This has meant that the FPGA market has grown from nothing in just over 20 years to become a key player in the IC industry, worth some $3.9 billion in 2014 and expected to be worth around $7.3 billion in 2022 (MarketsandMarkets 2016). It has been driven by the growth in the automotive sector, mobile devices in the consumer electronics sector and the number of data centers.

1.2.1 Rise of Heterogeneous Computing Platforms

Whilst Moore’s law is presented here as being the cornerstone for driving FPGA evolution and indeed electronics, it also has been the driving force for computing. However, all is not well with computing’s reliance on silicon technology. Whilst the number of transistors continues to double, the scaling of clock speed has not continued at the same rate. This is due to the increase in power consumption, particularly the increase in static power. The issue of the heat dissipation capability of packaging means that computing platform providers such as Intel have limited their processor power to 30 W. This resulted in an adjustment in the prediction for clock rates between 2005 and 2011 (as illustrated in Figure 1.2) as clock rate is a key contributor to power consumption (ITRS 2005).

Figure 1.2 Change in ITRS scaling prediction for clock frequencies

In 2005, the International Technology Roadmap for Semiconductors (ITRS) predicted that a 100 GHz clock would be achieved in 2020, but this estimation had to be revised first in 2007 and then again in 2011. This has been seen in the current technology where a clock rate of some 30 GHz was expected in 2015 based on the original forecast, but we see that speeds have been restricted to 3–4 GHz. This has meant that the performance per gigahertz has effectively stalled since 2005 and has generated the interest by major computing companies in exploring different architectures that employ FPGA technology (Putnam et al. 2014; Blott and Vissers 2014).

1.2.2 Programmability and DSP

On many occasions, the growth indicated by Moore’s law has led people to argue that transistors are essentially free and therefore can be exploited, as in the case of programmable hardware, to provide additional flexibility. This could be backed up by the observation that the cost of a transistor has dropped from one-tenth of a cent in the 1980s to one-thousandth of a cent in the 2000s. Thus we have seen the introduction of hardware programmability into electronics in the form of FPGAs.

In order to make a single transistor programmable in an SRAM technology, the programmability is controlled by storing a “1” or a “0” on the gate of the transistor, thereby making it conduct or not. This value is then stored in an SRAM cell which, if it requires six transistors, will will mean that we need seven transistors to achieve one programmable equivalent in FPGA. The reality is that in an overall FPGA implementation, the penalty is nowhere as harsh as this, but it has to be taken into consideration in terms of ultimate system cost.

It is the ability to program the FPGA hardware after fabrication that is the main appeal of the technology; this provides a new level of reassurance in an increasingly competitive market where “right first time” system construction is becoming more difficult to achieve. It would appear that that assessment was vindicated in the late 1990s and early 2000s: when there was a major market downturn, the FPGA market remained fairly constant when other microelectronic technologies were suffering. Of course, the importance of programmability has already been demonstrated by the microprocessor, but this represented a new change in how programmability was performed.

The argument developed in the previous section presents a clear advantage of FPGA technology in overcoming PCB design errors and manufacturing faults. Whilst this might have been true in the early days of FPGA technology, evolution in silicon technology has moved the FPGA from being a programmable interconnection technology to making it into a system component. If the microprocessor or microcontroller was viewed as programmable system component, the current FPGA devices must also be viewed in this vein, giving us a different perspective on system implementation.

In electronic system design, the main attraction of the microprocessor is that it considerably lessens the risk of system development. As the hardware is fixed, all of the design effort can be concentrated on developing the code. This situation has been complemented by the development of efficient software compilers which have largely removed the need for the designer to create assembly language; to some extent, this can even absolve the designer from having a detailed knowledge of the microprocessor architecture (although many practitioners would argue that this is essential to produce good code). This concept has grown in popularity, and embedded microprocessor courses are now essential parts of any electrical/electronic or computer engineering degree course.

A lot of this process has been down to the software developer’s ability to exploit an underlying processor architecture, the von Neumann architecture. However, this advantage has also been the limiting factor in its application to the topic of this text, namely DSP. In the von Neumann architecture, operations are processed sequentially, which allows relatively straightforward interpretation of the hardware for programming purposes; however, this severely limits the performance in DSP applications which exhibit high levels of parallelism and have operations that are highly data-independent. This cries out for parallel realization, and whilst DSP microprocessors go some way toward addressing this situation by providing concurrency in the form of parallel hardware and software “pipelining,” there is still the concept of one architecture suiting all sizes of the DSP problem.

This limitation is overcome in FPGAs as they allow what can be considered to be a second level of programmability, namely programming of the underlying processor architecture. By creating an architecture that best meets the algorithmic requirements, high levels of performance in terms of area, speed and power can be achieved. This concept is not new as the idea of deriving a system architecture to suit algorithmic requirements has been the cornerstone of application-specific integrated circuit (ASIC) implementations. In high volumes, ASIC implementations have resulted in the most cost-effective, fastest and lowest-energy solutions. However, increasing mask costs and the impact of “right first time” system realization have made the FPGA a much more attractive alternative.

In this sense, FPGAs capture the performance aspects offered by ASIC implementation, but with the advantage of programmability usually associated with programmable processors. Thus, FPGA solutions have emerged which currently offer several hundreds of giga operations per second (GOPS) on a single FPGA for some DSP applications, which is at least an order of magnitude better performance than microprocessors.

1.3 Influence of Programmability

In many texts, Moore’s law is used to highlight the evolution of silicon technology, but another interesting viewpoint particularly relevant for FPGA technology is Makimoto’s wave, which was first published in the January 1991 edition of Electronics Weekly. It is based on an observation by Tsugio Makimoto who noted that technology has shifted between standardization and customization. In the 1960s, 7400 TTL series logic chips were used to create applications; and then in the early 1970s, the custom large-scale integration era emerged where chips were created (or customized) for specific applications such as the calculator. The chips were now increasing in their levels of integration and so the term “medium-scale integration” (MSI) was born. The evolution of the microprocessor in the 1970s saw the swing back towards standardization where one “standard” chip was used for a wide range of applications.

The 1980s then saw the birth of ASICs where designers could overcome the fact that the sequential microprocessor posed severe limitations in DSP applications where higher levels of computations were needed. The DSP processor also emerged, such as the TMS32010, which differed from conventional processors as they were based on the Harvard architecture which had separate program and data memories and separate buses. Even with DSP processors, ASICs offered considerable potential in terms of processing power and, more importantly, power consumption. The development of the FPGA from a “glue component” that allowed other components to be connected together to form a system to become a component or even a system itself led to its increased popularity.

The concept of coupling microprocessors with FPGAs in heterogeneous platforms was very attractive as this represented a completely programmable platform with microprocessors to implement the control-dominated aspects of DSP systems and FPGAs to implement the data-dominated aspects. This concept formed the basis of FPGA-based custom computing machines (FCCMs) which formed the basis for “configurable” or reconfigurable computing (Villasenor and Mangione-Smith 1997). In these systems, users could not only implement computational complex algorithms in hardware, but also use the programmability aspect of the hardware to change the system functionality, allowing the development of “virtual hardware” where hardware could ‘virtually” implement systems that are an order of magnitude larger (Brebner 1997).

We would argue that there have been two programmability eras. The first occurred with the emergence of the microprocessor in the 1970s, where engineers could develop programmable solutions based on this fixed hardware. The major challenge at this time was the software environments; developers worked with assembly language, and even when compilers and assemblers emerged for C, best performance was achieved by hand-coding. Libraries started to appear which provided basic common I/O functions, thereby allowing designers to concentrate on the application. These functions are now readily available as core components in commercial compilers and assemblers. The need for high-level languages grew, and now most programming is carried out in high-level programming languages such as C and Java, with an increased use of even higher-level environments such as the unified modeling language (UML).

The second era of programmability was ushered in by FPGAs. Makimoto indicates that field programmability is standardized in manufacture and customized in application. This can be considered to have offered hardware programmability if you think in terms of the first wave as the programmability in the software domain where the hardware remains fixed. This is a key challenge as most computer programming tools work on the fixed hardware platform principle, allowing optimizations to be created as there is clear direction on how to improve performance from an algorithmic representation. With FPGAs, the user is given full freedom to define the architecture which best suits the application. However, this presents a problem in that each solution must be handcrafted and every hardware designer knows the issues in designing and verifying hardware designs!

Some of the trends in the two eras have similarities. In the early days, schematic capture was used to design early circuits, which was synonymous with assembly-level programming. Hardware description languages such as VHSIC Hardware Description Language (VHDL) and Verilog then started to emerge that could used to produce a higher level of abstraction, with the current aim to have C-based tools such as SystemC and Catapult® from Mentor Graphics as a single software-based programming environment (Very High Speed Integrated Circuit (VHSIC) was a US Department of Defense funded program in the late 1970s and early 1980s with the aim of producing the next generation of integrated circuits). Initially, as with software programming languages, there was mistrust in the quality of the resulting code produced by these approaches.

With the establishment of improved cost-effectiveness, synthesis tools are equivalent to the evolution of efficient software compilers for high-level programming languages, and the evolution of library functions allowed a high degree of confidence to be subsequently established; the use of HDLs is now commonplace for FPGA implementation. Indeed, the emergence of intellectual property (IP) cores mirrored the evolution of libraries such as I/O programming functions for software flows; they allowed common functions to be reused as developers trusted the quality of the resulting implementation produced by such libraries, particularly as pressures to produce more code within the same time-span grew. The early IP cores emerged from basic function libraries into complex signal processing and communications functions such as those available from the FPGA vendors and the various web-based IP repositories.

1.4 Challenges of FPGAs

In the early days, FPGAs were seen as glue logic chips used to plug components together to form complex systems. FPGAs then increasingly came to be seen as complete systems in themselves, as illustrated in Table 1.1