101,99 €
Design for Embedded Image Processing on FPGAs Bridge the gap between software and hardware with this foundational design reference Field-programmable gate arrays (FPGAs) are integrated circuits designed so that configuration can take place. Circuits of this kind play an integral role in processing images, with FPGAs increasingly embedded in digital cameras and other devices that produce visual data outputs for subsequent realization and compression. These uses of FPGAs require specific design processes designed to mediate smoothly between hardware and processing algorithm. Design for Embedded Image Processing on FPGAs provides a comprehensive overview of these processes and their applications in embedded image processing. Beginning with an overview of image processing and its core principles, this book discusses specific design and computation techniques, with a smooth progression from the foundations of the field to its advanced principles. Readers of the second edition of Design for Embedded Image Processing on FPGAs will also find: * Detailed discussion of image processing techniques including point operations, histogram operations, linear transformations, and more * New chapters covering Deep Learning algorithms and Image and Video Coding * Example applications throughout to ground principles and demonstrate techniques Design for Embedded Image Processing on FPGAs is ideal for engineers and academics working in the field of Image Processing, as well as graduate students studying Embedded Systems Engineering, Image Processing, Digital Design, and related fields.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 1292
Veröffentlichungsjahr: 2023
Second Edition
Donald G. Bailey
Massey University, Palmerston North, New Zealand
This second edition first published 2024© 2024 John Wiley & Sons, Ltd
Edition HistoryJohn Wiley & Sons (Asia) Pte Ltd (1e, 2011)
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Donald G. Bailey to be identified as the author of this work has been asserted in accordance with law.
Registered Office(s)
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Trademarks
Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data
Names: Bailey, Donald G. (Donald Graeme), 1962‐ author.
Title: Design for embedded image processing on FPGAs / Donald G. Bailey.
Description: Second edition. | Hoboken, NJ : Wiley, 2024. | Includes index.
Identifiers: LCCN 2023011469 (print) | LCCN 2023011470 (ebook) | ISBN 9781119819790 (cloth) | ISBN 9781119819806 (adobe pdf) | ISBN 9781119819813 (epub)
Subjects: LCSH: Embedded computer systems. | Field programmable gate arrays.
Classification: LCC TK7895.E42 B3264 2024 (print) | LCC TK7895.E42 (ebook) | DDC 621.39/5–dc23/eng/20230321
LC record available at https://lccn.loc.gov/2023011469
LC ebook record available at https://lccn.loc.gov/2023011470
Cover Design: Wiley
Cover Image: © bestfoto77/Shutterstock
Image processing, and in particular embedded image processing, faces many challenges, from increasing resolution, increasing frame rates, and the need to operate at low power. These offer significant challenges for implementation on conventional software‐based platforms. This leads naturally to considering field‐programmable gate arrays (FPGAs) as an implementation platform for embedded imaging applications. Many image processing operations are inherently parallel, and FPGAs provide programmable hardware, also inherently parallel. Therefore, it should be as simple as mapping one onto the other, right? Well, yes …and no.
Image processing is traditionally thought of as a software domain task, whereas FPGA‐based design is firmly in the hardware domain. There are a lot of tricks and techniques required to create an efficient design. Perhaps the biggest hurdle to an efficient implementation is the need for a hardware mindset. To bridge the gap between software and hardware, it is necessary to think of algorithms not on their own but more in terms of their underlying computational architecture. Implementing an image processing algorithm (or indeed any algorithm) on an FPGA therefore consists of determining the underlying architecture of an algorithm, mapping that architecture onto the resources available within an FPGA, and finally mapping the algorithm onto the hardware architecture. While the mechanics of this process is mostly automated by high‐level synthesis tools, the underlying design is not. A low‐quality design can only go so far; it is still important to keep in mind the hardware that is being implied by the code and design the algorithm for the underlying hardware.
Unfortunately, there is limited material available to help those new to the area to get started. While there are many research papers published in conference proceedings and journals, there are only a few that focus specifically on how to map image processing algorithms onto FPGAs. The research papers found in the literature can be classified into several broad groups.
The first focuses on the FPGA architecture itself. Most of these provide an analysis of a range of techniques relating to the structure and granularity of logic blocks, the routing networks, and embedded memories. As well as the FPGA structure, a wide range of topics are covered, including underlying technology, power issues, the effects of process variability, and dynamic reconfigurability. Many of these papers are purely proposals, or relate to prototype FPGAs rather than commercially available chips. While they provide insights as to some of the features which might be available in the next generation of devices, most of the topics within this group are at too low a level.
A second group of papers investigates the topic of reconfigurable computing. Here, the focus is on how an FPGA can be used to accelerate some computationally intensive task or range of tasks. While image processing is one such task considered, most of the research relates more to high‐performance computing rather than low‐power embedded systems. Topics within this group include hardware and software partitioning, hardware and software co‐design, dynamic reconfigurability, communications between an FPGA and central processing unit (CPU), comparisons between the performance of FPGAs, graphics processing units (GPUs) and CPUs, and the design of operating systems and specific platforms for both reconfigurable computing applications and research. Important principles and techniques can be gleaned from many of these papers even though this may not be their primary focus.
The next group of papers considers tools for programming FPGAs and applications, with a focus on improving the productivity of the development process. A wide range of hardware description languages have been proposed, with many modelled after software languages such as C, Java, and even Prolog. Many of these are developed as research tools, with very few making it out of the laboratory to commercial availability. There has also been considerable research on compilation techniques for mapping standard software languages to hardware (high‐level synthesis). Techniques such as loop unrolling, strip mining, and pipelining to produce parallel hardware are important principles that can result in more efficient hardware designs.
The final group of papers focuses on a range of applications, including image processing and the implementation of both image processing operations and systems. Unfortunately, as a result of page limits and space constraints, many of these papers give the results of the implementation of various systems but present relatively few design details. This is especially so in the case of many papers that describe deep learning systems. Often the final product is described, without describing many of the reasons or decisions that led to that design. Many of these designs cannot be recreated without acquiring the specific platform and tools that were used or inferring a lot of the missing details. While some of these details may appear obvious in hindsight, without this knowledge, many are far from obvious just from reading the papers. The better papers in this group tended to have a tighter focus, considering the implementation of a single image processing operation.
So, while there may be a reasonable amount of material available, it is quite diffuse. In many cases, it is necessary to know exactly what you are looking for, or just be lucky to find it. The intention of this book, therefore, is to bring together much of this diverse research (on both FPGA design and image processing) and present it in a systematic way as a reference or guide.
This book is written primarily for those who are familiar with the basics of image processing and want to consider implementing image processing using FPGAs. Perhaps the biggest hurdle is switching from a software mindset to a hardware way of thinking. When we program in software, a good compiler can map the algorithm in the programming language onto the underlying computer architecture relatively efficiently. When programming hardware though, it is not simply a matter of porting the software onto hardware. The underlying hardware architecture needs to be designed as well. In particular, programming hardware usually requires transforming the algorithm into an appropriate parallel architecture, often with significant changes to the algorithm itself. This requires significant design, rather than just decomposition and mapping of the dataflow (as is accomplished by a good high‐level synthesis tool). This book addresses this issue by not only providing algorithms for image processing operations but also discusses both the design process and the underlying architectures that can be used to implement the algorithms efficiently.
This book would also be useful to those with a hardware background, who are familiar with programming and applying FPGAs to other problems, and are considering image processing applications. While many of the techniques are relevant and applicable to a wide range of application areas, most of the focus and examples are taken from image processing. Sufficient detail is given to make many of the algorithms and their implementation clear. However, learning image processing is more than just collecting a set of algorithms, and there are any number of excellent image processing textbooks that provide these.
It is the domain of embedded image processing where FPGAs come into their own. An efficient, low‐power design requires that the techniques of both the hardware engineer and the software engineer be integrated tightly within the final solution.
Although many of the underlying design principles have not changed, the environment has changed quite significantly since the first edition. In general, there has been an increase in the requirements within applications. Resolutions are increasing, with high‐definition television (HDTV) becoming ubiquitous, and growing demand within 4K and 8K resolutions (the 8K format has 33.2 Mpixels). Sensor resolutions are also increasing steadily, with sensors having more than 100 Mpixels becoming available. Frame rates have also been increasing, with up to 120 frames per second being part of the standard for ultra‐high‐definition television (UHDTV). The dynamic range is also increasing from commonly used 8 bits per pixel to 12–16 bits. All of these factors lead to more data to be processed, at a faster rate.
Against this, there has been an increasing awareness of power and sustainability issues. As a low‐power computing platform, FPGAs are well placed to address the power concerns in many applications.
The capabilities of FPGAs have improved significantly as technology improvements enable more to be packed onto them. Not only has there been an increase in the amount of programmable logic and on‐chip memory blocks, but FPGAs are becoming more heterogeneous. Many FPGAs now incorporate significant hardened logic blocks, including moderately powerful reduced instruction set computing (RISC) processors, external memory interfacing, and a wide range of communication interfaces. Digital signal processing (DSP) blocks are also improving, with the move towards supporting floating‐point in high‐end devices. Technology improvements have seen significant reductions in the power required.
Even the FPGA market has changed, with the takeover of both Altera and Xilinx by Intel and AMD, respectively. This is an indication that FPGAs are seen as a serious contender for high‐performance computing and acceleration. The competition has not stood still, with both CPUs and GPUs increasing in capability. In particular, a new generation of low‐power GPUs has become available that are more viable for embedded image processing.
High‐level synthesis tools are becoming more mature and address many of the development time issues associated with conventional register transfer level design. The ability to compile to both software and hardware enables more complex algorithms to be explored, with faster debugging. They also allow faster exploration of the design space, enabling efficient designs to be developed more readily. However, the use of high‐level synthesis does not eliminate the need for careful algorithm design.
While the use of FPGAs for image processing has not become mainstream, there has been a lot of activity in this space as the capabilities of FPGAs have improved. The research literature on programming and applying FPGAs in the context of image processing has grown significantly. However, it is still quite diffuse, with most papers focusing on one specific aspect. As researchers have looked at more complex image processing operations, the descriptions of the implementation have become higher level, requiring a lot of reading between the lines, and additional design work to be able to replicate a design.
One significant area that has become mainstream in image processing is the use of deep learning models. Deep learning was not around when the previous edition was written and only started becoming successful in image processing tasks in the early 2010s. Their success has made them a driving application, not only for FPGAs and FPGA architecture but also within computing in general. However, deep learning models pose a huge computational demand on processing, especially for training, but also for deployment. In an embedded vision context, this has made FPGAs a target platform for their deployment. Deep learning is a big topic on its own, so this book is unable to do much more than scratch the surface and concentrate on some of the issues associated with FPGA‐based implementation.
This book aims to provide a comprehensive overview of algorithms and techniques for implementing image processing algorithms on FPGAs, particularly for low‐ and intermediate‐level vision. However, as with design in any field, there is more than one way of achieving a particular task. Much of the emphasis has been placed on stream‐based approaches to implementing image processing, as these can efficiently exploit parallelism when they can be used. This emphasis reflects my background and experience in the area and is not intended to be the last word on the topic.
A broad overview of image processing is presented in Chapter 1, with a brief historical context. Many of the basic image processing terms are defined, and the different stages of an image processing algorithm are identified. The problem of real‐time embedded image processing is introduced, and the limitations of conventional serial processors for tackling this problem are identified. High‐speed image processing must exploit the parallelism inherent in the processing of images; the different types of parallelism are identified and explained.
FPGAs combine the advantages of both hardware and software systems, by providing reprogrammable (hence flexible) hardware. Chapter 2 provides an introduction to FPGA technology. While some of this will be more detailed than is necessary to implement algorithms, a basic knowledge of the building blocks and underlying architecture is important to developing resource efficient solutions. The synthesis process for building hardware on FPGAs is defined, with particular emphasis on the design flow for implementing algorithms. Traditional hardware description languages are compared with high‐level synthesis, with the benefits and limitations of each outlined in the context of image processing.
The process of designing and implementing an image processing application on an FPGA is described in detail in Chapter 3. Particular emphasis is given to the differences between designing for an FPGA‐based implementation and a standard software implementation. The critical initial step is to clearly define the image processing problem that is being tackled. This must be in sufficient detail to provide a specification that may be used to evaluate the solution. The procedure for developing the image processing algorithm is described in detail, outlining the common stages within many image processing algorithms. The resulting algorithm must then be used to define the system and computational architectures. The mapping from an algorithm is more than simply porting the algorithm to a hardware description language. It is necessary to transform the algorithm to make efficient use of the resources available on the FPGA. The final stage is to implement the algorithm by mapping it onto the computational architecture. Several checklists provide a guide and hints for testing and debugging an algorithm on an FPGA.
Four types of constraints on the mapping process are limited processing time, limited access to data, limited system resources, and limited system power. Chapter 4 describes several techniques for overcoming or alleviating these constraints. Timing explores low‐level pipelining, process synchronisation, and working with multiple clock domains. A range of memory and caching architectures are presented for alleviating memory bandwidth. Resource sharing and associated arbitration issues are discussed, along with reconfigurability. The chapter finishes with a section introducing commonly used performance metrics in terms of both system and application performance.
Chapter 5 focuses on the computational aspects of image processing designs. These help to bridge the gap between a software and hardware implementation. Different number representation and number systems are described. Techniques for the computation of elementary functions are discussed, with a particular focus on those that are hardware friendly. Many of these could be considered the hardware equivalent of software libraries for efficiently implementing common functions. Possible FPGA implementations of a range of data structures commonly found in computer vision algorithms are presented.
Any embedded application must interface with the real world. A range of common peripherals is described in Chapter 6, with suggestions on how they may be interfaced to an FPGA. Particular attention is given to interfacing cameras and video output devices. Interfacing with other devices is discussed, including serial communications, off‐chip memory, and serial processors.
The next section of this book describes the implementation of many common image processing operations. Some of the design decisions and alternative ways of mapping the operations onto FPGAs are considered. While reasonably comprehensive, particularly for low‐level image‐to‐image transformations, it is impossible to cover every possible design. The examples discussed are intended to provide the foundation for many other related operations.
Chapter 7 considers point operations, where the output depends only on the corresponding input pixel in the input image(s). Both direct computation and lookup table approaches are described. With multiple input images, techniques such as image averaging and background modelling are discussed in detail. The final sections in this chapter consider the processing of colour and hyperspectral images. Colour processing includes colour space conversion, colour balancing, and colour segmentation.
The implementation of histograms and histogram‐based processing are discussed in Chapter 8. The techniques of accumulating a histogram, and then extracting data from the histogram, are described in some detail. Particular tasks are histogram equalisation, threshold selection, and using histograms for image matching. The concepts of standard 1‐D histograms are extended to multi‐dimensional histograms. The use of clustering for colour segmentation and classification is discussed in some detail. The chapter concludes with the use of features extracted from multi‐dimensional histograms for texture analysis.
Chapter 9 considers a wide range of local filters, both linear and nonlinear. Particular emphasis is given to caching techniques for a stream‐based implementation and methods for efficiently handling the processing around the image borders. Rank filters are described, and a selection of associated sorting network architectures reviewed. Morphological filters are another important class of filters. State machine implementations of morphological filtering provide an alternative to the classic filter implementation. Separability and both serial and parallel decomposition techniques are described that enable more efficient implementations.
Image warping and related techniques are covered in Chapter 10. The forward and reverse mapping approaches to geometric transformation are compared in some detail, with particular emphasis on techniques for stream processing implementations. Interpolation is frequently associated with geometric transformation. Hardware‐based algorithms for bilinear, bicubic, and spline‐based interpolation are described. Related techniques of image registration are also described at the end of this chapter, including a discussion of feature point detection, description, and matching.
Chapter 11 introduces linear transforms, with a particular focus on the fast Fourier transform (FFT), the discrete cosine transform (DCT), and the wavelet transform. Both parallel and pipelined implementations of the FFT and DCT are described. Filtering and inverse filtering in the frequency domain are discussed in some detail. Lifting‐based filtering is developed for the wavelet transform. This can reduce the logic requirements by up to a factor of 4 over a direct finite impulse response implementation.
Image coding is important for image storage or transmission. Chapter 12 discusses the stages within image and video coding and outlines some of the techniques that can be used at each stage. Several of the standards for both still image and video coding are outlined, with an overview of the compression techniques used.
A selection of intermediate‐level operations relating to region detection and labelling is presented in Chapter 13. Standard software algorithms for chain coding and connected component labelling are adapted to give efficient streamed implementation. These can significantly reduce both the latency and memory requirements of an application. Hardware implementations of the distance transform, the watershed transform, and the Hough transform are also presented, discussing some of the key design decisions for an efficient implementation.
Machine learning techniques are commonly used within computer vision. Chapter 14 introduces the key techniques for regression and classification, with a particular focus on FPGA implementation. Deep learning techniques are increasingly being used in many computer vision applications. A range of deep network architectures is introduced, and some of the issues for realising these on FPGAs are discussed.
Finally, Chapter 15 presents a selection of case studies, showing how the material and techniques described in the previous chapters can be integrated within a complete application. These applications briefly show the design steps and illustrate the mapping process at the whole algorithm level rather than purely at the operation level. Many gains can be made by combining operations together within a compatible overall architecture. The applications described are coloured region tracking for a gesture‐based user interface, calibrating and correcting barrel distortion in lenses, development of a foveal image sensor inspired by some of the attributes of the human visual system, a machine vision system for real‐time produce grading, stereo imaging for depth estimation, and face detection.
The contents of this book are independent of any particular FPGA or FPGA vendor, or any particular hardware description language. The topic is already sufficiently specialised without narrowing the audience further! As a result, many of the functions and operations are represented in block schematic form. This enables a language‐independent representation and places emphasis on a particular hardware implementation of the algorithm in a way that is portable. The basic elements of these schematics are illustrated in Figure P.1. is generally used as the input of an image processing operation, with the output image represented by .
With some mathematical operations, such as subtraction and comparison, the order of the operands is important. In such cases, the first operand is indicated with a blob rather than an arrow, as shown on the bottom in Figure P.1.
Consider a recursive filter operating on streamed data:
where the subscript in this instance refers to the th pixel in the streamed image. At a high level, this can be considered as an image processing operation, and represented by a single block, as shown in the top‐left of Figure P.1. The low‐level implementation is given in the middle‐left panel. The input and output, and , are represented by registers (dark blocks, with optional register names in white); the subscripts have been dropped because they are implicit with streamed operation. In some instances additional control inputs may be shown, for example CE for clock enable, RST for reset. Constants are represented as mid‐grey blocks, and other function blocks with light‐grey background.
Figure P.1 Conventions used in this book. Top‐left: representation of an image processing operation; middle‐left: a block schematic representation of the function given by Eq. (P.1); bottom‐left: representation of operators where the order of operands is important; right: symbols used for various blocks within block schematics.
When representing logic functions in equations, is used for logical OR, and for logical AND. This is to avoid confusion with addition and multiplication.
Donald G. Bailey
Massey University
Palmerston North, New Zealand
I would like to acknowledge all those who have helped me to get me where I currently am in my understanding of field‐programmable gate array (FPGA)‐based design. In particular, I would like to thank my research students (David Johnson, Kim Gribbon, Chris Johnston, Aaron Bishell, Andreas Buhler, Ni Ma, Anoop Ambikumar, Tariq Khan, and Michael Klaiber) who helped to shape my thinking and approach to FPGA development as we struggled together to work out efficient ways of implementing image processing algorithms. This book is as much a reflection of their work as it is of mine.
Most of our early work used Handel‐C and was tested on boards provided by Celoxica. I would like to acknowledge the support provided by Roger Gook and his team, first with Celoxica and later with Agility Design Solutions. Later work was on boards supplied by Terasic. I would like to acknowledge Sean Peng and his team for their on‐going support and encouragement.