115,99 €
Methods and Techniques in Deep Learning Introduces multiple state-of-the-art deep learning architectures for mmWave radar in a variety of advanced applications Methods and Techniques in Deep Learning: Advancements in mmWave Radar Solutions provides a timely and authoritative overview of the use of artificial intelligence (AI)-based processing for various mmWave radar applications. Focusing on practical deep learning techniques, this comprehensive volume explains the fundamentals of deep learning, reviews cutting-edge deep metric learning techniques, describes different typologies of reinforcement learning (RL) algorithms, highlights how domain adaptation (DA) can be used for improving the performance of machine learning (ML) algorithms, and more. Throughout the book, readers are exposed to product-ready deep learning solutions while learning skills that are relevant for building any industrial-grade, sensor-based deep learning solution. A team of authors with more than 70 filed patents and 100 published papers on AI and sensor processing illustrates how deep learning is enabling a range of advanced industrial, consumer, and automotive applications of mmWave radars. In-depth chapters cover topics including multi-modal deep learning approaches, the elemental blocks required to formulate Bayesian deep learning, how domain adaptation (DA) can be used for improving the performance of machine learning algorithms, and geometric deep learning are used for processing point clouds. In addition, the book: * Discusses various advanced applications and how their respective challenges have been addressed using different deep learning architectures and algorithms * Describes deep learning in the context of computer vision, natural language processing, sensor processing, and mmWave radar sensors * Demonstrates how deep parametric learning reduces the number of trainable parameters and improves the data flow * Presents several human-machine interface (HMI) applications such as gesture recognition, human activity classification, human localization and tracking, in-cabin automotive occupancy sensing Methods and Techniques in Deep Learning: Advancements in mmWave Radar Solutions is an invaluable resource for industry professionals, researchers, and graduate students working in systems engineering, signal processing, sensors, data science, and AI.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 506
Veröffentlichungsjahr: 2022
Cover
Title Page
Copyright
Dedication
About the Authors
Preface
Acronyms
1 Introduction to Radar Processing and Deep Learning
1.1 Basics of Radar Systems
1.2 FMCW Signal Processing
1.3 Target Detection and Clustering
1.4 Target Tracking
1.5 Target Representation
1.6 Target Recognition
1.7 Training a Neural Network
1.8 Questions to the Reader
References
Note
2 Deep Metric Learning
2.1 Introduction
2.2 Pairwise Methods
2.3 End-to-End Learning
2.4 Proxy Methods
2.5 Advanced Methods
2.6 Application: Gesture Sensing
2.7 Questions to the Reader
References
3 Deep Parametric Learning
3.1 Introduction
3.2 Radar Parametric Neural Network
3.3 Multilevel Wavelet Decomposition Network
3.4 Application: Activity Classification
3.5 Conclusion
3.6 Question to Readers
References
4 Deep Reinforcement Learning
4.1 Useful Notation and Equations
4.2 Introduction
4.3 On-Policy Reinforcement Learning
4.4 Off-Policy Reinforcement Learning
4.5 Model-Based Reinforcement Learning
4.6 Model-Free Reinforcement Learning
4.7 Value-Based Reinforcement Learning
4.8 Policy-based Reinforcement Learning
4.9 Online Reinforcement Learning
4.10 Offline Reinforcement Learning
4.11 Reinforcement Learning with Discrete Actions
4.12 Reinforcement Learning with Continuous Actions
4.13 Reinforcement Learning Algorithms for Radar Applications
4.14 Application: Tracker's Parameter Optimization
4.15 Conclusion
4.16 Questions to the Reader
References
Notes
5 Cross-Modal Learning
5.1 Introduction
5.2 Self-Supervised Multimodal Learning
5.3 Joint Embeddings Learning
5.4 Multimodal Input
5.5 Cross-Modal Learning
5.6 Application: People Counting
5.7 Conclusion
5.8 Questions to the Reader
References
6 Signal Processing with Deep Learning
6.1 Introduction
6.2 Algorithm Unrolling
6.3 Physics-Inspired Deep Learning
6.4 Processing-Specific Network Architectures
6.5 Deep Learning-aided Signal Processing
6.6 Questions to the Reader
References
7 Domain Adaptation
7.1 Introduction
7.2 Transfer Learning and Domain Adaptation
7.3 Categories of Domain Adaptation
7.4 Domain Adaptation in Radar Processing
7.5 Summary
7.6 Questions to the Reader
References
8 Bayesian Deep Learning
8.1 Learning Theory
8.2 Bayesian Learning
8.3 Bayesian Approximations
8.4 Application: VRU Classification
8.5 Summary
8.6 Questions to the Reader
References
Notes
9 Geometric Deep Learning
9.1 Representation Learning in Graph Neural Network
9.2 Graph Representation Learning
9.3 Applications
9.4 Conclusion
9.5 Questions to the Reader
References
Notes
Index
End User License Agreement
Chapter 2
Table 2.1 Operating parameters of the used radar chipset BGT60TR13C.
Table 2.2 Accuracy and F1-scores.
Table 2.3 Clustering scores of the class clusters in the embedding space aft...
Chapter 3
Table 3.1 Samples per class.
Table 3.2 Model sizes of DSNet and RDCNet.
Table 3.3 Model sizes of 2D SincNet, 2D WCN, and 2D ConvNet.
Table 3.4 Accuracies (in %) and F1-scores (in %) for the evaluated approache...
Table 3.5 Accuracies (in %) and F1-scores (in %) under the presence of a sta...
Chapter 5
Table 5.1 Operating parameters.
Chapter 6
Table 6.1 Operating parameters.
Table 6.2 Comparison of the detection performance of the traditional pipelin...
Chapter 7
Table 7.1 Radar configuration parameters.
Table 7.2 Test accuracy (%) of MDD for FMCW data.
Table 7.3 Minimum and maximum accuracy (%) of MDD for different datasets.
Table 7.4 Average accuracy comparison (%) of the original MDD implementation...
Chapter 8
Table 8.1 Different parametric and nonparametric models with possible algori...
Table 8.2 Similarity indices (SSIM) for simulated micro-Doppler spectra of i...
Table 8.3 Quantitative analysis on the quality of the clustering over pretra...
Table 8.4 Quantitative analysis on the quality of the clustering over unseen...
Chapter 9
Table 9.1 Operating parameters.
Table 9.2 A comparison of the effect of attention at different feature abstr...
Table 9.3 Distribution of the six different target class in the dataset.
Table 9.4 Comparison of the class-agnostic clustering methods for target loc...
Table 9.5 Comparison of the localization accuracy for the class-agnostic and...
Chapter 1
Figure 1.1 Block diagram of the continuous wave radar front-end and its rece...
Figure 1.2 Illustration of typical modern radar sensors with several identic...
Figure 1.3 Summary of FMCW signal processing pipeline including both pre- an...
Figure 1.4 Pictorial representation of all three transforms, i.e., Fourier, ...
Figure 1.5 A visual summary on state-of-the-art target detection algorithms ...
Figure 1.6 Graphical representation of predict and update stage for an UKF w...
Figure 1.7 Target classification using (a) Doppler signature image and (b) s...
Figure 1.8 Two receive antennas with the angle of arrival
and the path len...
Figure 1.9 Illustration of a 3D radar point cloud where coordinate axis are
Figure 1.10 Example of a feedforward neural network with one hidden layer.
Figure 1.11 Illustration of various activation functions: (a) sigmoid functi...
Figure 1.12 Example of a CNN architecture.
Figure 1.13 Example of a LSTM cell.
Figure 1.14 Different types of RNN models: (a) one-to-one, (b) one-to-many, ...
Figure 1.15 Illustration of variational autoencoder architecture depicting: ...
Figure 1.16 Illustration of a vanilla GAN architecture outlining the princip...
Figure 1.17 Transformer network architecture. Source: adapted from [44].
Figure 1.18 Illustration of the gradient descent when (a) the learning rate ...
Figure 1.19 Illustration of underfitting and overfitting of a model.
Chapter 2
Figure 2.1 Taxonomy of the deep metric learning approaches.
Figure 2.2 Overview of metric learning losses between samples. (a) Contrasti...
Figure 2.3 Different types of margins with regards to angular distances and ...
Figure 2.4 Visualization of the explicit intra- and interclass optimization ...
Figure 2.5 Visualization of the feature space after training with the center...
Figure 2.6 Visualization of the direct optimization of Euclidean distances t...
Figure 2.7 Visualization of the normalization issue when considering an open...
Figure 2.8 Optimal 2D LAR loss label positions (circles) and multiplier assi...
Figure 2.9 Overview of performed gestures.
Figure 2.10 Exemplary set of spectrograms for each macrogesture in the data ...
Figure 2.11 Exemplary set of spectrograms for each microgesture in the data ...
Figure 2.12 Exemplary set of filtered spectrograms for each macrogesture in ...
Figure 2.13 Exemplary set of filtered spectrograms for each microgesture in ...
Figure 2.14 Architecture of the used VAE model.
Figure 2.15 Confusion matrix of the triplet model.
Figure 2.16 Confusion matrix of the TVAE model.
Figure 2.17 t-SNE plots of the embedded features resulting from the differen...
Chapter 3
Figure 3.1 Visualization of the same radar data frame in (a) raw time domain...
Figure 3.2 (a) Conventional processing pipeline, involving explicit preproce...
Figure 3.3 Exemplary 2D sinc filter in (a) time and (b) frequency domains.
Figure 3.4 Exemplary 2D Morlet wavelet in (a) time and (b) frequency domains...
Figure 3.5 Target tracking visualized. A person is approaching the radar and...
Figure 3.6 (a) Real part, (b) imaginary part, and (c) frequency response of ...
Figure 3.7 Comparison of (a) time domain ADC data, (b) RDI, and (c) correspo...
Figure 3.8 Illustration of (a) mWDN framework and (b) approximate wavelet di...
Figure 3.9 Proposed parametric CNN learning the filter parameters of 2D sinc...
Figure 3.10 The state-of-art CNN architectures: (a) DSNet with Doppler spect...
Figure 3.11 Confusion matrix of (a) RDCNet and (b) 2D WCN. The values are ro...
Figure 3.12 Validation accuracy over training epochs of five training runs o...
Figure 3.13 Cumulative gain of learned (a) 2D sinc filters, (b) 2D wavelets,...
Figure 3.14 Cumulative gain of learned (a) 2D SincNet, (b) 2D WCN, and (c) u...
Chapter 4
Figure 4.1 Taxonomy of the RL algorithms.
Figure 4.2 Taxonomy of the reinforcement learning algorithms.
Figure 4.3 Traditional and proposed processing pipeline for radar target tra...
Figure 4.4 (a) NIS is higher than 95% confidence score and (b) NIS is lower ...
Figure 4.5 Multivariate Gaussian representing the predicted track distributi...
Figure 4.6 Indoor recording environment example pictures, taken with one of ...
Figure 4.7 Respective network architectures of the actor network, lower tria...
Figure 4.8 Reward development on the test set over the steps of the Phase II...
Figure 4.9 Best-performing scene position predictions, pretrained (
) vs. Ba...
Figure 4.10 Worst-performing scene position predictions, pretrained (
) vs. ...
Figure 4.11 Comparison of (a)
position prediction for pretrained (
) vs. b...
Chapter 5
Figure 5.1 MMDL techniques.
Figure 5.2 Proposed self-supervised learning-based architecture.
Figure 5.3 Example of neural visualization of the trained network for differ...
Figure 5.4 Overall architecture of the proposed solution for joint embedding...
Figure 5.5 Multimodal compact bilinear block.
Figure 5.6 Overall architecture of proposed solution for VQA.
Figure 5.7 Overall architecture of the proposed solution for cross-modal lea...
Figure 5.8 (a) Training methodology involving knowledge distillation, (b) in...
Figure 5.9 Micro- and macro-Doppler component-based RAI processing pipeline....
Figure 5.10 (a) Macro-Doppler RAI and (b) micro-Doppler RAI.
Figure 5.11 (a, c) Two and three person scenario camera images and (b, d) CS...
Figure 5.12 (a) Autoencoder learning. (b) Classification layer learning.
Figure 5.13 Order: ground-truth heatmap, macro-Doppler RAI, reconstructed he...
Figure 5.14 (a) Unimodal learning confusion matrix. (b) Cross-modal learning...
Figure 5.15 Proposed architecture.
Figure 5.16 Confusion matrix without knowledge distillation (a) and with (b)...
Chapter 6
Figure 6.1 Visualization of an unrolled general algorithm as block diagram....
Figure 6.2 ISTA algorithm unrolled for
iterations.
Figure 6.3 Learned ISTA algorithm
for multiple iterations.
Figure 6.4 Sketch of a uniform linear array (ULA) with three receivers, show...
Figure 6.5 Sketch of a generative adversarial network.
Figure 6.6 Doppler spectrogram of trip-and-fall forward of an elderly person...
Figure 6.7 Applications of gesture recognition algorithms: (a) micro- and (b...
Figure 6.8 Traditional processing pipeline for radar target tracking.
Figure 6.9 Traditional signal processing chain for target detection.
Figure 6.10 Visualization for th OS-CFAR process, with the cell under test, ...
Figure 6.11 Visualization of core (circles), edge (crosses), and noise-point...
Figure 6.12 Deep learning processing chain for target detection.
Figure 6.13 Deep learning processing chain for target segmentation.
Figure 6.14 (a) Raw RDI image with four human targets, (b) processed RDI usi...
Chapter 7
Figure 7.1 Transfer learning and domain adaptation.
Figure 7.2 Categories of domain adaptation.
Figure 7.3 Elastic weight consolidation helps in finding a common low-error ...
Figure 7.4 Illustration of a GAN architecture for domain adaptation. The dis...
Figure 7.5 Adversarial DA architecture without the generator.The discrim...
Figure 7.6 Sketch for a reconstruction-based domain adaptation using an auto...
Figure 7.7 Autoencoder and classifier training method according to Rahman et...
Figure 7.8 Range and Doppler spectrogram for
boxing while walking
.
Figure 7.9 MDD adversarial network.
Chapter 8
Figure 8.1 A comparative illustration of (a) deterministic and (b) Bayesian ...
Figure 8.2 Model estimation plot on the effect of different noise observatio...
Figure 8.3 Illustration of the sampling approach.
Figure 8.4 Illustration of the variational inference approximation method.
Figure 8.5 The choice of the family in variational inference sets both the d...
Figure 8.6 Visual understanding on optimization process of the variational i...
Figure 8.7 Optimization process of the variational inference approach.
Figure 8.8 Illustration of difference between AE (deterministic) and VAE (pr...
Figure 8.9 Illustration of difference between AE (deterministic) and VAE (pr...
Figure 8.10 Dynamic VRU point target models for a pedestrian and a cyclist. ...
Figure 8.11 An overview of evaluated architectures and their relation to eac...
Figure 8.12 Comparison of (a) state-of-the-art algorithm pipeline for contin...
Figure 8.13 A visual illustration of target classification accuracy using t-...
Figure 8.14 (a) Visualization of the uncertainty estimates over different cl...
Figure 8.15 t-SNE plot over the latent embedding for unseen data using TVAE-...
Figure 8.16 (a) A visual understanding on the uncertainty estimates over uns...
Chapter 9
Figure 9.1 An illustration of geometric priors: the input signal (image
) i...
Figure 9.2 An illustration of scale separation, where we can approximate a f...
Figure 9.3 An illustration of feature engineering and different task learnin...
Figure 9.4 An illustration of efficient task-independent feature learning, i...
Figure 9.5 An illustration of convolution CNN where features of neighbors ar...
Figure 9.6 An illustration of attention CNN where features of neighbors are ...
Figure 9.7 An illustration of attention message-passing GNN where arbitrary ...
Figure 9.8 Radar setup:
.
Figure 9.9 Camera point cloud.
Figure 9.10 Range-Doppler image. Shaded bar shows the intensity of each bin....
Figure 9.11 Radar point cloud with shade showing velocity value in m/s.
Figure 9.12 Overview of used approach for skeleton detection. Given an input...
Figure 9.13 Preprocessing of raw radar data.
Figure 9.14 Network architecture: Radar point cloud in shape
first passes ...
Figure 9.15 Left: Generating
-dimensional edge features
from two
-dimens...
Figure 9.16 Confusion matrix.
Figure 9.17 Accuracy comparison among unimodal, cross learning, and cross le...
Figure 9.18 Overview of our methods which is composed of five modules, (i) t...
Figure 9.19 Visualization of the IoU population distribution for all target ...
Figure 9.20 Visualization of the different intermediate outputs of the propo...
Figure 9.21 Combined illustration of the performance of HARadNet. (i) shows ...
Cover
Table of Contents
Title Page
Copyright
Dedication
About the Authors
Preface
Acronyms
Begin Reading
Index
End User License Agreement
ii
iii
iv
v
xv
xvi
xvii
xviii
xix
xx
xxi
xxiii
xxiv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
IEEE Press
445 Hoes Lane
Piscataway, NJ 08854
IEEE Press Editorial Board
Sarah Spurgeon, Editor in Chief
Jón Atli Benediktsson
Anjan Bose
Adam Drobot
Peter (Yong) Lian
Andreas Molisch
Saeid Nahavandi
Jeffrey Reed
Thomas Robertazzi
Diomidis Spinellis
Ahmet Murat Tekalp
Avik Santra
Souvik Hazra
Lorenzo Servadei
Thomas Stadelmayer
Michael Stephan
Anand Dubey
Infineon Technologies, Munich, Germany
Copyright © 2023 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data
Names: Santra, Avik, author.
Title: Methods and techniques in deep learning : advancements in mmwave radar solutions / Avik Santra, Souvik Hazra, Lorenzo Servadei, Thomas Stadelmayer, Michael Stephan, Anand Dubey, Infineon Technologies, Munich, Germany.
Description: Hoboken, New Jersey : John Wiley & Sons, Inc., [2023] | Includes bibliographical references and index.
Identifiers: LCCN 2022036520 (print) | LCCN 2022036521 (ebook) | ISBN 9781119910657 (hardback) | ISBN 9781119910664 (adobe pdf) | ISBN 9781119910671 (epub)
Subjects: LCSH: Millimeter wave radar–Data processing. | Radar targets–Identification–Data processing. | Radar receiving apparatus–Data processing. | Deep learning (Machine learning)
Classification: LCC TK6592.M55 S26 2023 (print) | LCC TK6592.M55 (ebook) | DDC 621.38480285–dc23/eng/20220929
LC record available at https://lccn.loc.gov/2022036520
LC ebook record available at https://lccn.loc.gov/2022036521
Cover Design: WileyCover Image: © Yurchanka Siarhei/Shutterstock
This book is dedicated to our respective families.
Avik Santra
Avik Santra received his B.S. degree in electronics and communications engineering from West Bengal University of Technology. He then received his M.S. degree in signal processing with first-class distinction from Indian Institute of Science and a Ph.D. degree in electrical, electronics, and informatics from the FAU University of Erlangen, Germany. He is currently heading the advanced AI team responsible for developing signal processing and machine learning algorithms and system solutions for radars and depth sensors at Infineon, Germany. Earlier in his career, he worked as a system design engineer for LTE chipsets at Broadcom Communications developing and implementing calibration algorithms for LTE chipsets. Subsequently, he has worked as a research engineer developing system concepts and representative demonstrators of next-generation long-range radars and data analytics at Airbus. He has received several spot awards for project excellence in multiple forums. He has been an invited speaker at various conferences and workshops, as well as delivered several tutorials on deep learning and signal processing topics. He is a reviewer at various IEEE and Elsevier journals and is a recipient of several outstanding reviewer awards. He has been lead guest editor at IEEE Sensors Journal and associate editor at Elsevier Machine Learning with Applications. He is the coauthor of the book Deep Learning Applications of Short-Range Radars, published by Artech House in September 2020. He has filed over 70 US/EU patents and published over 55 research papers related to deep learning and signal processing topics. He is a senior member of IEEE.
Souvik Hazra
Souvik Hazra received his B. Tech degree in electrical engineering from KIIT University in 2017 and then received his MS degree in data science and engineering from EURECOM and IMT, France, in 2019. He is currently working as a senior staff machine learning engineer at Infineon Technologies AG, Munich, where he is responsible for the overall development of machine learning and signal processing solutions for radars and microphones. Earlier in his career, he has worked as a research intern at Airbus and CCAF, University of Cambridge, on various deep learning topics. He has been invited as a speaker at various summits and has been a reviewer at various IEEE journals and conferences. He is the coauthor of the book Deep Learning Applications of Short-Range Radars, published by Artech House in September 2020. Besides his full-time job at Infineon, he is pursuing his PhD degree at Friedrich-Alexander-University (FAU), Erlangen.
Lorenzo Servadei
Dr. Lorenzo Servadei is a senior staff machine learning engineer at Infineon Technologies AG. His main interests are methods of reinforcement learning applied to quantum computing design, signal processing, and design automation of microchips. He obtained a PhD in computer science from a collaboration between Infineon Technologies AG and Johannes Kepler University of Linz.
His PhD focused on the use of methods of reinforcement learning for hardware design optimization. To this end, he researched and approached methods of combinatorial optimization for the improvement of power, performance, and area (PPA) on digital hardware. In particular, he developed combinatorial reinforcement learning (RL) algorithms that gradually improve the positioning and connection of subcomponents within the hardware schematic. During his PhD, he collaborated and published two journal papers with Professor Hochreiter, inventor of the Long-Short Term Memory (LSTM) networks. Additionally, he spent several months at Duke University, working on machine learning contributions on Hardware Security. Dr. Lorenzo Servadei has also served as a machine learning trainer for Infineon Technologies AG, helping to grow the artificial intelligence community within the company in different sites around the world. He is currently an IEEE member, a senior lecturer, Habilitand, and a post-doc in the Department of Electrical and Computer Engineering at the Technical University of Munich.
Thomas Stadelmayer
Thomas Stadelmayer was born in Regensburg, Bavaria, Germany, in 1994. He studied computational engineering at Friedrich-Alexander University (FAU) Erlangen-Nuremberg and graduated with a bachelor's degree in 2015 and a master's degree in 2018. He then worked as a research assistant in the Circuits, Systems and Hardware Test (CST) research group at the Institute of Electrical Engineering, FAU Erlangen-Nuremberg. His research interests include digital signal processing and machine learning for short-range and indoor radar applications. During his research, he worked in close collaboration with Infineon Technologies on various radar applications based on machine learning, such as hand gesture recognition and person localization. He is particularly interested in combining classical signal processing and machine learning to obtain more interpretable neural networks. He is also interested in deep metric learning for detecting outliers or unknown motion to make applications more robust in real-world environments. He has contributed to his research area with scientific publications and several patent applications. He joined Infineon Technologies in February 2022 in the Advanced Artificial Intelligence group. His task is to improve current signal processing-based radar algorithms using artificial intelligence to overcome application limitations while also exploring new applications enabled by artificial intelligence for short-range radars. He also builds proof-of-concept demonstrators and works closely with academic partners. Besides his work at Infineon, he is pursuing his PhD degree.
Michael Stephan
Michael Stephan was born in Forchheim, Bavaria, Germany, in 1995. He received his bachelor's degree in electrical engineering and master's degree in advanced signal processing and communications engineering from the Friedrich-Alexander-University Erlangen-Nuremberg in 2017 and 2019, respectively. During his studies, he was a visiting scientist at Nokia Bell Labs, Holmdel, New Jersey, USA, where he looked into the RF-chains for Hybrid MIMO Precoders. He wrote his master thesis at the Poly-Grames Research Center in Montréal, Canada, about algorithmically reducing the effect of coupling on the angle of arrival estimation performance of MIMO FMCW radar and also completed an internship at Infineon Technologies AG in Linz, Austria, marking his first contact with deep learning for indoor target localization using FMCW radar sensors. He is currently pursuing his PhD degree with the Friedrich-Alexander-University Erlangen-Nuremberg at the Institute for Electronics Engineering in cooperation with Infineon Technologies AG, Neubiberg, Germany. Coming from a signal processing perspective, his current research focuses on various deep learning topics with application in mmWave radar signal processing for indoor localization and tracking in real-world environments. He has written numerous publications and filed multiple patents on deep learning applications for radar processing. Recently, his research focuses on explicitly using traditional signal processing knowledge during the neural network training process to achieve better generalization and performance in the low-data regime.
Anand Dubey
Anand Dubey was born in Mirzapur, Uttar Pradesh, India, in 1990. He studied electronics and communication engineering at Jaypee Institute of Information Technology University (JIITU) for his bachelor's degree in 2012 and automotive software engineering at Technical University of Chemnitz for his master's degree in 2018. Later, he worked as a research assistant in the Circuits, Systems and Hardware Test (CST) research group at the Institute of Electrical Engineering, FAU Erlangen-Nuremberg. His research interests include digital signal processing and machine learning for automotive radar applications. During his research, he worked on an application to detect and classify pedestrians and cyclists using their motion and spatial signatures. He is particularly interested in combining statistical signal processing and Bayesian machine learning to obtain more interpretable and reliable neural networks. He is also interested in the domain of geometric learning where data are sparse and correlated. He has contributed to his research area with several scientific publications. He joined Infineon Technologies in January 2022, and his task is to investigate and propose novel signal processing pipelines for speech enhancement using Bayesian machine learning algorithms. He is also exploring areas of tiny machine learning algorithms for low-powered, microcontroller units.
Radar has evolved from a complex, high-end military technology into a relatively simple, low-end solution penetrating industrial, automotive, and consumer market segments. This rapid evolution has been driven by the advancements in silicon and the use of deep learning algorithms to utilize the full potential of sensor data. The use of radar sensors has grown multifold penetrating automotive, industrial, and consumer markets, offering a plethora of applications. The advent of deep learning has transformed many fields and resulted in state-of-the-art solutions in computer vision, natural language processing, speech processing, etc. However, the application of deep learning algorithms to radars is still by and large at its nascent stage. This book attempts to present the theoretical concepts behind several advanced deep learning concepts and highlight how such techniques enable such applications, which were not otherwise possible.
This book presents cutting-edge artificial intelligence (AI)-based processing using advanced deep learning to a short-range radar. AI is the hottest topic in all industrial sectors and has led to disruptions across all fields, such as computer vision, natural language processing, speech processing, medical imaging, etc. However, the application of AI to radars is relatively new and unexplored. We in this book present the cutting-edge deep learning processing that we worked and are working on at Infineon Technologies. This book covers how advanced deep learning concepts are being used to enable applications ranging from industrial sector, consumer space, to emerging automotive industries. This book presents examples of several human–machine interface applications such as gesture recognition and sensing, human activity classification, people counting, people localization, and tracking along with automotive target detections, localization, and classification.
Chapter 1 introduces the fundamentals of deep learning, its evolution over time, and the different facets that make deep learning so powerful. This chapter introduces various components of conventional convolutional neural networks, recurrent neural network, and fully connected layers in relation to various tasks such as classification, localization, segmentation, or translation. Chapter 2 presents deep metric learning with an intensive overview of the state-of-the-art algorithms and how open-set classification tasks are handled using metric learning. Then, a short-range radar application that aims to classify among a set of predefined hand gestures amid random unknown motions is presented.
Chapter 3 introduces deep parametric learning, where the preprocessing pipeline can be integrated into a deep neural network and made data driven, thus enhancing the performance to be task specific as well as making the architecture compact. Chapter 4 introduces deep reinforcement learning, where the learning algorithm depends on the sum of rewards produced by policy interacting with an environment. We review the basics of deep reinforcement learning and then present the overview of different typologies of deep reinforcement learning algorithms. We present the efficacies of deep parametric learning with activity classification application, and for reinforcement learning, we present how it helps to update the parameters of a tracker adaptively as a function on the target dynamics.
Chapter 5 introduces cross-modal learning algorithms by giving an overview of the state-of-the-art approaches, and then, we present two approaches of cross-modal learning to improve radar-based people counting solutions in comparison with unimodal learning approaches. In Chapter 6, we present signal processing-led learning that gives an overview over different model-based approaches to incorporate expert knowledge in deep learning methodologies. We present the advantages of signal processing-driven deep learning with respect to radar-based target detection and segmentation use case.
Chapter 7 presents domain adaptation wherein the model is trained on a source data distribution and then deployed for a different target data distribution. Transfer learning and fine-tuning are subsets of domain adaptation, and here, we present the overview of the existing techniques and introduce them to specific applications of human activity classification. Chapter 8 presents Bayesian deep learning, introducing an overview on the history of learning theory for deterministic and Bayesian neural networks followed by understanding on different elemental blocks required to formulate Bayesian deep learning and then a practical application demonstrating the efficacies of Bayesian deep learning for an automotive radar. Chapter 9 introduces geometric deep learning, starting with the overview followed by the need to capture and learn underlying patterns in a complex non-Euclidean data structure. Subsequently, practical application is demonstrated using automotive radar point clouds for automotive target classification and for long-range gesture sensing.
This book is intended for graduate students, academic researchers, and industry practitioners working with deep learning who strive to apply deep learning techniques to mmWave radars or depth sensors. This book is written keeping beginners to advanced researchers in mind and assumes sufficient knowledge of linear algebra and engineering mathematics. Each chapter has end questions to assess the understanding of the reader. This book covers the theoretical foundation of each deep learning algorithm or paradigm and also presents the adaptations of such algorithms to a specific mmWave radar application. This book covers advanced concepts such as deep metric learning, parametric learning, reinforcement learning, reinforcement learning, cross-learning, signal processing-led architectures, domain adaptation, and geometric deep learning. While each chapter is independent of the other, it is suggested that an early researcher reads the first introductory chapter introducing basic radar signal processing and deep learning before reading the specific deep learning chapter.
The authors would like to express their heartfelt gratitude to their PhD supervisors Prof. Robert Weigel and Prof. Robert Wille for their constant guidance and support. We look at them with great respect for their profound knowledge and experience, their unparalleled teaching and problem-solving skills, and their relentless pursuit for perfection, which has something we try to emulate all the time and in this book.
We are thankful to our department head, Gerhard Martin, for being extremely supportive and encouraging us all the time to give our best. We would also like to greatly thank Dr. Christian Mandl who has been a lighthouse of inspiration for us, guiding us with his accurate understanding of technical concepts along with his leadership skills. We would also like to thank Dr. Ashutosh Pandey for the technical guidance and unparallel knowledge and his relentless strive for excellence.
The authors would also like to thank their editors and reviewers for his encouragements, reviews, and suggestions to improve this book.
BDL
Bayesian deep learning
BIC
Bayesian information criterion
BNN
Bayesian neural networks
CFEL
Complex frequency extraction layer
CV
Computer vision
CNN
Convolutional neural network
CM
Confusion matrix
CFAR
Constant false alarm rate
DDPG
Deep deterministic policy gradient
DRL
Deep reinforcement learning
DBSCAN
Density-based spatial clustering of applications with noise
DSP
Digital signal processor
DoA
Direction of arrival
DA
Domain adaptation
d
-SNE
Domain adaptation using stochastic neighborhood embedding
DGCNN
Dynamic graph CNN
EKF
Extended Kalman filter
FADA
Few-shot adversarial domain adaptation
FMCW
Frequency-modulated continuous-wave
GMM
Gaussian mixture model
GP
Gaussian process
GRV
Gaussian random variable
GAN
Generative adversarial network
GRL
Gradient reversal layer
GAE
Graph autoencoder
GNN
Graph neural network
IID
Independent and identically distributed
ISTA
Iterative shrinkage-thresholding algorithm
K-NN
k-nearest neighbor
LAR
Label-aware ranked
LIDAR
Laser imaging detection and ranging
lse
log-sum-exp
LSTM
Long short-term memory
ML
Machine learning
MDD
Margin disparity discrepancy
MCMC
Markov chain Monte Carlo
MDD
Maximum mean discrepancy
MH
Metropolis–hastings algorithm
mLSTM
multifrequency long short-term memory
mWDN
multilevel wavelet decomposition network
MCB
Multimodal compact bilinear
MMDL
Multimodal deep learning
NN
Neural network
NMS
Nonmaxima suppression
OKS
Object keypoint similarity
OS-CFAR
Ordered statistics CFAR
RAI
Range-angle image
RDI
Range-doppler image
RGNN
Recurrent graph NN
RNN
Recurrent neural network
RL
Reinforcement learning
RCF
Residual classification
SPKF
Sigma-point Kalman filter
STGNN
Spatial-temporal graph neural networks
SVM
Support vector machine
TL
Transfer learning
TRPO
Trust region policy optimization
VAE
Variational auto-encoder
VBGM
Variational Bayesian Gaussian mixture model
VI
Variational inference
VQA
Visual question answering
VRU
Vulnerable road users flow
UMAP
Uniform manifold approximation and projection
UKF
Unscented Kalman filter
At the end of this chapter, reader will have understanding on
How radar data cubes are processed to extract range, velocity, and angle of the multiple detected targets and are tracked over time.
Different target representations used for radar target recognition.
Introduction to deep learning architectures used for radar target recognition.
Radar is an acronym that stands for radio detection and ranging. It is basically an electromagnetic system used to detect the presence of one or more targets of interest and estimate their range, angle, and velocity relative to the radar. Instead of just measuring the target's location and velocity, modern radars can predict the target given the reflected radar signals. The main objective of radar compared to infrared and optical sensors is to discover distant targets under difficult climate conditions and to determine their spatial location while tracking them over time with precision. The general working principle and signal processing fundamental details are explained in the following sections.
The radar system generally consists of a transmitter that produces an electromagnetic signal, which is radiated into space by the transmit antenna. When this signal strikes an object, it gets reflected or re-radiated in many directions. This reflected echo signal is received by the receive antenna, which delivers it to the receiver circuitry, where it is processed to detect the target and also localize it over time along with certain characteristics of the target. A simplified version of a typical continuous wave radar front-end with the most important building blocks can be seen in Figure 1.1. The chosen waveform is generated by a local oscillator (LO) and transmitted via the transmit (Tx) antenna. The receive (Rx) antenna then captures the incoming signal reflections from the target at a distance. After amplifying the received signal, it is mixed with the original transmitted waveform and is passed through subsequent analog bandpass filtering (BPF). This removes any high-frequency components that could cause aliasing as well as low-frequency components from direct coupling of the LO signal into the receiver. After mixing and filtering the signal that has been shifted to an intermediate frequency (IF), and it is referred as . The IF bandwidth is determined by the upper cut-off frequency of the bandpass filter, which is typically in the order of tens of kHz to few MHz.
Figure 1.1 Block diagram of the continuous wave radar front-end and its receive chain including the mixer, band pass filter, and analog-to-digital converter. The digitized samples are stored into a data matrix . The radar in this case is sensing a human target in the field of view.
To detect and differentiate multiple targets along its range, relative velocity, and azimuth-elevation angle dimensions, linear frequency modulated continuous wave (FMCW) is used as the most standard sensing waveform [1]. Usually, consecutive identical chirps are transmitted within a frame with a predefined time spacing referred to as chirp repetition time. The received IF signal is arranged within a two dimensional matrix, and the intratime, i.e., within a chirp, is referred as fast-time, while the intertime, i.e., across chirps, is referred to as slow-time. If the target is static, the round trip delay in the received signal is manifested as a frequency offset along the fast-time dimension after down-mixing at the receiver. But if the target or the radar is not stationary, the received signal will have an additional frequency offset caused by the Doppler shift manifested across slow-time dimension.
Figure 1.2 shows the concept of a FMCW modulation in detail. The LO generates a chirp signal with starting frequency , bandwidth , duration , and resulting sweep rate . By taking advantage of time integral over Tx frequency, instantaneous phase is calculated as shown in Eq. (1.1), where corresponds to the initial phase of the LO:
Figure 1.2 Illustration of typical modern radar sensors with several identical transmit chirps within a frame and the digitized IF samples are then stored chirpwise in a data matrix for coherent processing.
Assuming unity amplitude for a single chirp, can be formulated by
If this transmit signal gets reflected by some object, also referred to as target, the reflection will be received at the radar with a time delay , which is proportional to the target's distance to the radar. Additionally, signals of multiple reflections, like extended targets, are superimposed on each other at the receiver. For an arbitrary number of point targets composing a spatially distributed target, the received signal can thus be expressed as follows:
where represents thermal receiver noise or clutter and is the round trip delay to the th target located at distance and moving with a relative radial velocity of . As a result, can be described as , where is the speed of light. For ease of notations, the noise term is dropped for all the following considerations. The received and amplified signal is mixed with the original transmitted signal (). As discussed before, both transmitted and received signal follows cosine waveform. Thus, the down-mixed signal can be transformed into two components using trigonometric formulation.
here contains the difference of Tx and Rx signal frequencies and contains the sum frequencies respectively. The sum component is removed by the following BPF and the resulting IF signal is obtained as follows:
This shows that intermediate received signal contains both distance-dependent frequency and also speed-dependent frequency shift which are factors of modulation parameters. This includes chirp duration , chirp repetition time , sweep frequency or bandwidth , and number of chirps in a frame as main configuration parameter for the design of a FMCW waveform. As a result, these parameters control the range and Doppler resolution, as presented in Eqs. (1.6) and (1.7), respectively. The maximum observable range and the maximum unambiguous Doppler is given in Eqs. (1.8) and (1.9).
The IF signal, obtained from Eq. (1.5), is then digitized by an analog-to-digital converter with sampling period at the discrete time instants , where . Consequently, the discrete time signal contains samples per chirp. Typically, modern short-range radar sensors rapidly transmit several identical chirps in a so-called chirp sequence modulation. The digitized IF samples are then stored chirp wise in a data matrix for coherent processing. Figure 1.3 summarizes the FMCW signal processing multistage pipeline, where is first preprocessed in time-domain for removal of spectral leakage or static targets followed by interference mitigation. Later, the preprocessed is transformed to frequency domain for target detection. Once a target is detected, then the measurement is fed into a tracking algorithm for temporal smoothening. At the end, the tracked target's features are extracted using their motion or spatial signature in the form of images or point-clouds, respectively, which are used for target recognition.
Figure 1.3 Summary of FMCW signal processing pipeline including both pre- and postprocessing over chirp matrix for target detection, tracking, and classification.
The frequency domain analysis for range-Doppler processing is explained in the following section.
As indicated by Eqs. (1.6) and (1.7) both range and radial velocity information of the target are functions of frequency shifts in the received signal. As a result, frequency-domain analysis is used to determine respective target's parameters instead of time-domain analysis. In contrast to time-domain signals where signal changes over time (amplitude or power) can be observed, frequency-domain analysis reveals how much of the signal lies within each given frequency band over a range of frequencies, which also include change in phase information. The most common frequency-domain transform methods are Fourier transform, short-time Fourier transform (STFT) and wavelet transforms. All three transforms are inner products of a family of basis functions with a time-domain signal. The parameterization and the basis functions determine the properties of the transforms.
Before delving into details, Figure 1.4 illustrates all the three transforms pictorially. While the classical technique to represent time signals in the frequency domain by calculating discrete Fourier transformation (DFT), it fails to detect time variant frequency effects, which are important for extended targets. As an alternative, DFT is modified by shortening the time window for each DFT and leads to short-time Fourier transformation (STFT), which improves resolution in time but at the cost of lower accuracy in the frequency domain, as seen in Figure 1.4. While the DFT has no temporal resolution and STFT have fixed resolution for complete time-frequency, the wavelet transformation can adapt both the time and frequency dimensions and result into a high-frequency resolution at low frequencies while maintaining good time localization at high frequency.
Figure 1.4 Pictorial representation of all three transforms, i.e., Fourier, STFT, and wavelets over analog-to-digital conversion (ADC) sampled chirp sequence data.
While the Fourier series is used for oscillating or repetitive signal, Fourier transform is used for nonrepetitive signals. Thus, Fourier transform can be formulated as a special case of Fourier series when the time period . For the standard Fourier transform, the basis functions are simply the complex sinusoidal oscillations
where is the time axis of the signal and is the single frequency parameter that determines the basis function in the family. There is one basis function for every . The Fourier transform of the signal is then simply the inner product written as an integral
The negative sign in the exponential comes from the complex conjugation in the general inner product definition:
Since we are processing down-sampled raw radar analog-to-digital conversion (ADC) data matrix, all signals are considered as discrete time and valued (unlike continuous signals). Thus, discrete time Fourier transform (DTFT) is performed as follows:
where is discrete time-value signal and is continuous spectrum. Consequently, as is of continuous nature, the DTFT cannot be processed directly on a digital machine. Therefore, a discrete spectrum response is required, which is usually done by using the discrete Fourier transform (DFT) and denoted by . This is done by sampling over the spectrum of the DTFT. The sampling frequency of DFT is defined as . Here, corresponds to the total number of samples retrieved from the DTFT:
The above equation shows two fundamental mathematical operations, which are carried out for every sample of the input signal – multiplication and addition. The DFT is an iterative operation and requires a high-computational effort. The fast-Fourier transform (FFT) is an algorithm to compute the DFT efficiently. This is usually done using the Cooley–Tukey algorithm; however, there exist many other algorithms.
The FFT operation is applied over down-sampled raw radar ADC data matrix stored along the chirp, namely fast-time dimension, with consecutive chirps stored along the columns, referred to as slow-time dimension. Prior to the FFT operation, as seen in Figure 1.3, time-domain preprocessing can be done as an optional step. This helps to remove Tx–Rx leakage and clutter noise from down-sampled raw radar ADC data matrix. This is done by mean subtraction across fast-time and slow-time, respectively. This process is commonly known as moving target indicator (MTI) processing. Furthermore, an optional interference detection and mitigation method could also be applied to reduce the effect of interference and noise. By calculating the two-dimensional fast Fourier transform (FFT) on this data matrix, fast-time is converted to range-frequency and slow-time to Doppler frequency. This operation yields the spectrum , with indexed dimensions as range-frequency and Doppler-frequency .
In principle, FFT assumes that the signal contains a continuous spectrum that is one period of a periodic signal. However, the measured raw data matrix may not contain integer number of periods. Therefore, the definiteness of the may result in a discontinuity at the endpoints of the waveform in comparison to the original continuous-time signal and could introduce sharp transition changes into the consecutive measured signals. These artificial discontinuities lead to the additional high-frequency components not present in the original signal. This phenomenon is called spectral leakage where energy at one frequency leaks into other frequencies. This causes the sharp frequency spectrum to spread into wider signals and leads to ambiguity.
These effects are minimized using a technique called windowing. The windowing function reduces the amplitude of the discontinuities at the boundaries of each finite sequence acquired by the ADC. Windowing function consists of amplitude envelope that is multiplied elementwise with the original ADC matrix. The characteristic of windowing function is such that it varies smoothly and gradually toward zero at the edges. This makes the endpoints of the measurement similar and therefore results in a continuous waveform without sharp transitions. In addition to it, in general, zero-padding along either of dimension of the data matrix is done with a factor of power of 2. This interpolates the coarse spectrum to become more smooth but does not reveal extra information from the spectrum. To improve the resolution of the spectrum, length of the recorded signal needs to be increased. Additionally, it can also be interpreted as windowing, which is time-domain multiplication of rectangular function with the original signal.
Since FFT is performed over complete ADC sampled chirp matrix , it averages out signal frequency components. This approach is good for localization and detection of reflection from targets but fails to extract both spatial or motion information (commonly termed as signatures) for an extended targets like humans.1 As a result, FFT is modified by adding time dimension to the base function () parameters by multiplying the infinitely long complex exponential with a window to localize it. This transform is known as STFT, whose base functions are then
where is the window functions that vanish outside some interval and are the time-frequency coordinates of the base function in the family. The inner product is formulated as follows:
The advantage of STFT is that it can capture frequency information of target's signature over time, i.e., range signature or micro-Doppler signatures. This information can be treated as unique signature for target recognition or its attribute recognition. However, it is challenging to find a trade-off between time and frequency resolution while calculating the STFT. This is determined by the choice of the window function and sampling frequency.
In contrast to DFT and STFT, the wavelet transform adapts the window size to the frequency with the constant bandwidth constraint. This is designed in a scale invariant approach that doesn't even need the complex modulation basis function. The working principle of wavelet transforms can be understood with a generic base function that is localized and oscillates with zero mean, i.e., the integral over the complete space is zero. This basis function is referred as mother wavelet. The advantage of such a basis function (wavelet) is that both localization (time) and oscillation (frequency) resolution trade-off can be reduced, which is a constraint in STFT. Therefore, the family of base functions can be summarized as
where is the mother wavelet and the scale parameter. Thus, the inner product becomes
As illustrated in Figure 1.3, irrespective of the target representation, i.e., images or point clouds, the most standard target detection has two stages, involving the detection followed by clustering. A simple approach to target detection is peak detection by determining if the sensed bin has higher amplitude than its neighboring bins.
Alternately, constant false alarm rate (CFAR) detector is used for detection of the targets. CFAR detector calculates detection probability for each bin by estimating varying noise power from neighboring cells as shown in Eq. (1.20). Here, refers to detection threshold, is scaling factor, is estimated noise power, counts the total number of neighboring reference cells and is the CFAR. Equation (1.20) represents cell averaging (CA)-CFAR. The drawback of CA-CFAR is that it occludes the weak targets near to strong target by having higher noise threshold and thus masking it out. As an alternative, ordered statistics (OS)-CFAR is being used for detection. The th ordered data is selected as threshold instead of averaging over all reference cells.
In contrast to rigid targets like car, truck, human as target of interest contains micro-motions which results in different velocity component in the received signal, commonly known as micro-Doppler components [2, 3]. This results to a spread of detected targets across Doppler dimension. Also, with the use of higher sweep bandwidth, the range resolution of the radar lies in the order of few centimeters. As a result, the reflection from target are not received as point target reflection but are spread across multiple range bins. These targets are also known as range-Doppler extended targets or doubly spread targets. As a result, all the detection from target needs to be clustered into one and thus necessitates clustering algorithm as the second stage. This also helps in reducing the computational complexity for the target-tracking algorithm, which after clustering tracks a single target parameter instead of tracking nonclustered group of target parameters. Density-based spatial clustering of applications with noise (DBSCAN) is used [4] as the state-of-the-art algorithm. Unlike most algorithms, DBSCAN runs clustering in one pass without having prior knowledge on number of clusters and is stable to the outliers (noise). The input hyperparameters required for DBSCAN are a minimum number of points and minimum distance between neighboring points to be part of cluster [5]. Given a set of target detections from same and multiple targets in the 2D space, DBSCAN groups detections that are closely packed together, while at the same time removing as outliers detections that lie alone in low-density regions. To do this, DBSCAN classifies each point as either core point, edge point, or noise. A point is defined as core point if it has at least neighbors, i.e., points within the distance . An edge point has less than neighbors, but at least one of its neighbors is a core point. All points that have less than neighbors and no core point as a neighbor do not belong to any cluster and will be classified as outlier and ignored. The two-stage approach has limitations in both the stages. While OS-CFAR fails to detect target in the case of clutter, multipath reflection, or interference for a fixed , DBSCAN fails to make cluster for sparse range-Doppler images (RDIs) and also is very sensitive to its hyperparameters. As a result, it may either lead to false detection, miss detections of real targets or target splits leading to multiple targets (when in reality it is only one target) in RDIs. However, the recent advancement in deep neural networks (DNNs) and its application in target segmentation makes DNNs an ideal algorithm for this problem. Unlike fixed rule-based methods, DNNs are capable of learning from low-level to high-level representations. The problem of two-stage target detection is treated as binary image segmentation problem in literature where target's cluster is considered as foreground and remaining information in range-Doppler map as background, as illustrated in Figure 1.5b. In [6, 7], authors have successfully demonstrated single-stage target detection on RDIs while suppressing the effect of scatterings from extended targets, multipath reflections and ghost targets. Further, in