Federated Learning for Future Intelligent Wireless Networks -  - E-Book

Federated Learning for Future Intelligent Wireless Networks E-Book

0,0
111,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Federated Learning for Future Intelligent Wireless Networks Explore the concepts, algorithms, and applications underlying federated learning In Federated Learning for Future Intelligent Wireless Networks, a team of distinguished researchers deliver a robust and insightful collection of resources covering the foundational concepts and algorithms powering federated learning, as well as explanations of how they can be used in wireless communication systems. The editors have included works that examine how communication resource provision affects federated learning performance, accuracy, convergence, scalability, and security and privacy. Readers will explore a wide range of topics that show how federated learning algorithms, concepts, and design and optimization issues apply to wireless communications. Readers will also find: * A thorough introduction to the fundamental concepts and algorithms of federated learning, including horizontal, vertical, and hybrid FL * Comprehensive explorations of wireless communication network design and optimization for federated learning * Practical discussions of novel federated learning algorithms and frameworks for future wireless networks * Expansive case studies in edge intelligence, autonomous driving, IoT, MEC, blockchain, and content caching and distribution Perfect for electrical and computer science engineers, researchers, professors, and postgraduate students with an interest in machine learning, Federated Learning for Future Intelligent Wireless Networks will also benefit regulators and institutional actors responsible for overseeing and making policy in the area of artificial intelligence.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 427

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.


Ähnliche


Table of Contents

Cover

Table of Contents

Serie Page

Title Page

Copyright

About the Editors

Preface

1 Federated Learning with Unreliable Transmission in Mobile Edge Computing Systems

1.1 System Model

1.2 Problem Formulation

1.3 A Joint Optimization Algorithm

1.4 Simulation and Experiment Results

Bibliography

Note

2 Federated Learning with non‐IID data in Mobile Edge Computing Systems

2.1 System Model

2.2 Performance Analysis and Averaging Design

2.3 Data Sharing Scheme

2.4 Simulation Results

Bibliography

Note

3 How Many Resources Are Needed to Support Wireless Edge Networks

3.1 Introduction

3.2 System Model

3.3 Wireless Bandwidth and Computing Resources Consumed for Supporting FL‐Enabled Wireless Edge Networks

3.4 The Relationship between FL Performance and Consumed Resources

3.5 Discussions of Three Cases

3.6 Numerical Results and Discussion

3.7 Conclusion

3.8 Proof of Corollary 3.2

3.9 Proof of Corollary 3.3

References

4 Device Association Based on Federated Deep Reinforcement Learning for Radio Access Network Slicing

4.1 Introduction

4.2 System Model

4.3 Problem Formulation

4.4 Hybrid Federated Deep Reinforcement Learning for Device Association

4.5 Numerical Results

4.6 Conclusion

Acknowledgment

References

5 Deep Federated Learning Based on Knowledge Distillation and Differential Privacy

5.1 Introduction

5.2 Related Work

5.3 System Model

5.4 The Implementation Details of the Proposed Strategy

5.5 Performance Evaluation

5.6 Conclusions

Bibliography

6 Federated Learning‐Based Beam Management in Dense Millimeter Wave Communication Systems

6.1 Introduction

6.2 System Model

6.3 Problem Formulation and Analysis

6.4 FL‐Based Beam Management in UDmmN

6.5 Performance Evaluation

6.6 Conclusions

Bibliography

7 Blockchain‐Empowered Federated Learning Approach for An Intelligent and Reliable D2D Caching Scheme

7.1 Introduction

7.2 Related Work

7.3 System Model

7.4 Problem Formulation and DRL‐Based Model Training

7.5 Privacy‐Preserved and Secure BDRFL Caching Scheme Design

7.6 Consensus Mechanism and Federated Learning Model Update

7.7 Simulation Results and Discussions

7.8 Conclusion

References

8 Heterogeneity‐Aware Dynamic Scheduling for Federated Edge Learning

8.1 Introduction

8.2 Related Works

8.3 System Model for FEEL

8.4 Heterogeneity‐Aware Dynamic Scheduling Problem Formulation

8.5 Dynamic Scheduling Algorithm Design and Analysis

8.6 Evaluation Results

8.7 Conclusions

8.A Appendices

References

Note

9 Robust Federated Learning with Real‐World Noisy Data

9.1 Introduction

9.2 Related Work

9.3 FedCorr

9.4 Experiments

9.5 Further Remarks

Bibliography

10 Analog Over‐the‐Air Federated Learning: Design and Analysis

10.1 Introduction

10.2 System Model

10.3 Analog Over‐the‐Air Model Training

10.4 Convergence Analysis

10.5 Numerical Results

10.6 Conclusion

Bibliography

11 Federated Edge Learning for Massive MIMO CSI Feedback

11.1 Introduction

11.2 System Model

11.3 FEEL for DL‐Based CSI Feedback

11.4 Simulation Results

11.5 Conclusion

Bibliography

Note

12 User‐Centric Decentralized Federated Learning for Autoencoder‐Based CSI Feedback

12.1 Autoencoder‐Based CSI Feedback

12.2 User‐Centric Online Training for AE‐Based CSI Feedback

12.3 Multiuser Online Training Using Decentralized Federated Learning

12.4 Numerical Results

12.5 Conclusion

Bibliography

Index

End User License Agreement

List of Tables

Chapter 6

Table 6.1 Summary of key notations.

Table 6.2 Key simulation parameters.

Table 6.3 Structure of the neural network.

Chapter 8

Table 8.1 Unbalanced data sizes for devices, with MNIST and CIFAR‐10 dat...

Table 8.2 Parameter settings related to the communication, computation, and...

Table 8.3 Test accuracy comparisons among different algorithms.

Chapter 9

Table 9.1 List of datasets used in our experiments.

Table 9.2 Average (five trials) and standard deviation of the best test acc...

Table 9.3 Average (five trials) and standard deviation of the best test accu...

Table 9.4 Best test accuracies on Clothing1M with IID setting.

Table 9.5 Average (five trials) and standard deviation of the best test acc...

Table 9.6 A comparison of communication efficiency for different methods on...

Table 9.7 Ablation study results (average and standard deviation of five tr...

Chapter 11

Table 11.1 Channel generation settings.

Chapter 12

Table 12.1 in dB of multiuser user‐centric online training using DFL.

Table 12.2 in dB of multiuser user‐centric online training using DFL with...

List of Illustrations

Chapter 1

Figure 1.1 The system model of wireless FL.

Figure 1.2 The test accuracy of optimized FL with Algorithm 1.3 trained by M...

Figure 1.3 Test accuracy and energy consumption for FL followed by different...

Figure 1.4 Test accuracy and energy consumption for FL followed by different...

Chapter 2

Figure 2.1 Test accuracy performance with different data distribution diverg...

Figure 2.2 Test accuracy performance with balanced and unbalanced datasets. ...

Figure 2.3 Performance of FL with optimized data sharing scheme, (a) test ac...

Chapter 3

Figure 3.1 FL‐enabled wireless edge networks.

Figure 3.2 The process of a communication round.

Figure 3.3 Comparison of the probability of successful transmission in the u...

Figure 3.4 Comparison of the probability of successful transmission in the d...

Figure 3.5 Comparison of bandwidth consumption in the uplink.

Figure 3.6 Comparison of bandwidth consumption in the downlink.

Figure 3.7 Comparison of the computing resource consumption.

Figure 3.8 Local training during each communication round.

Figure 3.9 Convergence.

Figure 3.10 Comparison of the global accuracy loss.

Figure 3.11 The relationship between training accuracy and testing accuracy....

Figure 3.12 The relationship between available bandwidth in the uplink and g...

Figure 3.13 The relationship between available computing resources and local...

Figure 3.14 The trade‐off between the computing resources and bandwidth.

Chapter 4

Figure 4.1 The NS‐based mobile network model.

Figure 4.2 The hybrid federated deep reinforcement learning architecture‐bas...

Figure 4.3 The process of HDRL.

Figure 4.4 The relationship between the number of communication rounds and t...

Figure 4.5 Convergence of HDRL.

Figure 4.6 Partial convergence curve of Figure 4.5 within .

Figure 4.7 The performance of the total long‐term reward.

Figure 4.8 Comparison of network throughput as a function of the number of d...

Figure 4.9 Comparison of handoff cost of four schemes.

Figure 4.10 Comparison of communication efficiency of four schemes.

Chapter 5

Figure 5.1 Knowledge distillation‐based federated learning data fusion archi...

Figure 5.2 Accuracy on Mnist with different differential privacy protection ...

Figure 5.3 Accuracy on EMnist with different differential privacy protection...

Chapter 6

Figure 6.1 An illustration of ultradense mmWave network.

Figure 6.2 Beam management operations in time slot .

Figure 6.3 Beam management based on FL (BMFL) in UDmmN.

Figure 6.4 The process of a communication round.

Figure 6.5 Procedure of Wave‐assisted mmWave association.

Figure 6.6 Convergence of the BMFL in UDmmN.

Figure 6.7 Comparisons of user coverage for the BMFL, BFS, and EDB, versus (...

Figure 6.8 Comparisons of network throughput for the BMFL, BFS, and EDB, giv...

Figure 6.9 Comparisons of user coverage for the BMFL, BMCL, and BMDL, versus...

Figure 6.10 Comparisons of network throughput for the BMFL, BMCL, and BMDL, ...

Chapter 7

Figure 7.1 Three different caching schemes in the communication community, (...

Figure 7.2 Blockchain and federated learning framework of the D2D caching sy...

Figure 7.3 Blockchain‐empowered federated model update flow chart.

Figure 7.4 Training process of BDRFL under different learning rate.

Figure 7.5 Training process of BDRFL with different DQN‐based methods.

Figure 7.6 The reduced value of average latency under different UE numbers. ...

Chapter 8

Figure 8.1 An illustration of FEEL with heterogeneous devices.

Figure 8.2 Average scheduled data size per round versus .

Figure 8.3 Average scheduled data size per round versus .

Figure 8.4 Completion time of global aggregation versus rounds.

Figure 8.5 Average energy consumption per device versus device index.

Figure 8.6 Learning performance with MNIST dataset,  s,  J. (a) Test accur...

Figure 8.7 Learning performance with MNIST dataset,  s,  J. (a) Test accur...

Figure 8.8 Learning performance with CIFAR‐10 dataset,  s,  J. (a) Test ac...

Figure 8.9 Learning performance with CIFAR‐10 dataset,  s,  J. (a) Test ac...

Chapter 9

Figure 9.1 An overview of

FedCorr

, organized into three stages. Algorithm st...

Figure 9.2 Depiction of non‐IID partitions for different parameters.

Figure 9.3 Empirical evaluation of LID score (a) and cumulative LID score (b...

Figure 9.4 Best test accuracies of three FL methods combined with

FedCorr

on...

Chapter 10

Figure 10.1 An illustration of the model training process. In each round, ag...

Figure 10.2 The examples of the ‐stable random variables. (a) Plots the pro...

Figure 10.3 Simulation results of the training loss of MLP on the MNIST data...

Figure 10.4 Simulation results of the training loss of MLP on the MNIST data...

Figure 10.5 Simulation results for training on the MNIST data set, under dif...

Chapter 11

Figure 11.1 Illustration of DL‐based CSI feedback and the detailed architect...

Figure 11.2 Illustration of FEEL‐based training framework for DL‐based CSI f...

Figure 11.3 Position of UEs in UMi.

Figure 11.4 Performance of IL‐, CL‐, and FEEL‐based training frameworks for ...

Figure 11.5 Performance of FEEL‐based training framework for DL‐based CSI fe...

Figure 11.6 Performance of FEEL‐based training framework for DL‐based CSI fe...

Chapter 12

Figure 12.1 Illustration of the AE‐based CSI feedback.

Figure 12.2 Illustration of the user‐centric online training for AE‐based CS...

Figure 12.3 Illustration of the framework, which edits the CSI before compre...

Figure 12.4 Illustration of the framework, which directly edits the codeword...

Figure 12.5 Illustration of the framework, which maintains the original enco...

Figure 12.6 Illustration of the contact network among the neighbored UE.

Figure 12.7 Four different network topologies. (a) Line, (b) ring, (c) multi...

Figure 12.8 ...

Figure 12.9 (dB) against when the UE moves to a new area.

Guide

Cover

Table of Contents

Series Page

Title Page

Copyright Page

About the Editors

Preface

Begin Reading

Index

End User License Agreement

Pages

ii

iii

iv

xv

xvi

xvii

xviii

xix

xx

xxi

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

113

114

115

116

117

118

119

120

121

122

123

124

125

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

291

292

293

294

295

IEEE Press

445 Hoes Lane

Piscataway, NJ 08854

IEEE Press Editorial Board

Sarah Spurgeon, Editor in Chief

Jón Atli Benediktsson

Anjan Bose

James Duncan

Amin Moeness

Desineni Subbaram Naidu

Behzad Razavi

Jim Lyke

Hai Li

Brian Johnson

Jeffrey Reed

Diomidis Spinellis

Adam Drobot

Tom Robertazzi

Ahmet Murat Tekalp

Federated Learning for Future Intelligent Wireless Networks

Edited by

 

Yao Sun

University of Glasgow, UK

 

Chaoqun You

Singapore University of Technology and Design, Singapore

 

Gang Feng

University of Electronic Science and Technology of China, China

 

Lei Zhang

University of Glasgow, UK

Copyright © 2024 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.

Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per‐copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750‐8400, fax (978) 750‐4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748‐6011, fax (201) 748‐6008, or online at http://www.wiley.com/go/permission.

Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762‐2974, outside the United States at (317) 572‐3993 or fax (317) 572‐4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging‐in‐Publication Data Applied for:

[Hardback ISBN: 9781119913894]

Cover Design: Wiley

Cover Image: © Blue Planet Studio/Shutterstock

About the Editors

Yao Sun is currently a lecturer with James Watt School of Engineering, the University of Glasgow, Glasgow, United Kingdom. He has extensive research experience and has published widely in wireless networking research. He has won the IEEE Communication Society of TAOS Best Paper Award in 2019 ICC, IEEE IoT Journal Best Paper Award in 2022, and Best Paper Award in 22nd ICCT. He has been the guest editor for special issues of several international journals. He has served as TPC Chair for UCET 2021 and TPC member for a number of international flagship conferences, including ICC 2022, VTC Spring 2022, GLOBECOM 2020, and WCNC 2019. His research interests include intelligent wireless networking, semantic communications, blockchain systems, and resource management in next‐generation mobile networks. He is a senior member of IEEE.

Chaoqun You is a postdoctoral research fellow at the Singapore University of Technology and Design (SUTD). She received the BS degree in communication engineering and the PhD degree in communication and information systems from the University of Electronic Science and Technology of China (UESTC) in 2013 and 2020, respectively. She was a visiting student at the University of Toronto from 2015 to 2017. Her current research interests include mobile edge computing, network virtualization, O‐RAN, federated learning, and 6G.

Gang Feng received his BEng and MEng degrees in electronic engineering from the University of Electronic Science and Technology of China (UESTC) in 1986 and 1989, respectively, and the PhD degree in Information Engineering from the Chinese University of Hong Kong in 1998. At present, he is a professor with the National Key Laboratory of Wireless Communications, UESTC of China. His research interests include resource management in wireless networks, next‐generation cellular networks, etc. Dr. Feng is a senior member of IEEE.

Lei Zhang is a Professor of Trustworthy Systems at the University of Glasgow. He has combined academia and industry research experience on wireless communications and networks and distributed systems for IoT, blockchain, and autonomous systems. His 20 patents have been granted/filed in 30+ countries/regions. He published 3 books and 150+ papers in peer‐reviewed journals, conferences, and edited books. He received the IEEE Internet of Things Journal Best Paper Award 2022, IEEE ComSoc TAOS Technical Committee Best Paper Award 2019, and IEEE ICEICT'21 Best Paper Award.

Preface

It has been considered one of the key missing components in the existing 5G network and is widely recognized as one of the most sought‐after functions for next‐generation 6G communication systems. Nowadays, there are more than 10 billion Internet‐of‐Things (IoT) equipment and 5 billion smartphones that are equipped with artificial intelligence (AI)‐empowered computing modules such as AI chips and GPU. On the one hand, the user equipment (UE) can be potentially deployed as computing nodes to process certain emerging service tasks such as crowdsensing tasks and collaborative tasks, which paves the way for applying AI in edge networks. On the other hand, in the paradigm of machine learning (ML), the powerful computing capability on these UEs can decouple ML from acquiring, storing, and training data in data centers as conventional methods.

Federated learning (FL) has been widely acknowledged as one of the most essential enablers to bring network edge intelligence into reality, as it can enable collaborative training of ML models while enhancing individual user privacy and data security. Empowered by the growing computing capabilities of UEs, FL trains ML models locally on each device where the raw data never leaves the device. Specifically, FL uses an iterative approach that requires a number of global iterations to achieve a global model accuracy. In each global iteration, UEs take a number of local iterations up to a local model accuracy. As a result, the implementation of FL at edge networks can also decrease the costs of transmitting raw data, relieve the burden on backbone networks, reduce the latency for real‐time decisions.

This book would explore recent advances in the theory and practice of FL, especially when it is applied to wireless communication systems. In detail, the book covers the following aspects:

1) principles and fundamentals of FL;

2) performance analysis of FL in wireless communication systems;

3) how future wireless networks (say 6G networks) enable FL as well as how FL frameworks/algorithms can be optimized when applying to wireless networks (6G);

4) FL applications to vertical industries and some typical communication scenarios.

Chapter 1 investigates the optimization design of FL in the edge network. First, an optimization problem is formulated to manage the trade‐off between model accuracy and training cost. Second, a joint optimization algorithm is designed to optimize the model compression, sample selection, and user selection strategies, which can approach a stationary optimal solution in a computationally efficient way. Finally, the performance of the proposed optimization scheme is evaluated by numerical simulation and experiment results, which show that both the accuracy loss and the cost of FL in the edge network can be reduced significantly by employing the proposed algorithm.

Chapter 2 studies non‐IID data model for FL, derives a theoretical upper bound, and redesigns the federated averaging scheme to reduce the weight difference. To further mitigate the impact of non‐IID data, a data‐sharing scheme is designed to jointly minimize the accuracy loss, the energy consumption, and latency with constrained resource of edge systems. Then a computation‐efficient algorithm is proposed to approach the optimal solution and provide the experiment results to evaluate our proposed schemes.

Chapter 3 theoretically analyzes the performance and cost of running FL, which is imperative to deeply understand the relationship between FL performance and multiple‐dimensional resources. In this chapter, we construct an analytical model to investigate the relationship between the FL model accuracy and consumed resources in FL‐enabled wireless edge networks. Based on the analytical model, we explicitly quantify the model accuracy, computing resources, and communication resources. Numerical results validate the effectiveness of our theoretical modeling and analysis and demonstrate the trade‐off between the communication and computing resources for achieving a certain model accuracy.

Chapter 4 proposes an efficient device association scheme for radio access network (RAN) slicing by exploiting a federated reinforcement learning framework, with the aim to improve network throughput, while guaranteeing user privacy and data security. Specially, we use deep reinforcement learning to train local models on UEs under a hybrid FL framework, where horizontally FL is employed for parameter aggregation on BS, while vertically FL is employed for access selection aggregation on the encrypted party. Numerical results show that our proposed scheme can achieve significant performance gains in terms of network throughput and communication efficiency in comparison with some known state‐of‐the‐art solutions.

Chapter 5 proposes a deep FL algorithm that utilizes knowledge distillation and differential privacy to safeguard privacy during the data fusion process. Our approach involves adding Gaussian noise at different stages of knowledge distillation‐based FL to ensure privacy protection. Our experimental results demonstrate that this strategy provides better privacy preservation while achieving high‐precision IoT data fusion.

Chapter 6 presents a novel systematic beam control scheme to tackle the formulated beam management problem, which is difficult due to the nonconvex objective function. The double deep Q‐network (DDQN) under a FL framework is employed to solve the above optimization problem, thereby fulfilling adaptive and intelligent beam management in mmwave networks. In the proposed beam management scheme based on federated learning (BMFL), the non‐raw‐data aggregation can theoretically protect user privacy while reducing handoff costs. Moreover, a data cleaning technique is used before the local model training, with the aim to further strengthen the privacy protection while improving the learning convergence speed. Simulation results demonstrate the performance gain of the proposed BMFL scheme.

Chapter 7 proposes a double‐layer blockchain‐based deep reinforcement federated learning (BDRFL) scheme to ensure privacy‐preserved and caching‐efficient D2D networks. In BDRFL, a double‐layer blockchain is utilized to further enhance data security. Simulation results first verify the convergence of BDRFL‐based algorithm and then demonstrate that the download latency of the BDRFL‐based caching scheme can be significantly reduced under different types of attacks when compared with some existing caching policies.

Chapter 8 aims to design a dynamic scheduling policy to explore the spectrum flexibility for heterogeneous federated edge learning (FEEL) so as to facilitate the distributed intelligence in edge networks. This chapter proposes a heterogeneity‐aware dynamic scheduling problem to minimize the global loss function, with consideration of straggler and limited device energy issues. By solving the formulated problem, we propose a dynamic scheduling algorithm (DISCO), to make an intelligent decision on the set and order of scheduled devices in each communication round. Theoretical analysis reveals that under certain conditions, learning performance and energy constraints can be guaranteed in the DISCO. Finally, we demonstrate the superiority of the DISCO through numerical and experimental results, respectively.

Chapter 9 discusses FedCorr, a general multistage framework to tackle heterogeneous label noise in FL, which does not make any assumptions on the noise models of local clients while still maintaining client data privacy. Both theoretical analysis and experiment results demonstrate the performance gain of this novel FL framework.

Chapter 10 provides a general overview of the analog over‐the‐air federated learning (AirFL) system. Specially, we illustrate the general system architecture and highlight the salient feature of AirFL that adopts analog transmissions for fast (but noisy) aggregation of intermediate parameters. Then, we establish a new convergence analysis framework that takes into account the effects of fading and interference noise. Our analysis unveils the impacts from the intrinsic properties of wireless transmissions on the convergence performance of AirFL. The theoretical findings are corroborated by extensive simulations.

Chapter 11 investigates a FEEL‐based training framework to DL‐based channel state information (CSI) feedback. In FEEL, each UE trains an autoencoder network locally and exchanges model parameters via the base station. Therefore, data privacy is better protected compared with centralized learning because the local CSI datasets are not required to be uploaded. Neural network parameter quantization is then introduced to the FEEL‐based training framework to reduce communication overhead. The simulation results indicate that the proposed FEEL‐based training framework can achieve comparable performance with centralized learning.

Chapter 12 proposes a user‐centric online training strategy in which the UE can collect CSI samples in the stable area and adjust the pretrained encoder online to further improve CSI reconstruction accuracy. Moreover, the proposed online training framework is extended to the multiuser scenario to improve performance sequentially. The key idea is to adopt decentralized FL without BS participation to combine the sharing of channel knowledge among UEs, which is called crowd intelligence. Simulation results show that the decentralized FL‐aided framework has higher feedback accuracy than the AE without online training.

November 2023

Yao Sun

Chaoqun You

Gang Feng

Lei Zhang

1Federated Learning with Unreliable Transmission in Mobile Edge Computing Systems

Chenyuan Feng1, Daquan Feng1, Zhongyuan Zhao2, Howard H. Yang3, and Tony Q. S. Quek4

1Shenzhen Key Laboratory of Digital Creative Technology, The Guangdong Province Engineering Laboratory for Digital Creative Technology, The Guangdong‐Hong Kong Joint Laboratory for Big Data Imaging and Communication, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, Guangdong, China

2State Key Laboratory of Networking and Switching Technology, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China

3Zhejiang University/University of Illinois at Urbana‐Champaign Institute, Zhejiang University, The College of Information Science and Electronic Engineering, Haining, Zhejiang, China

4Information Systems Technology and Design Pillar, Singapore University of Technology and Design, Singapore

1.1 System Model

Consider the deployment of FL in an MEC scenario, which consists of an edge access point and multiple users . An edge computing server is equipped with , while a local computing unit is equipped with , . As shown in Figure 1.1, the edge computing server and local computing units can act as the computing server and the clients, respectively, which can interact with each other via the wireless channels between and .

As introduced previously, FL can be implemented by updating the local models and the global model iteratively. In particular, we focus on the th iteration, which can be introduced as follows.

1.1.1 Local Model Training

In this phase, each user updates the local model independently based on its local collected data. Without loss of generality, we focus on a specific user , the local model of can be updated as follows:

(1.1)

Figure 1.1 The system model of wireless FL.

where and denote the update results of 's local model during the th and ‐th iterations, respectively, denotes the training dataset for updating , which is randomly selected from , , denotes the local dataset located at , is the learning rate of the th iteration, and is the gradient of loss function with respect to . In this chapter, the loss function is defined as the empirical risk with respect to , which can be defined as follows:

(1.2)

where denotes the loss function of the data element , and denotes the size of .

1.1.2 Update Result Feedback via the Wireless Channels

When the local model training procedure is accomplished, should transmit its update result to via the wireless channel. In the existing works, the server randomly selects the users since it is assumed that the communications between the computing server and the clients are ideal. However, it cannot be ensured in the MEC systems due to the unreliable wireless transmission circumstances, which will cause accuracy loss of FL models. Therefore, in this chapter, only the users with high communication reliability and low model accuracy loss are scheduled to participate in each iteration of global model averaging. In particular, the scheduling status of for the th iteration of global averaging is characterized by a Boolean variable , i.e.,

(1.3)

If is scheduled, its update result can be modeled as a vector, which is usually with high dimension, especially for the deep neural network models. Therefore, to improve the efficiency of update result feedback, model sparsification and parameter quantization techniques should be employed to compress . As introduced previously, can be transformed into a sparse form via model sparsification, which can be expressed as follows:

(1.4)

where denotes a sparsification matrix for .

Next, each element of is quantized independently by employing uniform quantization. The quantization error can be approximated as an additive Gaussian noise, which is independent with . Then the quantized parameter vector can be expressed as

(1.5)

where denotes a quantization noise vector, i.e., , and denotes the covariance matrix. Due to the implementation of independent quantization, each element of is independent with each other, i.e., , , , where and denote the th and th elements of , respectively. Therefore, is a diagonal matrix, which can be denoted as .

After model sparsification and parameter quantization, is suitable for baseband processing and wireless transmissions. In this chapter, the flat fading model is employed to characterize the wireless channels between and . Therefore, the channel fading can be assumed to be unchanged during the transmission of . Then the observation of at can be expressed as

(1.6)

where captures the flat channel fading of the wireless link between and , denotes the additive white Gaussian noise at , i.e., , denotes a identity matrix, and is the power of noise.

1.1.3 Global Model Averaging

To recover the update results of local models, should be first decompressed by . In this chapter, the minimum mean‐square error (MMSE) criterion is employed, and the decompression result can be written as

(1.7)

where is a decompression matrix of , denotes a set that consists of all the possible quantized parameter vectors, i.e., . Since each element of the quantized model parameter vector can be detected individually, recalling the computational complexity of MMSE, the complexity of this detection is a linear function of the vector dimension.1

Then the global model can be updated by averaging the decompressed results of local models. As introduced in Konecný et al. [2016], the update result of global model can be expressed as

(1.8)

where and denote the global model for the th and ‐th iterations, respectively, is defined by (1.3), .

After global model averaging, is sent back to the users. Since are transmitted via downlink transmissions, which can acquire more radio resource and higher transmit power than the local model update phase. Therefore, it can be assumed that is received successfully by all the users. Then the local model of each user can be updated as , .

1.2 Problem Formulation

The performance of existing learning techniques is mainly determined by the accuracy of generated learning models. It is difficult to be modeled in a tractable form, and thus cannot be optimized by employing the existing resource management schemes in the MEC systems. In this section, we first derived a closed‐form upper bound of model accuracy loss, which is an efficient metric to evaluate the quality of FL models. Then an optimization problem is formulated to improve the model accuracy and training efficiency of FL with limited budget of computation and communication resources.

1.2.1 Model Accuracy Loss

As introduced in Konecný et al. [2016], the objective of model training is to minimize the expected risk, which can be estimated by employing the the empirical risk given by (1.2). Therefore, the model accuracy loss, which can be defined as the expected mean square error (MSE) with respect to the the empirical risk and the expected risk, is an efficient metric to evaluate the accuracy performance of generated models. In this chapter, the model accuracy loss after iterations of global averaging can be expressed as

(1.9)

where denotes the empirical risk with respect to for the th iteration, denotes the minimum expected risk of our studied FL model, i.e., , and denotes the theoretically optimal parameter vector.

As introduced in Shai et al. [2010], to guarantee the convergence of FL, the following features with respect to the empirical risk should be satisfied:

is ‐Lipchitz, i.e., for any and , we have

is ‐strongly convex. i.e., for any and , and , the following inequality can be established:

By substituting (1.8) into (1.9), an upper bound of can be expressed as follows due to the convexity of :

(1.10)

In the paradigms of FL, each iteration can be optimized independently. Without loss of generality, we focus on the optimization of the th iteration in this chapter, and thus in (1.10) can be treated as a constant. Therefore, is mainly decided by , which can be derived as follows:

(1.11)

where is given by (1.2), and denotes the expected risk with respect to . As shown in (1.11), is determined by the communication loss , sample selection loss , and model training loss , which can be derived as follows.

1.2.2 Communication Loss

The communication loss is jointly decided by the model compression loss and the communication error caused by wireless transmissions. Then an upper bound of can be provided by the following lemma.

Lemma 1.1Whenis‐Lipchitz, an upper bound of communication lossin (1.11) can be expressed as

(1.12)

where, is the power of quantization noise, andis the power of Gaussian noise.

1.2.3 Sample Selection Loss

Although the training dataset is generated based on the distribution of , the sample selection bias still exists due to the limited size of . Since is trained based on , the sample selection bias causes error propagation during the model training procedure, which is characterized as sample selection loss in this chapter. As shown in (1.11), is defined as the difference between the empirical risk with respect to and the expected risk with respect to the distribution of data elements. As introduced in Devroye et al. [1996], an upper bound of can be provided as follows.

Remark 1.1 ((Devroye et al. [1996], Theorem 12.6) When is convex, a tractable upper bound of can be expressed as follows:

(1.13)

1.2.4 Model Training Loss

As introduced in Section 1.1, is generated by employing stochastic gradient descent (SGD) method, which is optimized in an iterative way. Theoretically, it can approach the optimal training result (Rakhlin et al. [2017]. However, there always exists a gap between and since only a few iterations can be executed to guarantee the computing efficiency. In this chapter, it is captured as model training loss in (1.11). To evaluate its impact on the model accuracy, the following lemma is provided.

Lemma 1.2 Whenis‐Lipchitz and‐strongly convex, andis with finite scale, i.e.,, an upper bound of model training losscan be written as

(1.14)

By substituting (1.11), (1.12), (1.13), and (1.14) into (1.10), an upper bound of can be derived, which is given by the following corollary.

Corollary 1.1 An upper bound of model accuracy loss given by (1.9), which is denoted as , can be expressed as

(1.15)

It is worth mentioning that it is unnecessary to obtain and with high computation costs based on the following schemes. First, we can obtain without additional computation overhead since the global loss can be computed as a linear combination of local loss with respect to each participant, and the local loss can be obtained by via local model training. Second, the true value of is known for some specific learning tasks, such as the supervised multiclassification tasks. As introduced in McMahan et al. [2017], since all image data can be labeled correctly when the global optimal parameter setting can be obtained, the corresponding minimum expected risk can be set as for the supervised multiclassification tasks. Moreover, when the true value of is unknown, zero can be treated as a lower bound of the expected risk function, and we also have since is global optimal solution. By far, an upper bound of can be derived as , and still can capture the relationship between the model accuracy loss and the selection strategies of participants and data samples, which can be substituted into (1.15) instead of .

1.2.5 Problem Formulation

1.2.5.1 Objective Function

Since the resource of MEC systems is restricted, a sophisticated trade‐off between the model accuracy and cost should be kept to guarantee that a high‐quality model can be generated in a cost‐efficient way. In this chapter, two categories of costs are considered, which are the computation and communication costs, respectively.

Each data element is processed by the same learning model, which means the computation cost is proportional to the size of employed training dataset. In accordance with (Zhao et al. [2020], Chen et al. [2021], we model the computation cost as a linear function of . The compression and communication cost is modeled as the mutual information between and since the mutual information can characterize the capacity of wireless link for a compressed signal to be successfully recovered (Zhang et al. [2007]. This standard rate‐distortion theoretic metric (El Gamal and Kim [2011] in the analysis of compression design can provide useful insight to deploy FL with successful compression schemes. Then the model cost of FL can be expressed as

(1.16)

where and denote the weights of computation and communication costs, respectively.

In this chapter, our target is to jointly minimize the model accuracy loss and the costs of FL, and the objective function can be modeled as

(1.17)

where and denote the weights of and , respectively.

1.2.5.2 Energy Consumption Constraint

Due to the limitation of battery volume, the energy consumption of each user should be restricted independently. During the th iteration, the user energy consumption is mainly caused by local model training and update result feedback. In particular, the energy consumption of local model training grows linearly with respect to the size of training dataset , and the transmit energy consumption can be derived based on (1.5). Then the energy consumption constraint can be expressed as follows:

(1.18)

where denotes the coefficient of energy consumption caused by processing a single data element in the local model training phase, and is the maximum energy consumption for the th iteration of FL.

1.2.5.3 User Selection Constraint

As introduced in Section 1.1, only a part of users are selected to feedback their update results. To guarantee the convergence performance and computing efficiency, the maximum number of selected users in the th iteration is set as , . Recalling (1.3), the following constraint of user selection is established:

(1.19)

1.2.5.4 Data Volume Constraint of Local Training Datasets

By employing sophisticated sample selection strategies, the impact of sample selection loss can be mitigated significantly. In particular, the local training data‐set is selected randomly from , and thus its data volume should follow the constraint

(1.20)

where denotes the data volume of , and denotes the set of nonnegative integers. In this chapter, we focus on the minimization of model accuracy loss and cost of FL in the MEC systems, which can be captured by the upper bound given by (1.15). Therefore, the optimization problem can be formulated as follows:

(1.21)

1.3 A Joint Optimization Algorithm

The established optimization problem given by (1.21) is a nonlinear and nonconvex problem. In this section, to obtain a tractable solution, it is first decoupled as three independent subproblems, and an iterative optimization algorithm is designed. Then we prove that the proposed algorithm can approach a stationary optimal solution.

1.3.1 Compression Optimization

As shown in (1.15) and (1.16), and only determine the accuracy loss and cost of communications, and thus an equivalent form of the objective function can be expressed as follows by removing the terms that are not related with and :

(1.22)

where

(1.23)
(1.24)

Equation (1.22) indicates that the compression of each user can be optimized independently by minimizing , which is restricted by an individual energy consumption constraint. Therefore, the optimization of compression and decompression matrices can be transformed into ‐dependent subproblems, which can be expressed as

(1.25)

We first consider solving and independently, and then propose Algorithm 1.1 to optimize them jointly.

1.3.1.1 Optimization of

The optimization problem of is identical with (1.25) when is fixed. First, we verify the convexity of , , and , which can be provided by the following lemma. Since , , and can be treated as functions with respect to , they are denoted as , , , and in this part, respectively.

Lemma 1.3andgiven by (1.23) are both convex, whilegiven by (1.24) is concave.

Lemma 1.3 shows that (1.25) is a convex–concave procedure problem, where both the objective function and constraints can be treated as a difference of two convex functions (Stephen and Lieven [2004]. It can be solved by using majorization minimization algorithm. The key idea is to replace the concave terms, i.e., in (1.25), by using successive convex approximation, which can transform the original optimization problem into a convex form. As shown in Algorithm 1.1, a stationary solution of (1.25) can be obtained in an iterative approach. Without loss of generality, we focus on the th iteration. By using first‐order Taylor expansion, can be approximated in a convex form, which can be expressed as follows:

(1.26)

where denotes the compression matrix of the th iteration, and is update result of the ‐th iteration. Based on (1.26), an approximated problem of (1.25) can be established as follows:

(1.27)

The approximated problem given by (1.27) is a convex problem satisfying the KKT condition, which can be solved efficiently by using the optimization package (Stephen and Lieven [2004].

In Algorithm 1.1, can be optimized iteratively, which can be updated by solving (1.27). To verify the convergence of Step 1 in Algorithm 1.1, the following theorem with respect to the descent of is provided.

Theorem 1.1Denoting as the optimal solution of (1.27) for the th iteration, the following inequality with respect to and can be established:

(1.28)

Theorem 1.1 indicates that keeps decreasing as increases. Moreover, a tractable lower bound of is 0. Therefore, Step 1 of Algorithm 1.1 can converge to a stationary point.

1.3.1.2 Optimization of

When the compression matrix is fixed, the optimization is equivalent to solve the following unconstraint problem:

(1.29)

To provide a tractable solution of (1.29), the following lemma with respect to the convexity of is provided.

Lemma 1.4 given by (1.23) is a convex function with respect to.

Based on Lemma 1.4, the optimal solution of (1.29) can be derived straightforwardly by solving the equation and can be expressed as

(1.30)

1.3.2 Joint Optimization of and

As shown in Algorithm 1.1, and can be updated iteratively. During each iteration, can be optimized by solving (1.27), and is updated based on (1.30). As introduced previously, given by (1.22) keeps decreasing as increases, since and are updated based on the optimal solution. Therefore, the convergence of Algorithm 1.1 can be guaranteed.

1.3.3 Optimization of Sample Selection

When we focus on the optimization of sample selection, given by (1.17) is equivalent to the following function with respect to :

(1.31)

where , . Moreover, recalling (1.21), the constraints with respect to , which are given by (1.18) and (1.20), can be rewritten as follows:

(1.32)

where denotes the floor function with respect to its argument. Similar to (1.25), the sample selection of each user can be optimized independently, and then the optimization problem of sample selection can be established as follows:

(1.33)

To solve (1.33) efficiently, we first consider a relaxed problem, where can be treated as a continuous variable. It is a fractional programming problem, and its objective function is nonlinear and nonconvex with respect to . Due to Theorem 1 in Zhao et al. [2020], given by (1.31) can be transformed into a linear form, and its minimum value can be achieved if and only if the following constraint is satisfied:

(1.34)

where

(1.35)

denotes the minimum value of , and is the corresponding optimal solution. Based on (1.34), the following optimization problem should be studied to obtain the optimal solution of (1.33):

(1.36)

Since is continuous and derivable with respect to , the optimal solution of (1.36) locates at either the stationary point or the boundary point, which can be expressed as

(1.37)

where denotes the solution of equation .

Based on (1.34) and (1.37), Algorithm 1.2 can be designed to approach the optimal solution of (1.32) iteratively. In particular, during the th iteration, is first updated by solving the following problem:

(1.38)

where denotes the update results of for the th iteration, and is the value of for the ‐th iteration. The optimal solution of (1.38) can be obtained straightforwardly based on (1.37). Then can be updated as follows:

(1.39)

To evaluate the equivalency of (1.32) and (1.38), should be calculated as follows:

(1.40)

Based on Theorem 3 in Zhao et al. [2020], it can be proved that approaches 0 monotonically as increases, which means that Algorithm 1.2 can approach the optimal solution of the original fractional optimization problem.

Denoting as the final update result that satisfies the termination constraints, the optimal solution of (1.32) can be expressed as follows by rounding:

(1.41)

where is the ceiling function with respect to its argument.

1.3.4 Optimization of User Selection

Based on (1.21), by fixing , , and , the subproblem of user selection can be written as follows:

(1.42)

where

(1.43)

given by (1.43) denotes the utility of for the th iteration of FL, which in practical is positive by choosing suitable and . Therefore, (1.42) is a 0‐1 backpack problem and can be solved efficiently by using dynamic programming. When is negative, in order to minimize , could never be selected, which means . After removing users with from set of candidate users, the remaining problem turns into a standard 0–1 backpack problem, which can by solved by dynamic programming.

1.3.5 A Joint Optimization Algorithm

Since all the subproblems can be solved efficiently, a joint optimization algorithm can be proposed by optimizing , , , and iteratively. As shown in Algorithm 1.3, during each iteration, the update results of and can be obtained by employing Algorithm 1.1, can be updated based on Algorithm 1.2, and can be obtained by solving (1.42).

As introduced previously, the optimal solution of each subproblem can be obtained, which can guarantee that given by (1.17) keeps decreasing as the iteration index increases. Moreover, 0 can be treated as a fixed lower bound of , and thus Algorithm 1.3 converges to the optimal solution of (1.21) with limited iterations. The final update results can be obtained when it achieves the maximum iteration time, or satisfies the accuracy requirement. Due to the non‐convexity of the formulated problem, the proposed iterative solution can only guarantee the convergence to the stationary point for one time. To overcome this defect, in our work, we use random initialization and run multiple simulations. Moreover, our proposed algorithm can be executed in parallel with multiple initialization values to search multiple stationary points individually. The best result selected among these stationary points can approach to the optimal solution. Therefore, we have the following corollary.

1.4 Simulation and Experiment Results

In this section, both the numerical simulation and experiment results are provided to evaluate the performance of our proposed optimization algorithm. Unlike the conventional works that focus on the data center networks, we study the implementation of FL at the edge of wireless networks, where each base station can associate with dozens of users at most. Therefore, we set the number of users is as and 50, the ratio of maximum number of selected users as . The maximum iteration indexes of Algorithm 1.3 and federated global averaging are set as and , respectively, and the learning rate is . We consider multiclassification learning task and MNIST dataset (Xiao et al. [2017], which is a commonly used image dataset. The entire training dataset consists of 60,000 training images and 10,000 testing images, which can be classified into 10 different digits. The following two cases with respect to local training datasets are considered, which are named as balanced and unbalanced dataset cases, respectively: (i) In the balanced dataset case, each client randomly and independently sample from the entire training dataset and is identically set as 600; (ii) In the unbalanced dataset case, each client independently and randomly sample two different digits from the training dataset and is set as a random integer variable that follows uniform distribution with a mean value of 600 for fair comparison, namely and , which means all clients will only have two kinds of labels at most and have different size of local dataset.

In this part, the performance of Algorithm 1.3