Deep Reinforcement Learning for Wireless Communications and Networking - Dinh Thai Hoang - E-Book

Deep Reinforcement Learning for Wireless Communications and Networking E-Book

Dinh Thai Hoang

0,0
103,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Deep Reinforcement Learning for Wireless Communications and Networking Comprehensive guide to Deep Reinforcement Learning (DRL) as applied to wireless communication systems Deep Reinforcement Learning for Wireless Communications and Networking presents an overview of the development of DRL while providing fundamental knowledge about theories, formulation, design, learning models, algorithms and implementation of DRL together with a particular case study to practice. The book also covers diverse applications of DRL to address various problems in wireless networks, such as caching, offloading, resource sharing, and security. The authors discuss open issues by introducing some advanced DRL approaches to address emerging issues in wireless communications and networking. Covering new advanced models of DRL, e.g., deep dueling architecture and generative adversarial networks, as well as emerging problems considered in wireless networks, e.g., ambient backscatter communication, intelligent reflecting surfaces and edge intelligence, this is the first comprehensive book studying applications of DRL for wireless networks that presents the state-of-the-art research in architecture, protocol, and application design. Deep Reinforcement Learning for Wireless Communications and Networking covers specific topics such as: * Deep reinforcement learning models, covering deep learning, deep reinforcement learning, and models of deep reinforcement learning * Physical layer applications covering signal detection, decoding, and beamforming, power and rate control, and physical-layer security * Medium access control (MAC) layer applications, covering resource allocation, channel access, and user/cell association * Network layer applications, covering traffic routing, network classification, and network slicing With comprehensive coverage of an exciting and noteworthy new technology, Deep Reinforcement Learning for Wireless Communications and Networking is an essential learning resource for researchers and communications engineers, along with developers and entrepreneurs in autonomous systems, who wish to harness this technology in practical applications.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 453

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

Copyright

Dedication

Notes on Contributors

Foreword

Preface

Acknowledgments

Acronyms

Introduction

Part I: Fundamentals of Deep Reinforcement Learning

1 Deep Reinforcement Learning and Its Applications

1.1 Wireless Networks and Emerging Challenges

1.2 Machine Learning Techniques and Development of DRL

1.3 Potentials and Applications of DRL

1.4 Structure of this Book and Target Readership

1.5 Chapter Summary

References

2 Markov Decision Process and Reinforcement Learning

2.1 Markov Decision Process

2.2 Partially Observable Markov Decision Process

2.3 Policy and Value Functions

2.4 Bellman Equations

2.5 Solutions of MDP Problems

2.6 Reinforcement Learning

2.7 Chapter Summary

References

3 Deep Reinforcement Learning Models and Techniques

3.1 Value-Based DRL Methods

3.2 Policy-Gradient Methods

3.3 Deterministic Policy Gradient (DPG)

3.4 Natural Gradients

3.5 Model-Based RL

3.6 Chapter Summary

References

4 A Case Study and Detailed Implementation

4.1 System Model and Problem Formulation

4.2 Implementation and Environment Settings

4.3 Simulation Results and Performance Analysis

4.4 Chapter Summary

References

Note

Part II: Applications of DRL in Wireless Communications and Networking

5 DRL at the Physical Layer

5.1 Beamforming, Signal Detection, and Decoding

5.2 Power and Rate Control

5.3 Physical-Layer Security

5.4 Chapter Summary

References

6 DRL at the MAC Layer

6.1 Resource Management and Optimization

6.2 Channel Access Control

6.3 Heterogeneous MAC Protocols

6.4 Chapter Summary

References

7 DRL at the Network Layer

7.1 Traffic Routing

7.2 Network Slicing

7.3 Network Intrusion Detection

7.4 Chapter Summary

References

8 DRL at the Application and Service Layer

8.1 Content Caching

8.2 Data and Computation Offloading

8.3 Data Processing and Analytics

8.4 Chapter Summary

References

Part III: Challenges, Approaches, Open Issues, and Emerging Research Topics

9 DRL Challenges in Wireless Networks

9.1 Adversarial Attacks on DRL

9.2 Multiagent DRL in Dynamic Environments

9.3 Other Challenges

9.4 Chapter Summary

References

10 DRL and Emerging Topics in Wireless Networks

10.1 DRL for Emerging Problems in Future Wireless Networks

10.2 Advanced DRL Models

10.3 Chapter Summary

References

Note

Index

End User License Agreement

List of Tables

Chapter 9

Table 9.1 Deep learning frameworks and their supported hardware platforms.

Chapter 10

Table 10.1 Transfer learning techniques.

Table 10.2 Deep reinforcement transfer learning strategies.

List of Illustrations

Chapter 1

Figure 1.1 A data-driven ML architecture.

Figure 1.2 Artificial neural network architecture.

Figure 1.3 Convolutional neural network architecture.

Figure 1.4 Recurrent neural network architecture.

Figure 1.5 LSTM network architecture.

Figure 1.6 An illustration of a reinforcement learning process.

Figure 1.7 An illustration of a DRL process.

Figure 1.8 Google DeepMind's DRL applications in playing games.

Figure 1.9 Applications of DRL in robotics.

Figure 1.10 Real-word applications of DRL. EHR; electronic health record, EM...

Figure 1.11 DRL applications in self-driving cars.

Figure 1.12 Structure of the book.

Chapter 2

Figure 2.1 An illustration of (a) MDP and (b) POMDP.

Chapter 3

Figure 3.1 Classical DQN architecture predicts Q-values.

Figure 3.2 Huber loss and square loss.

Figure 3.3 Dueling network predicting separately the advantages and state va...

Figure 3.4 Reward-to-go: summation of rewards collected starting from transi...

Figure 3.5 Actor-critic policy gradient architecture.

Figure 3.6 Gaussian distributions and .

Figure 3.7 Gaussian distribution and .

Figure 3.8 Clipping effect. (a):

A

 > 0. (b):

A

 < 0.

Figure 3.9 Model-free RL vs. model-based RL.

Figure 3.10 Sample-efficiency of different DRL approaches.

Figure 3.11 Model-based reinforcement learning process.

Chapter 4

Figure 4.1 System model.

Figure 4.2 Flowchart to express the actions of the transmitter.

Figure 4.3 Convergence rates of Q-learning and deep Q-learning algorithms.

Figure 4.4 Convergence rates of deep Q-learning with different learning rate...

Figure 4.5 Convergence rates of deep Q-learning with different decay factors...

Figure 4.6 Convergence rates of deep Q-learning with different activation fu...

Figure 4.7 Convergence rates of deep Q-learning with different optimizers.

Chapter 5

Figure 5.1 RIS-assisted MISO systems.

Figure 5.2 Beamforming for high mobility mmWave systems.

Figure 5.3 An MIMO system with one-bit ADCs.

Figure 5.4 A general HetNet, in which multiple APs share the same spectrum b...

Figure 5.5 An illustration of deception strategy to deal with reactive jammi...

Chapter 6

Figure 6.1 DRL-based model for MAC resource optimization.

Figure 6.2 DRL-based channel access control.

Figure 6.3 DRL-based model for IoT MAC with massive access.

Figure 6.4 DRL-based MAC for 5G.

Figure 6.5 Heterogeneous MAC protocols.

Chapter 7

Figure 7.1 An illustration of the SINET architecture.

Figure 7.2 An illustration of the SFC orchestration scheme based on DDPG. NF...

Figure 7.3 An illustration of the proposed radio access network-only slicing...

Figure 7.4 An illustration of the deep dueling network architecture used to ...

Figure 7.5 An illustration of the proposed resource slicing and customizatio...

Figure 7.6 An illustration of the proposed hybrid federated deep reinforceme...

Figure 7.7 An illustration of the distributed resource orchestration system ...

Figure 7.8 An illustration of the two-level control strategy for intelligent...

Figure 7.9 An illustration of the deep federated RL framework for network sl...

Figure 7.10 An illustration of uplink and downlink decoupled RAN framework f...

Chapter 8

Figure 8.1 An illustration of a VR network assisted by SBSs and UAVs.

Figure 8.2 An illustration of a video service assisted by computing and cach...

Figure 8.3 Illustrations of data/computation offloading models in mobile net...

Figure 8.4 An illustration of DRL approach for horizontal data partitioning ...

Figure 8.5 Illustration of a two-level framework for data compression.

Figure 8.6 Illustration of a data cloud tuning system using DRL.

Figure 8.7 The process of proposed RL based R-Tree (RLR) to answer queries....

Figure 8.8 An illustration of the proposed DRL framework for join order enum...

Chapter 9

Figure 9.1 A taxonomy of attacks on DRL models.

Figure 9.2 An illustration of attacks on observations of a DRL agent.

Figure 9.3 An illustration of attacks on the DRL-based adaptive traffic sign...

Figure 9.4 An illustration of attacks on the reward function of the DRL agen...

Figure 9.5 An illustration of attacks on the reward function of the DRL agen...

Figure 9.6 Comparisons among different Markov models. (a) MDP, (b) POMDP, (c...

Figure 9.7 Classification of applications of multiagent DRL in wireless netw...

Figure 9.8 An illustration of using two separated deep neural networks, i.e....

Figure 9.9 An illustration of DRL agent using reply memory to mitigate high-...

Chapter 10

Figure 10.1 The system model of iRDC.

Figure 10.2 Principle of operation of ambient backscatter communication.

Figure 10.3 RF-powered ambient backscattering system model. (a) Incumbent ch...

Figure 10.4 RIS architecture.

Figure 10.5 RIS system model.

Figure 10.6 RSMA network model.

Figure 10.7 Transfer learning model.

Figure 10.8 Enhanced GAN structure for DRL.

Guide

Cover

Title Page

Copyright

Dedication

Notes on Contributors

Foreword

Preface

Acknowledgments

Acronyms

Introduction

Table of Contents

Begin Reading

Index

End User License Agreement

Pages

ii

iii

iv

v

xiii

xiv

xv

xvi

xvii

xviii

xix

xx

xxi

xxii

1

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

109

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

IEEE Press445 Hoes LanePiscataway, NJ 08854

IEEE Press Editorial BoardSarah Spurgeon, Editor in Chief

Jón Atli BenediktssonAnjan BoseJames DuncanAmin MoenessDesineni Subbaram Naidu

 

Behzad RazaviJim LykeHai LiBrian Johnson

 

Jeffrey ReedDiomidis SpinellisAdam DrobotTom RobertazziAhmet Murat Tekalp

Deep Reinforcement Learning for Wireless Communications and Networking

Theory, Applications, and Implementation

 

Dinh Thai Hoang

University of Technology Sydney, Australia

Nguyen Van Huynh

Edinburgh Napier University, United Kingdom

Diep N. Nguyen

University of Technology Sydney, Australia

Ekram Hossain

University of Manitoba, Canada

Dusit Niyato

Nanyang Technological University, Singapore

 

Copyright © 2023 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http//www.wiley.com/go/permission.

Trademarks Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data applied for:

Hardback ISBN: 9781119873679

Cover Design: WileyCover Image: © Liu zishan/Shutterstock

 

To my family – Dinh Thai Hoang

To my family – Nguyen Van Huynh

To Veronica Hai Binh, Paul Son Nam, and Thuy – Diep N. Nguyen

To my parents – Ekram Hossain

To my family – Dusit Niyato

Notes on Contributors

Dinh Thai HoangSchool of Electrical and Data EngineeringUniversity of Technology SydneyAustralia

Nguyen Van HuynhSchool of Computing, Engineering and the Built EnvironmentEdinburgh Napier UniversityUK

Diep N. NguyenSchool of Electrical and Data EngineeringUniversity of Technology SydneyAustralia

Ekram HossainDepartment of Electrical and Computer EngineeringUniversity of ManitobaCanada

Dusit NiyatoSchool of Computer Science and EngineeringNanyang Technological UniversitySingapore

Foreword

Prof. Merouane Debbah, Integrating deep reinforcement learning (DRL) techniques in wireless communications and networking has paved the way for achieving efficient and optimized wireless systems. This ground-breaking book provides excellent material for researchers who want to study applications of deep reinforcement learning in wireless networks, with many practical examples and implementation details for the readers to practice. It also covers various topics at different network layers, such as channel access, network slicing, and content caching. This book is essential for anyone looking to stay ahead of the curve in this exciting field.

Prof. Vincent Poor, Many aspects of wireless communications and networking are being transformed through the application of deep reinforcement learning (DRL) techniques. This book represents an important contribution to this field, providing a comprehensive treatment of the theory, applications, and implementation of DRL in wireless communications and networking. An important aspect of this book is its focus on practical implementation issues, such as system design, algorithm implementation, and real-world deployment challenges. By bridging the gap between theory and practice, the authors provide readers with the tools to build and deploy DRL-based wireless communication and networking systems. This book is a useful resource for those interested in learning about the potential of DRL to improve wireless communications and networking systems. Its breadth and depth of coverage, practical focus, and expert insights make it a singular contribution to the field.

Preface

Reinforcement learning is one of the most important research directions of machine learning (ML), which has had significant impacts on the development of artificial intelligence (AI) over the last 20 years. Reinforcement learning is a learning process in which an agent can periodically make decisions, observe the results, and then automatically adjust its strategy to achieve an optimal policy. However, this learning process, even with proven convergence, often takes a significant amount of time to reach the best policy as it has to explore and gain knowledge of an entire system, making it unsuitable and inapplicable to large-scale systems and networks. Consequently, applications of reinforcement learning are very limited in practice. Recently, deep learning has been introduced as a new breakthrough ML technique. It can overcome the limitations of reinforcement learning and thus open a new era for the development of reinforcement learning, namely deep reinforcement learning (DRL). DRL embraces the advantage of deep neural networks (DNNs) to train the learning process, thereby improving the learning rate and the performance of reinforcement learning algorithms. As a result, DRL has been adopted in numerous applications of reinforcement learning in practice such as robotics, computer vision, speech recognition, and natural language processing.

In the areas of communications and networking, DRL has been recently used as an effective tool to address various problems and challenges. In particular, modern networks such as the Internet-of-Things (IoT), heterogeneous networks (HetNets), and unmanned aerial vehicle (UAV) networks become more decentralized, ad-hoc, and autonomous in nature. Network entities such as IoT devices, mobile users, and UAVs need to make local and independent decisions, e.g. spectrum access, data rate adaption, transmit power control, and base station association, to achieve the goals of different networks including, e.g. throughput maximization and energy consumption minimization. In uncertain and stochastic environments, most of the decision-making problems can be modeled as a so-called Markov decision process (MDP). Dynamic programming and other algorithms such as value iteration, as well as reinforcement learning techniques, can be adopted to solve the MDP. However, modern networks are large-scale and complicated, and thus the computational complexity of the techniques rapidly becomes unmanageable, i.e. curse of dimensionality. As a result, DRL has been developing as an alternative solution to overcome the challenge. In general, the DRL approaches provide the following advantages:

DRL can effectively obtain the solution of sophisticated network optimizations, especially in cases with incomplete information. Thus, it enables network entities, e.g. base stations, in modern networks to solve non-convex and complex problems, e.g. joint user association, computation, and transmission schedule, to achieve optimal solutions without complete and accurate network information.

DRL allows network entities to learn and build knowledge about the communication and networking environment. Thus, by using DRL, the network entities, e.g. a mobile user, can learn optimal policies, e.g. base station selection, channel selection, handover decision, caching, and offloading decisions, without knowing a priori channel model and mobility pattern.

DRL provides autonomous decision-making. With the DRL approach, network entities can make observations and obtain the best policy locally with minimum or without information exchange among each other. This not only reduces communication overheads but also improves the security and robustness of the networks.

DRL significantly improves the learning speed, especially in problems with large state and action spaces. Thus, in large-scale networks, e.g. IoT systems with thousands of devices, DRL allows the network controller or IoT gateways to control dynamically user association, spectrum access, and transmit power for a massive number of IoT devices and mobile users.

Several other problems in communications and networking such as cyber-physical attacks, interference management, and data offloading can be modeled as games, e.g. the non-cooperative game. DRL has been recently extended and used as an efficient tool to solve competitor, e.g. finding the Nash equilibrium, without complete information.

Clearly, DRL will be the key enabler for the next generation of wireless networks. Therefore, DRL is of increasing interest to researchers, communication engineers, computer scientists, and application developers. In this regard, we introduce a new book, titled “Deep Reinforcement Learning for Wireless Communications and Networking: Theory, Applications, and Implementation”, which will provide a fundamental background of DRL and then study recent advances in DRL to address practical challenges in wireless communications and networking. In particular, this book first gives a tutorial on DRL, from basic concepts to advanced modeling techniques to motivate and provide fundamental knowledge for the readers. We then provide case studies together with implementation details to help the readers better understand how to practice and apply DRL to their problems. After that, we review DRL approaches that address emerging issues in communications and networking. The issues include dynamic network access, data rate control, wireless caching, data offloading, network security, and connectivity preservation, which are all important to next-generation networks such as 5G and beyond. Finally, we highlight important challenges, open issues, and future research directions for applying DRL to wireless networks.

Acknowledgments

The authors would like to acknowledge grant-awarding agencies that supported parts of this book. This research was supported in part by the Australian Research Council under the DECRA project DE210100651 and the Natural Sciences and Engineering Research Council of Canada (NSERC).

The authors would like to thank Mr. Cong Thanh Nguyen, Mr. Hieu Chi Nguyen, Mr. Nam Hoai Chu, and Mr. Khoa Viet Tran for their technical assistance and discussions during the writing of this book.

Acronyms

No

Acronyms

Terms

 1

A3C

asynchronous advantage actor-critic

 2

ACK

acknowledgment message

 3

AI

artificial intelligence

 4

ANN

artificial neural network

 5

AP

access point

 6

BER

bit error rate

 7

BS

base station

 8

CNN

convolutional neural network

 9

CSI

channel state information

10

D2D

device-to-device

11

DDPG

deep deterministic policy gradient

12

DDQN

double deep Q-network

13

DL

deep learning

14

DNN

deep neural network

15

DPG

deterministic policy gradient

16

DQN

deep Q-learning

17

DRL

deep reinforcement learning

18

eMBB

enhanced mobile broadband

19

FL

federated learning

20

FSMC

finite-state Markov chain

21

GAN

generative adversarial network

22

GPU

graphics processing unit

23

IoT

Internet-of-Things

24

ITS

intelligent transportation system

25

LTE

Long-term evolution

26

M2M

machine-to-machine

27

MAC

medium access control

28

MARL

multi-agent RL

29

MDP

Markov decision process

30

MEC

mobile edge computing

31

MIMO

multiple-input multiple-output

32

MISO

Multi-input single-output

33

ML

machine learning

34

mMTC

massive machine type communications

35

mmWave

millimeter wave

36

MU

mobile user

37

NFV

network function virtualization

38

OFDMA

orthogonal frequency division multiple access

39

POMDP

partially observable Markov decision process

40

PPO

proximal policy optimization

41

PSR

predictive state representation

42

QoE

Quality of Experience

43

QoS

Quality of Service

44

RAN

radio access network

45

RB

resource block

46

RF

radio frequency

47

RIS

reconfigurable intelligent surface

48

RL

reinforcement learning

49

RNN

recurrent neural network

50

SARSA

state-action-reward-state-action

51

SDN

software-defined networking

52

SGD

stochastic gradient descent

53

SINR

signal-to-interference-plus-noise ratio

54

SMDP

semi-Markov decision process

55

TD

temporal difference

56

TDMA

time-division multiple access

57

TRPO

trust region policy optimization

58

UAV

unmanned aerial vehicle

59

UE

user equipment

60

UL

uplink

61

URLLC

ultra-reliable and low-latency communications

62

VANET

vehicular ad hoc NETworks

63

VNF

virtual network function

64

WLAN

wireless local area network

65

WSN

wireless sensor network

Introduction

Deep reinforcement learning (DRL) empowered by deep neural networks (DNNs) has been developing as a promising solution to address high-dimensional and continuous control problems effectively. The integration of DRL into future wireless networks will revolutionize conventional model-based network optimization with model-free approaches and meet various application demands. By interacting with the environment, DRL provides an autonomous decision-making mechanism for the network entities to solve non-convex, complex, model-free problems, e.g. spectrum access, handover, scheduling, caching, data offloading, and resource allocation. This not only reduces communication overhead but also improves network security and reliability. Though DRL has shown great potential to address emerging issues in complex wireless networks, there are still domain-specific challenges that require further investigation. The challenges may include the design of proper DNN architectures to capture the characteristics of 5G network optimization problems, the state explosion in dense networks, multi-agent learning in dynamic networks, limited training data and exploration space in practical networks, the inaccessibility and high cost of network information, as well as the balance between information quality and learning performance.

This book provides a comprehensive overview of DRL and its applications to wireless communication and networking. It covers a wide range of topics from basic to advanced concepts, focusing on important aspects related to algorithms, models, performance optimizations, machine learning, and automation for future wireless networks. As a result, this book will provide essential tools and knowledge for researchers, engineers, developers, and graduate students to understand and be able to apply DRL to their work. We believe that this book will not only be of great interest to those in the fields of wireless communication and networking but also to those interested in DRL and AI more broadly.

Part IFundamentals of Deep Reinforcement Learning

1Deep Reinforcement Learning and Its Applications

1.1 Wireless Networks and Emerging Challenges

Over the past few years, communication technologies have been rapidly developing to support various aspects of our daily lives, from smart cities and healthcare to logistics and transportation. This will be the backbone for the future's data-centric society. Nevertheless, these new applications generate a tremendous amount of workload and require high-reliability and ultrahigh-capacity wireless communications. In the latest report [1], Cisco projected the number of connected devices that will be around 29.3 billion by 2023, with more than 45% equipped with mobile connections. The fastest-growing mobile connection type is likely machine-to-machine (M2M), as Internet-of-Things (IoT) services play a significant role in consumer and business environments. This poses several challenges in future wireless communication systems:

Emerging services (e.g. augmented reality [AR] and virtual reality [VR]) require high-reliability and ultrahigh capacity wireless communications. However, existing communication systems, designed and optimized based on conventional communication theories, significantly prevent further performance improvements for these services.

Wireless networks are becoming increasingly ad hoc and decentralized, in which mobile devices and sensors are required to make independent actions such as channel selections and base station associations to meet the system's requirements, e.g. energy efficiency and throughput maximization. Nonetheless, the dynamics and uncertainty of the systems prevent them from obtaining optimal decisions.

Another crucial component of future network systems is network traffic control. Network control can dramatically improve resource usage and the efficiency of information transmission through monitoring, checking, and controlling data flows. Unfortunately, the proliferation of smart IoT devices and ultradense radio networks has greatly expanded the network size with extremely dynamic topologies. In addition, the explosive growing data traffic imposes considerable pressure on Internet management. As a result, existing network control approaches may not effectively handle these complex and dynamic networks.

Mobile edge computing

(

MEC

) has been recently proposed to provide computing and caching capabilities at the edge of cellular networks. In this way, popular contents can be cached at the network edge, such as base station, end-user devices, and gateways to avoid duplicate transmissions of the same content, resulting in better energy and spectrum usage

[2

,

3]

. One major challenge in future communication systems is the straggling problems at both edge nodes and wireless links, which can significantly increase the computation delay of the system. Additionally, the huge data demands of mobile users and the limited storage and processing capacities are critical issues that need to be addressed.

Conventional approaches to addressing the new challenges and demands of modern communication systems have several limitations. First, the rapid growth in the number of devices, the expansion of network scale, and the diversity of services in the new era of communications are expected to significantly increase the amount of data generated by applications, users, and networks [1]. However, traditional solutions may be unable to process and utilize this data effectively to improve system performance. Second, existing algorithms are not well-suited to handle the dynamic and uncertain nature of network environments, resulting in poor performance [4]. Finally, traditional optimization solutions often require complete information about the system to be effective, but this information may not be readily available in practice, limiting the applicability of these approaches. Deep reinforcement learning (DRL) has the potential to overcome these limitations and provide promising solutions to these challenges.

DRL leverages the benefits of deep neural networks (DNNs), which have proven effective in tackling complex, large-scale engines, speech recognition, medical diagnosis, and computer vision. This makes DRL well suited for managing the increasing complexity and scale of future communication networks. Additionally, DRL's online deployment allows it to effectively handle the dynamics and unpredictable nature of wireless communication environments.

1.2 Machine Learning Techniques and Development of DRL

1.2.1 Machine Learning

Machine learning (ML) is a problem-solving paradigm where a machine learns a particular task (e.g. image classification, document text classification, speech recognition, medical diagnosis, robot control, and resource allocation in communication networks) and performance metric (e.g. classification accuracy and performance loss) using experiences or data [5]. The task generally involves a function that maps well-defined inputs to well-defined outputs. The essence of data-driven ML is that there is a pattern in the task inputs and the outcome which cannot be pinned down mathematically. Thus, the solution to the task, which may involve making a decision or predicting an output, cannot be programmed explicitly. If the set of rules connecting the task inputs and output(s) were known, a program could be written based on those rules (e.g. if-then-else codes) to solve the problem. Instead, an ML algorithm learns from the input data set, which specifies the correct output for a given input; that is, an ML method will result in a program that uses the data samples to solve the problem. A data-driven ML architecture for the classification problem is shown in Figure 1.1. The training module is responsible for optimizing the classifier from the training data samples and providing the classification module with a trained classifier. The classification module determines the output based on the input data. The training and classification modules can work independently. The training procedure generally takes a long time. However, the training module is activated only periodically. Also, the training procedure can be performed in the background, while the classification module operates as usual.

Figure 1.1 A data-driven ML architecture.

There are three categories of ML techniques, including supervised, unsupervised, and reinforcement learning.

Supervised learning

: Given a data set , a supervised learning algorithm predicts that generalizes the input–output mapping in to inputs outside . Here, is the -dimensional feature space , is the input vector of the th sample, is the label of the th sample, and is the label space. For binary classification problems (e.g. spam filtering), or . For multiclass classification (e.g. face classification), . On the other hand, for regression problems (e.g. predicting temperature), . The data points are drawn from a (unknown) distribution . The learning process involves learning a function such that for a new pair , we have with high probability (or ). A loss function (or risk function), such as the mean squared error function, evaluates the error between the predicted probabilities/values returned by the function and the labels on the training data.

For supervised learning, the data set is usually split into three subsets: as the training data, as the validation data, and as the test data. The function is validated on : if the loss is too significant, will be revised based on and validated again on . This process will keep going back and forth until it gives a low loss on . The standard supervised learning techniques include the following: Bayesian classification, logistic regression, -nearest neighbor (KNN), neural network (NN), support vector machine (SVM), decision tree (DT) classification, and recommender system. Note that supervised learning techniques require the availability of labeled data sets.

Unsupervised learning

techniques are used to create an internal representation of the input, e.g. to form clusters, extract features, reduce dimensionality, estimate density. Unlike supervised learning, these techniques can deal with unlabeled data sets.

Reinforcement learning

(

RL

)

techniques do not require a prior dataset. With RL, an agent learns from interactions with an external environment. The idea of learning by interacting with a domain is an imitation of humans' natural learning process. For example, at the point when a newborn child plays, e.g. waves his arms or kicks a ball, his/her brain has a direct sensorimotor connection with its surroundings. Repeating this process produces essential information about the impact of actions, causes and effects, and what to do to reach the goals.

Deep learning (DL), a subset of ML, has gained popularity thanks to its DNN architectures to overcome the limitations of ML. DL models are able to extract the key features of data without relying on the data's structure. The “deep” in deep learning refers to the number of layers in the DNN architecture, with more layers leading to a deeper network. DL has been successfully applied in various fields, including face and voice recognition, text translation, and intelligent driver assistance systems. It has several advantages over traditional algorithms as follows [6]:

No need for system modeling

: The system must be well modeled in traditional optimization approaches to obtain the optimal solution. Nevertheless, all information about the system must be available to formulate the optimization problem. In practice, this may not be feasible, especially in future wireless networks where users' behaviors and network states are diverse and may randomly occur. Even if the optimization problem is well defined, solving it is usually challenging due to nonconvexity and high-dimensional problems. DL can efficiently address all these issues by allowing us to be data-driven. In particular, it obtains the optimal solution by training the DNN with sufficient data.

Supports parallel and distributed algorithms

: In many complex systems, DL may require a large volume of labeled data to train its DNN to achieve good training performance. DL can be implemented in parallel and distributed to accelerate the training process. Specifically, instead of training with single computing hardware (e.g. graphics processing unit [GPU]), we can simultaneously leverage the computing power of multiple computers/systems for the training process. There are two types of parallelism in DL: (i) model parallelism and (ii) data parallelism. For the former, different layers in the deep learning model can be trained in parallel on other computing devices. The latter uses the same model for every execution unit but trains the model with different training samples.

Reusable

: The trained model can be reused in other systems/problems effectively with DL. Using well-trained models built by experts can significantly reduce the training time and related costs. For example, AlexNet can be reused in new recognition tasks with minimal configurations

[6]

. Moreover, the trained model can be transferred to a different but related system to improve its training using the transfer learning technique. The transfer learning technique can obtain a good training accuracy for the target system with a few training samples as it can leverage the gained knowledge in the source system. This is very helpful as collecting training samples is costly and requires human intervention.

There are several types of DNNs, such as artificial neural networks (ANNs) (i.e. feed-forward neural networks), convolutional neural networks (CNNs), and recurrent neural networks (RNNs). However, they consist of the same components: (i) neurons, (ii) weights, (iii) biases, and (iv) functions. Typically, layers are interconnected via nodes (i.e. neurons) in a DNN. Each neuron has an activation function to compute the output given the weighted inputs, i.e. synapses and bias [7]. During the training, neural network parameters are updated by calculating the gradient of the loss function.

1.2.2 Artificial Neural Network

An ANN is a typical neural network known as a feed-forward neural network. In particular, ANN consists of nonlinear processing layers, including an input layer, several hidden layers, and an output layer, as illustrated in Figure 1.2. A hidden layer uses the outputs of its previous layer as the input. In other words, ANN passes information in one direction from the input layer to the output layer. In general, ANN can learn any nonlinear function; thus, it is often referred to as a universal function approximator. The essential component of this universal approximation is activation functions. Specifically, these activation functions introduce nonlinear properties to the network and thus help it to learn complex relationships between input data and their outputs. In practice, there are three main activation functions widely adopted in DL applications: (i) sigmoid, (ii) tanh, and (iii) relu [6, 8]. Due to its effectiveness and simplicity, ANN is the most popular neural network used in DL applications.

Figure 1.2 Artificial neural network architecture.

1.2.3 Convolutional Neural Network

Another type of deep neural network is CNN, designed mainly to handle image data. To do that, CNN introduces new layers, including convolution, Rectified Linear unit (Relu), and pooling layers, as shown in Figure 1.3.

Convolution layer

deploys a set of convolutional filters, each of which handles certain features from the images.

Relu layer

can map negative values to zero and maintain positive values during training, and thus it enables faster and more effective training.

Pooling layer

is designed to reduce the number of parameters that the network needs to learn by performing down-sampling operations.

It is worth noting that a CNN can contain tens or hundreds of layers depending on the given problem. The filters can learn simple features such as brightness and edges and then move to complex properties that uniquely belong to the object. In general, CNN performs much better than ANN in handling image data. The main reason is that CNN does not need to convert images to one-dimensional vectors before training the model, which increases the number of trainable parameters and cannot capture the spatial features of images. In contrast, CNN uses convolutional layers to learn the features of images directly. As a result, it can effectively learn all the features of input images. In the area of wireless communications, CNN is a promising technique to handle network data in the form of images, e.g. spectrum analysis [9–11], modulation classification [12, 13], and wireless channel feature extraction [14].

Figure 1.3 Convolutional neural network architecture.

1.2.4 Recurrent Neural Network

An RNN is a DL network structure that leverages previous information to improve the learning process for the current and future input data. To do that, RNN is equipped with loops and hidden states. As illustrated in Figure 1.4, by using the loops, RNN can store previous information in the hidden state and operate in sequence. In particular, the output of the RNN cell at time will be stored in the hidden state and will be used to improve the training process of the input at time . This unique property makes RNN suitable for dealing with sequential data such as natural language processing and video analysis.

In practice, RNN may not perform well with learning long-term dependencies as it can encounter the “vanishing” or “exploding” gradient problem caused by the backpropagation operation. Long short-term memory (LSTM) is proposed to deal with this issue. As illustrated in Figure 1.5, LSTM uses additional gates to decide the proportion of previous information in the hidden state for the output and the next hidden state. Recently, RNN, especially LSTM, has emerged as a prominent structure for signal classification tasks in wireless communications [15–18]. The reason is that signals are naturally sequential data since they are usually collected over time.

Figure 1.4 Recurrent neural network architecture.

Figure 1.5 LSTM network architecture.

1.2.5 Development of Deep Reinforcement Learning

In the last 20 years, RL [19] has become one of the most important lines of research in ML and has had an impact on the development of artificial intelligence (AI). An RL process entails an agent making regular decisions, observing the results, and then automatically adjusting their strategy to achieve the optimal policy to maximize the system performance, as illustrated in Figure 1.6. In particular, given a state, the agent makes an action and observes the immediate reward and next state of the environment. Then, this experience (i.e. current state, current action, immediate reward, and next state) will be used to update the Q-table using the Bellman equation to obtain the optimal policy. By interacting with the environment, RL can effectively deal with the dynamics and uncertainty of the environment. When the environment changes, the algorithm can adapt to the new properties and obtain a new optimal policy. While traditional RL techniques are limited to low-dimensional problems, DRL techniques are very effective to handle large-dimensional issues. In particular, DRL is a combination of RL and a DNN, where the DNN generates an action as the output given the current state of the system as the input, as illustrated in Figure 1.7. This unique feature of DRL is beneficial in future wireless communication systems in which users and network devices are diverse and dynamic.

Figure 1.6 An illustration of a reinforcement learning process.

Figure 1.7 An illustration of a DRL process.

1.3 Potentials and Applications of DRL

The applications of DRL have the potential to attract significant attention from the research community and industry. The reasons for this can be explained based on the following observations.

1.3.1 Benefits of DRL in Human Lives

According to the McKinsey report, AI techniques, including DL and RL, can create between US$ 3.5 trillion and US$ 5.8 trillion annually across nine business functions in 19 industries. Experts believe DRL is at the cutting edge and has finally been applied practically to real-world applications. They also believe that the development of DRL will significantly impact AI advancement and can eventually bring us closer to artificial general intelligence (AGI).

1.3.2 Features and Advantages of DRL Techniques

DRL inherits outstanding advantages of both RL and DL techniques, thereby offering distinctive features which are expected to address and overcome diverse technical challenges existing for many years [7, 20].

Ability to solve sophisticated optimizations

: DRL can solve complex optimization problems by simply learning from its interactions with the system. As a result, it allows network controllers (e.g. gateways, base stations, and service providers) to solve nonconvex optimization problems to maximize the system performance in certain aspects without requiring complete and accurate network properties in advance.

Intelligent learning

: By interacting with the communication and networking environment, DRL can intelligently adapt its optimal policy corresponding to the system's conditions and properties. Thus, by using DRL, network controllers can efficiently perform optimal actions (e.g. beam associations, handover decisions, and caching and offloading decisions) in a real-time manner.

Enable fully autonomous systems

: With DRL, network entities can automatically and intelligently make optimal control decisions with minimal or without information exchange by interacting with the environment (e.g. wireless connections, users, and physical infrastructures). For that, DRL can greatly reduce communication overhead as well as enhance the system's security and robustness.

Overcome limitations of conventional ML techniques

: Compared with conventional RL approaches, DRL has a much faster convergence speed, especially in large-scale problems with large action and state spaces (e.g. IoT networks that contain a massive number of devices with different configurations). As a result, with DRL, network controllers can quickly adapt with the new conditions of the network and adjust their policies (e.g. user association, transmit power, and spectrum access) to maximize the long-term performance of the system.

Novel solutions to several conventional optimization problems

: Various problems in communications and networking, such as data offloading, interference control, and cyber-security, can be modeled as optimization problems. Unfortunately, conventional approaches cannot efficiently solve these problems due to their nonconvexity and the lack of global information. DRL, on the other hand, can effectively address these issues by simply interacting with the system to learn and obtain the optimal solutions, thanks to the power of DNNs.

1.3.3 Academic Research Activities

A substantial amount of research activities related to DRL in communications and wireless networks have been initiated. These days, the major flagship IEEE conferences (e.g. IEEE International Conference on Communications [ICC], IEEE Global Communications Conference [GLOBECOM], IEEE Wireless Communications and Networking Conference [WCNC], and IEEE Vehicular Technology Conference [VTC]) have special sessions on the DRL techniques. Some recent IEEE magazines and journals have had their special issues on this topic, e.g. IEEE Transactions on Cognitive Communications and Networking's special issue on “Deep Reinforcement Learning for Future Wireless Communication Networks” in 2019 and IEEE IoT Journals' special issue on “Deep Reinforcement Learning for Emerging IoT Systems” in 2019. Recently, there are quite many tutorials on this topic presented in IEEE flagship conferences, such as IEEE GLOBECOM, IEEE ICC, and IEEE WF-IoT. Clearly, the research on the DRL in wireless networks is emerging and has already received significant attention from researchers worldwide.

1.3.4 Applications of DRL Techniques

As mentioned, DRL can significantly improve the learning process of reinforcement learning algorithms by using DNNs. As a result, there are many practical applications of DRL in various areas such as robotics, manufacturing, and healthcare as follows [7, 20]:

Video games

: In complex interactive video games, DRL is utilized to enable the agent to adapt its behavior based on its learning from the game in order to maximize the score. A wide range of PC games, such as StartCraft, Chess, Go, and Atari, utilize DRL to enable enemies to adapt their moves and tactics based on the human player's performance, as illustrated in

Figure 1.8

.

Figure 1.8 Google DeepMind's DRL applications in playing games.

Adapted from [21, 22].

Chemistry

: DRL has been applied to optimize chemical reactions by using an agent to predict the actions that will lead to the most desirable chemical reaction at each stage of the experiment. In many cases, DRL has been found to outperform traditional algorithms used for this purpose.

Manufacturing

: DRL is used by many manufacturing companies, such as Fanuc, to assist their robots in picking up objects from one box and placing them in a container with high speed and accuracy

[23]

. These DRL-assisted machines are able to learn and memorize the objects they handle, allowing them to perform tasks efficiently and effectively

[24

26]

. In warehouses and e-commerce facilities, these intelligent robots are utilized to sort and deliver products to customers. For example, Tesla's factories used DRL to reduce the risk of human error by performing a significant portion of the work on vehicles.

Robotics

: Google developed the Soft actor-Critic algorithm, which allows robots to learn real-world tasks using DRL efficiently and safely, without the need for numerous attempts, as depicted in

Figure 1.9

. The algorithm has been successful in quickly training an insect-like robot to walk and a robot hand to perform simple tasks. It helps protect the robot from taking actions that could potentially cause harm.

Healthcare

: DRL can be used on historical medical data to determine the most effective treatments and to predict the best treatment options for current patients. For instance, DRL has been applied to predict drug doses for sepsis patients, identify optimal cycles for chemotherapy, and select dynamic treatment regimens combining hundreds of medications based on medical registry data

[28

30]

.

In addition, many other real-world applications can be summarized in Figure 1.10.

Figure 1.9 Applications of DRL in robotics.

Adapted from Pilarski et al. [27].

Figure 1.10 Real-word applications of DRL. EHR; electronic health record, EMR; electronic medical record, NLP; natural language processing, QA; question answering, IE; information extraction, IR; information retrieval.

1.3.5 Applications of DRL Techniques in Wireless Networks

With many applications of DRL in practice, especially for wireless and mobile networks, many patents for DRL techniques have been issued in recent years. KR102240442B1 patent [31] claims a novel idea of using DRL for proactive caching in millimeter-wave vehicular networks. CN109474980B invention [32] provides a wireless network resource allocation method based on DRL, which can improve the energy efficiency in a time-varying channel environment to the maximum extent with lower complexity. CN110488861B invention [33] discloses an unmanned aerial vehicle (UAV) track optimization method and device based on DRL and an UAV. More importantly, many applications of DRL have been deployed in our real lives, such as applications of DRL for self-driving cars and industrial automation. AWS DeepRacer is an example of an autonomous racing car that utilizes DRL to navigate a physical track using cameras and an RL model to control its throttle and direction. Similarly, Wayve.ai has recently used DRL to train a car to drive within a single day by using an algorithm to complete a lane following task using a deep network with four convolutional layers and three fully connected layers. The example in Figure 1.11 demonstrates the lane following task from the perspective of the driver. Obviously, the development of DRL will be the next big thing for future wireless networks.

Figure 1.11 DRL applications in self-driving cars.

Mwiti [34].

1.4 Structure of this Book and Target Readership

1.4.1 Motivations and Structure of this Book

DRL has been shown to be a promising approach for addressing high-dimensional and continuous control problems using deep neural networks as powerful function approximators. The integration of DRL into future wireless networks will transform the traditional model-based network optimization approach to a model-free approach and enable the network to meet a variety of application requirements. By interacting with the environment, DRL provides a mechanism for network entities to make autonomous decisions and solve complex, model-free problems such as spectrum access, handover, scheduling, caching, data offloading, and resource allocation [35–37]. This can not only reduce communication overhead but also improve network security and reliability. While DRL has demonstrated significant potential for addressing emerging issues in complex wireless networks, there are still domain-specific challenges that require further investigation, such as designing appropriate DNN architectures for future network optimization problems, addressing state explosion in dense networks, handling multiagent learning in dynamic networks, dealing with limited training data and exploration space in practical networks, and balancing the trade-off between information quality and learning performance.

There are five primary objectives of this book.

Introduce an emerging research topic together with promising applications of DRL in wireless networks.

Provide fundamental knowledge, including comprehending theory, building system models, formulating optimization problems, and designing appropriate algorithms to address practical problems in wireless communications and networks using DRL.

Provide a short tutorial to help the readers learn and practice programming DRL under a specific scenario.

Provide a comprehensive review of the state-of-the-art research and development covering different aspects of DRL in wireless communications and networks.

Introduce emerging applications of DRL in wireless communications and networks and highlight their challenges and open issues.

To achieve the objectives above, the book includes three main parts as follows:

Part I: Fundamentals of Deep Reinforcement Learning

This part presents an overview of the development of DRL and provides fundamental knowledge about theories, formulation, design, learning models, algorithms, and implementation of DRL, together with a particular case study to practice.

Chapter 1: The first chapter provides an overview of DRL, its development, and potential applications. In particular, the chapter starts with the development of wireless networks and the emerging challenges that researchers and practitioners face. We then present the remarkable development of ML and its significant impacts on all aspects of our lives. After that, we introduce recent breakthroughs in ML, mainly focusing on DRL, and discuss more details about DRL's outstanding features and advantages to address future wireless networks' challenges.

Chapter 2: In the second chapter, we provide fundamental background and theory of the Markov decision process (MDP), a critical mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision-maker. Specifically, essential components of an MDP and some typical extension models are presented. After that, specific solutions to address MDP problems, e.g. linear programming, value iteration, policy iteration, and reinforcement learning, are reviewed.

Chapter 3: Then, we will discuss DRL, a combination of RL and DL to address the current drawbacks of RL. In particular, we discuss more details how different DL models can be integrated into RL algorithms to speed up the learning processes. Many advanced DRL models are reviewed to provide a comprehensive perspective for the readers.

Chapter 4: In this chapter, we provide a particular scenario with detailed implementation to help the readers have a deeper understanding of step-by-step how to design, analyze, formulate, and solve an MDP optimization problem with DRL codes using conventional programming tools, e.g. TensorFlow. In addition, many simulation results are provided to discuss different aspects of implementing DRL and evaluate the impacts of parameters on learning processes.

Part II: Applications of DRL in Wireless Communications and Networking

This part focuses on studying diverse applications of DRL to address various problems in wireless networks, such as caching, offloading, resource sharing, and security. We show example problems at the physical, media access control (MAC), network, and application layers and potential applications of DRL techniques to address them. Comparisons and detailed discussions are also provided to help the readers to have a comprehensive view of the advantages and limitations of using DRL to solve different problems in wireless networks.

Chapter 5 – DRL at the Physical Layer: The need for high-reliability and ultrahigh capacity wireless communication has driven significant research into 5G communication systems. However, traditional techniques used for the design and optimization of these systems often struggle with the complexity and high dimensionality of the problems. In recent years, DRL has been recognized as a promising tool for addressing these complicated design and optimization problems. In this chapter, we will examine the potential applications of DRL in addressing three key issues at the physical layer of communication systems: beamforming, signal detection and channel estimation, and channel coding.

Chapter 6 – DRL at the MAC Layer