Hands-On Real Time Communications - Rolando Herrero - E-Book

Hands-On Real Time Communications E-Book

Rolando Herrero

0,0
108,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Understand and deploy RTC protocols for any kind of communication system

Since the emergence of Voice over IP (VoIP) communications in the late 1990s, a set of real time communication (RTC) protocols has evolved that together now support an immense range of technologies and systems architectures. Whether it’s 5G networks (and beyond) or Internet of Things architectures, RTC protocols are indispensable to modern telecommunications. An understanding of these protocols and their design and deployment is critical for engineers, academics, and industry professionals in virtually every connected field.

Hands-On Real Time Communications offers a thorough yet accessible introduction to this subject, incorporating both the theory and practical applications of RTC protocols. It offers detailed instructions for deploying RTC protocols on various types of network stacks, as well as the use of a huge range of speech, audio, and video codecs. Practice-oriented and designed for students as well as established professionals, it’s a must-own for anyone looking to deploy and maintain essential communications architectures.

Readers will also find:

  • A matching hands-on section for each theoretical aspect, incorporating license-free protocol analyzers and emulation tools
  • Detailed discussion of topics including signaling, media packetization, real hardware-based network interfaces, and many more
  • End-of-chapter questions and lab exercises to facilitate learning

Hands-On Real Time Communications is ideal for advanced undergraduate or graduate students in RTC communication and networking classes, as well as for engineers, technologists, and architects looking to learn and understand the principles of RTC networking.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 517

Veröffentlichungsjahr: 2025

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Table of Contents

Title Page

Copyright

Preface

What is the Focus of this book?

Why Did I Write This Book?

Target Audience

How is this Book Organized?

Acronyms

About the Companion Website

Part I: Theoretical Background

1 Media Generation

1.1 Introduction

1.2 Signals

1.3 Sampling and Quantization

1.4 Codecs

1.5 Quality Scores

1.6 Summary

1.7 Homework Problems and Questions

Bibliography

2 Media Session Signaling

2.1 Introduction

2.2 Lower Layers

2.3 SIP

2.4 SDP

2.5 HTTP

2.6 Security Considerations

2.7 Summary

2.8 Homework Problems and Questions

Bibliography

3 Media Packetization

3.1 Introduction

3.2 RTP

3.3 RTCP

3.4 SRTP and SRTCP

3.5 Framing

3.6 Summary

3.7 Homework Problems and Questions

Bibliography

4 Media over Network

4.1 Introduction

4.2 Non-3GPP Networks

4.3 3GPP 4G/5G/6G

4.4 IoT

4.5 Putting All Together

4.6 Summary

4.7 Homework Problems and Questions

Bibliography

Part II: Building Topologies

5 Non-3GPP Networks

5.1 Introduction

5.2 Lower Layers

5.3 RTC

5.4 Codecs

5.5 Summary

5.6 Homework Problems and Questions

5.7 Lab Exercises

Bibliography

6 3GPP Networks

6.1 Introduction

6.2 Lower Layers

6.3 VoLTE

6.4 Summary

6.5 Homework Problems and Questions

6.6 Lab Exercises

Bibliography

7 IoT Networks

7.1 Introduction

7.2 WPAN

7.3 LPWAN

7.4 Summary

7.5 Homework Problems and Questions

7.6 Lab Exercises

Bibliography

Index

End User License Agreement

List of Tables

Chapter 1

Table 1.1 Sampling Rate Types.

Table 1.2 A-Law.

Table 1.3 A-Law.

Table 1.4 GSM 6.10 Bit Allocation

Chapter 2

Table 2.1 Subnetting.

Table 2.2 Long vs Short IPv6 Addresses.

Table 2.3 IPv6 vs IPv4 Addresses.

Table 2.4 SIP Response Codecs.

Chapter 3

Table 3.1 Codec Name and Payload Type.

Table 3.2 AMR and AMR-WB Rates (Kbps).

Table 3.3 EVRC and EVRC-B Rates (Kbps).

Table 3.4 EVS Frame Type.

Table 3.5 NALU Types and Packet Types.

Chapter 4

Table 4.1 Technology Comparison.

Table 4.2 IEEE 802.11a.

Table 4.3 6LoWPAN Dispatch Values.

Table 4.4 Unicast S/SAM and D/DAM Values.

Table 4.5 Multicast D/DAM Values.

List of Illustrations

Chapter 1

Figure 1.1 Speech Signal.

Figure 1.2 Audio Signal.

Figure 1.3 Progressive vs Interlaced Scanning.

Figure 1.4 RGB to YUV Conversion.

Figure 1.5 Composite Image.

Figure 1.6 Y Component.

Figure 1.7 U Component.

Figure 1.8 V Component.

Figure 1.9 Rate–Distortion Example.

Figure 1.10 Signal .

Figure 1.11 Sampled .

Figure 1.12 Band-limited Signal.

Figure 1.13 Spectrum of Sampled Signal.

Figure 1.14 Aliasing.

Figure 1.15 Spectrum of Instantaneous Sampled Signal.

Figure 1.16 Reconstruction Filter.

Figure 1.17 Time and Frequency Domain for Sampling.

Figure 1.18 Partition Cell.

Figure 1.19 Midtrise Noise.

Figure 1.20 Digital Speech Communication System.

Figure 1.21 Encoder / Decoder.

Figure 1.22 Coding Delay.

Figure 1.23 Transmission Delay; Constant vs Burst.

Figure 1.24 The Phrase “Thank You”.

Figure 1.25 Speech Production Model.

Figure 1.26 -Law Analog Companding.

Figure 1.27 Input Signal.

Figure 1.28 PCM Quantized Signal and Reconstruction Error.

Figure 1.29 DPCM Quantized Signal and Reconstruction Error.

Figure 1.30 4-bit PCM.

Figure 1.31 Voiced vs Unvoiced Frames.

Figure 1.32 Autocorrelation.

Figure 1.33 PG for Voiced and Unvoiced Frames.

Figure 1.34 LPC-10 Speech Production Model.

Figure 1.35 CELP.

Figure 1.36 8-bit Uniform Quantization.

Figure 1.37 8-bit ITU-T Rec. G.711 Quantization.

Figure 1.38 Performance Comparison; Linear vs ITU-T Rec. G.711.

Figure 1.39 RPE.

Figure 1.40 Ear Perception Threshold with Audible Tone A.

Figure 1.41 Ear Perception Threshold (Tone A is masked by tone B).

Figure 1.42 Temporal Masking.

Figure 1.43 Perceptual Audio Coding.

Figure 1.44 Blocks.

Figure 1.45 DCT of a Block.

Figure 1.46 GOP Example.

Figure 1.47 Video Frames.

Figure 1.48 Motion Estimation.

Figure 1.49 Sequence Encoding.

Figure 1.50 Perceptual Speech Quality Measure.

Figure 1.51 Circuit Switching.

Figure 1.52 Packet Switching.

Figure 1.53 Packet Loss and Latency.

Figure 1.54 IP Protocol Layers.

Figure 1.55 Network Delay and Jitter.

Figure 1.56 PESQ Block Diagram.

Figure 1.57 POLQA Block Diagram.

Chapter 2

Figure 2.1 Transmission Rate and Bandwidth and Throughput.

Figure 2.2 FSK and PSK Modulations.

Figure 2.3 Quadrature Phase Shift Keying.

Figure 2.4 Multicarrier Modulation.

Figure 2.5 Layers of Link Layer.

Figure 2.6 IPv4 Header and Payload.

Figure 2.7 IPv4 Fragmentation.

Figure 2.8 Example IPv4 Network.

Figure 2.9 ICMP Packet Header and Data.

Figure 2.10 IPv6 Unicast Address Structure.

Figure 2.11 IPv6 Header and Payload.

Figure 2.12 UDP Segment Header and Payload.

Figure 2.13 TCP Header and Payload.

Figure 2.14 Setting up a TCP Connection.

Figure 2.15 Terminating a TCP Connection.

Figure 2.16 SIP Call Flow.

Figure 2.17 HTTP Transactions.

Figure 2.18 HTTP Setup.

Figure 2.19 HTTP Request.

Figure 2.20 HTTP Response.

Figure 2.21 Symmetric Encryption Flow.

Figure 2.22 PKI-based Encryption Flow.

Figure 2.23 Message Authentication Flow.

Figure 2.24 Certification Authority Flow.

Figure 2.25 TLS Session Setup Flow.

Figure 2.26 ITU X.509 Format.

Figure 2.27 TLS Record Format.

Figure 2.28 Secure and Unsecure SIP Protocol Stacks.

Figure 2.29 SIPS Message Flow.

Chapter 3

Figure 3.1 RTP/RTCP Stack.

Figure 3.2 RTP Header Format and Payload.

Figure 3.3 RTP Media Transmission Flow.

Figure 3.4 RTCP Header Format and Payload.

Figure 3.5 SRTP Header Format and Payload.

Figure 3.6 SRTCP Header Format and Payload.

Figure 3.7 DTLS-SRTP Flow.

Figure 3.8 Simple Framing Example.

Figure 3.9 IETF RFC 4867 Format.

Figure 3.10 Payload Header.

Figure 3.11 Payload Header.

Figure 3.12 Bandwidth Efficient Mode Encoding.

Figure 3.13 Octet-aligned Mode Encoding.

Figure 3.14 EVRC Header-Free Packet Format.

Figure 3.15 EVRC Interleaved Packet Format.

Figure 3.16 Compact Bundled Format.

Figure 3.17 ToC-only Format.

Figure 3.18 ToC Format.

Figure 3.19 CMR and ToC Format.

Figure 3.20 CMR Format.

Figure 3.21 LATM Format.

Figure 3.22 H.261 Header Format.

Figure 3.23 H.263 Header Format (Mode A).

Figure 3.24 H.263 Header Format (Mode B).

Figure 3.25 H.263 Header Format (Mode B).

Figure 3.26 H.263+ Header Format.

Figure 3.27 Single NAL Unit Packet (H.264).

Figure 3.28 Single NAL Unit Packet (H.265).

Chapter 4

Figure 4.1 Ethernet Header and Payload.

Figure 4.2 IEEE 802.2 SAP and SNAP.

Figure 4.3 Basic Service Set (BSS).

Figure 4.4 Independent Basic Service Set (IBSS).

Figure 4.5 IEEE 802.11 Contention under DCF.

Figure 4.6 IEEE 802.11 Header and Payload.

Figure 4.7 SIP and RTP over non-3GPP Networks.

Figure 4.8 WebRTP over non-3GPP Networks.

Figure 4.9 LTE Topology.

Figure 4.10 SIP and RTP over LTE.

Figure 4.11 5G Topology.

Figure 4.12 SIP and RTP over 5G.

Figure 4.13 IoT Networks.

Figure 4.14 IoT Access Network.

Figure 4.15 RTC WPAN Protocol Stack.

Figure 4.16 IEEE 802.15.4 and IETF Protocols.

Figure 4.17 IEEE 802.15.4 Topologies.

Figure 4.18 IEEE 802.15.4 Header and Payload.

Figure 4.19 IEEE 802.15.4 Security Header.

Figure 4.20 6LoWPAN Protocol Stack.

Figure 4.21 6LoWPAN Protocol Translation.

Figure 4.22 6LoWPAN Example.

Figure 4.23 Uncompressed IPv6 Header through 6LoWPAN.

Figure 4.24 UDP over IPv6.

Figure 4.25 UDP over IPv6 over 6LoWPAN.

Figure 4.26 Network Layer IPHC.

Figure 4.27 Network IPHC with Transport NHC.

Figure 4.28 NHC Port Number Encoding.

Figure 4.29 6LoWPAN Datagram with Link-Local Addresses.

Figure 4.30 6LoWPAN Datagram with Unicast Addresses.

Figure 4.31 Summary of 6Lo Families.

Figure 4.32 LoRa Stack.

Figure 4.33 Media and Signaling over LoRa.

Figure 4.34 SIP and RTP over IoT Networks.

Figure 4.35 SIP and RTP over Telecommunication Networks.

Chapter 5

Figure 5.1 Example Netualizer Deployment.

Figure 5.2 Initial Screen of Netualizer Controller.

Figure 5.3 New Project Creation Dialog.

Figure 5.4 Default Agent Configuration.

Figure 5.5 Default Script.

Figure 5.6 Wireshark Network Interface Selection.

Figure 5.7 Traffic Capture on the NT Interface.

Figure 5.8 Wireshark Filter Option.

Figure 5.9 Selecting the Physical Layer.

Figure 5.10 Creating the Ethernet Layer.

Figure 5.11 Creating the IPv4 Layer.

Figure 5.12 The 3-Layer IPv4 Stack.

Figure 5.13 Stack Copy.

Figure 5.14 Stack Pasting.

Figure 5.15 Stack Label Change.

Figure 5.16 MAC Address Change on the

SECOND

Stack.

Figure 5.17 Parameters of the IP Layer.

Figure 5.18 Suite Execution.

Figure 5.19 Configuring the Destination Address for Pinging.

Figure 5.20 ICMP Echo Packets.

Figure 5.21 IPv6 Address Configuration.

Figure 5.22 SECOND Stack Ping.

Figure 5.23 ICMPv6 Echo Packets.

Figure 5.24 Third Frame Details.

Figure 5.25 Impairment Layer Creation.

Figure 5.26 Traffic Direction on Stacks.

Figure 5.27 Configuration Transmission Latency.

Figure 5.28 1-second Latency SECOND Stack Ping.

Figure 5.29 1-second Latency ICMP Echo Packets.

Figure 5.30 UDP Layer Selection.

Figure 5.31 IP Destination Address Configuration.

Figure 5.32 Raw Data Injection.

Figure 5.33 Captured UDP Packet.

Figure 5.34 UDP Packets.

Figure 5.35 UDP Packets with 50% Loss.

Figure 5.36 TCP Stacks.

Figure 5.37 Captured TCP Traffic.

Figure 5.38 Captured 10-message TCP Traffic.

Figure 5.39 TCP Stream Follows Wireshark Option.

Figure 5.40 Stream Dump.

Figure 5.41 TCP Traffic Capture with 10% Loss.

Figure 5.42 UDP Protocol Stacks.

Figure 5.43 Creating the SIP Layer.

Figure 5.44 SIP Protocol Stacks.

Figure 5.45 RTP Protocol Stacks.

Figure 5.46 Selecting Audio Codec.

Figure 5.47 Add RTP Interface Option.

Figure 5.48 Creating the TTS Layer.

Figure 5.49 Creating Audio Layer.

Figure 5.50 Laying down the Audio Layer.

Figure 5.51 Configuring URL.

Figure 5.52 Configuring TTS Text Option.

Figure 5.53 Invite Command.

Figure 5.54 SIP and RTP Session Setup.

Figure 5.55 INVITE Request.

Figure 5.56 INVITE 200 OK Response.

Figure 5.57 RTP Media Packet.

Figure 5.58 Session Removal.

Figure 5.59 SIP and RTP Session Termination.

Figure 5.60 BYE Request.

Figure 5.61 BYE 200 OK Response.

Figure 5.62 QoS Layer Creation.

Figure 5.63 QoS Reference Configuration.

Figure 5.64 Configuring

tts

as Reference.

Figure 5.65 Start PESQ Option.

Figure 5.66 Stop PESQ Option.

Figure 5.67 PESQ Score on Events Window.

Figure 5.68 PESQ Score on Events Window (50% Packet Loss).

Figure 5.69 TLS Protocol Stacks.

Figure 5.70 Configuration of Certificates and Private Key.

Figure 5.71

server.pem

Selection.

Figure 5.72 TLS Certificates and Private Key.

Figure 5.73 Upstream Last Packet Option.

Figure 5.74 Content of Upstream Last Packet.

Figure 5.75 Server Authentication.

Figure 5.76 Mutual Authentication Option.

Figure 5.77 Captured TLS Packets with Mutual Authentication.

Figure 5.78 DTLS Protocol Stacks.

Figure 5.79 DTLS Certificates and Private Key.

Figure 5.80 Captured DTLS Packets with Server Authentication.

Figure 5.81 Captured DTLS Packets with Mutual Authentication.

Figure 5.82 Detaching and Removing Protocol Layers.

Figure 5.83 SRTP Layer Creation.

Figure 5.84 SRTP Protocol Stacks.

Figure 5.85 Add RTP Interfaces Option.

Figure 5.86 Add Codecs Option.

Figure 5.87 Captured SIP and SRTP Packets.

Figure 5.88 SDP of SRTP INVITE.

Figure 5.89 SDP of SRTP 200 OK.

Figure 5.90 SRTP Authentication Option.

Figure 5.91 SDP of SRTP INVITE with Authentication.

Figure 5.92 SDP of SRTP 200 OK with Authentication.

Figure 5.93 Captured SRTP Packet.

Figure 5.94 ITU-T Recommendation G.729 Configuration.

Figure 5.95 AMR-WB Configuration.

Figure 5.96 Selection of Video Code.

Figure 5.97 ITU-T Recommendation H.265 Configuration.

Chapter 6

Figure 6.1 SIM7600G-based Radio.

Figure 6.2 LTE Topology under Consideration.

Figure 6.3 4G Project.

Figure 6.4 The Virtual 4G Interface.

Figure 6.5 4G Link Layer Selection.

Figure 6.6 4G 2-Layer Stack.

Figure 6.7 IP Stack.

Figure 6.8 IP Stack Configuration.

Figure 6.9 IP Stack Copy Selection.

Figure 6.10 Duplicated IP Stack.

Figure 6.11 IP Address Changes.

Figure 6.12 PCAP Capture Configuration.

Figure 6.13 Ping Command.

Figure 6.14 Ping Execution on

SECOND

Stack.

Figure 6.15 Captured ICMP Packets.

Figure 6.16 ICMP Echo Packets.

Figure 6.17 UDP Protocol Stacks.

Figure 6.18 Raw Data Injection.

Figure 6.19 Captured UDP Packet.

Figure 6.20 RTC 4G Stacks.

Figure 6.21 Netualizer Dialpad Option.

Figure 6.22 Idle Dialpad.

Figure 6.23 Dialpad on Call.

Figure 6.24 Captured RTC Traffic.

Figure 6.25 Captured RTP Packet.

Figure 6.26

SECOND

Stack EVS Codec Support.

Figure 6.27 EVS Codec Configuration.

Figure 6.28

FIRST

Stack EVS Codec Support.

Figure 6.29 RTP Stacks with Audio and Video Support.

Figure 6.30 Media Type Option.

Figure 6.31 Player Layer Creation.

Figure 6.32 RTP Stack with Player.

Figure 6.33 Player Layer Configuration.

Figure 6.34 Call in Progress.

Figure 6.35 Captured Audio and Video Packets.

Figure 6.36 Captured ITU-T Recommendation H.265 RTP Packet.

Chapter 7

Figure 7.1 CC2531-based Radio.

Figure 7.2 Network Interface Selection.

Figure 7.3 Virtual Hardware Support Option.

Figure 7.4 Project Name Selection.

Figure 7.5 The IEEE 802.15.4 Interface.

Figure 7.6 IEEE 802.15.4 Layer Creation.

Figure 7.7 IEEE 802.15.4 Layer Name Selection.

Figure 7.8 IEEE 802.15.4 2-Layer Stack.

Figure 7.9 6LoWPAN Layer Creation.

Figure 7.10 IEEE 802.15.4 4-Layer Stack.

Figure 7.11 IPv6 Stack Copy Selection.

Figure 7.12 Duplicated IPv6 Stack.

Figure 7.13 Address Field.

Figure 7.14 IPv6 Address Changes.

Figure 7.15 IEEE 802.15.4 Address Field.

Figure 7.16 IEEE 802.15.4 Address Changes.

Figure 7.17 Wireshark Capture Setup.

Figure 7.18 PCAP Filename Configuration.

Figure 7.19 Suite Execution.

Figure 7.20 Ping Command.

Figure 7.21 Ping Execution on

SECOND

Stack.

Figure 7.22 Trace Location.

Figure 7.23 Captured ICMPv6 Packets.

Figure 7.24 IEEE 802.15.4 Header on Wireshark.

Figure 7.25 6LoWPAN Header on Wireshark.

Figure 7.26 IPv6 and ICMPv6 Headers on Wireshark.

Figure 7.27 UDP Layer Selection.

Figure 7.28 UDP Protocol Stacks.

Figure 7.29 UDP Port Number Selection.

Figure 7.30 UDP Port Number Configuration.

Figure 7.31

Hello World!

Message Injection.

Figure 7.32 150-byte Message Injection.

Figure 7.33 Captured IEEE 802.15.4 Packets.

Figure 7.34 Captured 6LoWPAN Packet.

Figure 7.35 Initial 6LoWPAN Fragment.

Figure 7.36 Non-Initial 6LoWPAN Fragment.

Figure 7.37 SIP and RTP over IEEE 802.15.4.

Figure 7.38 RYLR896-based Radio.

Figure 7.39 Network Interface Selection.

Figure 7.40 Project Name Selection.

Figure 7.41 The LoRa Interface.

Figure 7.42 Lora Layer Creation.

Figure 7.43 LoRa Layer Name Selection.

Figure 7.44 LoRa 2-Layer Stack.

Figure 7.45 6LoBTLE Layer Creation.

Figure 7.46 LoRa Protocol Stack.

Figure 7.47 Stack Copy.

Figure 7.48 Copied Stacks.

Figure 7.49 IP Address Parameter.

Figure 7.50 IPv6 Address Configuration.

Figure 7.51 LoRa Address Parameter.

Figure 7.52 LoRa Address Configuration.

Figure 7.53 LoRa Address Configuration on

SECOND

Stack.

Figure 7.54 Destination Address Configuration.

Figure 7.55 Suite Execution.

Figure 7.56 Ping Address Configuration.

Figure 7.57 Ping Execution.

Figure 7.58 Configuration Termination.

Figure 7.59 Captured LoRa Packets.

Figure 7.60 UDP Layer Parameters.

Figure 7.61 LoRa-based UDP Stacks.

Figure 7.62 Selecting the

udpx

Port Number.

Figure 7.63 Port Number Configuration.

Figure 7.64 Destination Address Parameters.

Figure 7.65

Hello World!

Message Injection.

Figure 7.66 Upstream Last Packet Option.

Figure 7.67 Received Message.

Figure 7.68 Captured LoRa Packet.

Figure 7.69 SIP and RTP over LoRa.

Guide

Cover

Table of Contents

Title Page

Copyright

Preface

Acronyms

About the Companion Website

Begin Reading

Index

End User License Agreement

Pages

iii

iv

xi

xii

xiii

xv

xvi

xvii

xviii

xix

xx

xxi

xxii

1

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

215

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

Hands-On Real Time Communications

A Practical Guide to RTC Protocols in Non-3GPP, 3GPP 4G/5G/6G and IoT Networks

 

Rolando Herrero

Cyber-Physical Systems and Telecom Networks

Program Director at Northeastern University

Boston

 

 

 

 

 

Copyright © 2025 by John Wiley & Sons, Inc.. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data Applied for:

Hardback ISBN: 9781394239627

Cover Design: WileyCover Image: © naqiewei/Getty Images

Preface

What is the Focus of this book?

Real Time Communication (RTC) has been traditionally associated with telephony and telephony applications. With the emergence of the Internet, however, RTC has been driven to support a wide range of topologies and technologies. Specifically, in a future ruled by the interaction between humans, robots, sensors, actuators, humanoids, and machines in general, RTC plays a crucial role. Although many applications fall under the umbrella of RTC, namely, text messaging, sensor readout transmission, video game controlling, etc., the focus of this book is on media-based RTC.

Because media involves speech, audio, and video, each with different temporal and spectral characteristics, the technologies involved vary greatly. However, a common theme is the use of Internet suite protocols. These protocols enable support for three main topologies that traditionally used different, and sometimes proprietary, mechanisms. The first topology includes mainstream, non-3GPP networks, which do not follow the standards of the 3rd Generation Partnership Project (3GPP). These networks, such as Wi-Fi and Ethernet-based stacks, support license-free communication, typically without requiring subscriptions. The second topology includes 3GPP networks, which are mainstream cellular networks that follow the standards of the 3GPP. These are wireless, mobile networks that rely on licensed communication channels associated with paid subscriptions. The third and last topology includes Internet of Things (IoT) networks. These networks support communication between devices and applications over both licensed and license-free channels. The key difference between IoT and 3GPP/non-3GPP topologies is that IoT incorporates non-human endpoints (devices) into the communication process. Additionally, devices in IoT networks are often power-constrained, meaning protocols optimized for 3GPP and non-3GPP networks may not function as well in IoT architectures.

This book looks at RTC from the perspective of the Internet Engineering Task Force (IETF) layered architecture. By grouping functionality into layers, the Internet suite of protocols maximizes the reuse of the technologies involved. Whether it is non-3GPP, 3GPP, or IoT, whether it is Wi-Fi, cellular, or an energy-constrained device, media is sampled, compressed, and transmitted using the same basic principles and mechanisms.

Why Did I Write This Book?

RTC is key to supporting the transmission of media between endpoints by supporting multiple topologies and architectures. However, many of the technologies involved in supporting RTC are legacy mechanisms that have been standardized for many years. On the other hand, some of them are new, particularly in the context of IoT networks. The material introduced in this book is part of the curriculum of the Telecommunication Networks Program at Northeastern University in Boston. Moreover, the content presented in this book is novel as it aims to cover multiple areas:

Presents RTC from the perspective of the IETF layered architecture with focus on speech, audio, and video.

Introduces media as a type of application that relies on

codecs

(coder/decoders) to compress speech, audio, and video while preserving

Quality of Service

(QoS).

Integrates RTC technologies to support 3GPP, non-3GPP, and IoT topologies.

Provides a hands-on approach by introducing tools to build, deploy, test, and understand 3GPP, non-3GPP, and IoT networking scenarios.

The first part of this book addresses the first two topics listed above, while the second part focuses on the last topic. The second part supports both hardware virtualization, where network interfaces are emulated, and actual hardware-based network interfaces.

Target Audience

Because this book deals with RTC-based media in 3GPP, non-3GPP, and IoT networks in the context of the IETF layered architecture, it is of importance to those interested in state-of-the-art networking and communication technologies. This includes graduate and undergraduate students pursuing degrees in Electrical Engineering, Computer Engineering, Computer Science, and Information Technology, among others, as well as practitioners, technologists, researchers, and engineers in general. Standardization efforts associated with RTC protocol stacks make this crucial. Moreover, since this book incorporates a second part that presents tools like Netualizer and Wireshark to virtualize these technologies, there are no requirements for dedicated hardware. This enables everyone to analyze and understand the inner workings of RTC protocols. In addition to the networking and communication aspects of media-based RTC, this book also focuses on the application layer, particularly on the details of speech, audio, and video codecs. This makes this book ideal for those in the target audience who are additionally interested in signal processing and information theory.

How is this Book Organized?

This book covers media based RTC focusing on networking and communication protocols as well as the details of the processing and compression of the media itself. To meet these goals, this 7-chapter book is organized in two parts as follows:

Part I

that includes four chapters provides the theoretical fundamentals of media-based RTC. In this context,

Chapter 1

looks at the characteristics of the different types of media including the corresponding codecs and how they pack data.

Chapter 2

presents the networking protocol that enables the setup and teardown of media sessions.

Chapter 3

focuses on the networking protocols that support the transmission of codec traffic. Finally,

Chapter 4

introduces the details of how these networking protocols integrate with non-3GPP, 3GPP, and IoT topologies.

Part II

that includes three chapters introduces the mechanisms to build the aforementioned topologies. Specifically,

Chapters 5

,

6

, and

7

respectively focus on non-3GPP, 3GPP, and IoT architectures.

March 2024       

Rolando Herrero

Boston, MA, USA

Acronyms

3G

3rd Generation

3GPP

3G Partnership Project

4G

4th Generation

5G

5th Generation

6G

6th Generation

6Lo

IPv6 over Networks of Resource-constrained Nodes

6LoBTLE

IPv6 over Low-power Bluetooth Low Energy

6LoWPAN

IPv6 over Low-power Wireless Personal Area Networks

AAC

Advanced Audio Coding

AAL2

ATM Adaptation Layer 2

AC

Alternating Current

ACELP

Algebraic CELP

ACR

Absolute Category Rating

AES

Advanced Encryption Standard

AM

Amplitude Modulation

AMR

Adaptive Multi-Rate

AP

Access Point

APCM

Adaptive PCM

API

Application Program Interfaces

AR

Autoregressive

ARP

Address Resolution Protocol

ARQ

Automatic Request

ASCII

American Standard Code for Information Interchange

ASR

Automatic Speech Recognition

AVC

Advanced Video Coding

AWS

Amazon Web Services

BLE

Bluetooth Low Energy

BOOTP

Bootstrap Protocol

bps

bits per second

BPSK

Binary PSK

BSS

Basic Service Set

BSSID

BSS Identifier

CA

Certification Authority

CBC

Cipher Block Chaining

CBR

Constant Bit Rate

CCR

Comparison Category Rating

CELP

Code-excited LP

CIDR

Classless Interdomain Routing

CIF

Common Intermediate Format

CM

Counter Mode

CSCF

Call Session Control Function

CMR

Codec Mode Request

CNAME

Canonical Name

CNG

Comfort Noise Generation

CoAP

Constrained Application Protocol

codec

Coder/Decoder

CSFB

Circuit-switched Fallback

CSMA/CA

Carrier Sense Multiple Access with Collision Avoidance

CSMA/CD

Carrier Sense Multiple Access with Collision Detection

CSRC

Contributing Source

CSS

Chirp Spread Spectrum

CRC

Cyclical Redundancy Checking

DAC

Digital-to-Analog Converter

DAM

Destination Address Mode

DBPSK

Differential BPSK

DC

Direct Current

DCF

Distributed Coordination Function

DCR

Degradation Category Rating

DCT

Discrete Cosine Transform

DES

Data Encryption Standard

DHCP

Dynamic Host Configuration Protocol

DIFS

Distributed Inter Frame Spacing

DMOS

Degradation MOS

DNS

Domain Name System

DoS

Denial of Service

DPCM

Differential PCM

DPSK

Differential PSK

DRS

Dynamic Rate Shifting

DSA

Digital Signature Algorithm

DSCP

Differentiated Services Code Point

DSSS

Direct Sequence SS

DTMF

Dual-Tone Multi-Frequency

DTLS

Datagram Transport Layer Security

DTX

Discontinuous Transmissions

EBIT

End Bit Position

ECC

Elliptic Curve Cryptography

ECN

Explicit Congestion Notification

EDCF

Enhanced DCF

EID

Extension Identifier

EIRP

Equivalent Isotropic Radiated Power

eMBB

Enhanced Mobile Broadband

EPC

Enhanced Packet Core

ESS

Extended Service Set

ETSI

European Telecommunications Standards Institute

EU-TRAN

Evolved UMTS Terrestrial Radio Access Network

EUI

Extended Unique Identifier

EVRC

Enhanced Variable Rate Codec

EVS

Enhanced Voice Services

FB

Full Band

FCF

Frame Control Field

FCS

Frame Checksum

FDD

Frequency Division Duplex

FDMA

Frequency Division Multiple Access

FEC

Forward Error Correction

FHSS

Frequency Hopping SS

FIPS

Federal Information Processing Standard

FM

Frequency Modulation

fmtp

format specific parameter

FQ

Frame Quality

FSF

Frame Control Field

FSK

Frequency Shift Keying

FT

Field Type

Gbps

Gigabits per second

gNB

Next Generation node B

GOB

Group of Blocks

GOP

Group of Pictures

GSM

Global System for Mobile Communications

HAN

Home Area Network

HBR

High Bit Rate

HC1

Header Compression 1

HC2

Header Compression 2

HD

High Definition

HD TV

HD Television

HEVC

High Efficiency Video Coding

HSS

Home Subscriber Server

HMAC

HASH-based Message Authentication Codec

HTML

Hypertext Markup Language

HTTP

HyperText Transfer Protocol

IBSS

Independent BSS

ICMP

Internet Control Message Protocol

IDE

Integrated Development Environment

IETF

Internet Engineering Task Force

IID

Interface Identifier

IMS

IP Multimedia Subsystem

iLBC

Internet Low Bitrate Codec

IoT

Internet of Things

IP

Internet Protocol

IP TTL

IP Time-to-Live

IPHC

IP Header Compression

iSAC

internet Speech Audio Codec

ISM

Instrumental Scientific and Medical

ITU

International Telecommunications Union

IPv4

Internet Protocol version 4

IPv6

Internet Protocol version 6

JPEG

Joint Photographic Experts Group

Kbps

Kilobits per second

LAN

Local Area Network

LATM

Low-overhead MPEG-4 Audio Transport Multiplex

LBR

Low Bit Rate

LBR

Low-power PAN Border Router

LCEVC

Low Complexity Enhancement Video Coding

LD-CELP

Low Delay CELP

LLC

Link Layer Control

LLN

Low Power and Lossy Network

LoRa

Long Range

LPC

Linear Predictive Coding

LPWAN

Low Power Wide Area Network

LSF

Line Spectral Frequency

LTE

Long Term Evolution

MAC

Media Access Control

MB

Macro Block

Mbps

Megabits per second

MBR

Medium Bit Rate

MCELP

Mixed CELP

MD

Message Digest

MDCT

Modified Discrete Cosine Transform

MF

More Fragments

MIC

Message Integrity Code

MIMO

Multiple Input Multiple Output

MKI

Master Key Identifier

MME

Mobility Management Entity

mMTC

Massive Machine-Type Communication

MOS

Mean Opinion Score

MPEG

Motion Pictures Experts Group

MRF

Media Resource Function

MRFC

MRF Controller

MRFP

MRF Processor

MSS

Maximum Segment Size

MTU

Maximum Transmission Unit

NAL

Network Abstraction Layer

NALU

NAL Unit

NAT

Network Address Translation

NB

Narrowband

NB-Fi

Narrowband Fidelity

NB-IoT

Narrowband IoT

ND

Neighbor Discovery

NFC

Near Field Communication

NFV

Network Function Virtualization

NHC

Next Header Compression

NR

New Radio

NS

Neighbor Solicitation

NT

Netualizer Network Interface

OFDM

Orthogonal Frequency Division Multiplexing

OQPSK

Offset QPSK

OSI

Open Systems Interconnection

PAM

Pulse Amplitude Modulation

PAN

Personal Area Network

PAN ID

PAN Identifier

PCF

Point Coordination Function

PCM

Pulse Code Modulation

PCRF

Policy and Charging Rules Function

PDCP

Packet Data Convergence Protocol

PDN-GW

Packet Data Network Gateway

PG

Prediction Gain

PKI

Public Key Infrastructure

PSK

Phase Shift Keying

PSQM

Perceptual Speech Quality Measure

QAM

Quadrature Amplitude Modulation

QCIF

Quarter CIF

QoS

Quality of Service

QPSK

Quadrature PSK

QUIC

Quick UDP Internet Connection

RAN

Radio Access Network

REST

Representational State Transfer

RGB

Red Green Blue

RLC

Radio Link Control

ROHC

Robust Header Compression

RPE

Regular Pulse Excitation

RPL

Routing for Low Power

RR

Receiver Report

RTC

Real Time Communication

RTCP

Real-Time Control Protocol

RTP

Real-Time Transport Protocol

RTT

Round Trip Time

SAA

Stateless Address Autoconfiguration

SAM

Source Address Mode

SBC

Session Border Controller

SBR

Spectral Band Replication

SBIT

Start Bit Position

SC-FDMA

Single Carrier FDMA

SD-DNS

Service Discovery DNS

SD-WAN

Software Defined WAN

SDAP

Service Data Adaption Protocol

SDES

SDP Security Description for Media Streams

SDN

Software Defined Network

SDP

Session Description Protocol

SDR

Software Defined Radio

SFD

Start Frame Delimiter

SHA

Secure Hash Algorithm

SID

Silence Insertion Descriptor

SIF

Source Intermediate Format

S-GW

Serving Gateway

SIFS

Short Inter Frame Spacing

SIP

Session Initialization Protocol

SIPS

Secure SIP

SNR

Signal to Noise Ratio

sps

samples per second

SSNR

Segmental SNR

SSRC

Synchronization Source

SR

Sender Report

SRC

Source Format

SRTCP

Secure RTCP

SRTP

Secure RTP

SWB

Super Wideband

TCP

Transport Control Protocol

TDD

Time Division Duplex

TDMA

Time Division Multiple Access

TLS

Transport Layer Security

TLV

Type-Length-Value

TTL

Time-to-Live

TTS

Text-to-Speech

UAC

User Agent Client

UART

Universal Asynchronous Receiver Transmitter

UAS

User Agent Server

UDP

User Datagram Protocol

UE

User Equipment

UI

User Interface

UART

Universal Asynchronous Receiver Transmitter

UMTS

Universal Mobile Telecommunications Framework

URI

Uniform Resource Identifier

URL

Uniform Resource Locator

URLLC

Ultra-reliable Low-latency Communication

VAD

Voice Activity Detection

VBR

Variable Bit Rate

VCL

Video Coding Layer

VLBR

Very Low Bit Rate

VoIP

Voice over IP

VoLTE

Voice over LTE

VoNR

Voice over NR

VSELP

Vector Sum Excited LP

VVC

Versatile Video Coding

WAN

Wide Area Network

WB

Wideband

WEP

Wired Equivalent Privacy

Wi-Fi

Wireless Fidelity

WLAN

Wireless Local Area Network

WPA

Wi-Fi Protected Access

WPAN

Wireless Personal Area Network

WSN

Wireless Sensor Network

XML

eXtensible Markup Language

About the Companion Website

This book is accompanied by a companion website:

www.wiley.com/go/herrero/RTCProtocols 

This website includes:

Solutions PDF

Scripts and Traces

Slides

Part ITheoretical Background

 

1Media Generation

1.1 Introduction

Real Time Communication (RTC) bridges the gap between devices and people, enabling instant, interactive exchanges. RTC supports a two-way dialog across User Equipment (UE) including smartphones, tablets, laptops, and even IoT devices (Herrero 2021). Video conferencing, voice calls, and instant messaging are just some examples of RTC. Other examples include IoT devices generating real time data that drive applications like predictive maintenance, remote monitoring, smart home automation as well as industrial IoT. This means that RTC involves a range of technologies with versatile applications that enable legacy and emerging architectures.

In this context, this chapter is about media generation, and it specifically focuses on the generation of signals that facilitate RTC. Given that signals originate in the analog domain, their conversion into digital form becomes necessary at the UEs (Herrero 2023). The process of digitization involves two essential mechanisms referred to as sampling and quantization (Haykin 2009). These processes lead to the conversion of analog signals with infinite precision into digital data of finite size, enabling storage and transmission. To achieve even greater compression, these streams can be processed through media codecs (Chu 2003).

The conversion of a signal from the analog realm to the digital domain introduces distortion. This distortion is further exacerbated during transmission through a communication channel affected by network impairments like loss and latency. Quantifying the degree of this distortion involves the utilization of quality scores and other relevant metrics.

1.2 Signals

Real time communication signals, capable of carrying any digital data, are our focus in this chapter, specifically how they carry speech, audio, and video. This is because speech, audio, and video signals are the most common types of signals used in RTC applications. Speech signals are used for voice calling, audio signals are used for music and other audio content, and video signals are used for video conferencing and streaming.

1.2.1 Speech

Audio and speech signals are sound signals, characterized by fluctuations in air pressure as they travel. These pressure variations are described as waves, commonly known as sound waves. Furthermore, speech signals play a pivotal role in human communication and constitute a subset of these audio signals. A key fact is that their distinctive nature allows them to be processed and encoded using mechanisms distinct from those employed for standard audio signals (Jurafsky and Martin 2000).

The human auditory system perceives sounds ranging from 20 to 20000 Hertz (Hz) (or 20 kHz), with heightened sensitivity to events occurring between 250 Hz and 5 kHz. This range is where speech signals predominantly reside. Vowel sounds derive their energy mainly from 250 Hz to 2 kHz, while voiced consonants like b, d, and m are strongest between 250 Hz and 4 kHz. Unvoiced consonants like f, s, and t have varying intensities and occupy the 2 to 8 kHz range.

For good speech comprehension, it is especially important to have good hearing in the range of 125 Hz to 4 kHz, where unvoiced consonants reside. Figure 1.1 shows a normalized speech sequence recorded over 25 milliseconds. Interestingly, it displays a strong correlation within about 8 milliseconds, matching the pitch period of the sequence. More details of the characteristics of speech, including the formal definition of the pitch period, are presented in Section 1.4.1.

1.2.2 Audio

Audio signals cover a wider range than speech signals, reaching the full human hearing range and even including the ultrasonic range. In terms of practicality, audio signals typically transmit in one direction, while speech signals spread in all directions. Unlike speech signals, which have a clear organization with specific patterns in frequency and amplitude, audio signals lack such structure and are less complex, with less data, allowing for better compression.

Similar to speech signals, audio signals exhibit various attributes beyond their frequency range. The amplitude of a speech signal corresponds to its volume or loudness. This amplitude can vary significantly, influenced by factors like the speaker, surroundings, and the conveyed emotion. Sound pressure amplitude is commonly quantified in units of pascals (Pa). In this context, a micropascal (Pa) is equivalent to one-millionth of a pascal.

Figure 1.1 Speech Signal.

In controlled laboratory settings, the benchmark for the minimum detectable amplitude, known as the threshold of hearing, is approximately 20 for a 1 kHz tone. Some individuals regard Pa as the threshold for experiencing pain, yet this perception is subjective and varies significantly based on individual differences and age. For instance, the sound pressure produced by a jackhammer at a distance of 1 meter reaches a maximum of approximately 2 Pa, while a jet engine at the same distance generates a maximum pressure of 632 Pa. Sound Pressure Level (SPL), another parameter of sound field, serves as a metric in acoustic analysis and recording. It is commonly employed to gauge the auditory experience at the listener’s location or the positioning of a microphone. The relationship between sound pressure, measured in pascals, and dB SPL is associated with a spectrum spanning from , corresponding to 0 dB SPL, to 200 Pa, equating to 140 dB SPL and even higher (Smith 2010).

Both audio and speech signals exhibit temporal characteristics, such as the zero crossing rate, which refers to the frequency at which the signal crosses the zero amplitude line within a second. Additionally, they manifest time-domain attributes like autocorrelation, illustrating how similar the signal is to itself at different time delays. In the frequency domain, these signals possess characteristics including the power spectrum, that show how much energy the signals have at different frequencies. Furthermore, there are Mel-Frequency Cepstral Coefficients (MFCCs), which capture the high-frequency content of the signal, and Linear Predictive Coding (LPC) coefficients, which model the signal as a linear combination of past samples. Figure 1.1 originally shown in Section 1.2.1 illustrates a speech sequence. In contrast, Figure 1.2 presents a normalized audio sequence of a 2.5 kHz tone with increasing amplitude, showing the differences in temporal and spectral characteristics compared to speech.

Figure 1.2 Audio Signal.

1.2.3 Video

The foundation of digital video predominantly originates from technology (now obsolete) introduced during the early stages of black and white television in the 1930s. The receiver component of black and white television featured the cathode ray tube (CRT). The CRT utilizes a substantial voltage difference to generate an electric field that propels electrons, giving rise to cathode rays. These rays transport electrons from the cathode to the anode, where, due to their momentum, they collide with a photosensitive screen. The velocity and, therefore, the luminosity on the screen correspond to the voltage applied between the anode and cathode. The CRT also incorporates horizontal and vertical deflection plates, allowing the path of the rays to be altered and ensuring that every point on the photosensitive screen receives illumination.

The image refresh process happens line by line, starting from the upper left corner and progressing to the lower right. Voltage signals applied to each set of deflection plates have a sawtooth waveform. The y-plate signal’s period matches the video frame duration, while the x-plate’s period aligns with the line duration. This scanning method is known as progressive scanning. The frame frequency, calculated as the reciprocal of the frame duration, is derived from the Alternating Current (AC) frequency to minimize interference from transformers and other power fluctuations. In the United States with 60 Hz AC, the standard frame rate has been traditionally 30 frames per second (fps). In regions with 50 Hz AC, like most of the world, the standard has been 25 fps. Note that the first and last lines of each frame, as well as the start and end of each line, are transmitted but not displayed. These hidden regions typically carry synchronization data (Benott 2008).

A different approach from progressive scanning is interlaced scanning. Here, each video frame is divided into two fields, scanned in an alternating manner. The first field scans the odd-numbered lines, while the second scans the even-numbered lines. While progressive scanning delivers one complete frame every seconds (assuming a 30 fps frame rate), interlaced scanning transmits half a frame every seconds, resulting in the same field rate (60 Hz in this example). Keep in mind that, interlaced scanning can provide a perceptual advantage by reducing choppiness in certain situations, like scenes with slow or moderate motion, due to its higher apparent frame rate (twice the field rate). However, it can also introduce artifacts under certain conditions.

Figure 1.3 compares progressive and interlaced scanning. Although both methods process the same number of fields per unit time (operating at field rate), they achieve this in different ways. Progressive scanning displays each complete frame independently, while interlaced scanning displays half frames in an alternating pattern, creating the illusion of a full frame. When viewed in motion under certain conditions, particularly with slow or moderate motion and higher vertical resolutions, interlaced scanning can offer a perception of smoother playback. However, it can also introduce artifacts, especially in fast-moving scenes or at lower resolutions.

Figure 1.3 Progressive vs Interlaced Scanning.

Irrespective of the frame rate, all black and white standards utilize a composite analog signal for transmission, incorporating video, blanking, and synchronization (VBS) information. This transmission method commonly employs interlaced scanning, where each line’s initiation is guided by precisely timed horizontal synchronization pulses embedded between the video signals in the imperceptible portion of each line. The term blanking has to do with the pulse responsible for guiding the beam’s retrace from the lower right to the upper left corner of the screen.

The black and white television standard in the United States uses a field frequency of 60 Hz, divided into two fields, each containing 262 ½ lines, and a line frequency of 15.75 kHz. Video content uses Amplitude Modulation (AM) with a bandwidth of 4.2 MHz, while audio uses Frequency Modulation (FM) with a 4.5 MHz carrier frequency.

Unlike the United States standard, the European (and international) black and white television standard operates at a field frequency of 50 Hz. It contains two fields, each composed of 312 ½ lines, resulting in a line frequency of 15.625 kHz. Video content uses AM modulation with a bandwidth of 5 MHz, while audio utilizes FM modulation with a 5.5 MHz carrier frequency.

Since any color can be made by mixing the three primary colors: red, green, and blue (RGB), a color TV scheme uses three separate black and white cameras, each with a filter that only lets in one of these primary colors. These cameras capture three separate VBS signals, which can be combined to recreate the original full-color image. However, there is a problem with this approach: old-fashioned black and white TVs cannot display video recorded with this method.

The primary objective is to devise a compatibility scheme that functions bidirectionally–enabling color signals to be displayed on traditional black and white televisions, while also ensuring that monochromatic signals can be exhibited on color televisions.

A linear combination can be applied to the RGB components and transformed into other three components that decompose the information into signals that provide backward compatibility. One such scheme is through a set of linear equations given by

where R, G, and B are the intensity values of the red, green, and blue components that serve as input to obtain luminance Y, blue chrominance and red chrominance components. An alternative set of equations results from

where the blue and red chrominances are U and V, respectively. Figure 1.4 illustrates how backward compatible signals are generated from the input signals. Each signal involves filtering a specific color of light through optics and then capturing its electrical representation via a regular black and white camera.

The luminance signal conveys the intensity of the color image, allowing black and white TVs to display a grayscale version. For color TVs, the chrominance signals carry the color information needed to render the complete image. Figure 1.5 depicts the composite signal, which combines these elements. Figures 1.6 to 1.8 further break down the Y, U, and V components, respectively, that make up the chrominance signal. Interestingly, human perception is less sensitive to color compared to luminance. This allows less bandwidth to be allocated for transmitting chrominance details, ensuring compatibility with older standards without impacting the critical black and white components.

The VBS signal containing chrominance information is called Color VBS (CVBS). Different ways of modulating chrominance signals create various color representation schemes, leading to different video transmission standards. For example, the United States relied on the National Television Standard Committee (NTSC) standard, while Europe utilized both Phase Alternating Line (PAL) and Sequential Color with Memory (SECAM), which were once dominant players in analog television broadcasting. Although analog TV is no longer used, these seemingly outdated technologies laid the foundation for many aspects of modern digital video. Details of digital video coding are introduced later in this chapter in Section 1.4.3.

Figure 1.4 RGB to YUV Conversion.

Figure 1.5 Composite Image.

Figure 1.6 Y Component.

Figure 1.7 U Component.

Figure 1.8 V Component.

1.3 Sampling and Quantization

As outlined in Section 1.1, information produced by natural sources such as speech, audio, and images exists in the analog domain. In this domain, corresponding signals often include a significant amount of redundant data. This redundant information can be eliminated without appreciably affecting human perception, a procedure referred to as data compression. It is worth noting that, in contrast to the analog domain, a digital domain exists where analog signals are translated into sequences derived from a countable set of numbers. These sequences can be stored and processed by signal processors and computers. The process of removing information in this context is termed source encoding.

Figure 1.9 Rate–Distortion Example.

Data compression can be divided into two primary categories. First, there is lossless compression, which involves eliminating redundancy in a reversible manner, ensuring that the reconstructed message remains identical to the original. Second, there is lossy compression, which entails purposefully removing information in a controlled manner. This removal of non-essential data is permanent, rendering the process irreversible. In this scenario, although the reconstructed message differs from the original, the objective is to minimize this divergence to the point of imperceptibility to the recipient.

The compression rate is defined as the ratio of the information present in the compressed message to that in the original message. Generally, lossy compression exhibits a significantly higher compression rate compared to lossless compression.

In the realm of lossy compression, a specific compression rate corresponds to a specific degree of distortion, giving rise to a trade-off known as rate–distortion. An illustration of this concept can be observed in Figure 1.9, where a source image undergoes compression at various rates, leading to distinct levels of distortion. Both rate and distortion are quantified on a unitless scale ranging from 0 to 4. Note that as distortion diminishes, transmission rates tend to increase; conversely, higher distortion levels result in lower transmission rates (Proakis and Manolakis 2006; Haykin 2009).

As indicated in Section 1.1, sampling and quantization are two key mechanisms that must be employed to convert signals from the analog to the digital domain. The subsequent subsections focus on these two key processes.

1.3.1 Sampling

Sampling transforms an analog signal, which is defined continuously across time, into a sequence of discrete samples taken periodically at a fixed interval known as the sampling period, measured in units of time. The reciprocal of the sampling period is the sampling rate, measured in units of samples per second (sps). If the sampling rate is too slow, the resulting set of samples might not accurately represent the original analog signal. Conversely, if the rate is excessively high, the set could become unwieldy for efficient processing and modulation for transmission through a channel. Table 1.1 shows how, in audio and speech sampling scenarios, each sampling rate has a specific type that identifies it.

Generally, when an energy signal (as shown in Figure 1.10) is defined at every instant of time and sampled at a rate of (where denotes the sampling period), the resulting sampled signal resembles Figure 1.11. Mathematically, sampling takes as input and generates an infinite sequence of samples spaced seconds apart, forming the sequence . This process, known as ideal or instantaneous sampling, captures the signal’s value at specific, infinitesimal instants of time. Note that under ideal sampling, the effect is akin to multiplying the analog signal by a train of time pulses. Specifically,

Table 1.1 Sampling Rate Types.

Type

Sampling Rate (sps)

Narrowband (NB)

8000

Wideband (WB)

16000

Super Wideband (SWB)

32000

Fullband (FB)

48000

Figure 1.10 Signal .

Figure 1.11 Sampled .

(1.1)

where is the ideal sampled signal and is a delta function positioned at time .

In the frequency domain becomes and it is given by

(1.2)

where the spectrum is an infinite sequence of shifted versions of the analog signal frequency representation. Whether overlap occurs among these versions hinges upon the sampling rate . To illustrate, consider a scenario where is band-limited (represented in Figure 1.12), containing no frequency components beyond Hz. If sampled at a rate of , the resulting spectrum (as shown in Figure 1.13) includes components without any overlap.

Mathematically, from Equation (1.2)

where if (1) for and (2) , then

(1.3)

Figure 1.12 Band-limited Signal.

Figure 1.13 Spectrum of Sampled Signal.

if . In this scenario, the samples of the analog signal taken every seconds, with being an integer, contain all the information in .

The inverse operation, that is, recovering the analog signal out of the samples is given by

(1.4)

where delayed versions of the function are added together to interpolate for any value of . Note that the sinc function is defined as . This convolution in the time domain is analogous to multiplication by a low-pass (LP) filter in the frequency domain, referred to as a reconstruction filter. As a result, to recover the analog signal from its sampled version, , it is only necessary to process it through this reconstruction filter.

Generally, an energy signal lacking frequency components above Hz can be accurately described using samples taken periodically every seconds. The sampling rate is known as the Nyquist Rate, while the sampling period is referred to as the Nyquist Interval.

Undersampling occurs when a signal is sampled at a rate below the Nyquist Rate, indicated by . This condition leads to aliasing, a phenomenon where successive shifted replicas of the analog signal’s frequency representation in Equation 1.2 overlap. To illustrate, consider the frequency representation of the signal shown in Figure 1.12. When undersampled, as shown in Figure 1.14, the signal exhibits the characteristic distortions of aliasing.

Figure 1.14 Aliasing.

To avoid aliasing, an LP filter, known as an anti-aliasing filter, can be used on the analog signal . This filter suppresses high-frequency components that could otherwise trigger undersampling issues. However, since this filter is not inherently perfect and possesses a transition band, it is important to make slight adjustments to the sampling rate. By ensuring a rate marginally higher than the Nyquist Rate, the absence of spectral overlap can be guaranteed.

When the signal shown in Figure 1.12 is sampled at a rate surpassing the Nyquist Rate, the resulting spectrum of the instantaneous sampled version, shown in Figure 1.15, is achieved.

Reconstruction and anti-aliasing filters typically share analogous characteristics, both possessing a transition band that extends from to . Moreover, as the sampling frequency increases, the spectral gap between repetitions of the analog signal’s spectrum within the sampled signal’s spectrum widens. This phenomenon alleviates constraints on the transition band’s width in these filters, illustrated in Figure 1.16. The ultimate extent of this band is dictated by .

Illustrated in Figure 1.17 is an illustrative sampling scenario, examined from the viewpoints of both time and frequency domains. To be precise, the signal designated for sampling at the transmitter undergoes multiplication and convolution by a series of pulses in both the time and frequency realms, respectively. This resultant signal is then transmitted through the communication channel and subsequently reconstructed at the receiver by means of frequency multiplication utilizing an LP filter. Correspondingly, in the temporal domain, this equates to convolution through the time depiction of the LP filter, which manifests as a sinc function.

Figure 1.15 Spectrum of Instantaneous Sampled Signal.

Figure 1.16 Reconstruction Filter.

Figure 1.17 Time and Frequency Domain for Sampling.

Figure 1.17 depicts a demonstrative sampling scenario, examined from both time and frequency domain perspectives. Specifically, the signal slated for sampling at the transmitter undergoes multiplication by a series of pulses in the time domain, corresponding to convolution in the frequency realm. The resulting signal traverses the communication channel, and upon reaching the receiver, it is reconstructed via frequency-domain multiplication using an LP filter. In the time domain, this translates to convolution with the time representation of the LP filter, which takes the form of a sinc function (Proakis and Manolakis 2006).

1.3.2 Quantization

While converting analog signals into digital formats, sampling captures a discrete collection of time-domain samples. However, each sample inherently possesses an infinite range of amplitude levels, making their representation using a limited set of digital values impractical. To address this challenge, quantization is employed alongside sampling. This process converts the potential analog sample amplitudes into a finite set of predetermined values, enabling their representation in the digital realm. As expected, quantization inevitably introduces distortion due to its mapping of an infinite range onto a finite set. If the number of available mapped values is insufficient, this distortion can negatively impact human perception.

Accordingly, when the function is sampled at a frequency of , each sample , undergoes transformation into a distinct amplitude, denoted as , selected from a predetermined set of values. Note that quantization typically operates in a memoryless, per-sample manner, guaranteeing that the quantization of one sample remains independent of quantization decisions made for previous samples.

In terms of amplitudes, if an analog sample has an amplitude , then through quantization, this amplitude is mapped into a number if is within a specific range or partition cell, shown in Figure 1.18 and given mathematically by

where is the total number of levels or possible discrete amplitudes in the quantization set and the discrete amplitudes with are called decision thresholds. Upon the need to reconstruct the quantized sample as a specific value , the numerical index undergoes a conversion process to yield a reconstruction level denoted as . This reconstruction level involves the entire spectrum of conceivable analog amplitudes present within a designated partition referred to as . The interval between two successive levels, represented as , is known as the step size.

Figure 1.18 Partition Cell.

In a broad sense, the conversion process from an analog sample to the reconstructed value is expressed as , where the function is referred to as the quantizer characteristic. Depending on whether it possesses an even or an odd count of levels, the quantizer can take the form of either a midtread or a midrise configuration.