108,99 €
Understand and deploy RTC protocols for any kind of communication system
Since the emergence of Voice over IP (VoIP) communications in the late 1990s, a set of real time communication (RTC) protocols has evolved that together now support an immense range of technologies and systems architectures. Whether it’s 5G networks (and beyond) or Internet of Things architectures, RTC protocols are indispensable to modern telecommunications. An understanding of these protocols and their design and deployment is critical for engineers, academics, and industry professionals in virtually every connected field.
Hands-On Real Time Communications offers a thorough yet accessible introduction to this subject, incorporating both the theory and practical applications of RTC protocols. It offers detailed instructions for deploying RTC protocols on various types of network stacks, as well as the use of a huge range of speech, audio, and video codecs. Practice-oriented and designed for students as well as established professionals, it’s a must-own for anyone looking to deploy and maintain essential communications architectures.
Readers will also find:
Hands-On Real Time Communications is ideal for advanced undergraduate or graduate students in RTC communication and networking classes, as well as for engineers, technologists, and architects looking to learn and understand the principles of RTC networking.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 517
Veröffentlichungsjahr: 2025
Cover
Table of Contents
Title Page
Copyright
Preface
What is the Focus of this book?
Why Did I Write This Book?
Target Audience
How is this Book Organized?
Acronyms
About the Companion Website
Part I: Theoretical Background
1 Media Generation
1.1 Introduction
1.2 Signals
1.3 Sampling and Quantization
1.4 Codecs
1.5 Quality Scores
1.6 Summary
1.7 Homework Problems and Questions
Bibliography
2 Media Session Signaling
2.1 Introduction
2.2 Lower Layers
2.3 SIP
2.4 SDP
2.5 HTTP
2.6 Security Considerations
2.7 Summary
2.8 Homework Problems and Questions
Bibliography
3 Media Packetization
3.1 Introduction
3.2 RTP
3.3 RTCP
3.4 SRTP and SRTCP
3.5 Framing
3.6 Summary
3.7 Homework Problems and Questions
Bibliography
4 Media over Network
4.1 Introduction
4.2 Non-3GPP Networks
4.3 3GPP 4G/5G/6G
4.4 IoT
4.5 Putting All Together
4.6 Summary
4.7 Homework Problems and Questions
Bibliography
Part II: Building Topologies
5 Non-3GPP Networks
5.1 Introduction
5.2 Lower Layers
5.3 RTC
5.4 Codecs
5.5 Summary
5.6 Homework Problems and Questions
5.7 Lab Exercises
Bibliography
6 3GPP Networks
6.1 Introduction
6.2 Lower Layers
6.3 VoLTE
6.4 Summary
6.5 Homework Problems and Questions
6.6 Lab Exercises
Bibliography
7 IoT Networks
7.1 Introduction
7.2 WPAN
7.3 LPWAN
7.4 Summary
7.5 Homework Problems and Questions
7.6 Lab Exercises
Bibliography
Index
End User License Agreement
Chapter 1
Table 1.1 Sampling Rate Types.
Table 1.2 A-Law.
Table 1.3 A-Law.
Table 1.4 GSM 6.10 Bit Allocation
Chapter 2
Table 2.1 Subnetting.
Table 2.2 Long vs Short IPv6 Addresses.
Table 2.3 IPv6 vs IPv4 Addresses.
Table 2.4 SIP Response Codecs.
Chapter 3
Table 3.1 Codec Name and Payload Type.
Table 3.2 AMR and AMR-WB Rates (Kbps).
Table 3.3 EVRC and EVRC-B Rates (Kbps).
Table 3.4 EVS Frame Type.
Table 3.5 NALU Types and Packet Types.
Chapter 4
Table 4.1 Technology Comparison.
Table 4.2 IEEE 802.11a.
Table 4.3 6LoWPAN Dispatch Values.
Table 4.4 Unicast S/SAM and D/DAM Values.
Table 4.5 Multicast D/DAM Values.
Chapter 1
Figure 1.1 Speech Signal.
Figure 1.2 Audio Signal.
Figure 1.3 Progressive vs Interlaced Scanning.
Figure 1.4 RGB to YUV Conversion.
Figure 1.5 Composite Image.
Figure 1.6 Y Component.
Figure 1.7 U Component.
Figure 1.8 V Component.
Figure 1.9 Rate–Distortion Example.
Figure 1.10 Signal .
Figure 1.11 Sampled .
Figure 1.12 Band-limited Signal.
Figure 1.13 Spectrum of Sampled Signal.
Figure 1.14 Aliasing.
Figure 1.15 Spectrum of Instantaneous Sampled Signal.
Figure 1.16 Reconstruction Filter.
Figure 1.17 Time and Frequency Domain for Sampling.
Figure 1.18 Partition Cell.
Figure 1.19 Midtrise Noise.
Figure 1.20 Digital Speech Communication System.
Figure 1.21 Encoder / Decoder.
Figure 1.22 Coding Delay.
Figure 1.23 Transmission Delay; Constant vs Burst.
Figure 1.24 The Phrase “Thank You”.
Figure 1.25 Speech Production Model.
Figure 1.26 -Law Analog Companding.
Figure 1.27 Input Signal.
Figure 1.28 PCM Quantized Signal and Reconstruction Error.
Figure 1.29 DPCM Quantized Signal and Reconstruction Error.
Figure 1.30 4-bit PCM.
Figure 1.31 Voiced vs Unvoiced Frames.
Figure 1.32 Autocorrelation.
Figure 1.33 PG for Voiced and Unvoiced Frames.
Figure 1.34 LPC-10 Speech Production Model.
Figure 1.35 CELP.
Figure 1.36 8-bit Uniform Quantization.
Figure 1.37 8-bit ITU-T Rec. G.711 Quantization.
Figure 1.38 Performance Comparison; Linear vs ITU-T Rec. G.711.
Figure 1.39 RPE.
Figure 1.40 Ear Perception Threshold with Audible Tone A.
Figure 1.41 Ear Perception Threshold (Tone A is masked by tone B).
Figure 1.42 Temporal Masking.
Figure 1.43 Perceptual Audio Coding.
Figure 1.44 Blocks.
Figure 1.45 DCT of a Block.
Figure 1.46 GOP Example.
Figure 1.47 Video Frames.
Figure 1.48 Motion Estimation.
Figure 1.49 Sequence Encoding.
Figure 1.50 Perceptual Speech Quality Measure.
Figure 1.51 Circuit Switching.
Figure 1.52 Packet Switching.
Figure 1.53 Packet Loss and Latency.
Figure 1.54 IP Protocol Layers.
Figure 1.55 Network Delay and Jitter.
Figure 1.56 PESQ Block Diagram.
Figure 1.57 POLQA Block Diagram.
Chapter 2
Figure 2.1 Transmission Rate and Bandwidth and Throughput.
Figure 2.2 FSK and PSK Modulations.
Figure 2.3 Quadrature Phase Shift Keying.
Figure 2.4 Multicarrier Modulation.
Figure 2.5 Layers of Link Layer.
Figure 2.6 IPv4 Header and Payload.
Figure 2.7 IPv4 Fragmentation.
Figure 2.8 Example IPv4 Network.
Figure 2.9 ICMP Packet Header and Data.
Figure 2.10 IPv6 Unicast Address Structure.
Figure 2.11 IPv6 Header and Payload.
Figure 2.12 UDP Segment Header and Payload.
Figure 2.13 TCP Header and Payload.
Figure 2.14 Setting up a TCP Connection.
Figure 2.15 Terminating a TCP Connection.
Figure 2.16 SIP Call Flow.
Figure 2.17 HTTP Transactions.
Figure 2.18 HTTP Setup.
Figure 2.19 HTTP Request.
Figure 2.20 HTTP Response.
Figure 2.21 Symmetric Encryption Flow.
Figure 2.22 PKI-based Encryption Flow.
Figure 2.23 Message Authentication Flow.
Figure 2.24 Certification Authority Flow.
Figure 2.25 TLS Session Setup Flow.
Figure 2.26 ITU X.509 Format.
Figure 2.27 TLS Record Format.
Figure 2.28 Secure and Unsecure SIP Protocol Stacks.
Figure 2.29 SIPS Message Flow.
Chapter 3
Figure 3.1 RTP/RTCP Stack.
Figure 3.2 RTP Header Format and Payload.
Figure 3.3 RTP Media Transmission Flow.
Figure 3.4 RTCP Header Format and Payload.
Figure 3.5 SRTP Header Format and Payload.
Figure 3.6 SRTCP Header Format and Payload.
Figure 3.7 DTLS-SRTP Flow.
Figure 3.8 Simple Framing Example.
Figure 3.9 IETF RFC 4867 Format.
Figure 3.10 Payload Header.
Figure 3.11 Payload Header.
Figure 3.12 Bandwidth Efficient Mode Encoding.
Figure 3.13 Octet-aligned Mode Encoding.
Figure 3.14 EVRC Header-Free Packet Format.
Figure 3.15 EVRC Interleaved Packet Format.
Figure 3.16 Compact Bundled Format.
Figure 3.17 ToC-only Format.
Figure 3.18 ToC Format.
Figure 3.19 CMR and ToC Format.
Figure 3.20 CMR Format.
Figure 3.21 LATM Format.
Figure 3.22 H.261 Header Format.
Figure 3.23 H.263 Header Format (Mode A).
Figure 3.24 H.263 Header Format (Mode B).
Figure 3.25 H.263 Header Format (Mode B).
Figure 3.26 H.263+ Header Format.
Figure 3.27 Single NAL Unit Packet (H.264).
Figure 3.28 Single NAL Unit Packet (H.265).
Chapter 4
Figure 4.1 Ethernet Header and Payload.
Figure 4.2 IEEE 802.2 SAP and SNAP.
Figure 4.3 Basic Service Set (BSS).
Figure 4.4 Independent Basic Service Set (IBSS).
Figure 4.5 IEEE 802.11 Contention under DCF.
Figure 4.6 IEEE 802.11 Header and Payload.
Figure 4.7 SIP and RTP over non-3GPP Networks.
Figure 4.8 WebRTP over non-3GPP Networks.
Figure 4.9 LTE Topology.
Figure 4.10 SIP and RTP over LTE.
Figure 4.11 5G Topology.
Figure 4.12 SIP and RTP over 5G.
Figure 4.13 IoT Networks.
Figure 4.14 IoT Access Network.
Figure 4.15 RTC WPAN Protocol Stack.
Figure 4.16 IEEE 802.15.4 and IETF Protocols.
Figure 4.17 IEEE 802.15.4 Topologies.
Figure 4.18 IEEE 802.15.4 Header and Payload.
Figure 4.19 IEEE 802.15.4 Security Header.
Figure 4.20 6LoWPAN Protocol Stack.
Figure 4.21 6LoWPAN Protocol Translation.
Figure 4.22 6LoWPAN Example.
Figure 4.23 Uncompressed IPv6 Header through 6LoWPAN.
Figure 4.24 UDP over IPv6.
Figure 4.25 UDP over IPv6 over 6LoWPAN.
Figure 4.26 Network Layer IPHC.
Figure 4.27 Network IPHC with Transport NHC.
Figure 4.28 NHC Port Number Encoding.
Figure 4.29 6LoWPAN Datagram with Link-Local Addresses.
Figure 4.30 6LoWPAN Datagram with Unicast Addresses.
Figure 4.31 Summary of 6Lo Families.
Figure 4.32 LoRa Stack.
Figure 4.33 Media and Signaling over LoRa.
Figure 4.34 SIP and RTP over IoT Networks.
Figure 4.35 SIP and RTP over Telecommunication Networks.
Chapter 5
Figure 5.1 Example Netualizer Deployment.
Figure 5.2 Initial Screen of Netualizer Controller.
Figure 5.3 New Project Creation Dialog.
Figure 5.4 Default Agent Configuration.
Figure 5.5 Default Script.
Figure 5.6 Wireshark Network Interface Selection.
Figure 5.7 Traffic Capture on the NT Interface.
Figure 5.8 Wireshark Filter Option.
Figure 5.9 Selecting the Physical Layer.
Figure 5.10 Creating the Ethernet Layer.
Figure 5.11 Creating the IPv4 Layer.
Figure 5.12 The 3-Layer IPv4 Stack.
Figure 5.13 Stack Copy.
Figure 5.14 Stack Pasting.
Figure 5.15 Stack Label Change.
Figure 5.16 MAC Address Change on the
SECOND
Stack.
Figure 5.17 Parameters of the IP Layer.
Figure 5.18 Suite Execution.
Figure 5.19 Configuring the Destination Address for Pinging.
Figure 5.20 ICMP Echo Packets.
Figure 5.21 IPv6 Address Configuration.
Figure 5.22 SECOND Stack Ping.
Figure 5.23 ICMPv6 Echo Packets.
Figure 5.24 Third Frame Details.
Figure 5.25 Impairment Layer Creation.
Figure 5.26 Traffic Direction on Stacks.
Figure 5.27 Configuration Transmission Latency.
Figure 5.28 1-second Latency SECOND Stack Ping.
Figure 5.29 1-second Latency ICMP Echo Packets.
Figure 5.30 UDP Layer Selection.
Figure 5.31 IP Destination Address Configuration.
Figure 5.32 Raw Data Injection.
Figure 5.33 Captured UDP Packet.
Figure 5.34 UDP Packets.
Figure 5.35 UDP Packets with 50% Loss.
Figure 5.36 TCP Stacks.
Figure 5.37 Captured TCP Traffic.
Figure 5.38 Captured 10-message TCP Traffic.
Figure 5.39 TCP Stream Follows Wireshark Option.
Figure 5.40 Stream Dump.
Figure 5.41 TCP Traffic Capture with 10% Loss.
Figure 5.42 UDP Protocol Stacks.
Figure 5.43 Creating the SIP Layer.
Figure 5.44 SIP Protocol Stacks.
Figure 5.45 RTP Protocol Stacks.
Figure 5.46 Selecting Audio Codec.
Figure 5.47 Add RTP Interface Option.
Figure 5.48 Creating the TTS Layer.
Figure 5.49 Creating Audio Layer.
Figure 5.50 Laying down the Audio Layer.
Figure 5.51 Configuring URL.
Figure 5.52 Configuring TTS Text Option.
Figure 5.53 Invite Command.
Figure 5.54 SIP and RTP Session Setup.
Figure 5.55 INVITE Request.
Figure 5.56 INVITE 200 OK Response.
Figure 5.57 RTP Media Packet.
Figure 5.58 Session Removal.
Figure 5.59 SIP and RTP Session Termination.
Figure 5.60 BYE Request.
Figure 5.61 BYE 200 OK Response.
Figure 5.62 QoS Layer Creation.
Figure 5.63 QoS Reference Configuration.
Figure 5.64 Configuring
tts
as Reference.
Figure 5.65 Start PESQ Option.
Figure 5.66 Stop PESQ Option.
Figure 5.67 PESQ Score on Events Window.
Figure 5.68 PESQ Score on Events Window (50% Packet Loss).
Figure 5.69 TLS Protocol Stacks.
Figure 5.70 Configuration of Certificates and Private Key.
Figure 5.71
server.pem
Selection.
Figure 5.72 TLS Certificates and Private Key.
Figure 5.73 Upstream Last Packet Option.
Figure 5.74 Content of Upstream Last Packet.
Figure 5.75 Server Authentication.
Figure 5.76 Mutual Authentication Option.
Figure 5.77 Captured TLS Packets with Mutual Authentication.
Figure 5.78 DTLS Protocol Stacks.
Figure 5.79 DTLS Certificates and Private Key.
Figure 5.80 Captured DTLS Packets with Server Authentication.
Figure 5.81 Captured DTLS Packets with Mutual Authentication.
Figure 5.82 Detaching and Removing Protocol Layers.
Figure 5.83 SRTP Layer Creation.
Figure 5.84 SRTP Protocol Stacks.
Figure 5.85 Add RTP Interfaces Option.
Figure 5.86 Add Codecs Option.
Figure 5.87 Captured SIP and SRTP Packets.
Figure 5.88 SDP of SRTP INVITE.
Figure 5.89 SDP of SRTP 200 OK.
Figure 5.90 SRTP Authentication Option.
Figure 5.91 SDP of SRTP INVITE with Authentication.
Figure 5.92 SDP of SRTP 200 OK with Authentication.
Figure 5.93 Captured SRTP Packet.
Figure 5.94 ITU-T Recommendation G.729 Configuration.
Figure 5.95 AMR-WB Configuration.
Figure 5.96 Selection of Video Code.
Figure 5.97 ITU-T Recommendation H.265 Configuration.
Chapter 6
Figure 6.1 SIM7600G-based Radio.
Figure 6.2 LTE Topology under Consideration.
Figure 6.3 4G Project.
Figure 6.4 The Virtual 4G Interface.
Figure 6.5 4G Link Layer Selection.
Figure 6.6 4G 2-Layer Stack.
Figure 6.7 IP Stack.
Figure 6.8 IP Stack Configuration.
Figure 6.9 IP Stack Copy Selection.
Figure 6.10 Duplicated IP Stack.
Figure 6.11 IP Address Changes.
Figure 6.12 PCAP Capture Configuration.
Figure 6.13 Ping Command.
Figure 6.14 Ping Execution on
SECOND
Stack.
Figure 6.15 Captured ICMP Packets.
Figure 6.16 ICMP Echo Packets.
Figure 6.17 UDP Protocol Stacks.
Figure 6.18 Raw Data Injection.
Figure 6.19 Captured UDP Packet.
Figure 6.20 RTC 4G Stacks.
Figure 6.21 Netualizer Dialpad Option.
Figure 6.22 Idle Dialpad.
Figure 6.23 Dialpad on Call.
Figure 6.24 Captured RTC Traffic.
Figure 6.25 Captured RTP Packet.
Figure 6.26
SECOND
Stack EVS Codec Support.
Figure 6.27 EVS Codec Configuration.
Figure 6.28
FIRST
Stack EVS Codec Support.
Figure 6.29 RTP Stacks with Audio and Video Support.
Figure 6.30 Media Type Option.
Figure 6.31 Player Layer Creation.
Figure 6.32 RTP Stack with Player.
Figure 6.33 Player Layer Configuration.
Figure 6.34 Call in Progress.
Figure 6.35 Captured Audio and Video Packets.
Figure 6.36 Captured ITU-T Recommendation H.265 RTP Packet.
Chapter 7
Figure 7.1 CC2531-based Radio.
Figure 7.2 Network Interface Selection.
Figure 7.3 Virtual Hardware Support Option.
Figure 7.4 Project Name Selection.
Figure 7.5 The IEEE 802.15.4 Interface.
Figure 7.6 IEEE 802.15.4 Layer Creation.
Figure 7.7 IEEE 802.15.4 Layer Name Selection.
Figure 7.8 IEEE 802.15.4 2-Layer Stack.
Figure 7.9 6LoWPAN Layer Creation.
Figure 7.10 IEEE 802.15.4 4-Layer Stack.
Figure 7.11 IPv6 Stack Copy Selection.
Figure 7.12 Duplicated IPv6 Stack.
Figure 7.13 Address Field.
Figure 7.14 IPv6 Address Changes.
Figure 7.15 IEEE 802.15.4 Address Field.
Figure 7.16 IEEE 802.15.4 Address Changes.
Figure 7.17 Wireshark Capture Setup.
Figure 7.18 PCAP Filename Configuration.
Figure 7.19 Suite Execution.
Figure 7.20 Ping Command.
Figure 7.21 Ping Execution on
SECOND
Stack.
Figure 7.22 Trace Location.
Figure 7.23 Captured ICMPv6 Packets.
Figure 7.24 IEEE 802.15.4 Header on Wireshark.
Figure 7.25 6LoWPAN Header on Wireshark.
Figure 7.26 IPv6 and ICMPv6 Headers on Wireshark.
Figure 7.27 UDP Layer Selection.
Figure 7.28 UDP Protocol Stacks.
Figure 7.29 UDP Port Number Selection.
Figure 7.30 UDP Port Number Configuration.
Figure 7.31
Hello World!
Message Injection.
Figure 7.32 150-byte Message Injection.
Figure 7.33 Captured IEEE 802.15.4 Packets.
Figure 7.34 Captured 6LoWPAN Packet.
Figure 7.35 Initial 6LoWPAN Fragment.
Figure 7.36 Non-Initial 6LoWPAN Fragment.
Figure 7.37 SIP and RTP over IEEE 802.15.4.
Figure 7.38 RYLR896-based Radio.
Figure 7.39 Network Interface Selection.
Figure 7.40 Project Name Selection.
Figure 7.41 The LoRa Interface.
Figure 7.42 Lora Layer Creation.
Figure 7.43 LoRa Layer Name Selection.
Figure 7.44 LoRa 2-Layer Stack.
Figure 7.45 6LoBTLE Layer Creation.
Figure 7.46 LoRa Protocol Stack.
Figure 7.47 Stack Copy.
Figure 7.48 Copied Stacks.
Figure 7.49 IP Address Parameter.
Figure 7.50 IPv6 Address Configuration.
Figure 7.51 LoRa Address Parameter.
Figure 7.52 LoRa Address Configuration.
Figure 7.53 LoRa Address Configuration on
SECOND
Stack.
Figure 7.54 Destination Address Configuration.
Figure 7.55 Suite Execution.
Figure 7.56 Ping Address Configuration.
Figure 7.57 Ping Execution.
Figure 7.58 Configuration Termination.
Figure 7.59 Captured LoRa Packets.
Figure 7.60 UDP Layer Parameters.
Figure 7.61 LoRa-based UDP Stacks.
Figure 7.62 Selecting the
udpx
Port Number.
Figure 7.63 Port Number Configuration.
Figure 7.64 Destination Address Parameters.
Figure 7.65
Hello World!
Message Injection.
Figure 7.66 Upstream Last Packet Option.
Figure 7.67 Received Message.
Figure 7.68 Captured LoRa Packet.
Figure 7.69 SIP and RTP over LoRa.
Cover
Table of Contents
Title Page
Copyright
Preface
Acronyms
About the Companion Website
Begin Reading
Index
End User License Agreement
iii
iv
xi
xii
xiii
xv
xvi
xvii
xviii
xix
xx
xxi
xxii
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
215
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
Rolando Herrero
Cyber-Physical Systems and Telecom Networks
Program Director at Northeastern University
Boston
Copyright © 2025 by John Wiley & Sons, Inc.. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data Applied for:
Hardback ISBN: 9781394239627
Cover Design: WileyCover Image: © naqiewei/Getty Images
Real Time Communication (RTC) has been traditionally associated with telephony and telephony applications. With the emergence of the Internet, however, RTC has been driven to support a wide range of topologies and technologies. Specifically, in a future ruled by the interaction between humans, robots, sensors, actuators, humanoids, and machines in general, RTC plays a crucial role. Although many applications fall under the umbrella of RTC, namely, text messaging, sensor readout transmission, video game controlling, etc., the focus of this book is on media-based RTC.
Because media involves speech, audio, and video, each with different temporal and spectral characteristics, the technologies involved vary greatly. However, a common theme is the use of Internet suite protocols. These protocols enable support for three main topologies that traditionally used different, and sometimes proprietary, mechanisms. The first topology includes mainstream, non-3GPP networks, which do not follow the standards of the 3rd Generation Partnership Project (3GPP). These networks, such as Wi-Fi and Ethernet-based stacks, support license-free communication, typically without requiring subscriptions. The second topology includes 3GPP networks, which are mainstream cellular networks that follow the standards of the 3GPP. These are wireless, mobile networks that rely on licensed communication channels associated with paid subscriptions. The third and last topology includes Internet of Things (IoT) networks. These networks support communication between devices and applications over both licensed and license-free channels. The key difference between IoT and 3GPP/non-3GPP topologies is that IoT incorporates non-human endpoints (devices) into the communication process. Additionally, devices in IoT networks are often power-constrained, meaning protocols optimized for 3GPP and non-3GPP networks may not function as well in IoT architectures.
This book looks at RTC from the perspective of the Internet Engineering Task Force (IETF) layered architecture. By grouping functionality into layers, the Internet suite of protocols maximizes the reuse of the technologies involved. Whether it is non-3GPP, 3GPP, or IoT, whether it is Wi-Fi, cellular, or an energy-constrained device, media is sampled, compressed, and transmitted using the same basic principles and mechanisms.
RTC is key to supporting the transmission of media between endpoints by supporting multiple topologies and architectures. However, many of the technologies involved in supporting RTC are legacy mechanisms that have been standardized for many years. On the other hand, some of them are new, particularly in the context of IoT networks. The material introduced in this book is part of the curriculum of the Telecommunication Networks Program at Northeastern University in Boston. Moreover, the content presented in this book is novel as it aims to cover multiple areas:
Presents RTC from the perspective of the IETF layered architecture with focus on speech, audio, and video.
Introduces media as a type of application that relies on
codecs
(coder/decoders) to compress speech, audio, and video while preserving
Quality of Service
(QoS).
Integrates RTC technologies to support 3GPP, non-3GPP, and IoT topologies.
Provides a hands-on approach by introducing tools to build, deploy, test, and understand 3GPP, non-3GPP, and IoT networking scenarios.
The first part of this book addresses the first two topics listed above, while the second part focuses on the last topic. The second part supports both hardware virtualization, where network interfaces are emulated, and actual hardware-based network interfaces.
Because this book deals with RTC-based media in 3GPP, non-3GPP, and IoT networks in the context of the IETF layered architecture, it is of importance to those interested in state-of-the-art networking and communication technologies. This includes graduate and undergraduate students pursuing degrees in Electrical Engineering, Computer Engineering, Computer Science, and Information Technology, among others, as well as practitioners, technologists, researchers, and engineers in general. Standardization efforts associated with RTC protocol stacks make this crucial. Moreover, since this book incorporates a second part that presents tools like Netualizer and Wireshark to virtualize these technologies, there are no requirements for dedicated hardware. This enables everyone to analyze and understand the inner workings of RTC protocols. In addition to the networking and communication aspects of media-based RTC, this book also focuses on the application layer, particularly on the details of speech, audio, and video codecs. This makes this book ideal for those in the target audience who are additionally interested in signal processing and information theory.
This book covers media based RTC focusing on networking and communication protocols as well as the details of the processing and compression of the media itself. To meet these goals, this 7-chapter book is organized in two parts as follows:
Part I
that includes four chapters provides the theoretical fundamentals of media-based RTC. In this context,
Chapter 1
looks at the characteristics of the different types of media including the corresponding codecs and how they pack data.
Chapter 2
presents the networking protocol that enables the setup and teardown of media sessions.
Chapter 3
focuses on the networking protocols that support the transmission of codec traffic. Finally,
Chapter 4
introduces the details of how these networking protocols integrate with non-3GPP, 3GPP, and IoT topologies.
Part II
that includes three chapters introduces the mechanisms to build the aforementioned topologies. Specifically,
Chapters 5
,
6
, and
7
respectively focus on non-3GPP, 3GPP, and IoT architectures.
March 2024
Rolando Herrero
Boston, MA, USA
3G
3rd Generation
3GPP
3G Partnership Project
4G
4th Generation
5G
5th Generation
6G
6th Generation
6Lo
IPv6 over Networks of Resource-constrained Nodes
6LoBTLE
IPv6 over Low-power Bluetooth Low Energy
6LoWPAN
IPv6 over Low-power Wireless Personal Area Networks
AAC
Advanced Audio Coding
AAL2
ATM Adaptation Layer 2
AC
Alternating Current
ACELP
Algebraic CELP
ACR
Absolute Category Rating
AES
Advanced Encryption Standard
AM
Amplitude Modulation
AMR
Adaptive Multi-Rate
AP
Access Point
APCM
Adaptive PCM
API
Application Program Interfaces
AR
Autoregressive
ARP
Address Resolution Protocol
ARQ
Automatic Request
ASCII
American Standard Code for Information Interchange
ASR
Automatic Speech Recognition
AVC
Advanced Video Coding
AWS
Amazon Web Services
BLE
Bluetooth Low Energy
BOOTP
Bootstrap Protocol
bps
bits per second
BPSK
Binary PSK
BSS
Basic Service Set
BSSID
BSS Identifier
CA
Certification Authority
CBC
Cipher Block Chaining
CBR
Constant Bit Rate
CCR
Comparison Category Rating
CELP
Code-excited LP
CIDR
Classless Interdomain Routing
CIF
Common Intermediate Format
CM
Counter Mode
CSCF
Call Session Control Function
CMR
Codec Mode Request
CNAME
Canonical Name
CNG
Comfort Noise Generation
CoAP
Constrained Application Protocol
codec
Coder/Decoder
CSFB
Circuit-switched Fallback
CSMA/CA
Carrier Sense Multiple Access with Collision Avoidance
CSMA/CD
Carrier Sense Multiple Access with Collision Detection
CSRC
Contributing Source
CSS
Chirp Spread Spectrum
CRC
Cyclical Redundancy Checking
DAC
Digital-to-Analog Converter
DAM
Destination Address Mode
DBPSK
Differential BPSK
DC
Direct Current
DCF
Distributed Coordination Function
DCR
Degradation Category Rating
DCT
Discrete Cosine Transform
DES
Data Encryption Standard
DHCP
Dynamic Host Configuration Protocol
DIFS
Distributed Inter Frame Spacing
DMOS
Degradation MOS
DNS
Domain Name System
DoS
Denial of Service
DPCM
Differential PCM
DPSK
Differential PSK
DRS
Dynamic Rate Shifting
DSA
Digital Signature Algorithm
DSCP
Differentiated Services Code Point
DSSS
Direct Sequence SS
DTMF
Dual-Tone Multi-Frequency
DTLS
Datagram Transport Layer Security
DTX
Discontinuous Transmissions
EBIT
End Bit Position
ECC
Elliptic Curve Cryptography
ECN
Explicit Congestion Notification
EDCF
Enhanced DCF
EID
Extension Identifier
EIRP
Equivalent Isotropic Radiated Power
eMBB
Enhanced Mobile Broadband
EPC
Enhanced Packet Core
ESS
Extended Service Set
ETSI
European Telecommunications Standards Institute
EU-TRAN
Evolved UMTS Terrestrial Radio Access Network
EUI
Extended Unique Identifier
EVRC
Enhanced Variable Rate Codec
EVS
Enhanced Voice Services
FB
Full Band
FCF
Frame Control Field
FCS
Frame Checksum
FDD
Frequency Division Duplex
FDMA
Frequency Division Multiple Access
FEC
Forward Error Correction
FHSS
Frequency Hopping SS
FIPS
Federal Information Processing Standard
FM
Frequency Modulation
fmtp
format specific parameter
FQ
Frame Quality
FSF
Frame Control Field
FSK
Frequency Shift Keying
FT
Field Type
Gbps
Gigabits per second
gNB
Next Generation node B
GOB
Group of Blocks
GOP
Group of Pictures
GSM
Global System for Mobile Communications
HAN
Home Area Network
HBR
High Bit Rate
HC1
Header Compression 1
HC2
Header Compression 2
HD
High Definition
HD TV
HD Television
HEVC
High Efficiency Video Coding
HSS
Home Subscriber Server
HMAC
HASH-based Message Authentication Codec
HTML
Hypertext Markup Language
HTTP
HyperText Transfer Protocol
IBSS
Independent BSS
ICMP
Internet Control Message Protocol
IDE
Integrated Development Environment
IETF
Internet Engineering Task Force
IID
Interface Identifier
IMS
IP Multimedia Subsystem
iLBC
Internet Low Bitrate Codec
IoT
Internet of Things
IP
Internet Protocol
IP TTL
IP Time-to-Live
IPHC
IP Header Compression
iSAC
internet Speech Audio Codec
ISM
Instrumental Scientific and Medical
ITU
International Telecommunications Union
IPv4
Internet Protocol version 4
IPv6
Internet Protocol version 6
JPEG
Joint Photographic Experts Group
Kbps
Kilobits per second
LAN
Local Area Network
LATM
Low-overhead MPEG-4 Audio Transport Multiplex
LBR
Low Bit Rate
LBR
Low-power PAN Border Router
LCEVC
Low Complexity Enhancement Video Coding
LD-CELP
Low Delay CELP
LLC
Link Layer Control
LLN
Low Power and Lossy Network
LoRa
Long Range
LPC
Linear Predictive Coding
LPWAN
Low Power Wide Area Network
LSF
Line Spectral Frequency
LTE
Long Term Evolution
MAC
Media Access Control
MB
Macro Block
Mbps
Megabits per second
MBR
Medium Bit Rate
MCELP
Mixed CELP
MD
Message Digest
MDCT
Modified Discrete Cosine Transform
MF
More Fragments
MIC
Message Integrity Code
MIMO
Multiple Input Multiple Output
MKI
Master Key Identifier
MME
Mobility Management Entity
mMTC
Massive Machine-Type Communication
MOS
Mean Opinion Score
MPEG
Motion Pictures Experts Group
MRF
Media Resource Function
MRFC
MRF Controller
MRFP
MRF Processor
MSS
Maximum Segment Size
MTU
Maximum Transmission Unit
NAL
Network Abstraction Layer
NALU
NAL Unit
NAT
Network Address Translation
NB
Narrowband
NB-Fi
Narrowband Fidelity
NB-IoT
Narrowband IoT
ND
Neighbor Discovery
NFC
Near Field Communication
NFV
Network Function Virtualization
NHC
Next Header Compression
NR
New Radio
NS
Neighbor Solicitation
NT
Netualizer Network Interface
OFDM
Orthogonal Frequency Division Multiplexing
OQPSK
Offset QPSK
OSI
Open Systems Interconnection
PAM
Pulse Amplitude Modulation
PAN
Personal Area Network
PAN ID
PAN Identifier
PCF
Point Coordination Function
PCM
Pulse Code Modulation
PCRF
Policy and Charging Rules Function
PDCP
Packet Data Convergence Protocol
PDN-GW
Packet Data Network Gateway
PG
Prediction Gain
PKI
Public Key Infrastructure
PSK
Phase Shift Keying
PSQM
Perceptual Speech Quality Measure
QAM
Quadrature Amplitude Modulation
QCIF
Quarter CIF
QoS
Quality of Service
QPSK
Quadrature PSK
QUIC
Quick UDP Internet Connection
RAN
Radio Access Network
REST
Representational State Transfer
RGB
Red Green Blue
RLC
Radio Link Control
ROHC
Robust Header Compression
RPE
Regular Pulse Excitation
RPL
Routing for Low Power
RR
Receiver Report
RTC
Real Time Communication
RTCP
Real-Time Control Protocol
RTP
Real-Time Transport Protocol
RTT
Round Trip Time
SAA
Stateless Address Autoconfiguration
SAM
Source Address Mode
SBC
Session Border Controller
SBR
Spectral Band Replication
SBIT
Start Bit Position
SC-FDMA
Single Carrier FDMA
SD-DNS
Service Discovery DNS
SD-WAN
Software Defined WAN
SDAP
Service Data Adaption Protocol
SDES
SDP Security Description for Media Streams
SDN
Software Defined Network
SDP
Session Description Protocol
SDR
Software Defined Radio
SFD
Start Frame Delimiter
SHA
Secure Hash Algorithm
SID
Silence Insertion Descriptor
SIF
Source Intermediate Format
S-GW
Serving Gateway
SIFS
Short Inter Frame Spacing
SIP
Session Initialization Protocol
SIPS
Secure SIP
SNR
Signal to Noise Ratio
sps
samples per second
SSNR
Segmental SNR
SSRC
Synchronization Source
SR
Sender Report
SRC
Source Format
SRTCP
Secure RTCP
SRTP
Secure RTP
SWB
Super Wideband
TCP
Transport Control Protocol
TDD
Time Division Duplex
TDMA
Time Division Multiple Access
TLS
Transport Layer Security
TLV
Type-Length-Value
TTL
Time-to-Live
TTS
Text-to-Speech
UAC
User Agent Client
UART
Universal Asynchronous Receiver Transmitter
UAS
User Agent Server
UDP
User Datagram Protocol
UE
User Equipment
UI
User Interface
UART
Universal Asynchronous Receiver Transmitter
UMTS
Universal Mobile Telecommunications Framework
URI
Uniform Resource Identifier
URL
Uniform Resource Locator
URLLC
Ultra-reliable Low-latency Communication
VAD
Voice Activity Detection
VBR
Variable Bit Rate
VCL
Video Coding Layer
VLBR
Very Low Bit Rate
VoIP
Voice over IP
VoLTE
Voice over LTE
VoNR
Voice over NR
VSELP
Vector Sum Excited LP
VVC
Versatile Video Coding
WAN
Wide Area Network
WB
Wideband
WEP
Wired Equivalent Privacy
Wi-Fi
Wireless Fidelity
WLAN
Wireless Local Area Network
WPA
Wi-Fi Protected Access
WPAN
Wireless Personal Area Network
WSN
Wireless Sensor Network
XML
eXtensible Markup Language
This book is accompanied by a companion website:
www.wiley.com/go/herrero/RTCProtocols
This website includes:
Solutions PDF
Scripts and Traces
Slides
Real Time Communication (RTC) bridges the gap between devices and people, enabling instant, interactive exchanges. RTC supports a two-way dialog across User Equipment (UE) including smartphones, tablets, laptops, and even IoT devices (Herrero 2021). Video conferencing, voice calls, and instant messaging are just some examples of RTC. Other examples include IoT devices generating real time data that drive applications like predictive maintenance, remote monitoring, smart home automation as well as industrial IoT. This means that RTC involves a range of technologies with versatile applications that enable legacy and emerging architectures.
In this context, this chapter is about media generation, and it specifically focuses on the generation of signals that facilitate RTC. Given that signals originate in the analog domain, their conversion into digital form becomes necessary at the UEs (Herrero 2023). The process of digitization involves two essential mechanisms referred to as sampling and quantization (Haykin 2009). These processes lead to the conversion of analog signals with infinite precision into digital data of finite size, enabling storage and transmission. To achieve even greater compression, these streams can be processed through media codecs (Chu 2003).
The conversion of a signal from the analog realm to the digital domain introduces distortion. This distortion is further exacerbated during transmission through a communication channel affected by network impairments like loss and latency. Quantifying the degree of this distortion involves the utilization of quality scores and other relevant metrics.
Real time communication signals, capable of carrying any digital data, are our focus in this chapter, specifically how they carry speech, audio, and video. This is because speech, audio, and video signals are the most common types of signals used in RTC applications. Speech signals are used for voice calling, audio signals are used for music and other audio content, and video signals are used for video conferencing and streaming.
Audio and speech signals are sound signals, characterized by fluctuations in air pressure as they travel. These pressure variations are described as waves, commonly known as sound waves. Furthermore, speech signals play a pivotal role in human communication and constitute a subset of these audio signals. A key fact is that their distinctive nature allows them to be processed and encoded using mechanisms distinct from those employed for standard audio signals (Jurafsky and Martin 2000).
The human auditory system perceives sounds ranging from 20 to 20000 Hertz (Hz) (or 20 kHz), with heightened sensitivity to events occurring between 250 Hz and 5 kHz. This range is where speech signals predominantly reside. Vowel sounds derive their energy mainly from 250 Hz to 2 kHz, while voiced consonants like b, d, and m are strongest between 250 Hz and 4 kHz. Unvoiced consonants like f, s, and t have varying intensities and occupy the 2 to 8 kHz range.
For good speech comprehension, it is especially important to have good hearing in the range of 125 Hz to 4 kHz, where unvoiced consonants reside. Figure 1.1 shows a normalized speech sequence recorded over 25 milliseconds. Interestingly, it displays a strong correlation within about 8 milliseconds, matching the pitch period of the sequence. More details of the characteristics of speech, including the formal definition of the pitch period, are presented in Section 1.4.1.
Audio signals cover a wider range than speech signals, reaching the full human hearing range and even including the ultrasonic range. In terms of practicality, audio signals typically transmit in one direction, while speech signals spread in all directions. Unlike speech signals, which have a clear organization with specific patterns in frequency and amplitude, audio signals lack such structure and are less complex, with less data, allowing for better compression.
Similar to speech signals, audio signals exhibit various attributes beyond their frequency range. The amplitude of a speech signal corresponds to its volume or loudness. This amplitude can vary significantly, influenced by factors like the speaker, surroundings, and the conveyed emotion. Sound pressure amplitude is commonly quantified in units of pascals (Pa). In this context, a micropascal (Pa) is equivalent to one-millionth of a pascal.
Figure 1.1 Speech Signal.
In controlled laboratory settings, the benchmark for the minimum detectable amplitude, known as the threshold of hearing, is approximately 20 for a 1 kHz tone. Some individuals regard Pa as the threshold for experiencing pain, yet this perception is subjective and varies significantly based on individual differences and age. For instance, the sound pressure produced by a jackhammer at a distance of 1 meter reaches a maximum of approximately 2 Pa, while a jet engine at the same distance generates a maximum pressure of 632 Pa. Sound Pressure Level (SPL), another parameter of sound field, serves as a metric in acoustic analysis and recording. It is commonly employed to gauge the auditory experience at the listener’s location or the positioning of a microphone. The relationship between sound pressure, measured in pascals, and dB SPL is associated with a spectrum spanning from , corresponding to 0 dB SPL, to 200 Pa, equating to 140 dB SPL and even higher (Smith 2010).
Both audio and speech signals exhibit temporal characteristics, such as the zero crossing rate, which refers to the frequency at which the signal crosses the zero amplitude line within a second. Additionally, they manifest time-domain attributes like autocorrelation, illustrating how similar the signal is to itself at different time delays. In the frequency domain, these signals possess characteristics including the power spectrum, that show how much energy the signals have at different frequencies. Furthermore, there are Mel-Frequency Cepstral Coefficients (MFCCs), which capture the high-frequency content of the signal, and Linear Predictive Coding (LPC) coefficients, which model the signal as a linear combination of past samples. Figure 1.1 originally shown in Section 1.2.1 illustrates a speech sequence. In contrast, Figure 1.2 presents a normalized audio sequence of a 2.5 kHz tone with increasing amplitude, showing the differences in temporal and spectral characteristics compared to speech.
Figure 1.2 Audio Signal.
The foundation of digital video predominantly originates from technology (now obsolete) introduced during the early stages of black and white television in the 1930s. The receiver component of black and white television featured the cathode ray tube (CRT). The CRT utilizes a substantial voltage difference to generate an electric field that propels electrons, giving rise to cathode rays. These rays transport electrons from the cathode to the anode, where, due to their momentum, they collide with a photosensitive screen. The velocity and, therefore, the luminosity on the screen correspond to the voltage applied between the anode and cathode. The CRT also incorporates horizontal and vertical deflection plates, allowing the path of the rays to be altered and ensuring that every point on the photosensitive screen receives illumination.
The image refresh process happens line by line, starting from the upper left corner and progressing to the lower right. Voltage signals applied to each set of deflection plates have a sawtooth waveform. The y-plate signal’s period matches the video frame duration, while the x-plate’s period aligns with the line duration. This scanning method is known as progressive scanning. The frame frequency, calculated as the reciprocal of the frame duration, is derived from the Alternating Current (AC) frequency to minimize interference from transformers and other power fluctuations. In the United States with 60 Hz AC, the standard frame rate has been traditionally 30 frames per second (fps). In regions with 50 Hz AC, like most of the world, the standard has been 25 fps. Note that the first and last lines of each frame, as well as the start and end of each line, are transmitted but not displayed. These hidden regions typically carry synchronization data (Benott 2008).
A different approach from progressive scanning is interlaced scanning. Here, each video frame is divided into two fields, scanned in an alternating manner. The first field scans the odd-numbered lines, while the second scans the even-numbered lines. While progressive scanning delivers one complete frame every seconds (assuming a 30 fps frame rate), interlaced scanning transmits half a frame every seconds, resulting in the same field rate (60 Hz in this example). Keep in mind that, interlaced scanning can provide a perceptual advantage by reducing choppiness in certain situations, like scenes with slow or moderate motion, due to its higher apparent frame rate (twice the field rate). However, it can also introduce artifacts under certain conditions.
Figure 1.3 compares progressive and interlaced scanning. Although both methods process the same number of fields per unit time (operating at field rate), they achieve this in different ways. Progressive scanning displays each complete frame independently, while interlaced scanning displays half frames in an alternating pattern, creating the illusion of a full frame. When viewed in motion under certain conditions, particularly with slow or moderate motion and higher vertical resolutions, interlaced scanning can offer a perception of smoother playback. However, it can also introduce artifacts, especially in fast-moving scenes or at lower resolutions.
Figure 1.3 Progressive vs Interlaced Scanning.
Irrespective of the frame rate, all black and white standards utilize a composite analog signal for transmission, incorporating video, blanking, and synchronization (VBS) information. This transmission method commonly employs interlaced scanning, where each line’s initiation is guided by precisely timed horizontal synchronization pulses embedded between the video signals in the imperceptible portion of each line. The term blanking has to do with the pulse responsible for guiding the beam’s retrace from the lower right to the upper left corner of the screen.
The black and white television standard in the United States uses a field frequency of 60 Hz, divided into two fields, each containing 262 ½ lines, and a line frequency of 15.75 kHz. Video content uses Amplitude Modulation (AM) with a bandwidth of 4.2 MHz, while audio uses Frequency Modulation (FM) with a 4.5 MHz carrier frequency.
Unlike the United States standard, the European (and international) black and white television standard operates at a field frequency of 50 Hz. It contains two fields, each composed of 312 ½ lines, resulting in a line frequency of 15.625 kHz. Video content uses AM modulation with a bandwidth of 5 MHz, while audio utilizes FM modulation with a 5.5 MHz carrier frequency.
Since any color can be made by mixing the three primary colors: red, green, and blue (RGB), a color TV scheme uses three separate black and white cameras, each with a filter that only lets in one of these primary colors. These cameras capture three separate VBS signals, which can be combined to recreate the original full-color image. However, there is a problem with this approach: old-fashioned black and white TVs cannot display video recorded with this method.
The primary objective is to devise a compatibility scheme that functions bidirectionally–enabling color signals to be displayed on traditional black and white televisions, while also ensuring that monochromatic signals can be exhibited on color televisions.
A linear combination can be applied to the RGB components and transformed into other three components that decompose the information into signals that provide backward compatibility. One such scheme is through a set of linear equations given by
where R, G, and B are the intensity values of the red, green, and blue components that serve as input to obtain luminance Y, blue chrominance and red chrominance components. An alternative set of equations results from
where the blue and red chrominances are U and V, respectively. Figure 1.4 illustrates how backward compatible signals are generated from the input signals. Each signal involves filtering a specific color of light through optics and then capturing its electrical representation via a regular black and white camera.
The luminance signal conveys the intensity of the color image, allowing black and white TVs to display a grayscale version. For color TVs, the chrominance signals carry the color information needed to render the complete image. Figure 1.5 depicts the composite signal, which combines these elements. Figures 1.6 to 1.8 further break down the Y, U, and V components, respectively, that make up the chrominance signal. Interestingly, human perception is less sensitive to color compared to luminance. This allows less bandwidth to be allocated for transmitting chrominance details, ensuring compatibility with older standards without impacting the critical black and white components.
The VBS signal containing chrominance information is called Color VBS (CVBS). Different ways of modulating chrominance signals create various color representation schemes, leading to different video transmission standards. For example, the United States relied on the National Television Standard Committee (NTSC) standard, while Europe utilized both Phase Alternating Line (PAL) and Sequential Color with Memory (SECAM), which were once dominant players in analog television broadcasting. Although analog TV is no longer used, these seemingly outdated technologies laid the foundation for many aspects of modern digital video. Details of digital video coding are introduced later in this chapter in Section 1.4.3.
Figure 1.4 RGB to YUV Conversion.
Figure 1.5 Composite Image.
Figure 1.6 Y Component.
Figure 1.7 U Component.
Figure 1.8 V Component.
As outlined in Section 1.1, information produced by natural sources such as speech, audio, and images exists in the analog domain. In this domain, corresponding signals often include a significant amount of redundant data. This redundant information can be eliminated without appreciably affecting human perception, a procedure referred to as data compression. It is worth noting that, in contrast to the analog domain, a digital domain exists where analog signals are translated into sequences derived from a countable set of numbers. These sequences can be stored and processed by signal processors and computers. The process of removing information in this context is termed source encoding.
Figure 1.9 Rate–Distortion Example.
Data compression can be divided into two primary categories. First, there is lossless compression, which involves eliminating redundancy in a reversible manner, ensuring that the reconstructed message remains identical to the original. Second, there is lossy compression, which entails purposefully removing information in a controlled manner. This removal of non-essential data is permanent, rendering the process irreversible. In this scenario, although the reconstructed message differs from the original, the objective is to minimize this divergence to the point of imperceptibility to the recipient.
The compression rate is defined as the ratio of the information present in the compressed message to that in the original message. Generally, lossy compression exhibits a significantly higher compression rate compared to lossless compression.
In the realm of lossy compression, a specific compression rate corresponds to a specific degree of distortion, giving rise to a trade-off known as rate–distortion. An illustration of this concept can be observed in Figure 1.9, where a source image undergoes compression at various rates, leading to distinct levels of distortion. Both rate and distortion are quantified on a unitless scale ranging from 0 to 4. Note that as distortion diminishes, transmission rates tend to increase; conversely, higher distortion levels result in lower transmission rates (Proakis and Manolakis 2006; Haykin 2009).
As indicated in Section 1.1, sampling and quantization are two key mechanisms that must be employed to convert signals from the analog to the digital domain. The subsequent subsections focus on these two key processes.
Sampling transforms an analog signal, which is defined continuously across time, into a sequence of discrete samples taken periodically at a fixed interval known as the sampling period, measured in units of time. The reciprocal of the sampling period is the sampling rate, measured in units of samples per second (sps). If the sampling rate is too slow, the resulting set of samples might not accurately represent the original analog signal. Conversely, if the rate is excessively high, the set could become unwieldy for efficient processing and modulation for transmission through a channel. Table 1.1 shows how, in audio and speech sampling scenarios, each sampling rate has a specific type that identifies it.
Generally, when an energy signal (as shown in Figure 1.10) is defined at every instant of time and sampled at a rate of (where denotes the sampling period), the resulting sampled signal resembles Figure 1.11. Mathematically, sampling takes as input and generates an infinite sequence of samples spaced seconds apart, forming the sequence . This process, known as ideal or instantaneous sampling, captures the signal’s value at specific, infinitesimal instants of time. Note that under ideal sampling, the effect is akin to multiplying the analog signal by a train of time pulses. Specifically,
Table 1.1 Sampling Rate Types.
Type
Sampling Rate (sps)
Narrowband (NB)
8000
Wideband (WB)
16000
Super Wideband (SWB)
32000
Fullband (FB)
48000
Figure 1.10 Signal .
Figure 1.11 Sampled .
where is the ideal sampled signal and is a delta function positioned at time .
In the frequency domain becomes and it is given by
where the spectrum is an infinite sequence of shifted versions of the analog signal frequency representation. Whether overlap occurs among these versions hinges upon the sampling rate . To illustrate, consider a scenario where is band-limited (represented in Figure 1.12), containing no frequency components beyond Hz. If sampled at a rate of , the resulting spectrum (as shown in Figure 1.13) includes components without any overlap.
Mathematically, from Equation (1.2)
where if (1) for and (2) , then
Figure 1.12 Band-limited Signal.
Figure 1.13 Spectrum of Sampled Signal.
if . In this scenario, the samples of the analog signal taken every seconds, with being an integer, contain all the information in .
The inverse operation, that is, recovering the analog signal out of the samples is given by
where delayed versions of the function are added together to interpolate for any value of . Note that the sinc function is defined as . This convolution in the time domain is analogous to multiplication by a low-pass (LP) filter in the frequency domain, referred to as a reconstruction filter. As a result, to recover the analog signal from its sampled version, , it is only necessary to process it through this reconstruction filter.
Generally, an energy signal lacking frequency components above Hz can be accurately described using samples taken periodically every seconds. The sampling rate is known as the Nyquist Rate, while the sampling period is referred to as the Nyquist Interval.
Undersampling occurs when a signal is sampled at a rate below the Nyquist Rate, indicated by . This condition leads to aliasing, a phenomenon where successive shifted replicas of the analog signal’s frequency representation in Equation 1.2 overlap. To illustrate, consider the frequency representation of the signal shown in Figure 1.12. When undersampled, as shown in Figure 1.14, the signal exhibits the characteristic distortions of aliasing.
Figure 1.14 Aliasing.
To avoid aliasing, an LP filter, known as an anti-aliasing filter, can be used on the analog signal . This filter suppresses high-frequency components that could otherwise trigger undersampling issues. However, since this filter is not inherently perfect and possesses a transition band, it is important to make slight adjustments to the sampling rate. By ensuring a rate marginally higher than the Nyquist Rate, the absence of spectral overlap can be guaranteed.
When the signal shown in Figure 1.12 is sampled at a rate surpassing the Nyquist Rate, the resulting spectrum of the instantaneous sampled version, shown in Figure 1.15, is achieved.
Reconstruction and anti-aliasing filters typically share analogous characteristics, both possessing a transition band that extends from to . Moreover, as the sampling frequency increases, the spectral gap between repetitions of the analog signal’s spectrum within the sampled signal’s spectrum widens. This phenomenon alleviates constraints on the transition band’s width in these filters, illustrated in Figure 1.16. The ultimate extent of this band is dictated by .
Illustrated in Figure 1.17 is an illustrative sampling scenario, examined from the viewpoints of both time and frequency domains. To be precise, the signal designated for sampling at the transmitter undergoes multiplication and convolution by a series of pulses in both the time and frequency realms, respectively. This resultant signal is then transmitted through the communication channel and subsequently reconstructed at the receiver by means of frequency multiplication utilizing an LP filter. Correspondingly, in the temporal domain, this equates to convolution through the time depiction of the LP filter, which manifests as a sinc function.
Figure 1.15 Spectrum of Instantaneous Sampled Signal.
Figure 1.16 Reconstruction Filter.
Figure 1.17 Time and Frequency Domain for Sampling.
Figure 1.17 depicts a demonstrative sampling scenario, examined from both time and frequency domain perspectives. Specifically, the signal slated for sampling at the transmitter undergoes multiplication by a series of pulses in the time domain, corresponding to convolution in the frequency realm. The resulting signal traverses the communication channel, and upon reaching the receiver, it is reconstructed via frequency-domain multiplication using an LP filter. In the time domain, this translates to convolution with the time representation of the LP filter, which takes the form of a sinc function (Proakis and Manolakis 2006).
While converting analog signals into digital formats, sampling captures a discrete collection of time-domain samples. However, each sample inherently possesses an infinite range of amplitude levels, making their representation using a limited set of digital values impractical. To address this challenge, quantization is employed alongside sampling. This process converts the potential analog sample amplitudes into a finite set of predetermined values, enabling their representation in the digital realm. As expected, quantization inevitably introduces distortion due to its mapping of an infinite range onto a finite set. If the number of available mapped values is insufficient, this distortion can negatively impact human perception.
Accordingly, when the function is sampled at a frequency of , each sample , undergoes transformation into a distinct amplitude, denoted as , selected from a predetermined set of values. Note that quantization typically operates in a memoryless, per-sample manner, guaranteeing that the quantization of one sample remains independent of quantization decisions made for previous samples.
In terms of amplitudes, if an analog sample has an amplitude , then through quantization, this amplitude is mapped into a number if is within a specific range or partition cell, shown in Figure 1.18 and given mathematically by
where is the total number of levels or possible discrete amplitudes in the quantization set and the discrete amplitudes with are called decision thresholds. Upon the need to reconstruct the quantized sample as a specific value , the numerical index undergoes a conversion process to yield a reconstruction level denoted as . This reconstruction level involves the entire spectrum of conceivable analog amplitudes present within a designated partition referred to as . The interval between two successive levels, represented as , is known as the step size.
Figure 1.18 Partition Cell.
In a broad sense, the conversion process from an analog sample to the reconstructed value is expressed as , where the function is referred to as the quantizer characteristic. Depending on whether it possesses an even or an odd count of levels, the quantizer can take the form of either a midtread or a midrise configuration.