62,99 €
Translates technical jargon into practical business communications solutions
This book takes readers from traditional voice, fax, video, and data services delivered via separate platforms to a single, unified platform delivering all of these services seamlessly via the Internet. With its clear, jargon-free explanations, the author enables all readers to better understand and assess the growing number of voice over Internet protocol (VoIP) and unified communications (UC) products and services that are available for businesses.
VoIP and Unified Communications is based on the author's careful review and synthesis of more than 7,000 pages of published standards as well as a broad range of datasheets, websites, white papers, and webinars. It begins with an introduction to IP technology and then covers such topics as:
Packet transmission and switching
VoIP signaling and call processing
How VoIP and UC are defining the future
Interconnections with global services
Network management for VoIP and UC
This book features a complete chapter dedicated to cost analyses and payback calculations, enabling readers to accurately determine the short- and long-term financial impact of migrating to various VoIP and UC products and services. There's also a chapter detailing major IP systems hardware and software. Throughout the book, diagrams illustrate how various VoIP and UC components and systems work. In addition, the author highlights potential problems and threats to UC services, steering readers away from common pitfalls.
Concise and to the point, this text enables readers—from novices to experienced engineers and technical managers—to understand how VoIP and UC really work so that everyone can confidently deal with network engineers, data center gurus, and top management.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 502
Veröffentlichungsjahr: 2012
Contents
Preface
Acknowledgments
Chapter 1: IP Technology Disrupts Voice Telephony
1.1 Introduction to the Public Switched Telephone Network
1.2 The Digital PSTN
1.3 The Packet Revolution in Telephony
Chapter 2: Traditional Telephones Still Set Expectations
2.1 Availability: How the Bell System Ensured Service
2.2 Call Completion
2.3 Sound Quality: Encoding for Recognizable Voices
2.4 Low Latency
2.5 Call Setup Delays
2.6 Impairments Controlled: Echo, Singing, Distortion, Noise
Chapter 3: From Circuits to Packets
3.1 Data and Signaling Preceded Voice
3.2 Putting Voice into Packets
Chapter 4: Packet Transmission and Switching
4.1 The Physical Layer: Transmission
4.2 Data Link Protocols
4.3 IP, the Network Protocol
4.4 Layer 4 Transport Protocols
4.5 Higher Layer Processes
4.6 Saving Bandwidth
4.7 Differences: Circuit Versus Packet Switched
Chapter 5: VoIP Signaling and Call Processing
5.1 What Packet Voice and UC Systems Share
5.2 Session Initiation Protocol (SIP)
5.3 Session Description Protocol
5.4 Media Gateway Control Protocol
5.5 H.323
5.6 Directory Services
Chapter 6: VoIP and Unified Communications Define the Future
6.1 Voice as Before, with Additions
6.2 Legacy Services to Keep and Improve with VoIP
6.3 Facsimile Transmission
6.4 Phone Features Added with VoIP/UC
Chapter 7: How VoIP and UC Impact the Network
7.1 Space, Power, and Cooling
7.2 Priority for Voice, Video, Fax Packets
7.3 Packets per Second
7.4 Bandwidth
7.5 Security Issues
7.6 First Migration Steps While Keeping Legacy Equipment
Chapter 8: Interconnections to Global Services
8.1 Media Gateways
8.2 SIP Trunking
8.3 Operating VoIP Across Network Address Translation
8.4 Session Border Controller
8.5 Supporting Multiple-Carrier Connections
8.6 Mobility and Wireless Access
Chapter 9: Network Management for VoIP and UC
9.1 Starting Right
9.2 Continuous Monitoring and Management
9.3 Troubleshooting and Repair
Chapter 10: Cost Analysis and Payback Calculation
Chapter 11: Examples of Hardware and Software
11.1 IP Phones
11.2 Gateways
11.3 Session Border Controllers
11.4 Call-Switching Servers
11.5 Hosted VoIP/UC Service
11.6 Management Systems/Workstations
Chapter 12: Appendixes
12.1 Acronyms and Definitions
12.2 Reference Documents
12.3 Message and Error Codes
Index
Copyright © 2012 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Flanagan, William A.
VoIP and unified communications: Internet telephony and the future voice network / William A. Flanagan
p.; cm.
Includes bibliographical references and index.
ISBN 978-1-118-01921-4 (cloth)
Dedicated to My Wife and Children
PREFACE
This book intends to prepare you to define Unified Communications (UC) for yourself and then get it to work for you.
Each vendor pulls together from its available products a package of features related to voice, data, messaging, and image communications. That’s UC for one vendor, but it’s unlikely to match exactly the UC from another vendor. You need a detailed specification to know what you’ll see installed.
Second, UC isn’t a magic button that solves every problem. On the contrary, careless attempts at UC can create expensive disruptions to your business. Be certain when you deploy UC that it actually enhances your business model or improves processes. Don’t do it just because everybody else is doing it. Planning for UC is an ideal opportunity to examine how you work, with a goal of reducing complexity.
Third, VoIP and UC may reduce overall costs in the long term, but it’s not free. Nemertes Research interviewed hundreds of companies that deployed VoIP to find the average first-year expense; it was over $1400 per employee. UC features would be additional.
So here’s the catch: you can’t plan UC very well unless you know what components and functions are available, how they work, how they work together, and how you can use them profitably in your own situation. In addition, some features, for example voice telephony and high-definition video conferencing, will impact your IP network in ways you might not expect. Other features, like Presence, may not operate well across services and vendors. Any number of new features could offer ways to change your procedures that will require retraining of staff.
Defining what you want involves some preparation on your part to learn the basics of the technology, including the vocabulary, so you can speak with some authority. Hence the need for some explanation of what is available, some background for context, and how to use it—my purpose here. To that end, I reviewed more than 7000 pages of published standards, plus data sheets, websites, white papers, and webinars—to save you much of that effort.
My hope is that the practice of VoIP and UC will avoid the complexity that ISDN had to deal with in the United States. In Europe the carriers almost eliminated customer options for the basic rate ISDN service (BRI) used in homes and small offices (two voice channels on a copper pair). An ISDN phone was plugged in and worked. In the United States, the same BRI had (has?) about 50 configurable parameters, almost all of which are incomprehensible to consumers (and most telco employees).
With some planning and a good deal of luck, a few standard UC profiles will emerge. In place of a long list of parameters to set up a session, the invitation message will carry one description header, “PROFILE=” with a choice of very few values, say something like “StdPhone,” “G3fax,” or “VideoConf.” The savings in bandwidth, processing latency, and equipment design could be huge. The SIP Connect agreement (version 1.1 issued in 2011 by the SIP Forum) is a good start at one area, SIP trunking. Ask your prospective vendors what they are doing to establish profiles and simplify configuration.
As markets mature and users grow more familiar with what’s available and what they really want for daily use, the package known as UC will become better defined. Until then, you must specify what you need and want, then ask vendors to bid on your specific UC.
Unfortunately for planning purposes, the market for UC products and services changes daily. You’ll have to pick what’s best for you from what’s available at the time. A book can’t offer the very latest product information—that’s what the web does. What this book intends to do is give you an overview of typical products and services, with the basis for judging what you find on the web. From that you can hold up your side of the conversation when speaking with sales and technical people. With a clear understanding, you also should be able to respond effectively to the questions and concerns of top management.
Nevertheless, there is hope that the information here will be of great value to those sales people, support engineers, and even newcomers to the industry who want to learn about or clarify their understanding of UC and VoIP. The technical level of the text is designed to include all readers. For those who have been in telephony, there are many references and comparisons to legacy phone services and how UC functions replace them.
To avoid jumping around in the book to understand one concept, some context information is repeated where necessary. Some repetition is not a bug, it’s a feature.
At several points in this book you will see warnings and cautions about potential problems and threats to UC services. These statements shouldn’t raise undue alarm or create doubt about the migration to UC, but make you aware of issues that telephony managers haven’t faced before. For example, there are:
New and changing legal requirements related to E911 location reporting.Hacking threats from Internet connectivity.Increased demands on IP networks for high availability and “no-downtime” servers.Taxes on Internet services that used to be exempt.Large demands for bandwidth from video-and file-sharing applications.We live in interesting times. I hope this book prevents at least some of your headaches.
—William A. Flanagan
Acknowledgments
The work of the Internet Engineering Task Force in publishing the Requests for Comment (RFCs) and Internet Standards helped make this book possible. Only the RFCs, IEEE and EIA standards, ITU Recommendations, and various implementation agreements fully describe the procedures, conditions, exceptions, and options for VoIP and UC protocols. After 10 years as a member of a Technical Committee for the Frame Relay Forum, I have a deep appreciation for the work involved.
Portions of standards appear here where their examples or statements say it best.
The archive of messages from the SIP Forum discussion list provided valuable insights into the practical matters faced by implementers. Thanks to all who shared their knowledge there.
Special thanks and appreciation to the many firms which provided briefings and answered my often detailed questions about VoIP and UC. In random order:
SmoothstoneMitel NetworksEncore NetworksAvayaSiemens Enterprise NetworksCisco SystemsJuniper NetworksBroadvoxOpenTextSprintAlcatel-LucentDialogicNECTone SoftwareSecure LogixIngate SystemsApparent NetworksERF WirelessAudioCodesand all the companies mentioned in the chapter examples.
W. A. F.
Chapter 1
IP TECHNOLOGY DISRUPTS VOICE TELEPHONY
Packet voice, Voice over IP, and Unified Communications (UC) technologies are remaking telephony in a fundamental way that hasn’t been seen since the 1960s. Then the Bell System introduced digital transmission and switching inside the carrier infrastructure to replace analog methods. As digital technology spilled over to businesses through the 1980s, a wave of digital PBX’s replaced older analog PBXs, key systems, and other forms of analog technology. Today the only remnant of analog in the public switched telephone network (PSTN) is the plain old telephone service (POTS) line, the once-universal service. POTS is being discontinued only gradually, but will probably disappear some day as cell phones, fiber to the home, and voice over cable TV networks continue to replace POTS with Voice over IP (VoIP).
1.1 INTRODUCTION TO THE PUBLIC SWITCHED TELEPHONE NETWORK
Telephones are so simple to use that they hide the complexity inside the network that provides the many features we enjoy. In designing a UC deployment, it’s good to understand what UC will replace and extend; that is, what we have used to date.
Figure 1.1 describes the original telephone technology, the analog phone or POTS line—Bell’s great invention. The phone at the house or office connects to the telco’s central office over a 2-wire copper line. The copper wires are twisted to reduce interference from external sources, such as AM radio stations and large electrical motors, but are not shielded by an external metal wrap—hence the term unshielded twisted pair (UTP). Electrical current to operate the phone comes from the battery in the central office; the phone needs no other power supply. Power from the CO was necessary when the first phones were installed because at that time lighting was by gas. Not many homes (and not all offices) had electricity.
FIGURE 1.1 Current loop from CO battery to phone.
Electrical current flows in a loop from end to end, through both phones. The portion of the connection between the customer and the CO came to be called the “local loop.” The transmitter in the mouth piece varies the rate of current flow in response to the sound waves from a talker’s mouth. Since the current flows in a loop, the same changes occur at the receiver where the miniature audio speaker in the earpiece reproduces the talker’s voice.
The system grew more complex as automatic switches took over from live operators, but the legacy signaling system is outside the scope of this work. For more information, see The Guide to T-1 Networking.
1.2 THE DIGITAL PSTN
The digital revolution hit the network in the 1960s with the deployment of channel banks. These multiplexers combine 24 analog circuits (2-wire POTS, 4-wire E&M, and other types) onto two twisted pairs, one for each direction, in the digital format that came to be known as T-1.
The reduction in wire count applied first on the trunk lines between central offices. The COs had room to house the new equipment, but more important, the cable ducts buried in the streets of major cities were filling up. The phone company couldn’t easily add more copper cables to fill the need for additional trunks between switches.
There was an added benefit to digital transmission: better sound quality. In most situations the 1’s and 0’s on the T-1 line survived intact, even if some analog noise were added by arcing motors, radio stations, or other sources. The receivers in the channel banks correctly recognized even a slightly distorted “1” as different from a “0,” so the output sound wasn’t impaired.
Digital transmission between analog switches looks like Figure 1.2. The transition from analog to digital for inter-office trunks was relatively easy and left other network devices in place. In this early form of digital telephony, the capacity of the T-1 line divides into 24 fixed channels based on time division multiplexing (TDM). That is, the 24 analog inputs take turns in strict rotation to send one byte of digitally encoded voice that represents a sample of the analog input loudness (the instantaneous volume level). The receiver converts that byte into a matching output level.
FIGURE 1.2 Channel banks between analog switches.
The Nyquist theorem regarding information transmission proved that if the samples were sent at a rate that was at least twice the highest audio frequency of the analog input, then the reproduction in the output at the receiver would be consistent with the input (reproducible results). Design compromises and precedents from analog telephones settled on a voice frequency range of 300 to 3300 Hz. Cutting off everything under 300 Hz eliminated AC hum and matched the limited capability of handset hardware to reproduce low frequencies. The top of 3300 Hz fit within what was then the standard for analog multiplexing: 4000 Hz for each analog channel.
To ensure that the sampling rate exceeded twice the highest voice frequency, the chosen sampling rate was 8000 per second. Each channel, then, generates 8 × 8000 = 64 kbit/s. This rate, the lowest in the digital multiplexing hierarchy, is numbered the way engineers start to count, with zero. Digital signal 0 (DS-0) is the fundamental building block of the TDM hierarchy in circuit-switched voice networks.
The T-1 bit rate is the sum of 24 channels plus an extra framing bit per cycle of 24 channels, a T-1 frame (Figure 1.3). This format continues in use as the way bits are organized on a primary rate interface (PRI) ISDN line. One of the DS-0s on a PRI, the D channel, carries only signaling messages, or what was called data because it wasn’t voice.
FIGURE 1.3 TDM frames showing the basic concept, a T-1 frame, and a superframe.
In any time division multiplexer, the basic frame consists of a string of bits marked in some way by a unique signature element which defines the frame (A). Some link protocols reserve a “start of frame character” that has no other use and never appears inside a frame.
In T-1 and PRI, the marker is a single F bit (B). One bit alone doesn’t allow a receiver to identify the start of the frame. The structure of a superframe (C) built up from 12 frames makes room for a fixed pattern across the superframe: 100011011100. The framing bit pattern allows the receiver to identify the locations of the F bits and from them the groups of bits associated with each channel. An extended superframe (ESF) of 24 frames uses a more complex pattern of F bits that includes a data channel.
The result is the now familiar T-1 bit rate:
Keep in mind that channel banks operate continuously. For each analog input (even if it is silent) the time slot on the DS-1 formatted line carries a byte of “sound” in every one of the 8000 frames per second. The capacity of the line is dedicated to the port on the channel bank, whether or not it is in use. In effect the digital transmission system of channel banks and T-1 lines (the original digital transmission technology) emulates the current flow in the analog local loop. T-1 transmission could also be compared to a moving sidewalk seen at most major airports. It runs at a constant rate whether or not there are passengers on it.
More precisely, the multiplexing format is DS-0; T-1 is a transmission technology on two twisted pairs that requires a repeater every mile but can be extended up to 50 miles. Digital subscriber line (DSL) equipment has largely displaced T-1 in local loop, with a longer reach at 1.5 Mbit/s without a repeater, but is more difficult to extend. Optical fiber now dominates between COs.
Some references to TDM-defined voice channels call it wasteful of bandwidth, but such a judgment should also take into account two other factors:
1. Low overhead: only 1/48 of a bit per octet sample is enough to identify the channels. Only half the F bits are used for ESF framing; the other 12 F bits are a data channel.
2. Low latency: each channel has a reserved spot in every frame. The latest byte from the speaker’s voice digitizer need wait no more than 1/8000 second (125 microseconds, μs) for the next frame to carry it away on the T-1 line.
Dedicated capacity per call prevents interference between users. One caller shouting can’t affect another who is whispering. With digital transmission, quality is consistently high. All callers who get connections receive the same high quality of service. Hold these thoughts for comparison to VoIP later.
Years after the first T-1 lines were installed between central offices, subscriber lines remained individual copper pairs from the switch in the CO to the telephone. Huge cables with thousands of pairs, laid from the CO to a large office building or to a residential neighborhood, had to be spliced by hand each time another reel of cable was added to the run. The biggest reel could hold as little as 1000 ft of a 4000-pair cable. Pieces of cable rarely exceed 1 mile, and the largest cables were installed mostly within large buildings.
The standard service area for a CO is measured by the length of its local loops: 12,000 feet is a common goal for the longest loops out of an office, which typically required splicing those cables once or twice.
When CO switches became digital, the channel bank was adapted to become an extension of the CO switch, with digital T-1 connections for most of the distance to the building or neighborhood. Splicing in the distribution network was reduced by a factor of 12 (or as much as 48, as described below).
In a sense, the original POTS is almost gone because the copper pair from the analog phone no longer reaches to the central office battery that powers the switch. In many areas, particularly those built up in the 1980s or later, the analog line ends within the neighborhood at a remote terminal (or channel bank). You can see the pedestal cabinets that hold them by the side of the road (Figure 1.4). From there the connection to the central office is a digital transmission line on copper or an optical fiber.
FIGURE 1.4 Pedestal cabinet that holds a remote terminal (SLC-96) for POTS service to a neighborhood.
The channel bank grew into the subscriber loop carrier (SLC) with up to 96 analog ports. It could be placed in a closet of a building, or into a free-standing cabinet near a cluster of homes. As Figure 1.4 shows, the analog ports on the SLC still power the phones over separate UTP lines. The payoff for the telephone company was a huge reduction in the distribution cabling where T-1 links (and, later, optical fibers) replaced the individual copper pairs. One negative was the need to power the SLC. Often an electrical utility meter is visible on the cabinet.
Recognizing that not every phone wants to call at the same time, the SLC “oversubscribed” its lines to the CO. In residential areas the 96 analog ports on the SLC often share a single T-1 from the SLC to the CO. The SLC, integrated into the switch’s logic, assigns a channel on the T-1 only during an active call. In business environments where more simultaneous calling is common, the phone company will install up to four T-1s if necessary, which allows all phones to call at once. Today a pair of optical fibers can carry all the calls from any number of SLCs at a site. Later sections will compare this circuit-based local loop technology with packet-based links such as SIP trunks.
Oversubscribing at the SLC didn’t change much for subscribers. Customers wouldn’t notice unless some event triggered mass calling. However, CO switches are also limited in the number of calls that they can set up per minute because the number of modules that receive dialed digits from a phone is much smaller than the number of phones served by the switch. A caller needs one of these modules to place a call, then the module is freed to handle another request while the first call remains active. In the unlikely event you have ever had to wait for dial tone after picking up the handset on a POTS line, you have waited for one of these modules to become free. Call setups per hour is a valid metric for VoIP servers as well.
To summarize the result of the digital revolution, Table 1.1 lists the attributes of phone calls made on circuit-based analog and digital system. Digital PBXs preserved the ability to power phones over the drop cable. Depending on the vendor, the power may have been on a phantom pair (Figure 1.5) or a separate copper pair in the same cable. A phantom pair derives from transformers at each end that couple the audio but keep the dc power on the drop wire.
TABLE 1.1 Characteristics of phone calls on analog and digital networks
Phone Call PropertyAnalog NetworkDigital NetworkSound qualityOften quite good for local calls; weaker and noisier for long distance callsAlmost always uniformly highSusceptibility to noiseHigh originally when transmission and switching were all analog; limited lately as T&S are now digitalVery limitedDistribution cableCopper unshielded twisted pairOptical fiberDrop cableUTPUTPPhone power sourceBattery in central office (or SLC)Local power, fed either from PBX or from wall transformerEchoAlways a concern; requires fine tuning amplifier gains and line losses (to minimize amplitude) plus echo cancellersDigital echo cancelers (in media gateways and phones) make echo undetectable on most callsFIGURE 1.5 Phantom power over two twisted pairs.
This phantom pair for power distribution is seen again in IP phones with Power over Ethernet (PoE) as defined in the IEEE standard 802.3af. The digital revolution fifty years ago retained some concepts and features from the analog technology. In particular, digital switches reserved capacity in defined circuits or channels for each call across the switch and over connected transmission lines (Figure 1.6).
FIGURE 1.6 A circuit-switched connection occupies dedicated capacity in switches and transmission lines for the duration of the call.
To set up a connection between digital trunks, a circuit switch starts a repetitive process that accepts the octet in a time slot on the inbound port, buffers it for a very short interval, and places it in the appropriate time slot in the next frame leaving the outbound port. The process works symmetrically, 8000 times per second, transferring octets in both directions between the connected time slots. Such a switch is also known as a time slot interchanger (TSI). The transfer delay averages about two frame times or 250 μs. SLCs behave similarly, dedicating a TDM channel from the SLC to the CO for each call on an analog port.
The channel exists end to end only for the duration of the call. A call clears when the TSI mapping from input to output disappears and the trunk time slots become available for new assignments.
1.3 THE PACKET REVOLUTION IN TELEPHONY
The packet revolution changes the network fundamentally, yet some elements are very similar.
Since human speech is analog, voice on a digital IP or packet network must be converted to a digital format, an encoding process that may be identical to that in a channel bank or a digital circuit-switched network. That is, packet voice often is encoded as pulse code modulation (PCM) as defined in G.711 for the original channel bank. But the bytes of data no longer stream immediately and at a constant rate over a dedicated 64 kbit/s channel.
In voice over IP (VoIP), the digital information is saved up for a short interval (typically 10 or 20 ms), then put into a packet and sent in a burst over the digital line at the line’s bit rate, usually much higher than 64 kbit/s such as Ethernet at 100 Mbit/s.
Where a T-1 transmission is a moving sidewalk, packet transmission is more like a high-speed shuttle train between terminals. Each car takes on a number of pedestrians (digital bytes) over the time in a station and moves them together and at higher speed. Both the trains and moving sidewalks could have the same capacity, able to carry the same number of passengers per hour (octets per second). For either transport method, the operations at the ends (buying tickets and going through security, or encoding and playback) deal with one individual/byte at a time.
Don’t rely too much on the metaphor. Keep in mind that voice channels contain flows of information bytes, not individuals. A moving sidewalk accepts any mix of people, whereas a T-1 frame dedicates each byte position to a specific channel. A shuttle train accepts random groups of individuals, whereas a VoIP packet represents the information of only one conversation. The concept of a stream is the flow of packets or bytes related to a single function or conversation.
1.3.1 Summary of Packet Switching
Because many packet connections can share one line, each packet must carry its destination address so that the network knows where to deliver it. To mix a metaphor, each train must be routed to the proper terminal, or the destination is put on the front of the bus. The addresses take several forms, depending on how they are used by the network. Addresses plus additional control information constitute the headers on a packet.
To ensure a common understanding of terms for this book, this section will describe how packet networks operate. Figure 1.7 shows the headers that make up a typical voice payload packet. A more detailed discussion appears later.
FIGURE 1.7 Internet protocol headers on a VoIP packet roughly corresponding to layers of the ISO model of a protocol stack.
For this and other descriptions of packets, the convention here is that bits are transmitted as if the diagram reads like English text; that is, from left to right starting in the top row and then the next row below until the end of the packet. Within an octet, the least significant bit (LSB) is sent first. Header diagrams are upside-down compared to the standard representation of a protocol stack.
The International Standards Organization (ISO) diagram shows seven layers. The bottom is the physical layer 1: copper, optical fiber, radio, or the string between two tin cans. Protocols occupy layers 2 through 7. The ISO data link, layer 2, is very close to the Internet data link and may use the same protocols such as Ethernet, frame relay, and generic encapsulation protocol. In the legacy data environment there are many more layer 2 protocols not of concern to this discussion of VoIP and UC.
While L2 is at the bottom of the ISO model, the header for the L2 protocol appears at the top of the packet header diagram. It is sent first because it goes the shortest distance—only to the other end of a transmission link.
The L3 ISO protocol for the network connection comes next. This is the position of the Internet Protocol, IP, whose function is to send packets to another host or hosts which can be anywhere on the Internet. An IP header can take a large number of hops from device to device as the packet finds its way across the network. IP has two main characteristics:
IP works on a best-effort basis, with no guarantees of delivery.IP is connectionless. The network accepts IP packets at any time—the network does not require any preparation to receive a packet for a new address.This kind of service is also known as a datagram service.
IP doesn’t guarantee delivery of information; this is a function of the next protocol at the ISO transport layer (L4), which can guarantee delivery of packets and in the proper order. On the Internet, Transmission Control Protocol (TCP) most often performs this function. TCP uses sequence numbers to spot missing packets and ensure delivery order. Error checks recognize transmission or bit errors. The sending TCP process saves packets until the receiver acknowledges receipt, in case a packet must be resent to correct an error. For voice packets, the User Data Protocol (UDP) occupies L4 and L5, so there is no real ISO transport layer error correction in the case of VoIP.
A host that receives a packet needs to know what to do with it—which process or application should deal with it. The ISO layer 5 protocol establishes a session between applications; that is, it identifies a sequence of packets associated with one process or transaction. The port numbers in TCP and UDP headers identify the associated process at each end.
ISO protocol layers are very specific to their functions, with defined interfaces between them. The idea is to allow changes at one layer without affecting any other layers, above or below, because the application program interfaces (APIs) are constant. The Internet protocol stack doesn’t line up exactly with ISO, but the goal of interchangeability of elements is the same. Users can deploy a hardware improvement or an updated portion of software without disruption to items on other layers. The interfaces between layers remain constant or change very slowly. The adoption of IPv6 would be much more difficult if IP were not confined to L3.
The presentation layer 6 is not often seen separately from an application. That is, the author of an application usually decides how it will appear to users. There are libraries of software functions that present information graphically, or enhance text displays. For VoIP, the Real-time Transmission Protocol (RTP) operates above ISO layers 2, 3, and 4 to provide functions tailored to voice and video applications. RTP is not strictly presentation, and not the full application, but provides what’s needed to support voice and video transmission—or, any streaming medium.
“Applications” are what most users think of as software, rather than layer 7. References to layer 7 are often meant to include any application.
1.3.2 Link Capacity: TDM versus Packets
There are two schools that put entirely different emPHAsis on the sylABles defining bandwidth efficiency. The outcome of the discussion impacts what call capacity a network designer will attribute to a link.
The advocates for “everything over IP” point out that channels defined on a transmission line get in the way of allocating bandwidth when and as needed. An open pipe T-1, for example, carries every packet at 1.536 Mbit/s, the data capacity after deducting the 8000 framing bits per second from the line bit rate of 1.544 Mbit/s. A channelized T-1, such as those used as voice trunks between a central office and a PBX, carries each channel at only 64 kbit/s. A packet transmitted on a DS-0 channel takes 24 times as long to finish as a packet sent on an unchannelized T-1.
Traditionalists point out another way to measure efficiency: the ratio of information bits to total bits on a link. For channelized voice traffic a full 24 channels represents 1.536 Mbit/s of voice and signaling information out of 1.544 Mbit/s, or about 99.5%.
What really matters is how many conversations will that T-1 access link support at one time. In a legacy TDM format the answer is 24. When the mode is VoIP, the answer varies over a wide range.
Packets require headers in addition to the information bits. Compared to a TDM channel, the number of bits per second for a conversation is higher if PCM voice encoding is packetized. That is, chopping a 64 kbit/s voice signal into packets requires adding 44 or more bytes (can exceed 64 bytes) to each 20 ms block of voice information. That’s 44 to 64 bytes added per 128 bytes.
For the simple use case of PCM and IPv4 on an Ethernet link, 64 bytes of header on 128 bytes of voice information raises the bandwidth needed in each direction to 96 kbit/s. Additional bandwidth is needed for the idle intervals required between packets on some Ethernet interfaces, optional headers on IP packets that belong to a virtual private network (VPN), and additional traffic to support authentication and other functions.
Common practice allocates at least 80 kbit/s of bandwidth for each voice channel encoded with standard PCM. To include all packet headers, it is more realistic to assume 100 or 180 kbit/s per conversation for link capacity planning. The effective number can vary when the system applies various methods to save bandwidth, described below. For one, compressing the voice information to 8 kbit/s (e.g., with the G.729 algorithm) doesn’t reduce the headers, so the bandwidth per channel for link sizing drops to around 50 kbit/s.
A major consulting firm reported that a T-1 line could support 50 conversations using VoIP, more than double the TDM capacity. To reach that density requires additional processing.
Header compression reduces the bandwidth per voice conversation. Since the headers are pretty much the same in packet after packet (addresses are constant, sequence numbers and time stamps increment predictably), it is possible to substitute a “token” value to represent the full set of headers. Several RFCs define the process, in which the sender substitutes 1 to 4 bytes for the complete 44+ bytes in the original headers, not including the data link protocol. In this form of compression there are other headers that aren’t compressed, for example, an Ethernet, Frame Relay, or Multi-Protocol Label Switching (MPLS) tag to multiplex connections on a link.
Keeping with the simple use case, and adding the minimum Ethernet overhead (24 bytes) to a compressed voice payload (16 bytes) produces a total packet length of 44 bytes. The headers repeat 50 times per second, requiring 17.6 kbit/s. Replacing Ethernet with a data link protocol that uses a much shorter header, like Frame Relay or HDLC, can reduce the full-duplex bandwidth per conversation to about 12 kbit/s. More than 50 of them will fit on a T-1.
Carriers often use double MPLS headers (Figure 1.8) to simplify their internal configurations, but those headers don’t require bandwidth on access lines, only on the carrier’s backbone. MPLS enables a network to set up static routes in tables (like those shown later in Figure 4.1), to ensure voice packets follow a physical path that introduces minimum latency.
FIGURE 1.8 MPLS labels add to header size but simplify packet forwarding and support traffic engineering for voice quality.
On the wide area network (WAN) and the fastest Ethernet links (full duplex connections with separate paths for each direction), the transmission equipment can queue packets and launch them head to tail with only a short separator between. On a local area network (LAN) based on a slower Ethernet, each packet starts with a preamble of a bit pattern that lets other hosts know a packet is coming and at what bit rate. That interval takes bandwidth too.
The most significant block to high transmission efficiency in packet networks is the problem of congestion handling. Switches and routers store packets they can’t send immediately in a local memory buffer. When that buffer fills, the only available relief is to discard packets.
Discards work well with data connections based on Transmission Control Protocol (TCP) because TCP client and server software in hosts recognizes lost packets as congestion and slows the transmission rate. Reduced throughput isn’t acceptable for voice, which depends on a constant-rate stream of information. Traditionally voice has been a constant bit rate service (64 kbit/s) with no speed variations.
VoIP operates on User Datagram Protocol (UDP), which has no mechanism to slow transmission. Variable bit rate compression algorithms exist, but typically they are based on the complexity of the talker’s voice rather than network congestion. So to avoid dropped VoIP packets, the best practice is to allocate no more than 40 to 60% of a link’s bit rate to voice service. The rest can be used by TCP connections, if the routers and switches prioritize voice and discard only data packets.
For comparison, the DS-0 channel of 64 kbit/s operates with minimal latency, at full capacity, in dedicated bandwidth for each call. No channel suffers from congestion after it is connected—degradation in service consists of the busy signal and blocked call attempts.
Without call admission controls on VoIP systems, new voice connections can overload a link and degrade the perceived quality for all users. Table 1.2 summarizes this and other differences.
TABLE 1.2 Differences between TDM and packet telephony
FeatureTDM Digital TelephonyPacket Voice over IPInformation flowConstant bit rate in channelBursts at line rate separated by pausesSwitching technologyBytes moved from input channel to output channel at constant rate by TSIPackets forwarded from input port/line to output port/line as possibleConnection resources in switches and transmissionDedicated to each call for its full durationCalls share switch queues and transmission paths; call occupies resources only while packet movesTelephone number (address)Assigned by carrier, permanent, fixed location unless ported (not including cellular)User’s network address, from ISP or self-assigned; can be a URL similar to email address, which is usable anywhereLatency (delay)Fixed, minimized, not affected by other callsCan vary, depending on path and congestion from other callsEchoCanceled by carrierSystem may not create echo, but will cancel if there is a potential to create itSecurityStrong; proprietary software on purpose-built hardware running over dedicated cablesSimilar to data network; may be exposed to Internet, voice usually shares cables with dataEquipment costsSignificant; often very high for add-on functions like voice mail, auto attendant, IVRIP phones more expensive than basic analog, comparable to digital phones; call control software migrating to commercial servers, reducing hardware costs; voice mail, auto attendant, conferencing often includedOperating costsMove/add/change requires service call; expansion may be expensive for new modulesEnd user may move phone, which can register its new location automatically or with user authentication.MaintenanceTypically very little; replace back up batteries, clean fans, possibly update software in 2 to 5 yearsSimilar to data network; software patches for servers, applications; scan for viruses; replace hard drives; may need new test equipment for voiceProduct life expectancy10 to 20 years; cooling fans may be only moving parts (no hard drives, except for voice mail)Similar to data network gear, as little as three years before software updates force some telephone replacements (servers should last >5 years)1.3.3 VoIP and “The Cloud”
This book intends to describe in considerable detail how VoIP and UC work. The components of a VoIP/UC system work the same regardless of where they are located physically. The phones will be on desktops, and the network links, routers, and switches will be where needed to connect everything. But the servers, databases, and security appliances can be anywhere on the Internet or a private network. Enterprises will put some servers on premises and others in outsourced data centers or carrier central offices. Hosted VoIP puts the call-processing power at the vendor’s site.
IN SHORT: Reading Network Drawings
Figure 1.9 shows a conventional way to represent a LAN. For readers coming from a voice background, the single horizontal line derives from early Ethernet, which consisted of one coaxial cable connecting all the local terminals. There was no switch, only the passive cable and attachment units.
FIGURE 1.9 Depiction of LAN connections derived from early Ethernet.
Coax contains a center copper conductor surrounded by insulation and then a braided wire tube-like covering that shields the inner conductor from electrical interference and acts as the second conductor. Ethernet attachments for all the local terminals penetrated the braided shield to connect to the center wire of the coax. Sharing a wire, all the terminals on the LAN received all the bits transmitted by every host. The drawing replicates this topology of a core with branches.
A more realistic topology today is for each terminal to connect to its own port on an Ethernet switch. This star-shaped layout is harder to draw accurately. Logically, it requires that the switch be shown and not assumed. Whenever a drawing like Figure 1.9 appears, you can safely assume it really looks like Figure 1.10.
FIGURE 1.10 Actual topology of Ethernet LAN; each terminal has its own switch port.
Network management or automapping software that probes the network for devices typically produces this second format of drawing to represent the network. The additional detail is needed to include every device and to allow each port on each device to carry a label for its IP address, VPN assignment, user name, and so forth.
Is that a cloud? Perhaps, but it may be better to separate two concepts whose overlap can confuse:
Hosted service places the servers in a vendor’s site.Cloud service implies more about the vendor’s infrastructure. It is virtualized and clustered to maximize availability (uptime), flexibility (standing up applications quickly), and expandability (adding more processing power, memory, or storage as needed).Private cloud infrastructure may be ideal for your own data center. If you outsource VoIP, cloud infrastructure is a good feature to look for, but not a guarantee of 100% uptime. But for the purposes here, “cloud” is not integral to VoIP; it is a feature of the hosting provider, which could be you. Be sure the VoIP/UC feature set you want is available from a “cloud vendor” and that a cloud is what you need.
Chapter 2
TRADITIONAL TELEPHONES STILL SET EXPECTATIONS
The analog telephone was a mature product more than 100 years ago. The POTS residential line of today is what every line was in those days. Amazingly those old phones will work on today’s analog service. Many central office switches still accept rotary (pulse) dialing as well as dual-tone multifrequency (DTMF or TouchTone) signals. For longevity and preserved backward compatibility, it’s hard to find anything as durable as POTS.
Telephones are so ingrained in our lives that many users are not aware of how the traditional service sets our expectations in many ways. There’s a story, allegedly true, about a call to a computer support desk from a person whose computer wouldn’t turn on. The support agent asked the caller to look at the back of the computer for loose cables. The caller said that was impossible because it was too dark—the power was out.
The expectation for phone service (powered from the CO) was transferred to a computer (on local power). It’s hard to see how old telephone expectations won’t continue to apply in some significant ways to telephones and telephone service after the packet revolution.
2.1 AVAILABILITY: HOW THE BELL SYSTEM ENSURED SERVICE
As described in Chapter 1, analog phones draw operating power over the local loop from the telco. To ensure continuity of operation, COs contain large banks of batteries, typically enough for over 8 hours of operation, powering the switch as well as the phones. Many COs have standby generators to sustain service indefinitely by taking over from the batteries during extended outages. The tradition was that the phones worked unless the CO burned down (which happens, if rarely).
The Bell System standard for the reliability of any piece of equipment, or the service provided by a set of similar equipment, was 99.999% uptime. That’s just over 5 minutes of outage per year (Table 2.1).
TABLE 2.1 Uptime percentage versus downtime
When there are multiple devices with that level of reliability, the overall service can have a slightly lower uptime target. The overall availability of a string of elements in series is approximately the product of the individual availability figures multiplied together. When redundant devices back up each other, their availability increases.
Networks to support business services used to be designed to approach “five-nines” overall, not just for each device. Residential service wasn’t quite as robust in practice—one carrier’s Fiber to the Home (FTTH) service advertises only “99.9% reliability.” Residential tariffs also had looser time frames for repairs compared to business services (good reasons that business lines cost more than residential). Overall, however, everyone expected the phones to work.
Cellular phone customers, while trading off reliability for mobility, still harbor some of the old expectations. Carriers recognize that attitude by advertising “most reliable network” and “fewer dropped calls.” Still, in the back of the big-box store under a metal roof, cellular service may not be available at all. Loss of signal in that limited situation might be acceptable, but loss of service on a desktop business phone won’t be tolerated.
Bottom line: a VoIP implementation must be designed for high availability.
Keeping phone service highly available after converting to an IP network requires some changes in attitude and expectations among traditional data operations people who run the IP network. Increasingly business and financial transaction services on data networks also demand high availability, so the network infrastructure and the people who maintain it have moved away from practices such as:
Taking down a service for hours at a time to upgrade hardware or to patch software.Rebooting a server or router “to see if that clears the problem.”Accepting delays of hours or days to restore a service after a hardware failure.Allowing a local power outage to interrupt services.Technology advances linked to the concept of “cloud computing” are a big help to availability. Server clusters that provide redundancy and load sharing prevent a single hardware or software failure from halting a service. Virtualization goes further, allowing a new server to come on line automatically when demand increases or a working server drops off after a hardware failure. The ability to move an application from one server to another—without service interruption—also means that managers need not scheduled down time to replace, upgrade, or expand servers; install patches; or upgrade software. The optimum process is to add a new server to the cluster, then shift traffic away from the server to go down until it is idle. That should be doable during normal business hours, not overnight or on a holiday.
2.2 CALL COMPLETION
The “fast-busy” signal has been a rarity for decades. The percentage of calls that fail to complete is so small that many users are not aware of a problem in this regard on the PSTN—because there isn’t one. Trunk capacity among central offices is almost always ample to accept all calls placed. The occasion of an extreme emergency will be an exception.
The circuit-switched PSTN has hard limits on the call capacity of each line, from POTS (1) to a fiber (thousands). When the DS-0s between COs are all in use, it is not possible to add another call, and the response is the fast-busy signal, which indicates a lack of resources in the network.
Deploying VoIP for long-distance trunks has expanded capacity and increased flexibility in routing calls, so there is enough capacity almost all the time. On a private network, legacy or VoIP, there is a greater possibility of resource limitations.
Pieces to consider when evaluating or designing an IP network for VoIP are as follows:
Call processor capacity: some server software has a limit on the number of simultaneous calls. Hardware limits need planning; for example, memory to hold the states of connections.Media gateways: similar to planning for legacy PBXs, the number of TDM trunks beyond the gateway is a strict limit.IP trunks: unless limited by a H.323 gatekeeper or similar agent, an IP line may accept more calls than it can carry reasonably. The quality of all connections on that link may suffer.SIP trunking: the carrier supplying the service may impose a limit on the number of calls at a time that is unrelated to the bandwidth of the access link or capacity of servers—you get what you pay for.2.3 SOUND QUALITY: ENCODING FOR RECOGNIZABLE VOICES
The design of the PCM encoding scheme in channel banks was very clever. It had to be, given the state of electronics in 1960. A key requirement was that voices be recognizable, even at low sound levels, when people are whispering, yet the phone had to reproduce loud shouting. The solution was companding: compressing/expanding. The algorithm is defined in ITU Recommendation G.711, which often lends its number to pulse code modulation (PCM).
Selection of the design parameters (easy in hindsight) could have gone something like this:
Carry over into the digital design the basic 4 kHz voice channel audio bandwidth, previously proved in analog multiplexing practice and the design of analog handsets.Apply the Nyquist theorem, which says that to ensure an accurate digital reproduction, the sampling rate must be at least twice the highest analog audio frequency (sampling has to catch every zero crossing, when the sound pressure changes from positive to negative); 4000 × 2 = 8000 per second.Measure the voice signal at each sample accurately enough to ensure voice recognition at low sound levels. To capture the nuances required a 16-bit analog-to-digital converter (ADC) with 64 K possible outcomes to each measurement.Compact the 65 K possible values from the 16-bit ADC to 8 bits, 256 possible byte values, by using a logarithmic scale and allowing fewer values at louder volumes.Establish the DS-0 channel: (8000/s) × 8 bits = 64,000 bits/s.Assemble 24 DS-0s into a DS-1 with the addition of 8000 framing bits per second.The cleverness was to selectively map the detailed measurements to only 255 values (all zeros isn’t allowed, to preserve electrical pulses, the logical 1’s, on the transmission line). The full range of sound pressure levels, both positive and negative, then fits into one byte. Some “rounding” is necessary to assign a range of 16-bit measurement possibilities to only one 8-bit result, the byte value. The difference is quantizing noise (QN).
Figure 2.1 shows how this mapping places more of the byte values at low loudness, with fewer values for very loud sounds.
FIGURE 2.1 Companding concentrates voice encoding information at lower sound levels, and reduces the number of bits needed.
PCM is very mature and inexpensive, making it an easy choice for hardware designers. PCM represents the wave something like a picket fence where the height of each picket represents the volume of the sound as it is sampled at that instant. The tops of the pickets describe the sound wave from the speaker and are reproduced by the receiver. Pickets are 1/8000 of second apart.
There are two variants of PCM. In the United States and Canada where transmission links are based on the T-1 standard of 1.544 Mbit/s, the mapping is called mu-Law, also written with the lowercase Greek letter mu as μ-Law. In areas that base the transmission hierarchy on E-1 at 2.048 Mbit/s, companding is A-Law. The difference arises from how the 16-bit measurements are assigned to the byte values.
The two laws are significantly different (as described in G.711). When configuring equipment you should set both ends to the same law version. The result of a mismatch (the source configured for one, the receiver for the other) is a noticeably distorted but still understandable conversation. While not common, checking for a “law” mismatch should be part of troubleshooting a problem in sound quality.
To express both positive and negative sound pressure, the sound pressure average with no audio input is set to the middle of the range (128). The most negative pressure during the loudest input registers, in decimal format, as a 1 on the scale; 255 is the highest positive value. The most significant binary bit is also called the sign bit, since it is 1 for positive pressure and 0 for negative.
Many packet voice systems use straight PCM encoding. It fulfills expectations for voice quality, is cheap to produce with commercial chips or DSPs, and can pass both DTMF and fax signals (both of which were designed for PCM). Compressing PCM to lower bit rates doesn’t reduce the quality by much, as perceived by a human listener. Fax and modems are, in general, another issue, as described elsewhere in this book.
Compression can have an upside. The 64 kbit/s TDM channel, or a packet stream with that payload capacity, can carry a voice signal compressed from a wider audio bandwidth. For example, an ADPCM algorithm requires less than 64 kbits/s to transmit “high-definition” voice with a 7000 Hz audio frequency range, more than double the 3300 Hz of PCM. The higher frequencies reproduce hard consonant sounds better and improve intelligibility.
Rather than compress PCM to save bandwidth, VoIP can apply higher quality encoding to produce better sound. Naturally it’s a bit complicated, but well within the capabilities of DSP chips. The method, as described in ITU Recommendation G.722, samples the audio input 16,000 times per second, double the rate for PCM. The ADC outputs 14 bits, leaving two bits per pair of octets to synchronize the operational mode between sender and receiver. Intended for a dial-up DS-0 channel, G.722 can carry an 8 or 16 kbit/s data sub-channel in addition to the voice, but this feature is unlikely to find application in a VoIP system where a separate packet stream for data offers more flexibility and increased capacity.
One way to think of the G.722 process is to split the audio input, with two band-pass filters, into two sub-channels of 0 to 4 kHz and 4 to 8 kHz. Each sub-band is then encoded separately in ADPCM. The two results are packaged into a single DS-0 which can be sent in a circuit-switched channel or packetized like PCM. The overall process is called Sub-Band Adaptive Differential Pulse Code Modulation (SB-ADPCM), or HD voice for short.
Of course, the phones at both ends of a conversation need to apply the same codec or algorithm. An important function of Session Initiation Protocol (SIP) and Session Description Protocol (SDP) is to let the phones decide how they will communicate. More on them below.
In the future, if the needed audio bandwidth increases further, there may be reasonable solutions. The sub-band approach to encoding can draw on codecs that require lower bit rates so the number of sub-bands can increase as required. Faster LANs are migrating from 100 million bits per second (Mbit/s) Ethernet to 1 gigabit per second (1000 Mbit/s), so a call may find it has much more than a DS-0 available. Those LAN speeds are becoming easier to find in the WAN as well, with Carrier Ethernet and direct optical fiber access to the premises boosting access speeds to 10 Gbit/s or more.
2.4 LOW LATENCY
One of the great attractions of the circuit-switched telephone was the experience: it is almost like being there. For most of the calls made for both business and personal reasons, the latency across the network is so low that it is not noticed—unless the connection includes a satellite hop.
Even long-distance calls have minor delay when entirely on land lines. The PSTN has so many routes and switches that it is not necessary to divert calls far from the most direct path. (US carriers have been known to play games with the rules for collecting local termination charges by routing calls through Canada, but that’s supposed to be a rare exception.)
Terrestrial copper cables and optical fibers propagate signals (including the bits that make up DS-0s and packets) at roughly two-thirds c, the speed of light in a vacuum, or 0.6 c. Microwave is faster, about 0.9 c (air is not a vacuum), but lacks the deployed capacity of fiber. Microwave has fallen out of favor for long distances but is still popular to connect cell towers to switching centers in the backbone network, saving a few milliseconds over T-1 or optical backhaul.
The PSTN’s circuit switches add only a few hundred microseconds of delay each, so most latency on LD is propagation delay. Historically a transcontinental connection of 3000 miles exhibited a delay of about 30 ms. With PBXs and phones adding almost no additional delay, the total was well below the threshold (150 ms each way) where the callers notice. To avoid a problem, the goal for round trip delay should be under 250 ms.
When latency is minimized and stable, it is constant. Thus in PSTN circuits there can be practically no variation in latency, thus no significant jitter. All the T&S equipment synchronizes to a master clock that ensures all DS-0 bit streams run at the same rate. In the worst case, an adjustment in TDM timing, called a frame slip, may drop or duplicate 125 microseconds (μs) of audio. A phone user might not notice at all; modems should not drop a connection.
The PSTN has been shifting to VoIP for long distance. Some carriers use the Internet, but major carriers separate the VoIP transmission facilities from the Internet and engineer them to minimize delays from congestion. Because the capacity of the backbone is so high—reaching 100 Gbit/s at this writing—the queuing delay in a router is even less than the few hundred microseconds in a 5ESS circuit switch. LD calls on the US PSTN continue to satisfy in terms of latency and jitter.
