Switch/Router Architectures - James Aweya - E-Book

Switch/Router Architectures E-Book

James Aweya

96,99 €


A practicing engineer's inclusive review of communication systems based on shared-bus and shared-memory Switch/Router Architectures This book delves into the inner workings of router and switch design in a comprehensive manner that is accessible to a broad audience. It begins by describing the role of switch/routers in a network, then moves on to the functional composition of a switch/router. A comparison of centralized versus distributed design of the architecture is also presented. The author discusses use of bus versus shared-memory for communication within a design, and also covers Quality of Service (QoS) mechanisms and configuration tools. Written in a simple style and language to allow readers to easily understand and appreciate the material presented, Switch/Router Architectures: Shared-Bus and Shared-Memory Based Systems discusses the design of multilayer switches--starting with the basic concepts and on to the basic architectures. It describes the evolution of multilayer switch designs and highlights the major performance issues affecting each design. It addresses the need to build faster multilayer switches and examines the architectural constraints imposed by the various multilayer switch designs. The book also discusses design issues including performance, implementation complexity, and scalability to higher speeds. This resource also: * Summarizes principles of operation and explores the most common installed routers * Covers the design of example architectures (shared bus and memory based architectures), starting from early software based designs * Provides case studies to enhance reader comprehension Switch/Router Architectures: Shared-Bus and Shared-Memory Based Systems is an excellent guide for advanced undergraduate and graduate level students, as well for engineers and researchers working in the field.

Sie lesen das E-Book in den Legimi-Apps auf:

von Legimi
zertifizierten E-Readern

Seitenzahl: 580




Series Page

Title Page


About the Author


Chapter 1: Introduction to Switch/Router Architectures

1.1 Introducing the Multilayer Switch

1.2 Evolution of Multilayer Switch Architectures

Chapter 2: Understanding Shared-Bus and Shared-Memory Switch Fabrics

2.1 Introduction

2.2 Switch Fabric Design Fundamentals

2.3 Types of Blocking in Switch Fabrics

2.4 Emerging Requirements for High-Performance Switch Fabrics

2.5 Shared Bus Fabric

2.6 Hierarchical Bus-Based Architecture

2.7 Distributed Output Buffered Fabric

2.8 Shared Memory Switch Fabric

2.9 Shared Ring Fabric

2.10 Electronic Design Problems

Chapter 3: Shared-Bus and Shared-Memory-Based Switch/Router Architectures

3.1 Architectures with Bus-Based Switch Fabrics and Centralized Forwarding Engines

3.2 Architectures with Bus-Based Switch Fabrics and Distributed Forwarding Engines

3.3 Architectures with Shared-Memory-Based Switch Fabrics and Distributed Forwarding Engines

3.4 Relating Architectures to Multilayer Switch Types

Chapter 4: Software Requirements for Switch/Routers

4.1 Introduction

4.2 Switch/Router Software Development Methods

4.3 Stability of the Routing Protocols

4.4 Network Management

4.5 Switch/Router Performance

4.6 Interaction Between Layer 3 (Routing) and Layer 2 (Bridging) Functions in Switch/Routers

4.7 Control and Management of Line Cards

4.8 Distributed Forwarding

Chapter 5: Architectures with Bus-Based Switch Fabrics: Case Study—Decnis 500/600 Multiprotocol Bridge/Router

5.1 Introduction

5.2 In-Place Packet Forwarding in Line Cards

5.3 Main Architectural Features of the Decnis 500/600

5.4 Decnic 500/600 Forwarding Philosophy

5.5 Detail System Architecture

5.6 Unicast Packet Reception in a Line Card

5.7 Unicast Packet Transmission in a Line Card

5.8 Multicast Packet Transmission in a Line Card

Chapter 6: Architectures with Bus-Based Switch Fabrics: Case Study—Fore Systems Powerhub Multilayer Switches

6.1 Introduction

6.2 Powerhub 7000 and 6000 Architectures

6.3 Powerhub Software Architecture

6.4 Packet Processing in the PowerHub

6.5 Looking Beyond the First-Generation Architectures

Chapter 7: Architectures with Bus-Based Switch Fabrics: Case Study—Cisco Catalyst 6000 Series Switches

7.1 Introduction

7.2 Main Architectural Features of the Catalyst 6000 Series

7.3 High-Level Architecture of the Catalyst 6000

7.4 Catalyst 6000 Control Plane Implementation and Forwarding Engines: Supervisor Engines

7.5 Catalyst 6000 Line Card Architectures

7.6 Packet Flow in the Catalyst 6000 with Centralized Flow Cache-Based Forwarding

Chapter 8: Architectures with Shared-Memory-Based Switch Fabrics: Case Study—Cisco Catalyst 3550 Series Switches

8.1 Introduction

8.2 Main Architectural Features of the Catalyst 3550 Series

8.3 System Architecture

8.4 Packet Forwarding

8.5 Catalyst 3550 Software Features

8.6 Catalyst 3550 Extended Features

Chapter 9: Architectures with Bus-Based Switch Fabrics: Case Study—Cisco Catalyst 6500 Series Switches with Supervisor Engine 32

9.1 Introduction

9.2 Cisco Catalyst 6500 32 Gb/s Shared Switching Bus

9.3 Supervisor Engine 32

9.4 Catalyst 6500 Line Cards Supported by Supervisor Engine 32

9.5 Cisco Catalyst 6500 32 Gb/s Shared Switching Bus Modes

9.6 Supervisor Engine 32 QoS Features

9.7 Packet Flow Through Supervisor Engine 32

Chapter 10: Architectures with Shared-Memory-Based Switch Fabrics: Case Study—Cisco Catalyst 8500 CSR Series

10.1 Introduction

10.2 Main Architectural Features of the Catalyst 8500 Series

10.3 The Switch-Route and Route Processors

10.4 Switch Fabric

10.5 Line Cards

10.6 Catalyst 8500 Forwarding Technology and Operations

10.7 Catalyst 8500 Quality-of-Service Mechanisms

Chapter 11: Quality of Service Mechanisms in the Switch/Routers

11.1 Introduction

11.2 QoS Forwarding Operations within a Typical Layer 2 Switch

11.3 QoS Forwarding Operations within a Typical Multilayer Switch

11.4 QoS Features in the Catalyst 6500

Chapter 12: Quality of Service Configuration Tools in Switch/Routers

12.1 Introduction

12.2 Ingress QoS and Port Trust Settings

12.3 Ingress and Egress Port Queues

12.4 Ingress and Egress Queue Thresholds

12.5 Ingress and Egress QoS Maps

12.6 Ingress and Egress Traffic Policing

12.7 Weighted Tail-Drop: Congestion Avoidance with Tail-Drop and Multiple Thresholds

12.8 Congestion Avoidance with Wred

12.9 Scheduling with WRR

12.10 Scheduling with Deficit Weighted Round-Robin (DWRR)

12.11 Scheduling with Shaped Round-Robin (SRR)

12.12 Scheduling with Strict Priority Queuing

12.13 Netflow and Flow Entries

Chapter 13: Case Study: Quality of Service Processing in the Cisco Catalyst 6000 and 6500 Series Switches

13.1 Introduction

13.2 Policy Feature Card (PFC)

13.3 Distributed Forwarding Card (DFC)

13.4 Port-Based ASICs

13.5 QoS Mappings

13.6 QoS Flow in the Catalyst 6000 and 6500 Family

13.7 Configuring Port Asic-Based QoS on the Catalyst 6000 and 6500 Family

13.8 IP Precedence and IEEE 802.1p CoS Processing Steps

Appendix A: Ethernet Frame

A.1 Introduction

A.2 Ethernet Frame Format

Appendix B: IPv4 PACKET

B.1 Introduction

B.2 IPv4 Packet Format


B.4 Address Resolution

B.5 IPv4 Address Exhaustion

B.6 IPv4 Options

B.7 IPv4 Packet Fragmentation and Reassembly

B.8 IP Packets Encapsulated into Ethernet Frames

B.9 Forwarding IPv4 Packets



End User License Agreement

List of Tables

Table B.1

Table 3.1

Table 10.1

Table 11.1

Table 11.2

Table 12.1

Table 12.2

List of Illustrations

Figure A.1

Figure A.2

Figure A.3

Figure A.4

Figure A.5

Figure A.6

Figure A.7

Figure A.8

Figure A.9

Figure B.1

Figure B.2

Figure 1.1

Figure 1.2

Figure 1.3

Figure 1.4

Figure 1.5

Figure 1.6

Figure 1.7

Figure 1.8

Figure 1.9

Figure 1.10

Figure 1.11

Figure 2.1

Figure 2.2

Figure 2.3

Figure 2.4

Figure 2.5

Figure 2.6

Figure 2.7

Figure 2.8

Figure 2.9

Figure 2.10

Figure 2.11

Figure 2.12

Figure 2.13

Figure 2.14

Figure 2.15

Figure 3.1

Figure 3.2

Figure 3.3

Figure 3.4

Figure 3.5

Figure 3.6

Figure 3.7

Figure 3.8

Figure 3.9

Figure 3.10

Figure 4.1

Figure 4.2

Figure 4.3

Figure 5.1

Figure 5.2

Figure 5.3

Figure 5.4

Figure 5.5

Figure 5.6

Figure 5.7

Figure 5.8

Figure 5.9

Figure 5.10

Figure 5.11

Figure 5.12

Figure 6.1

Figure 6.2

Figure 6.3

Figure 6.4

Figure 6.5

Figure 6.6

Figure 6.7

Figure 6.8

Figure 6.9

Figure 7.1

Figure 7.2

Figure 7.3

Figure 7.4

Figure 7.5

Figure 7.6

Figure 7.7

Figure 7.8

Figure 7.9

Figure 7.10

Figure 8.1

Figure 8.2

Figure 8.3

Figure 8.4

Figure 8.5

Figure 8.6

Figure 9.1

Figure 9.2

Figure 9.3

Figure 9.4

Figure 9.5

Figure 9.6

Figure 9.7

Figure 10.1

Figure 10.2

Figure 10.3

Figure 10.4

Figure 10.5

Figure 10.6

Figure 10.7

Figure 11.1

Figure 11.2

Figure 11.3

Figure 12.1

Figure 12.2

Figure 12.3

Figure 12.4

Figure 12.5

Figure 12.6

Figure 12.7

Figure 13.1

Figure 13.2

Figure 13.3



Table of Contents

Begin Reading

Chapter 1








































































































































































































































































































































IEEE Press445 Hoes LanePiscataway, NJ 08854

IEEE Press Editorial Board

Ekram Hossain, Editor in Chief

Giancarlo Fortino

Andreas Molisch

Linda Shafer

David Alan Grier

Saeid Nahavandi

Mohammad Shahidehpour

Donald Heirman

Ray Perez

Sarah Spurgeon

Xiaoou Li

Jeffrey Reed

Ahmet Murat Tekalp

Switch/Router Architectures

Shared-Bus and Shared-Memory Based Systems


Copyright © 2018 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.

Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data is available.

ISBN: 978-1-119-48615-2

About the Author

James Aweya was a Senior Systems Architect with the global Telecom company Nortel, Ottawa, Canada, from 1996 to 2009. His work with Nortel involved many areas, including the design of communication networks, protocols and algorithms, switches and routers, and other Telecom and IT equipment. He received his B.Sc. (Hons.) degree in Electrical & Electronics Engineering from the University of Science & Technology, Kumasi, Ghana; M.Sc. in Electrical Engineering from the University of Saskatchewan, Saskatoon, Canada; and Ph.D. in Electrical Engineering from the University of Ottawa, Canada. He has authored more than 54 international journal papers, 39 conference papers, 43 technical reports, and received 63 U.S. patents, with more patents pending. He was awarded the 2007 Nortel Technology Award of Excellence (TAE) for his pioneering and innovative research on Timing and Synchronization across Packet and TDM Networks. He was also recognized in 2007 as one of Nortel's top 15 innovators. Dr. Aweya is a Senior Member of the IEEE. He is presently a Chief Research Scientist at EBTIC (Etisalat British Telecom Innovation Center) in Abu Dhabi, UAE, responsible for research in next-generation mobile network architectures, timing and synchronization over packet networks, indoor localization over WiFi networks, cloud RANs, software-defined networks, network function virtualization, and other areas of networking of interest to EBTIC stakeholders and partners.


This book discusses the design of multilayer switches, sometimes called switch/routers, starting with the basic concepts and then on to the basic architectures. It describes the evolution of multilayer switch designs and highlights the major performance issues affecting each design. The need to build faster multilayer switches has been addressed over the years in a variety of ways and the book discusses these in various chapters. In particular, we examine the architectural constraints imposed by the various multilayer switch designs. The design issues discussed include performance, implementation complexity, and scalability to higher speeds.

The goal of the book is not to present an exhaustive list or taxonomy of design alternatives but to use strategically selected designs (some of which are a bit old, but still represent contemporary designs) to highlight the design philosophy behind each design. The selection of the example designs does not in any way suggest a preference for one vendor design or product over the other. The selection is based purely on how representative a design covers the topics of interest and also on how much information (available in the public domain) could be gathered on the particular design to enable a proper coverage of the topics. The designs selected tend to be representative of the majority of the other designs not discussed in the book.

Even today, most designs still adopt the old approaches highlighted in the book. A design itself might have existed for some time, but the design concepts have stayed pretty much alive in the telecommunication (Telecoms) industry as common practice. The functions and features of the multilayer switch have stayed very much the same over the years. What has mainly changed is the use of advances in higher density manufacturing technologies, high-speed electronics, faster processors, lower power consumption architectures, and optical technologies to achieve higher forwarding speeds, improved device scalability and reliability, and reduced implementation complexity.

As emphasized above, these design examples are representative enough to cover the relevant concepts and ideas needed to understand how multilayer switches are designed and deployed today. The book is purposely written in a simple style and language to allow readers to easily understand and appreciate the material presented.

James AweyaEtisalat British Telecom Innovation Center (EBTIC)Khalifa UniversityAbu Dhabi, UAE

1Introduction to Switch/Router Architectures

1.1 Introducing the Multilayer Switch

The term multilayer switch (or equivalently switch/router) in this book refers to a networking device that performs both Open Systems Interconnection (OSI) network reference model Layer 2 and Layer 3 forwarding of packets (Figure 1.1). The Layer 3 forwarding functions are typically based on the Internet Protocol (IP), while the Layer 2 functions are based on Ethernet. The Layer 2 forwarding function is responsible for forwarding packets (Ethernet frames) within a Layer 2 broadcast domain or Virtual Local Area Network (VLAN). The Layer 3 forwarding function is responsible for forwarding an IP packet from one subnetwork, network or VLAN to an another subnetwork, network, or VLAN.

Figure 1.1 Layer 2 forwarding versus Layer 3 forwarding.

The IP subnetwork could be created based on well-known IP subnetworking rules and guidelines or as a VLAN. A VLAN is a logical group of devices that can span one or more physically separate network segments that are configured to intercommunicate as if they were connected to one physical Layer 2 broadcast domain. Even though the devices may be located on a number of different physical or geographically separate network segments, the devices can intercommunicate as if they are all connected to one physical broadcast domain.

For the Layer 3 forwarding functions to work, the routing functions in the multilayer switch learn about other networks, paths to destination networks and destinations, through dynamic IP routing protocols or via static/manual configuration information provided by a network administrator. The dynamic IP routing protocols – RIP (Routing Information Protocol), OSPF (Open Shortest Path First) Protocol, IS–IS (Intermediate System-to-Intermediate System) Protocol, BGP (Border Gateway Protocol) – allow routers and switch/routers to communicate and distribute network topology information between themselves and provide updates when the network topology changes occur. The routers and switch/routers via the routing protocols learn about the network topology to try to select the best loop-free path on which to forward a packet from its source to its destination IP address.

1.1.1 Control and Data Planes in the Multilayer Switch

The Layer 3 and Layer 2 forwarding functions can each be split into subfunctions – the control plane and data (or forwarding) plane functions (Figure 1.2). Comprehensive discussion of the basic architectures of routers is given in [AWEYA2000] and [AWEYA2001]. The Layer 2 functions in an Ethernet switch and switch/router involve relatively very simple control and data plane operations.

Figure 1.2 Control and data planes in a multilayer switch.

The data plane operations in Layer 2 switches involve MAC address learning (to discover the ports on which new addresses are located), frame flooding (for frames with unknown addresses), frame filtering, and frame forwarding (using a MAC address table showing MAC address to port mappings). The corresponding control plane operations in the Layer 2 devices involve running network loop prevention protocols such as the various variants of the Spanning Tree Protocol (STP), link aggregation-related protocols, device management and configuration tools, and so on.

Even though the Layer 2 functions can be split into two planes of control and data operations, this separation (of control plane and data plane) is usually applied to the Layer 3 functions performed by routers and switch/routers. In a router or switch/router, the entity that performs the control plane operations is referred to as the routing engine, route processor, or control engine (Figure 1.3).

Figure 1.3 Control and forwarding engines in multilayer switches.

The entity that performs the data (or forwarding) plane operations is referred to as the forwarding engine or forwarding processor. By separating the control plane operations from the packet forwarding operations, a designer can effectively identify processing bottlenecks in the device. This knowledge allows the designer to develop and/or use specialized software or hardware components and processors to eliminate these bottlenecks.

1.1.2 Control Engine

Control plane operations in the router or switch/router are performed by the routing engine or route processor, which runs the operating system software that has modules that include the routing protocols, system monitoring functions, system configuration and management tools and interfaces, network traffic engineering functions, traffic management policy tools, and so on.

The control engine runs the routing protocols that maintain the routing tables from which the Layer 3 forwarding table is generated to be used by the Layer 3 forwarding engine in the router or switch/router (Figure 1.4). In addition to running other protocols such as PIM (Protocol Independent Multicast), IGMP (Internet Group Management Protocol), ICMP (Internet Control Messaging Protocol), ARP (Address Resolution Protocol), BFD (Bidirectional Forwarding Detection), and LACP (Link Aggregation Control Protocol), the control engine is responsible for maintaining sessions and exchanging protocol information with other router or network devices.

Figure 1.4 Routing protocols and routing table in the control engine.

The control engine typically is the module that provides the control and monitoring functions for the entire router or switch/router, including controlling system power supplies, monitoring and controlling system temperature (via cooling fans), and monitoring system status (power supplies, cooling fans, line cards, ports and interfaces, primary/secondary router processors, primary/secondary forwarding engines, etc.). The routing engine also controls the router or switch/router network management interfaces, controls some chassis components (e.g., hot-swap or OIR (online insertion and removal) status of components on the backplane), and provides the interfaces for system management and user access to the device.

In high-performance platforms, more than one routing engine can be supported in the router switch/router (Figure 1.5). If two routing engines are installed, one typically functions as the primary (or master) and the other as the secondary (or backup). In this redundant routing engine configuration, if the primary routing engine fails or is removed (for maintenance/repairs) and the secondary routing engine is configured appropriately, the latter takes over as the master routing engine.

Figure 1.5 Multilayer switch with primary and secondary routing engines.

Typically, a router or switch/router supports a set of management ports (e.g., serial port, 10/100 Mb/s Ethernet ports). These ports, generally located on the routing engine module, connect the routing engine to one or more external devices (e.g., terminal, computer) on which a network administrator can issue commands from a command-line interface (CLI) to configure and manage the device. The routing engine could support one or more USB ports that can accept a USB memory device that allows for the loading of the operating system and other system software.

In our discussion in this book, we consider the management plane as part of the control plane – not a separate plane in its own right (Figure 1.6). The management plane is considered a subplane that supports the functions used to manage the router or switch/router via some connections to external management devices (a terminal or computer). Examples of protocols supported in the management plane include Simple Network Management Protocol (SNMP), Telnet, File Transfer Protocol (FTP), Secure FTP, and Secure Shell (SSH). These management protocols allow configuring, managing, and monitoring the device as well as CLI access to the device.

Figure 1.6 Control plane versus management plane.

A console port (which is an EIA/TIA-232 asynchronous serial port) could allow the connection of the routing engine to a device with a serial interface (terminal, modem, computer, etc.) through a serial cable with an RJ-45 connector (Figure 1.7). An AUX (or auxiliary) port could allow the connection of the routing engine (through a serial cable with an RJ-45 connector) to a computer, modem, or other auxiliary device. Furthermore, a 10/100 Mb/s Ethernet interface could connect the routing engine to a management LAN (or a device that has an Ethernet connection) for out-of-band management of the router or switch/router.

Figure 1.7 Management ports.

The routing table (also called the Routing Information Base (RIB)) maintains information about the network topology around the router or switch/router and is constructed and maintained from information obtained from the dynamic routing protocols, and static routes configured by the network administrator. The routing table contains a list of routes to particular IP network destinations (or IP address prefixes). Each route is associated with a metric that is a “distance” measure used by a routing protocol in performing the best path computation to a destination.

The best path to a destination is determined by a routing protocol based on metric (quantitative value) it uses to “measure” the distance it takes to reach a destination. Different routing protocols use different metrics to measure the distance to a given destination. Then the best path to a destination selected by a routing protocol is the path with the lowest metric. Usually the routing protocol selects the best path by evaluating all the possible multiple paths available to the same destination and selects the shortest or optimum path to reach that network. Whenever multiple paths from the router to the same destination exist, each path uses a different output or egress interface on the router to reach that destination.

Typically, routing protocols have their own metrics and rules that they use to construct and update routing tables. The routing protocol generates a metric for each path through the network where the metrics may be based on a single characteristic of a path (e.g., RIP uses a hop count) or several characteristics of a path (e.g., EIGRP uses bandwidth, traffic load, delay, reliability). Some routing protocols may base route selection on multiple metrics, where they combine them into a single representative metric.

If multiple routing protocols (e.g., RIP, EIGRP, OSPF) provide a router or switch/router with different routes to the same destination, the administrative distance (AD) is used to select the preferred (or more trusted) route to that destination (Figure 1.8). The preference is given to the route that has the lowest administrative distance. The administrative distance assigned to a route generated by a particular routing protocol is a numerical value used to rank the multiple paths leading to the same destination. It is a mechanism for a router to rate the trustworthiness of a routing information source (including static routes). The administrative distance represents the trustworthiness the router places on the route. The lower the administrative distance, the more trustworthy the routing information source.

Figure 1.8 Use of administrative distance in route selection for the routing table.

For example, considering OSPF and RIP, routes supplied by OSPF have a lower administrative distance than routes supplied by the RIP. It is not unusual for a router or switch/router to be configured with multiple routing protocols in addition to static routes. In this case, the routing table will have more than one routing information source for the same destination. For example, if the router runs both RIP and EIGRP, both routing protocols may compute different best paths to the same destination. However, RIP determines its path based on hop count, while EIGRP's best path is based on its composite metric. The administrative distance is used by the router to determine the route to install into its routing table. A static route takes precedence over an EIGRP discovered route, which in turn takes precedence over a RIP discovered route.

As another example, if OSPF computes a best path to a specific destination, the router first checks if an entry for that destination exists in the routing table. If no entry exists, the OSPF discovered route is installed into the routing table. If a route already exists, the router decides whether to install the OSPF discovered route based on the administrative distance of the OSPF generated route and the administrative distance of the existing route in the routing table. If the OSPF discovered route has the lowest administrative distance to the destination (compared to the route in the routing table), it is installed in the routing table. If the OSPF discovered route is not the route with the best administrative distance, it is rejected.

A routing protocol may also identify/discover multiple paths (a bundle of routes not just one) to a particular destination as the best path (Figure 1.9). This happens when the routing table has two or more paths with identical metrics to the same destination address. When the router has discovered two or more paths to a particular destination with equal cost metrics, the router can take advantage of this to forward packets to that destination over the multiple paths equally.

Figure 1.9 Equal cost multipath routing.

In the above situation, the routing table may support multiple entries where the router installs the maximum number of multiple paths allowed per destination address. The routing table will contain the single destination address, but will associate it with multiple exit router interfaces, one interface entry for each equal cost path. The router will then forward packets to that destination address across the multiple exit interfaces listed in the routing table. This feature is known as equal-cost multipath (ECMP) and can be employed in a router to provide load balancing or sharing across the multiple paths.

A router or switch/router may also support an important feature called virtual routing and forwarding (VRF), which is a technology that allows the router or switch/router to support concurrently multiple independent virtual routing and forwarding table instances (Figure 1.10). VRF is a feature that can be used to create logical segmentation between different networks on the same routing platform. The routing instances are independent, thereby allowing VRF to use overlapping IP addresses even on a single interface (i.e., using subinterfaces) without conflicting with each other.

Figure 1.10 Virtual routing and forwarding (VRF).

VRF allows, for example, a network path between two devices to be segmented into several independent logical paths without having to use multiple devices for each path. With VRF, the traffic paths through the routing platform are isolated, leading to increased network security, which can even eliminate the need for encryption and authentication for network traffic.

A service provider may use VRF to create separate virtual private networks (VPNs) on a single platform for its customers. For this reason, VRF is sometimes referred to as VPN routing and forwarding. Similar to VLAN-based networking where IEEE 802.1Q trunks can be used to extend a VLAN between switching domains, VRF-based networking can use IEEE 802.1Q trunks, Multiprotocol Label Switching (MPLS) tags, or Generic Routing Encapsulation (GRE) tunnels to extend and connect a path of VRFs together.

While VRF has some similarities to a logical router, which may support many routing tables, a (single) VRF instance supports only one routing table. Furthermore, VRF uses a forwarding table that specifies the next hop node for each packet forwarded, a list of nodes along a path that may forward the packet, and routing protocols and a set of forwarding rules that specify how to forward the packet. These requirements isolate traffic and prevent packets in a specific VRF from being forwarded outside its VRF path, and also prevent traffic from outside (the VRF path) from entering the specific VRF path.

The routing engine also maintains an adjacency table, which typically in its simplest form may be an ARP cache. The adjacency table (also known as Adjacency Information Base (AIB)) contains the MAC addresses and egress interfaces of all directly connected next hops and directly connected destinations (Figure 1.11). This table is populated with MAC address discoveries obtained from ARP, statically configured MAC addresses, and other Layer 2 protocol tables (e.g., Frame Relay and ATM map tables). The network administrator can explicitly configure MAC address information in the adjacency table, for example, for directly attached data servers.

Figure 1.11 Layer 3 forwarding table and adjacency table.

The adjacency table is built from information obtained from ARP that is used by IP hosts to dynamically learn the MAC address of other IP hosts on the same Layer 2 broadcast domain (VLAN or subnet) when the target host's IP address is known. For example, an IP host that needs to know the MAC address of another IP host connected to same VLAN can send an ARP request using a broadcast address. The sending host then waits for an ARP reply from the target IP host. When received, the ARP reply includes the required MAC address and the associated IP address. This MAC address can then be used to address Ethernet frames (destination MAC address) originating from the sending IP host to the target IP host on the same VLAN.

1.1.3 Forwarding Engine

The data or forwarding plane operations (i.e., the actual forwarding of data) in the router or switch/router are performed by the forwarding engine, which can consist of software and/or hardware (ASICs) processing elements. The Layer 3 forwarding engine performs route lookup for each arriving IP packet using a Layer 3 forwarding table. In some implementations, the adjacency table is not a separate module but integrated in the Layer 3 forwarding table to allow for one lookup for all next hop forwarding information. The forwarding engine performs filtering and forwarding of incoming packets, directing outbound packets to the appropriate egress interface or interfaces (for multicast traffic) for transmission to the external network.

As already discussed, routers and switch/routers determine best paths to network destinations by sharing information about the network topology and conditions with neighboring routers. The router or switch/router communicates with their neighboring Layer 3 peers to build a comprehensive routing database (the routing table) that enables the forwarding engine to forward packets across optimum paths through the network. The information in the routing table (which is very extensive) is distilled into the much smaller Layer 3 forwarding table that is optimized for IP data plane operations.

Not all the information in the routing table is directly used or is relevant to data plane operations. The Layer 3 forwarding table (also called Forwarding Information Base (FIB)) maintains a mirror image of all the most relevant forwarding information contained in the routing table (next hop IP address, egress port(s), next hop MAC address (adjacency information)). When routing or topology changes occur in the network, the IP routing table is updated, and those changes are reflected in the forwarding table.

The forwarding engine processes packets to obtain next hop information, applies quality of service (QoS) and security filters, and implements traffic management policies (policing, packet discard, scheduling, etc.), routing policies (e.g., ECMP, VRFs), and other functions required to forward packets to the next hop along the route to their final destinations. In the switch/router, separate forwarding engines could provide the Layer 2 and Layer 3 packet forwarding functions. The forwarding engine may need to implement some special data plane operations that affect packet forwarding such as QoS and access control lists (ACLs).

Each forwarding engine can consist of the following components:

Software and/or hardware processing module or engine, which provides the route (best path) lookup function in a forwarding table.

Switch fabric interface modules, which use the results of the forwarding table lookup to guide and manage the transfer of packet data units across the switch fabric to the outbound interface(s). The switch interface module will be responsible for prepending internal routing tags to processed packets. The internal routing tag would typically carry information about the destination port, priority queuing, packet address rewrite, packet priority rewrite, and so on.

Layer 2/Layer 3 processing modules, which perform Layer 2 and Layer 3 packet decapsulation and encapsulation and manage the segmentation and reassembly of packets within the router or switch/router.

Queuing and buffer memory processing modules, which manage the buffering of (possibly, segmented) data units in the memory as well as any priority queuing requirements.

As discussed above, the forwarding table is constructed from the routing table and the ARP cache maintained by the routing engine. When an IP packet is received by the forwarding engine, a lookup is performed in the forwarding table (and adjacency table) for the next hop destination address and the appropriate outbound port, and the packet is sent to the outbound port. The Layer 3 forwarding information and ARP information can be implemented logically as one table or maintained as separate tables that can be jointly consulted when forwarding packets.

The router also decrements the IP TTL field and recomputes the IP header checksum. The router rewrites the destination MAC address of the packet with the next hop router's MAC address, and also rewrites the source MAC address with the MAC address of the outgoing Layer 3 interface. The router then recomputes the Ethernet frame checksum and finally delivers the packet out the outbound on its way to the next hop.

1.2 Evolution of Multilayer Switch Architectures

The network devices that drive service provider networks, enterprise networks, residential networks, and the Internet have evolved architecturally and considerably over the years and are still evolving to keep up with new service requirements and user traffic. The continuous demand for more network bandwidth in addition to the introduction of new generation of services and applications are placing tremendous performance demands on networks.

Streaming and real-time audio and video, videoconferencing, online gaming, real-time business transactions, telecommuting, the increasingly sophisticated home devices and appliances, and the ubiquity of bandwidth hungry mobile devices are some of the many applications and services that are driving the need for scalability, reliability, high bandwidth, and improved quality of services in networks. As a result, network operators including the residential network owners are demanding the following features and requirements from their networks:

The ability to cost-effectively scale their networks with minimal downtime and impact on network operations as traffic grows.

The ability to harness the higher link bandwidths provided by the latest wireless and fiber-optic technologies to allow for transporting large volumes of traffic.

The ability to implement mechanisms for minimizing data loss, data transfer latency, and latency variations (sometimes referred to as network jitter), thus enabling the improved support of delay-sensitive applications. This includes the ability to create differentiated services by prioritizing traffic based on application and user requirements.

The pressures of these demands have created the need for sophisticated new equipment designs for the network from the access to the core. New switch, router, and switch/router designs have emerged and continue to do so to meet the corresponding technical and performance challenges being faced in today's networks. These designs also give operators of enterprise networks and service provider networks the ability to quickly improve and scale their networks to bring new services to market.

The first generation of routers and switch/routers were designed to have the control plane and packet forwarding function share centralized processing resources resulting in poor device and, consequently, network performance as network traffic grow. In these designs, all processing functions (regardless of the offered load) must contend for a centralized, single, and finite pool of processing resources.

To handle the growing network traffic loads and at the same time harness the high-speed wireless and ultrafast fiber-optic interfaces (10, 40, and 100 Gb/s speeds), the typical high-end router and switch/router now support distributed forwarding architectures. These designs provide high forwarding capacities, largely by distributing the packet forwarding functions across modular line cards on the system.

These distributed architectures enable operators to scale their networks capacity even from the platform level, that is, within a single router or switch/router chassis as network traffic loads increase without a corresponding drain on central processing resources. These distributed forwarding architectures avoid the packet forwarding throughput degradation and bottlenecks normally associated with the centralized processor forwarding architectures.

In the next chapter and also in rest of the book, we describe the various bus- and shared-memory-based forwarding architectures, starting from the architectures with centralized forwarding to those with distributed forwarding. The inner workings of these architectures are discussed in addition to their performance limitations.

1.2.1 Centralized Forwarding versus Distributed Forwarding Architectures

From a packet forwarding perspective, we can categorize broadly the switch, router, or switch/router architectures as centralized or distributed. There are architectures that fall in between these two, but focusing on these two here helps to shed light on how the designs have evolved to be what they are today.

In a centralized forwarding architecture, a processor is equipped with a forwarding function (engine) that allows it to make all the packet forwarding decisions in the system. In this architecture, the routing engine and forwarding engine both could be on one processor or the routing engine implemented on a separate centralized processor. In the simplest form of the centralized architecture, typically a single general-purpose CPU manages all control and data plane operations.

In such a centralized forwarding architecture, when a packet is received on an ingress interface or port, it is forwarded to the centralized forwarding engine. The forwarding engine examines the packet's destination address to determine the egress port on which the packet should be sent out. The forwarding engine completes all its processing and forwards the packet to the egress port to be sent to the next hop.

In the centralized architecture, there is not great distinction between a port or a line card – both have very limited packet processing capabilities from a packet forwarding perspective. A line card can have more than one port, where the line card only supports breakouts for the multiple ports on the card. Any processing and memory at a port or line card is only for receive operations from the network and data transfer to the centralized processor, and data transfer from the centralized processor and transmit to the network.

In distributed forwarding architecture, the line cards are equipped with forwarding engines that allow them to make packet forwarding decisions locally without consulting the route processor or another forwarding engine in the system. The route processor is only responsible for generating the master forwarding table that is then distributed to the line cards. It is also responsible for synchronizing the distributed forwarding tables in the line card with the master forwarding table in the route processor whenever changes occur in the master table. The updates are triggered by route and network topology changes that are captured by the routing protocols.

In a distributed forwarding architecture, when a packet is received on an ingress line card, it is sent to the local forwarding engine on the card. The local forwarding engine performs a forwarding table lookup to determine if the outbound interface is local or is on another line card in the system. If the interface is local, it forwards the packet out that local interface. If the outbound interface is located on a different line card, the packet is sent across the switch fabric directly to the egress line card, bypassing the route processor.

As will be discussed in the next chapter, some packets needing special handling (special or exception packets) will still have to be forwarded to the route processor by the line card. By offloading the packet forwarding operations to the line cards, packet forwarding performance is greatly improved in the distributed forwarding architectures.

It is important to note that in the distributed architecture, routing protocols and most other control and management protocols always run on a routing engine that is typically in one centralized (control) processor. Some distributed architectures offload the processing of other control plane protocols such as ARP, BFD, and ICMP to a line card CPU. This allows each line to handle any ARP, BFD, and ICMP messages locally without having to rely on the centralized route processor.

Current practice in distributed router and switch/router design today is to make route processing a centralized function (which also has the added benefit of supporting route processor redundancy, if required (Figure 1.5)). The control plane requires a lot of complex operations and algorithms (routing protocols, control and management protocols, system configuration and management, etc.) and so having a single place where all route processing and routing table information maintenance are done for each platform significantly reduces system complexity. Furthermore, the control plane operations tend to have a system-wide impact and also change very slowly compared to the data plane operations.

Even though the control plane is centralized, there is no need to scale route processing resources in direct proportion to the speed of the line cards being supported or added to the system in order to maintain system throughput. This is because, unlike the forwarding engine (whether centralized or distributed), the route processor performs no functions on a packet-by-packet basis. Rather, it communicates with other Layer 3 network entities and updates routing tables, and its operations can be decoupled from the packet-by-packet forwarding process.

Packet forwarding relies only on using the best-path information precomputed by the route processor. A forwarding engine performs forwarding function by consulting the forwarding table, which is a summary of the main forwarding information in routing table created by all the routing protocols as described in Figure 1.8. Based on destination address information in the IP packet header, the forwarding engine consults the forwarding table to select the appropriate output interface and forwards the packet to that interface or interfaces (for multicast traffic).

2Understanding Shared-Bus and Shared-Memory Switch Fabrics

2.1 Introduction

One of the major components that define the performance and capabilities of a switch, switch/router, router, and almost all network devices is the switch fabric. The switch fabric (both shared or distributed) in a network device influences in a great way the following:

The scalability of the device and the network

The nonblocking characteristics of the network

The throughput offered to end users

The quality of service (QoS) offered to users

The type of buffering employed in the switch fabric and its location also play a major role in the aforementioned issues. A switch fabric, in the sense of a network device, refers to a structure that is used to interconnect multiple components or modules in a system to allow them to exchange/transfer information, sometimes, simultaneously.

Packets are transferred across the switch fabric from input ports to output ports, and sometimes, held in small temporary “queues” within the fabric when contention with other traffic prevents a packet from being delivered immediately to its destination. The switch fabric in a switch/router or router is responsible for transferring packets between the various functional modules (network interface cards, memory blocks, route/control processors, forwarding engines, etc.). In particular, it transports user packets transiting the device from the input modules to the appropriate output modules. Figure 2.1 illustrates the generic architecture of a switch fabric.

Figure 2.1 Generic switch fabric with input/output ports.

There exist many types of standard and user-defined switch fabric architectures, and deciding on what type of architecture to use for a particular network device usually depends on where the device will be deployed in the network and the amount and type of traffic it will be required to carry. In practice, switch fabric implementations are often a combination of basic or standard well-known architectures. Switch fabrics can generally be implemented as

time-division switch fabrics

- shared media

- shared memory

space-division switch fabrics

- crossbar

- multistage constructions

Time-division switch fabrics in turn can be implemented as

shared media

- bus architectures

- ring architectures

shared memory

The switch fabric is one of the most critical components in a high-performance network device and plays an important role in defining very much the switching and forwarding characteristics of the system. Under heavy network traffic load, and depending on the design, the internal switch fabric paths/channels can easily become the bottleneck, thereby limiting the overall throughput of a switch/router or router operating at the access layer or the core (backbone) of a network.

The design of the switch fabric is often complicated by other requirements such as multicasting and broadcasting, scalability, fault tolerance, and preservation of service guarantees for end-user applications (e.g., data loss, latency, and latency variation requirements). To preserve end-user latency requirements, for instance, a switch fabric may use a combination of fabric speed-up and intelligent scheduling mechanisms to guarantee predictable delays to packets sent over the fabric.

Switch/router and router implementations generally employ variations or various combinations of the basic fabric architectures: shared bus, shared memory, distributed output buffered, and crossbar switch. Most of the multistage switch fabric architectures are combinations of these basic architectures.

Switch fabric design is a very well-studied area, especially in the context of asynchronous transfer mode (ATM) switches [AHMA89,TOBA90]. In this chapter, we discuss the most common switch fabrics used in switch/router and router architectures. There are many different methods and trade-offs involved in implementing a switch fabric and its associated queuing mechanisms, and each approach has very different implications for the overall design. This chapter is not intended to be a review of all possible approaches, but presents only examples of the most common methods that are used.

2.2 Switch Fabric Design Fundamentals

The primary function of the shared switch fabric is to transfer data between the various modules in the device. To perform this primary function, the other functions described in Figure 2.2 are required. Switch fabric functions can be broadly separated into control path and data path functionality as shown in Figure 2.2. The control path functions include data path scheduling (e.g., node interconnectivity, memory allocation), control parameter setting for the data path (e.g., class of service, time of service), and flow and congestion control (e.g., flow control signals, backpressure mechanisms, packet discard). The data path functions include input to output data transfer and buffering. Buffering is an essential element for the proper operation of any switch fabric and is needed to absorb traffic when there are any mismatches between the input line rates and the output line service rates.

Figure 2.2 Functions and partitioning of functions in a switch fabric.

In an output buffered switch, packets traversing the switch are stored in output buffers at their destination output ports. The use of multiple separate queues at each output port isolates packet flows to the port queues from each other and reduces packet loss due to contention at the output port when it is oversubscribed. With this, when port oversubscription occurs, the separate queues at the output buffered switch port constrain packet loss to only oversubscribed output queues.

By using separate queues and thereby reducing delays due to contention at the output ports, output buffered switches make it possible to control packet latency through the system, which is an important requirement for supporting QoS in a network device. The shared memory switch is one particular example of output buffered switches.

In an input buffered switch, packets are buffered at input ports as they arrive at the switch. Each input port buffering has a path into the switch fabric that runs at, at least, line speed. The switch fabric may or may not implement a fabric speed-up. Access to the switch fabric may be controlled by a fabric arbiter that resolves contention for access to the fabric itself and also to output ports. This arbiter may be required to schedule packet transfers across the fabric.

When the switch fabric runs at line speed, the memories used for the input buffering only need to run at the maximum port speed. The memory bandwidth in this case is not proportional to the number of input ports, so it is possible to implement scalable switch fabrics that can support a large number of ports with low-cost, lower speed memories.

An important issue that can severely limit the throughput of input buffered switches is head-of-line (HOL) blocking. If simple FIFO (first-in first-out) is used at each input buffer of the input buffered switch, and all input ports are loaded at 100% utilization with uniformly distributed traffic, HOL blocking can reduce the overall switch throughput to 58% of the maximum aggregate input rate [KAROLM87]. Studies have shown that HOL blocking can be eliminated by using per destination port buffering at each input port (called virtual output queues (VoQs)) and appropriate scheduling algorithms. Using specially designed input scheduling algorithms, input buffered switches with VoQs can eliminate HOL blocking entirely and achieve 100% throughput [MCKEOW96].

It is common practice in switch/router and router design to segment variable-length packets into small, fixed-sized chunks or units (cells) for transport across the switch fabric and also before writing into memory. This simplifies buffering and scheduling and makes packet transfers across the device more predictable. However, the main disadvantage to a buffer memory that uses fixed-size units is that memory usage can be inefficient when a packet is not a multiple of the unit size (slightly larger).

The last cell of a packet may not be completely filled with data when the packet is segmented into equal-size cells. For example, if a 64 bytes cell size is used, a packet of 65 bytes will require two cells (first cell of 64 bytes actual data and second cell of 1 byte actual data). This means 128 bytes of memory will be used to store the 65 bytes of actual data, resulting in about 50% efficiency of memory use.

Another disadvantage of using fixed size units is that all cells of a packet in the memory must be appropriately linked so that the cells can be reassembled to form the entire packet before further processing and transmission. The additional storage required for the information linking the cells, and the bandwidth needed to access these data can be a challenge to implement at higher speeds.

We describe below some of the typical design approaches for switch/router and router switch fabrics. Depending on the technology used, a large capacity switch fabric can be either realized with a single large switch fabric to handle the rated capacity or implemented with smaller switch fabrics as a building block. Using building blocks, a large-capacity switch can be realized by connecting a number of such blocks into a network of switch fabrics. Needless to say, endless variations of these designs can be imagined, but the example presented here are the most common fabrics found in switches/routers and routers.

2.3 Types of Blocking in Switch Fabrics

The following are the main types of data blocking in switch fabric:

Internal Blocking:

Internal blocking occurs in the internal paths, channels, or links of a switch fabric.

Output Blocking:

A switch that is internally nonblocking can be blocking at an output of a switch fabric due to conflicting requests to the port.

Head-of-Line Blocking:

HOL blocking can occur at input ports that have strictly FIFO queuing. Buffered packets are served in a FIFO manner. Packets not forwarded due to output conflict are buffered leading to more data transfer delay. A packet at the front of a queue facing blocking prevents the next packet in the queue from being delivered to a noncontending output, resulting in reduced throughput of a switch.

Resolving Internal Blocking in Shared Bus and Shared Memory Architectures:

Internal nonblocking in these architectures can be achieved by using a high-capacity switch fabric with bandwidth equal to or greater than the aggregate capacity of the connected network interfaces.

Resolving Output Blocking in Shared Bus And Shared Memory Architectures:

Switch fabrics that do not support a scheduler for allocating/dedicating timeslots for packets (at the input interfaces) can have output port conflicts, which means output conflict resolution is needed on slot-by-slot basis.

Output conflicts can be resolved by polling each input one at a time (e.g., round-robin scheduling, token circulation). However, this is not scalable when the system has a large number of inputs. Also, outputs without conflicts (just served) have an unfair advantage in receiving more data (getting a new transmission timeslot)

Resolving HOL Blocking in Shared Bus and Shared Memory Architectures:

The system can allow packets behind a HOL blocked packet to contend for outputs.

A practical solution is to implement at each input port multiple buffers called VoQs, one for each output. In this case, if the next packet cannot be transmitted due to HOL blocking, another packet from another VoQ is transmitted.

2.4 Emerging Requirements for High-Performance Switch Fabrics

In the early days of networking, network devices were based on shared bus switch fabric architectures. The shared bus switch fabric served its purpose well for the requirements of switches, switch/routers, routers, and other devices at that time. However, based on the demands placed on the performance of networks today, a new set of requirements has emerged for switches, switch/routers, and routers.


Switch fabrics are required to sustain very high link utilization under bursty and heavy traffic load conditions. Also, with the advent of 1, 10, 40, and 100 Gb/s Ethernet, network devices now demand correspondingly higher switch fabric bandwidth.

Wire-Speed Performance:

The switch fabrics are required to deliver true wire-speed performance on any one of their attached interfaces. For high-performance switch fabrics, the design constraints are, typically, chosen to ensure the fabric sustains wire-speed performance even under worst case network and traffic conditions. The switch fabric has to deliver full wire-speed performance even when subjected to the minimum expected packet size (without any typical packet size assumption). Also, the performance of the switch fabric has to be independent of input and output port configuration and assignments (no assumptions about the traffic locality on the switch fabric).


Switch fabrics are required to support an architecture that scales up in capacity and number of ports. As the amount of traffic in the network increases, the switch fabric must be able to scale up accordingly. The ability to accommodate more slots in a single chassis contributes to overall network scalability.


Switch/routers and routers are now required to have a modular architecture with flexibility to allow users to add or mix and match the number/type of line modules, as needed.

Quality of Service:

Users now depend on networks to handle different traffic types with different QoS requirements. Thus, switch fabrics will be required to provide multiple priority queuing levels to support different traffic types.


More applications are emerging that utilize multicast transport. These applications include distribution of news, financial data, software, video, audio, and multiperson conferencing. Therefore, the percentage of multicast traffic traversing the switch fabric is increasing over time. The switch fabric is required to support efficient multicasting capabilities which, in some designs, might include hardware replication of packets.

High Availability:

Multigigabit and terabit switch/routers and routers are being deployed in the core of enterprise networks and the Internet. Traffic from thousands of individual users pass through the switch fabric at any given time. Thus, the robustness and overall availability of the switch fabric becomes a critical important design factor. The switch fabric must enable reliable and fault-tolerant solutions suitable for enterprise and carrier class applications.

Product Diversity:

Vendors now support a family of products at various price/performance points. Vendors continuously seek to deliver switch/routers and routers with differing levels of functionality, performance, and price. To control expenses while migrating networks to the service-enabled Internet, it is important that service providers have an assortment of products supporting distributed architectures and high-speed interfaces. This breadth of choice gives service providers the flexibility to install the equipment with the mix of network connection types, port capacity and density, footprint, and corresponding cost that best matches the needs of each service provider site. The switch fabric plays an important role here.

Low Power Consumption and Smaller Rack Space: