Resilient MPLS RingsJuniper Networks, Inc.1133 Innovation WaySunnyvaleCA94089USAkireeti.kompella@gmail.comTelefonicaRonda de la ComunicacionSur-3 building, 3rd floorMadrid28050Spainluismiguel.contrerasmurillo@telefonica.comhttp://people.tid.es/LuisM.Contreras/
Routing
MPLS WGMPLSringtransport
This document describes the use of the MPLS control and data
planes on ring topologies. It describes the special nature of
rings, and proceeds to show how MPLS can be effectively used in
such topologies. It describes how MPLS rings are configured,
auto-discovered and signaled, as well as how the data plane
works. Companion documents describe the details of discovery
and signaling for specific protocols.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described
in .
Rings are a very common topology in transport networks. A ring
is the simplest topology offering link and node resilience.
Rings are nearly ubiquitous in access and aggregation networks.
As MPLS increases its presence in such networks, and takes on a
greater role in transport, it is imperative that MPLS handles
rings well; this is not the case today.
This document describes the special nature of rings, and the
special needs of MPLS on rings. It then shows how these needs
can be met in several ways, some of which involve extensions to
protocols such as IS-IS , OSPF, RSVP-TE and LDP
.
The intent of this document is to handle rings that "occur
naturally". Many access and aggregation networks in metros have
their start as a simple ring. They may then grow into more
complex topologies, for example, by adding parallel links to the
ring, or by adding "express" links. The goal here is to discover
these rings (with some guidance), and run MPLS over them
efficiently. The intent is not to construct rings in a mesh
network, and use those for protection.
A (directed) graph G = (V, E) consists of a set of vertices
(or nodes) V and a set of edges (or links) E. An edge is an
ordered pair of nodes (a, b), where a and b are in V. (In
this document, the terms node and link will be used instead of
vertex and edge.)
A ring is a subgraph of G. A ring consists of a subset of n
nodes {R_i, 0 ≤ i < n} of V. The directed edges {(R_i,
R_i+1) and (R_i+1, R_i), 0 ≤ i < n-1} must be a subset
of E (note that index arithmetic is done modulo n). We define
the direction from node R_i to R_i+1 as "clockwise" (CW) and
the reverse direction as "anticlockwise" (AC). As there may
be several rings in a graph, we number each ring with a
distinct ring ID RID.
The following terminology is used for ring LSPs:
A non-zero number that identifies a ring; this is unique
in some scope of a Service Provider's network. A node may
belong to multiple rings.
A member of a ring. Note that a device may belong to
several rings.
A logical numbering of nodes in a ring, from zero upto one
less than the ring size. Used purely for exposition in
this document.
The ring master initiates the ring identification process.
Mastership is indicated in the IGP by a two-bit field.
Nodes whose indices differ by one (modulo ring size).
Links that connnect ring neighbors.
Links that connnect non-neighboring ring nodes.
A two-bit field in the IGP indicating the direction of a
link. The choices are:
undefined link clockwise ring link anticlockwise ring link express link
The process of discovering ring nodes, ring links, link
directions, and express links.
The following notation is used for ring LSPs:
A ring node with index k. R_k has AC neighbor R_(k-1) and
CW neighbor R_(k+1).
A (unicast) Ring LSP anchored on node R_k.
A label allocated by R_j for RL_k in the CW direction.
A label allocated by R_j for RL_k in the AC direction.
A Path (Resv) message sent by R_j for RL_k.
A ring is the simplest topology that offers resilience. This is
perhaps the main reason to lay out fiber in a ring. Thus,
effective mechanisms for fast failover on rings are needed.
Furthermore, there are large numbers of rings. Thus,
configuration of rings needs to be as simple as possible.
Finally, bandwidth management on access rings is very important,
as bandwidth is generally quite constrained here.
The goals of this document are to present mechanisms for
improved MPLS-based resilience in ring networks (using ideas
that are reminiscent of Bidirectional Line Switched Rings), for
automatic bring-up of LSPs, better bandwidth management and for
auto-hierarchy. These goals can be achieved using extensions to
existing IGP and MPLS signaling protocols, using central
provisioning, or in other ways.
Say a ring has ring ID RID. The ring is provisioned by choosing
one or more ring masters for the ring and assigning them the
RID. Other nodes in the ring may also be assigned this RID, or
may be configured as "promiscuous". Ring discovery then kicks
in. When each ring node knows its CW and AC ring neighbors and
its ring links, and all express links have been identified, ring
identification is complete.
Once ring identification is complete, each node signals one or
more ring LSPs RL_i. RL_i, anchored on node R_i, consists of
two counter-rotating unicast LSPs that start and end at R_i. A
ring LSP is "multipoint": any node R_j can use RL_i to send
traffic to R_i; this can be in either the CW or AC directions,
or both (i.e., load balanced). Both of these counter-rotating
LSPs are "active"; the choice of direction to send traffic to
R_i is determined by policy at the node where traffic is
injected into the ring. The default is to send traffic along
the shortest path. Bidirectional connectivity between nodes R_i
and R_j is achieved by using two different ring LSPs: R_i uses
RL_j to reach R_j, and R_j uses RL_i to reach R_i.
The goal here is to provision rings with the absolute minimum
configuration. The exposition below aims to achieve that
using auto-discovery via a link-state IGP (see ). Of course, auto-discovery can be
overriden by configuration. For example, a link that would
otherwise be classified by auto-discovery as a ring link might
be configured not to be used for ring LSPs.
Ring nodes have a loopback address, and run a link-state IGP
and an MPLS signaling protocol. To provision a node as a ring
node for ring RID, the node is simply assigned that RID. A
node may be part of several rings, and thus may be assigned
several ring IDs.
To simplify ring provisioning even further, a node N may be
made "promiscuous" by being assigned an RID of 0. A
promiscuous node listens to RIDs in its IGP neighbors'
link-state updates. For every non-zero RID N hears from a
neighbor, N joins the corresponding ring by taking on that
RID. In many situations, the use of promiscuous mode means
that only one or two nodes in a ring needs to be provisioned;
everything else is auto-discovered.
A ring node indicates in its IGP updates the ring LSP
signaling protocols it supports. This can be LDP and/or
RSVP-TE. Ideally, each node should support both.
Ring links must be MPLS-capable. They are by default
unnumbered, point-to-point (from the IGP point of view) and
"auto-bundled". The last attribute means that parallel links
between ring neighbors are considered as a single link,
without the need for explicit configuration for bundling (such
as a Link Aggregation Group). Note that each component may be
advertised separately in the IGP; however, signaling messages
and labels across one component link apply to all components.
Parallel links between a pair of ring nodes is often the
result of having multiple lambdas or fibers between those
nodes. RMR is primarily intended for operation at the packet
layer; however, parallel links at the lambda or fiber layer
result in parallel links at the packet layer.
A ring link is not provisioned as belonging to the ring; it is
discovered to belong to ring RID if both its adjacent nodes
belong to RID. A ring link's direction (CW or AC) is also
discovered; this process is initiated by the ring's ring
master. Note that the above two attributes can be overridden
by provisioning if needed; it is then up to the provisioning
system to maintain consistency across the ring.
Express links are discovered once ring nodes, ring links and
directions have been established. As defined earlier,
express links are links joining non-neighboring ring nodes;
often, this may be the result of optically bypassing ring
nodes. The use of express links will be described in a
future version of this document.
Ring LSPs are not provisioned. Once a ring node R_i knows its
RID, its ring links and directions, it kicks off ring LSP
signaling automatically. R_i allocates CW and AC labels for
each ring LSP RL_k. R_i also initiates the creation of RL_i.
As the signaling propagates around the ring, CW and AC labels
are exchanged. When R_i receives CW and AC labels for RL_k
from its ring neighbors, primary and fast reroute (FRR) paths
for RL_k are installed at R_i. More details are given in
.
For RSVP-TE LSPs, bandwidths may be signaled in both
directions. However, these are not provisioned either;
rather, one does "reverse call admission control". When a
service needs to use an LSP, the ring node where the traffic
enters the ring attempts to increase the bandwidth on the LSP
to the egress. If successful, the service is admitted to the
ring.
In setting up RL_k, a node R_j sends out two labels: CL_jk to
R_j-1 and AL_jk to R_j+1. R_j also receives two labels:
CL_j+1,k from R_j+1, and AL_j-1,k from R_j-1. R_j can now set
up the forwarding entries for RL_k. In the CW direction, R_j
swaps incoming label CL_jk with CL_j+1,k with next hop R_j+1;
these allow R_j to act as LSR for RL_k. R_j also installs an
LFIB entry to push CL_j+1,k with next hop R_j+1 to act as
ingress for RL_k. Similarly, in the AC direction, R_j swaps
incoming label AL_jk with AL_j-1,k with next hop R_j-1 (as
LSR), and an entry to push AL_j-1,k with next hop R_j-1 (as
ingress).
Clearly, R_k does not act as ingress for its own LSPs.
However, R_k can send OAM messages, for example, an MPLS ping
or traceroute (),
using labels CL_k,k+1 and AL_k-1,k, to test the entire ring
LSP anchored at R_k in both directions. Furthermore, if these
LSPs use UHP, then R_k installs LFIB entries to pop CL_k,k for
packets received from R_k-1 and to pop AL_k,k for packets
received from R_k+1.
At the same time that R_j sets up its primary CW and AC LFIB
entries, it can also set up the protection forwarding entries
for RL_k. In the CW direction, R_j sets up an FRR LFIB entry
to swap incoming label CL_jk with AL_j-1,k with next hop
R_j-1. In the AC direction, R_j sets up an FRR LFIB entry to
swap incoming label AL_jk with CL_j+1,k with next hop R_j+1.
Again, R_k does not install FRR LFIB entries in this manner.
In this scheme, there are no protection LSPs as such -- no
node or link bypass LSPs, no standby LSPs, no detours, and no
LFA-type protection. Protection is via the "other" direction
around the ring, which is why ring LSPs are in
counter-rotating pairs. Protection works in the same way for
link, node and ring LSP failures.
If a node R_j detects a failure from R_j+1 -- either all links
to R_j+1 fail, or R_j+1 itself fails, R_j switches traffic on
all CW ring LSPs to the AC direction using the FRR LFIB
entries. If the failure is specific to a single ring LSP, R_j
switches traffic just for that LSP. In either case, this
switchover can be very fast, as the FRR LFIB entries can be
preprogrammed. Fast detection and fast switchover lead to
minimal traffic loss.
R_j then sends an indication to R_j-1 that the CW direction is
not working, so that R_j-1 can similarly switch traffic to the
AC direction. For RSVP-TE, this indication can be a PathErr
or a Notify; other signaling protocols have similar
indications. These indications propagate AC until each
traffic source on the ring AC of the failure uses the AC
direction. Thus, within a short period, traffic will be
flowing in the optimal path, given that there is a failure on
the ring. This contrasts with (say) bypass protection, where
until the ingress recomputes a new path, traffic will be
suboptimal.
Note that the failure of a node or a link will not necessarily
affect all ring LSPs. Thus, it is important to identify the
affected LSPs (and switch them), but to leave the rest alone.
One point to note is that when a ring node, say R_j, fails,
RL_j is clearly unusable. However, the above protection
scheme will cause a traffic loop: R_j-1 detects a failure CW,
and protects by sending CW traffic on RL_j back all the way to
R_j+1, which in turn sends traffic to R_j-1, etc. There are
three proposals to avoid this:
Each ring node acting as ingress sends traffic with a TTL
of at most 2*n, where n is the number of nodes in the ring.
A ring node sends protected traffic (i.e., traffic
switched from CW to AC or vice versa) with TTL just
large enough to reach the egress.
A ring node sends protected traffic with a special purpose
label below the ring LSP label. A protecting node first
checks for the presence of this label; if present, it
means that the traffic is looping and MUST be dropped.
It is recommended that (2) be implemented. The other methods
are optional.
Auto-discovery proceeds in three phases. The first phase is
the announcement phase. The second phase is the mastership
phase. The third phase is the ring identification phase.
The format of an RMR Node Type-Length-Value (TLV) is given
below. It consists of information pertaining to the node and
optionally, sub-TLVs. A Neighbor sub-TLV contains information
pertaining to the node's neighbors. Other sub-TLVs may be
defined in the future. Details of the format specific to IS-IS
and OSPF will be given in the corresponding IGP documents.
Each node participating in an MPLS ring is assigned an RID; in
the example, RID = 17. A node is also provisioned with a
mastership value. Each node advertises a ring node TLV for
each ring it is participating in, along with the associated
flags. It then starts timer T1.
A node in promiscuous mode doesn't advertise any ring node TLVs.
However, when it hears a ring node TLV from an IGP neighbor, it
joins that ring, and sends its own ring node TLV with that RID.
The announcement phase allows a ring node to discover other
ring nodes in the same ring so that a ring master can be
elected.
When timer T1 fires, a node enters the mastership phase. In
this phase, each ring node N starts timer T2 and checks if it
is master. If it is the node with the lowest loopback address
of all nodes with the highest mastership values, N declares
itself master by readvertising its ring node TLV with the M
bit set.
When timer T2 fires, each node examines the ring node TLVs
from all other nodes in the ring to identify the ring master.
There should be exaclty one; if not, each node restarts timer
T2 and tries again. The nodes that set their M bit should be
extra careful in advertising their M bit in subsequent tries.
When there is exactly one ring master M, M enters the Ring
Identification Phase. M indicates that it has successfully
completed this phase by advertising ring link TLVs. This is
the trigger for M's CW neighbor to enter the Ring
Identification Phase. This phase passes CW until all ring
nodes have completed ring identification.
In the Ring Identification Phase, a node X that has two or
more IGP neighbors that belong to the ring picks one of them
to be its CW ring neighbor. If X is the ring master, it also
picks a node as its AC ring neighbor. If there are exactly
two such nodes, this step is trivial. If not, X computes a
ring that includes all nodes that have completed the Ring
Identification Phase (as seen by their ring link TLVs) and
further contains the maximal number of nodes that belong to
the ring. Based on that, X picks a CW neighbor and inserts
ring link TLVs with ring direction CW for each link to its CW
neighbor; X also inserts a ring link TLV with direction AC for
each link to its AC neighbor. Then, X determines its express
links. These are links connected to ring nodes that are not
ring neighbors. X advertises ring link TLVs for express links
by setting the link direction to "express link".
The main changes to a ring are:
ring link addition; ring link deletion; ring node addition; and ring node deletion.
The main goal of handling ring changes is (as much as
possible) not to perturb existing ring operation. Thus, if
the ring master hasn't changed, all of the above changes
should be local to the point of change. Link adds just
update the IGP; signaling should take advantage of the new
capacity as soon as it learns. Link deletions in the case
of parallel links also show up as a change in capacity (until
the last link in the bundle is removed.)
The removal of the last ring link between two nodes, or the
removal of a ring node is an event that triggers protection
switching. In a simple ring, the result is a broken ring.
However, if a ring has express links, then it may be able to
converge to a smaller ring with protection. Details of this
process will be given in a future version.
The addition of a new ring node can also be handled
incrementally. Again, the details of this process will be
given in a futre version.
A future version of this document will specify
protocol-independent details about ring LSP signaling.
Each ring node should advertise in its ring node TLV the OAM
protocols it supports. Each ring node is expected to run a
link-level OAM over each ring link. This should be an OAM
protocol that both neighbors agree on. The default hello
time is 3.3 millisecond.
Each ring node also sends OAM messages over each direction of
its ring LSP. This is a multi-hop OAM to check LSP liveness;
typically, BFD would be used for this. The node chooses the
hello interval; the default is once a second.
In some cases, a ring H may be incomplete, either because H is
permanently missing a link (not just because of a failure), or
because the link required to complete H is in a different IGP
area. Either way, the ring discovery algorithm will fail. We
call such a ring a "half-ring". Half-rings are sufficiently
common that finding a way to deal with them effectively is a
useful problem to solve.
Let's call the node(s) that connect a ring to the rest of the
network "hub node(s)" (usually, there are a pair of hub
nodes.) Suppose a ring has two hub nodes H1 and H2. Suppose
further that a non-hub ring node X wants to send traffic to
some node Z outside the ring. This could be done, say, by
having targeted LDP (T-LDP) sessions from H1 and H2 to X
advertising LDP reachability to Z via H1 (H2); there would be
a two-label stack from X to reach Z. Say that to reach Z, X
prefers H1; thus, traffic from X to Z will first go to H1 via
a ring LSP, then to Z via LDP.
If H1 fails, traffic from X to Z will drop until the T-LDP
session from H1 to Z fails, the IGP reconverges, and H2's
label to Z is chosen. Thereafter, traffic will go from X to
H2 via a ring LSP, then to Z via LDP. However, this
convergence could take a long time. Since this is a very
common and important situation, it is again a useful problem
to solve.
It is not anticipated that either the notion of MPLS rings or
the extensions to various protocols to support them will cause
new security loopholes. As this document is updated, this
section will also be updated.
Many thanks to Pierre Bichon whose exemplar of self-organizing
networks and whose urging for ever simpler provisioning led to
the notion of promiscuous nodes.
There are no requests as yet to IANA for this document.