`http://ocw.mit.edu
`
`6.033 Computer System Engineering
`Spring 2009
`
`For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
`
`1
`
`
`
`Massachusetts Institute of Technology
`
`Department of Electrical Engineering and Computer Science
`
`
`6.033, Spring 2009
`
`Wide-Area Internet Routing
`
`January 2009
`
`Abstract
`
`Our goal is to explain how routing between different administrative domains works in the In
`ternet. We discuss how Internet Service Providers (ISPs) exchange routing information, packets,
`and (above all) money between each other, and how the way in which they buy service from and
`sell service to each other and their customers influences routing. We discuss the salient features
`of the Border Gateway Protocol, Version 4 (BGP4, which we will refer to simply as BGP), the
`current interdomain routing protocol in the Internet. Finally, we discuss a few interesting fail
`ures and shortcomings of the routing system. These notes focus only on the essential elements
`of interdomain routing, often sacrificing detail for clarity and sweeping generality.
`
`1 Autonomous Systems
`
`Network attachment points for Internet hosts are identified using IP addresses, which are 32-bit (in
`IPv4) numbers. The fundamental reason for the scalability of the Internet’s network layer is its use of
`topological addressing. Unlike an Ethernet address that is location-independent and always identifies
`a host network interface independent of where it is connected in the network, an IP address of a
`connected network interface depends on its location in the network topology. Topological addressing
`allows routes to Internet destinations to be aggregated in the forwarding tables of the routers (as a
`simple example of aggregation, any IP address of the form 18.* in the Internet is in MIT’s network,
`so external routers need only maintain a routing entry for 18.* to correctly forward packets to
`MIT’s network), and allows routes to be summarized and exchanged by the routers participating in
`the Internet’s routing protocols. In the absence of aggressive aggregation, there would be no hope
`for scaling the routing system to hundreds of millions of hosts connected over tens of millions of
`networks located in tens of thousands of ISPs and organizations.
`
`Routers use address prefixes to identify a contiguous range of IP addresses in their routing messages.
`For example, an address prefix of the form 18.31.* identifies the IP address range 18.31.0.0 to
`18.31.255.255, a range whose size is 216. Each routing message involves a router telling a neighboring
`router information of the form “To get to address prefix P , you can use the link on which you
`heard me tell you this,”1 together with information about the route (the information depends on
`the routing protocol and could include the number of hops, cost of the route, other ISPs on the
`path, etc.).
`
`An abstract, highly idealized view of the Internet is shown in Figure 1, where end-hosts hook up
`to routers, which hook up with other routers to form a nice connected graph of essentially “peer”
`routers that cooperate nicely using routing protocols that exchange “shortest-path” or similar
`information and provide global connectivity. The same view posits that the graph induced by the
`routers and their links has a large amount of redundancy and the Internet’s routing algorithms are
`
`1It turns out that this simple description is not always true, notably in the iBGP scheme discussed later, but it’s
`true most of the time.
`
`c� 2001-2009 Hari Balakrishnan
`
`2
`
`
`
`designed to rapidly detect faults and problems in the routing substrate and route around them.
`In addition to routing around failures, clever routing protocols performing load-sensitive routing
`dynamically shed load away from congested paths on to less-loaded paths.
`
`Unfortunately, while simple, this abstraction is actually quite misleading as far as the wide-area
`Internet is concerned. The real story of the Internet routing infrastructure is that the Internet
`service is provided by a large number of commercial enterprises, generally in competition with
`each other. Cooperation, required for global connectivity, is generally at odds with the need to be
`a profitable commercial enterprise, which often occurs at the expense of one’s competitors—the
`same people with whom one needs to cooperate. How this “competitive cooperation” is achieved
`in practice (although there’s lots of room for improvement), and how we might improve things,
`provides an interesting study of how good technical research can be shaped and challenged by
`commercial realities.
`
`Figure 2 depicts a more realistic view of what the Internet infrastructure looks like. Here, Internet
`Service Providers (ISPs) cooperate to provide global connectivity to their respective customer
`networks. An important point about ISPs is that they aren’t all equal; they come in a variety of
`sizes and internal structure. Some are bigger and more “connected” than others, and others have
`global reachability in their routing tables. A simplified, but useful, model is that they come in three
`varieties, “small,” “large,” and “really huge”, and these colloquial descriptions have names given to
`them. Tier-3 ISPs have a small number of usually localized (in geography) end-customers, Tier-2
`ISPs generally have regional scope (e.g., state-wide, region-wide, or non-US country-wide), while
`Tier-1 ISPs, of which there are a handful (nine in early 2008), have global scope in the sense that
`their routing tables actually have explicit routes to all currently reachable Internet prefixes (i.e.,
`they have no default routes). Figure shows this organization.
`
`The previous paragraph used the term “route”, which we define as a mapping from an IP prefix
`(a range of addresses) to a link, with the property that packets for any destination within that
`prefix may be sent along the corresponding link. A router adds such entries to its routing table
`after selecting the best way to reach each prefix from among route advertisements sent by its
`neighboring routers.
`
`Routers exchange route advertisements with each other using a routing protocol. The current wide
`area routing protocol in the Internet, which operates between routers at the boundary between
`ISPs, is BGP (Border Gateway Protocol, Version 4) [11, 12]. More precisely, the wide-area routing
`architecture is divided into autonomous systems (ASes) that exchange reachability information. An
`AS is owned and administered by a single commercial entity, and implements some set of policies
`in deciding how to route its packets to the rest of the Internet, and how to export its routes (its
`own, those of its customers, and other routes it may have learned from other ASes) to other ASes.
`Each AS is identified by a unique 16-bit number.
`
`A different routing protocol operates within each AS. These routing protocols are called Interior
`Gateway Protocols (IGPs), and include protocols like the Routing Information Protocol (RIP) [6],
`Open Shortest Paths First (OSPF) [8], Intermediate System-Intermediate System (IS-IS) [9], and
`E-IGRP. In contrast, BGP is an interdomain routing protocol. Operationally, a key difference
`between BGP and IGPs is that the former is concerned with exchanging reachability information
`between ASes in a scalable manner while allowing each AS to implement autonomous routing
`policies, whereas the latter are typically concerned with optimizing a path metric. In general, IGPs
`don’t scale as well as BGP does with respect to the number of participants involved.
`
`The rest of this paper is in two parts: first, we will look at inter-AS relationships (transit and
`peering); then, we will study some salient features of BGP. We won’t talk about IGPs like RIP and
`
`c
`� 2001-2009 Hari Balakrishnan
`
`3
`
`
`
`E
`
`A
`
`End-hosts
`
`“Internet”
`
`C
`
`Routers connected
`in a fault-tolerant
`structure
`
`B
`
`D
`
`Courtesy of Hari Balakrishnan. Used with permission.
`
`Figure 1: A rather misleading view of the Internet routing system.
`
`OSPF; to learn more about them, read a standard networking textbook (e.g., Peterson & Davie or
`Kurose & Ross).
`
`2
`
`Inter-AS Relationships: Transit and Peering
`
`The Internet is composed of many different types of ASes, from universities to corporations to
`regional ISPs to nation-wide ISPs. Smaller ASes (e.g., , universities, corporations, etc.) typically
`purchase Internet connectivity from ISPs. Smaller regional ISPs, in turn, purchase connectivity
`from larger ISPs with large “backbone” networks.
`
`Consider the picture shown in Figure 3. It shows an ISP, X, directly connected to a provider (from
`whom it buys Internet service) and a few customers (to whom it sells Internet service). In addition,
`the figure shows two other ISPs to whom it is directly connected, with whom X exchanges routing
`information via BGP.
`
`The different types of ASes lead to different types of business relationships between them, which
`in turn translate to different policies for exchanging and selecting routes. There are two prevalent
`forms of AS-AS interconnection. The first form is provider-customer transit (aka “transit”), wherein
`one ISP (the “provider” P in Figure 3) provides access to all (or most) destinations in its routing
`tables. Transit almost always is meaningful in an inter-AS relationship where financial settlement is
`involved; the provider charges its customers for Internet access, in return for forwarding packets on
`behalf of customers to destinations (and in the opposite direction in many cases). Another example
`of a transit relationship in Figure 3 is between X and its customers (the Cis).
`
`The second prevalent form of inter-AS interconnection is called peering. Here, two ASes (typically
`ISPs) provide mutual access to a subset of each other’s routing tables. The subset of interest here
`is their own transit customers (and the ISPs own internal addresses). Like transit, peering is a
`business deal, but it may not involve financial settlement. While paid peering is common in some
`parts of the world, in many cases they are reciprocal agreements. As long as the traffic ratio between
`the concerned ASs is not highly asymmetric (e.g., 4:1 is a commonly believed and quoted ratio),
`
`c� 2001-2009 Hari Balakrishnan
`
`4
`
`
`
`End-hosts (ISP customers)
`
`Tier-3 ISP
`(“Local”)
`Customer
`Provider
`
`Tier-2 ISP
`
`Provider
`
`Tier-2 ISP
`(“Regional or
`country-wide)
`
`Customer
`
`Tier-1 ISP
`(“Default-free”;
`Has global reachability info)
`
`(Another) Tier-1 ISP
`
`Tier-2 ISP
`
`Figure 2: A more accurate picture of the wide-area Internet routing system, with various types of
`ISPs defined by their respective reach. Tier-1 ISPs have “default-free” routing tables (i.e., they
`don’t have any default routes), and have global reachability information. There are a handful of
`these today (nine in early 2008).
`
`there’s usually no financial settlement. Peering deals are almost always under non-disclosure and
`are confidential.
`
`2.1 Peering v. Transit
`
`A key point to note about peering relationships is that they are often between business competitors.
`There are two main reasons for peering relationships:
`
`1. Tier-1 peering: An Internet that had only provider-customer transit relationships would
`require a single (large) Tier-1 at the root, because cycles in the directed graph of ASes don’t
`make sense (a cycle would require that the money paid by each AS to its provider would flow
`from an AS back to itself). That single Tier-1 would in effect be a monopoly because it would
`control all Internet traffic (before the commercialization of the Internet, the NSFNET was in
`fact such a backbone). Commercial competition has led to the creation of a handful of large
`Tier-1 ISPs (nine today, up from five a few years ago). In effect, these ISPs act as a sort of
`cartel because an ISP has to be of a certain size and control a reasonable chunk of traffic to
`be able to enter into a peering relationship with any other Tier-1 provider. Peering between
`Tier-1 ISPs ensures that they have explicit default-free routes to all Internet destinations
`(prefixes).
`
`2. Saving money: Peering isn’t restricted to large Tier-1’s, though. If a non-trivial fraction of
`the packets emanating from a an AS (regardless of its size) or its customers is destined for
`another AS or its customers, then each AS has an incentive to avoid paying transit costs to
`its respective providers. Of course, the best thing for each AS to do would be to wean away
`
`c� 2001-2009 Hari Balakrishnan
`
`5
`
`
`
`C’P
`
`Transit ($$)
`
`ISP
`P
`
`Transit ($$$)
`
`Peering
`
`Transit ($$$)
`
`Peering
`
`Z
`
`Transit ($)
`
`ISP
`X
`
`Peering
`
`Y
`
`Transit ($$)
`
`Transit ($$)
`
`Z’s customers
`
`C1
`
`C3
`
`C2
`
`Y’s customers
`
`Figure 3: Common inter-AS relationships: transit and peering.
`
`the other’s customers, but that may be hard to achieve. The next best thing, which would be
`in their mutual interest, would be to avoid paying transit costs to their respective providers,
`but instead set up a transit-free link between each other to forward packets for their direct
`customers. In addition, this approach has the advantage that this more direct path would
`lead to better end-to-end performance (in terms of latency, packet loss rate, and throughput)
`for their customers.
`
`Balancing these potential benefits are some forces against peering. Transit relationships generate
`revenue; peering relationships usually don’t. Peering relationships typically need to be renegotiated
`often, and asymmetric traffic ratios require care to handle in a way that’s mutually satisfactory.
`Above all, these relationships are often between competitors vying for the same customer base.
`
`In the discussion so far, we have implicitly used an important property of current interdomain
`routing: A route advertisement from B to A for a destination prefix is an agreement by B that it
`will forward packets sent via A destined for any destination in the prefix. This (implicit) agreement
`implies that one way to think about Internet economics is to view ISPs as charging customers for
`entries in their routing tables. Of course, the data rate of the interconnection is also crucial, and is
`the major determinant of an ISP’s pricing policy.
`
`A word on pricing is in order. ISPs charge for Internet access in one of two ways. The first is a
`fixed price for a certain speed of access (fixed pricing is a common way to charge for home or small
`business access in the US). The second measures traffic and charges according to the amount of
`bandwidth used. A common approach is to measure usage in 5-minute windows over the duration
`of the pricing period (typically one month). These 5-minute averages are samples that form a
`distrubution. The ISP then calculates the 95th (or 90th) percentile of this distribution and charges
`an amount that depends on this value, using the terms set in the contract. ISPs also offer discounts
`for outages experienced; some ISPs now have delay guarantees (through their network) set up in
`the contract for certain kinds of traffic. It’s unclear to what extent customer networks can and do
`go about verifying that providers comply with the all terms set in their contract.
`
`c
`� 2001-2009 Hari Balakrishnan
`
`6
`
`
`
`The take-away lesson here is simple: providers make more money from a customer if the amount
`of traffic sent on behalf of the customer increases. This simple observation is the basis for a large
`number of complex routing policies that one sees in practice. In the rest of this section, we describe
`some common examples of routing policy.
`
`2.2 Exporting Routes to Make or Save Money
`
`Each AS (ISP) needs to make decisions on which routes to export to its neighboring ISPs using
`BGP. The reason why export policies are important is that no ISP wants to act as transit for
`packets that it isn’t somehow making money on. Because packets flow in the opposite direction to
`the (best) route advertisement for any destination, an AS should advertise routes to neighbors with
`care.
`
`Transit customer routes. To an ISP, its customer routes are likely the most important, because
`the view it provides to its customers is the sense that all potential senders in the Internet can
`reach them. It is in the ISP’s best interest to advertise routes to its transit customers to as many
`other connected ASes as possible. The more traffic that an ISP carries on behalf of a customer, the
`“fatter” the pipe that the customer would need, implying higher revenue for the ISP. Hence, if a
`destination were advertised from multiple neighbors, an ISP should prefer the advertisement made
`from a customer over all other choices (in particular, over peers and transit providers).
`
`Transit provider routes. Does an ISP want to provide transit to the routes exported by its
`provider to it? Most likely not, because the ISP isn’t making any money on providing such transit
`is a customer of P , and P
`facilities. An example of this situation is shown in Figure 3, where CP
`� to X. It isn’t in X’s interest to advertise this route to everyone, e.g.,
`has exported a route to CP
`to other ISPs with whom X has a peering relationship. An important exception to this, of course,
`is X’s transit customers who are paying X for service—the service X provides its customers Ci’s is
`that they can reach any location on the Internet via X, so it makes sense for X to export as many
`routes to X as possible.
`
`�
`
`Peer routes. It usually makes sense for an ISP to export only selected routes from its routing tables
`to other peering ISPs. It obviously makes sense to export routes to all of ones transit customers.
`It also makes sense to export routes to addresses within an ISP. However, it does not make sense
`to export an ISP’s transit provider routes to other peering ISPs, because that may cause a peering
`ISP to use the advertising ISP to reach a destination advertised by a transit provider. Doing so
`would expend ISP resources but not lead to revenue.
`
`The same situation applies to routes learned from other peering relationships. Consider ISP Z in
`Figure 3, with its own transit customers. It doesn’t make sense for X to advertise routes to Z’s
`customers to another peering ISP (Y ), because X doesn’t make any money on Y using X to get
`packets to Z’s customers!
`
`These arguments show that most ISPs end up providing selective transit: typically, full transit capa
`bilities for their own transit customers in both directions, some transit (between mutual customers)
`in a peering relationship, and transit only for one’s transit customers (and ISP-internal addresses)
`to one’s providers.
`
`The discussion so far may make it sound like BGP is the only way in which to exchange reachability
`information between an ISP and its customers or between two ASes. That is not true in practice,
`though; a large fraction of end-customers (typically customers who don’t provide large amounts of
`further transit and/or aren’t ISPs) don’t run BGP sessions with their providers. The reason is that
`
`c� 2001-2009 Hari Balakrishnan
`
`7
`
`
`
`BGP is complicated to configure, administer, and manage, and isn’t particularly useful if the set
`of addresses in the customer’s network is relatively invariant. These customers interact with their
`providers via static routes. These routes are usually manually configured. Of course, information
`about customer address blocks will in general be exchanged by a provider using BGP to other ASes
`(ISPs) to achieve global reachability to the customer premises.
`
`For the several thousands of networks that run BGP, deciding which routes to export is an important
`policy decision. Such decisions are expressed using route filters, which are rules that decide which
`routes an AS’s routers should filter to routers of neighboring ASes.
`
`2.3
`
`Importing Routes to Make or Save Money
`
`In addition to deciding how to filter routes while exporting them, when a router hears many possible
`routes to a destination network, it needs to decide which route to import into its forwarding tables.
`The problem boils down to ranking routes to each destination prefix.
`
`Deciding which routes to import is a fairly involved process in BGP and requires a consideration
`of several attributes of the advertised routes. For the time being, we consider only one of the many
`things that an AS needs to consider, but it’s the most important concern, viz., who advertised the
`route?
`
`Typically, when a router (e.g., X in Figure 3) hears advertisements about its transit customers
`from other ASes (e.g., because the customer is multi-homed), it needs to ensure that packets to
`the customer do not traverse additional ASes unnecessarily. The main reason is that an AS doesn’t
`want to spend money paying its providers for traffic destined to its direct customer, and wants
`to increase its value as perceived by the customer. This requirement usually means that customer
`routes are prioritized over routes to the same network advertised by providers or peers. Second,
`peer routes are likely more preferable to provider routes, because the purpose of peering was to
`exchange reachability information about mutual transit customers. These two observations imply
`that typically routes are imported in the following priority order:
`
`customer > peer > provider
`
`This rule (and many others like it) can be implemented in BGP using a special attribute that’s
`locally maintained by routers in an AS, called the LOCAL PREF attribute. The first rule in route
`selection with BGP is to rank routes according to the LOCAL PREF attribute and pick the one
`with the highest value. It is only when this attribute is not set for a route that other attributes of
`a route even considered in the ranking procedure.
`
`That said, in practice most routes are not selected using the LOCAL PREF attribute; other attributes
`like the length of the AS path tend to be quite common. We discuss these other route attributes
`and the details of the BGP route selection process, also called the decision process, when we talk
`about the main idea in BGP.
`
`2.4 Routing Policy = Ranking + Filtering
`
`Network operators express a wide range of routing policies, but to first approximation, most of
`them can be boiled down to ranking decisions and export filters.
`
`c� 2001-2009 Hari Balakrishnan
`
`8
`
`
`
`3 BGP
`
`We now turn to how reachability information is exchanged using BGP, and see how routing policies
`like the ones explained in the previous section can be expressed and realized. We start with a
`discussion of the main design goals in BGP and summarize the protocol. Most of the complexity
`in wide-area routing is not in the protocol, but in how BGP routers are configured to implement
`policy, and in how routes learned from other ASes are disseminated within an AS. The rest of the
`section discusses these issues.
`
`3.1 Design Goals
`
`In the old NSFNET, the backbone routers exchanged routing information over a tree topology,
`using a routing protocol called EGP (Exterior Gateway Protocol). Because the backbone routing
`information was exchanged over a tree, the routing protocol was relatively simple. The evolution
`of the Internet from a singly administered backbone to its current commercial structure made the
`NSFNET EGP obsolete and required a more sophisticated protocol.
`
`The design of BGP was motivated by three important needs:
`
`1. Scalability. The division of the Internet into ASes under independent administration was
`done while the backbone of the then Internet was under the administration of the NSFNet.
`When the NSFNet was “turned off” in the early 1990s and the Internet routing infrastructure
`opened up to competition in the US, a number of ISPs providing different sizes sprang up.
`The growth in the number of networks (and hosts) has continued to date. To support this
`growth, routers must be able to handle an increasing number of prefixes, BGP must ensure
`that the amount of advertisement traffic scales well with “churn” in the network (parts of
`the network going down and coming up), and BGP must converge to correct loop-free paths
`within a reasonable amount of time after any change. Achieving these goals is not easy.
`
`2. Policy. The ability for each AS to implement and enforce various forms of routing policy was
`an important design goal. One of the consequences of this was the development of the BGP
`attribute structure for route announcements, allowing route filtering, and allowing each AS
`to rank its available routes arbitrarily.
`
`3. Cooperation under competitive circumstances. BGP was designed in large part to
`handle the transition from the NSFNet to a situation where the “backbone” Internet infras
`tructure would no longer be run by a single administrative entity. This structure implies that
`the routing protocol should allow ASes to make purely local decisions on how to route packets,
`from among any set of choices. Moreover, BGP was designed to allow each AS to keep its
`ranking and filtering policies confidential from other ASes.
`
`But what about security? Ensuring the authenticity and integrity of messages was understood to
`be a good goal, but there was a pressing need to get a reasonable routing system in place before the
`security story was fully worked out. Efforts to secure BGP, notably S-BGP [7], have been worked
`out and involve external registries and infrastructure to maintain mappings between prefixes and
`the ASes that own them, as well as the public keys for ASes. These approaches have not been
`deployed in the Internet for a variety of reasons, including the fact that existing routing registries
`tend to have a number of errors and omissions (so people can’t trust them a great deal).
`
`c� 2001-2009 Hari Balakrishnan
`
`9
`
`
`
`Misconfigurations and malice cause connectivity outages from time to time because routing to
`various destinations gets fouled up. The next section gives some examples of past routing problems
`that have made the news; those examples illustrate the adage that complex systems fail for complex
`reasons.
`
`3.2 Protocol Details
`
`As protocols go, BGP is not an overly complicated protocol (as we’ll see later, what makes its
`operation complicated is the variety and complexity of BGP router configurations). The basic
`operation of BGP—the protocol state machine, the format of routing messages, and the propagation
`of routing updates—are all defined in the protocol standard (RFC 4271, which obsoletes RFC
`1771) [13]. BGP runs over TCP on a well-known port (179). To start participating in a BGP
`session with another router, a router sends an OPEN message after establishing a TCP connection
`to it on the BGP port. After the OPEN is completed, both routers exchange their tables of all
`active routes (of course, applying all applicable route filtering rules). Each router then integrates
`the information obtained from its neighbor into its routing table. The entire process may take many
`seconds to a few minutes to complete, especially on sessions that have a large number of active
`routes.
`
`After this initialization, there are two main types of messages on the BGP session. First, BGP
`routers send route UPDATE messages sent on the session. These updates only send any routing
`entries that have changed since the last update (or transmission of all active routes). There are
`two kinds of updates: announcements, which are changes to existing routes or new routes, and
`withdrawals, which are messages that inform the receiver that the named routes no longer exist. A
`withdrawal usually happens when some previously announced route can no longer be used (e.g.,
`because of a failure or a change in policy). Because BGP uses TCP, which provides reliable and
`in-order delivery, routes do not need to be periodically announced, unless they change.
`
`But, in the absence of periodic routing updates, how does a router know whether the neighbor at
`the other end of a session is still functioning properly? One possible solution might be for BGP to
`run over a transport protocol that implements its own “is the peer alive” message protocol. Such
`messages are also called “keepalive” messages. TCP, however, does not implement a transport-layer
`“keepalive” (with good reason), so BGP uses its own. Each BGP session has a configurable keepalive
`timer, and the router guarantees that it will attempt to send at least one BGP message during that
`time. If there are no UPDATE messages, then the router sends the second type of message on the
`session: KEEPALIVE messages. The absence of a certain number BGP KEEPALIVE messages on a
`session causes the router to terminate that session. The number of missing messages depends on
`a configurable times called the hold timer; the specification recommends that the hold timer be at
`least as long as the keepalive timer duration negotiated on the session.
`
`More details about the BGP state machine may be found in [2, 13].
`
`Unlike many IGP’s, BGP does not simply optimize any metrics like shortest-paths or delays. Be
`cause its goals are to provide reachability information and enable routing policies, its announcements
`do not simply announce some metric like hop-count. Rather, they have the following format:
`
`IP pref ix : Attributes
`
`where for each announced IP prefix (in the “A/m” format), one or more attributes are also an
`nounced. There are a number of standardized attributes in BGP, and we’ll look at some of them
`in more detail below.
`
`c
`� 2001-2009 Hari Balakrishnan
`
`10
`
`
`
`iBGP
`
`R2
`
`R1
`
`Info
`about D
`
`eBGP
`
`D
`
`Figure 4: eBGP and iBGP.
`
`We already talked about one BGP attribute, LOCAL PREF. This attribute isn’t disseminated with
`route announcements, but is an important attribute used locally while selecting a route for a
`destination. When a route is advertised from a neighboring AS, the receiving BGP router consults
`its configuration and may set a LOCAL PREF for this route.
`
`3.3
`
`eBGP and iBGP
`
`There are two types of BGP sessions: eBGP sessions are between BGP-speaking routers in different
`ASes, while iBGP sessions are between BGP routers in the same AS. They serve different purposes,
`but use the same protocol.
`
`eBGP is the “standard” mode in which BGP is used; after all, BGP was designed to exchange
`network routing information between different ASes in the Internet. eBGP sessions are shown in
`Figure 4, where the BGP routers implement route filtering rules and exchange a subset of their
`routes with routers in other ASes. These sessions generally operate over a one-hop IP path (i.e.,
`over directly connected IP links).
`
`In general, each AS will have more than one router that participates in eBGP sessions with neigh
`boring ASes. During this process, each router will obtain information about some subset of all the
`prefixes that the entire AS knows about. Each such eBGP router must disseminate routes to the
`external prefix to all the other routers in the AS. This dissemination must be done with care to
`meet two important goals:
`
`1. Loop-free forwarding. After the dissemination of eBGP learned routes, the resulting routes
`(and the subsequent forwarding paths of packets sent along those routes) picked by all routers
`should be free of deflections and forwarding loops [3, 5].
`
`2. Complete visibility. One of the goals of BGP is to allow each AS to be treated as a single
`monolithic entity. This means that the several eBGP-speaking routes in the AS must exchange
`external route information so that they have a complete view of all external routes. For
`instance, consider Figure 4, and prefix D. Router R2 needs to know how to forward packets
`
`c� 2001-2009 Hari Balakrishnan
`
`11
`
`
`
`Figure 5: Small ASes establish a “full mesh” of iBGP sessions. Each circle represents a router within
`an AS. Only eBGP-learned routes are re-advertised over iBGP sessions.
`
`destined for D, but R2 hasn’t heard a direct announcement on any of its eBGP sessions
`for D. 2 By “complete visibility”, we mean the following: for every external destination, each
`router picks the same route that it would have picked had it seen the best routes from each
`eBGP router in the AS.
`
`The dissemination of externally learned routes to routers inside an AS is done over internal
`BGP (iBGP) sessions running in each AS.
`
`An important question concerns the topology over which iBGP sessions should be run. One pos
`sibility is to use an arbitrary connected graph and “flood” updates of external routes to all BGP
`routers in an AS. Of cours, an approach based on flooding would require additional techniques
`to avoid routing loops. The original BGP specification solved this problem by simply setting up
`a full mesh of iBGP sessions (see Figure 5, where every eBGP router