`The transport layer is not just another layer. It is the heart of the whole proto(cid:173)
`col hierarchy. Its task is to provide reliable, cost-effective data transport from the
`source machine to the destination niachine, independent of the physical network
`or networks currently in use. Without the transport layer, the whole concept of
`layered protocols would make little sense. In this chapter we will study the trans(cid:173)
`port layer in detail, including its services, design, protocols, and performance.
`In the following sections we will provide an introduction to the transport ser(cid:173)
`vice. We look at what kind of service is provided to the application layer (or ses(cid:173)
`sion layer, if one exists), and especially how one can characterize the quality of
`service. Then we will look at how applications access the transport service, that
`is, what the interface is like.
`6.1.1. Services Provided to the Upper Layers
`The ultimate goal of the transport layer is to provide efficient, reliable, and
`cost-effective service to its users, normally processes in the application layer. To
`achieve this goal, the transport layer makes use of the services provided
`SEC. 6.1
`layer on top of the network layer that improves the quality of the service. If a
`transport entity is informed halfway through a long transmission that its network
`connection has been abruptly terminated, with no indication of what has happened
`to the data currently in transit, it can set up a new network connection to the
`remote transport entity. Using this new network connection, it can send a query to
`its peer asking which data arrived and which did not, and then pick up from where
`it left off.
`In essence, the existence of the transport layer makes it possible for the trans(cid:173)
`port service to be more reliable than the underlying network service. Lost packets
`and mangled data can be detected and compensated for by the transport layer.
`Furthermore, the transport service primitives can be designed to be independent of
`the network service primitives which may vary considerably from network to net(cid:173)
`work (e.g., connectionless LAN service may be quite different than connection-(cid:173)
`oriented WAN service).
`Thanks to the transport layer, it is possible for application programs to be
`written using a standard set of primitives, and to have these programs work on a
`wide variety of networks, without having to worry about dealing with different
`subnet interfaces and unreliable transmission. If all real networks were flawless
`and all had the same service primitives, the transport layer would probably not be
`needed. However, in the real world it fulfills the key function of isolating the
`upper layers from the technology, design, and imperfections of the subnet.
`For this reason, many people have made a distinction between layers 1
`through 4 on the one hand, and layer(s) above 4 on the other. The bottom four
`layers can be seen as the transport service provider, whereas the upper layer(s)
`are the transport service user. This distinction of provider versus user has a
`considerable impact on the design of the layers and puts the transport layer in a
`key position, since it forms the major boundary between the provider and user of
`the reliable data transmission service.
`6.1.2. Quality of Service
`Another way of looking at the transport layer is to regard its primary function
`as enhancing the QoS (Quality of Service) provided by the network layer. If the
`network service is impeccable, the transport layer has an easy job. If, however,
`the network service is poor, the transport layer has to bridge the gap between what
`the transport users want and what the network layer provides.
`While at first glance, quality of service might seem like a vague concept (get(cid:173)
`ting everyone to agree what constitutes "good" service is a nontrivial exercise),
`QoS can be characterized by a number of specific parameters, as we saw in Chap.
`5. The transport service may allow the user to specify preferred, acceptable, and
`minimum values for various service parameters at the time a connection is set up.
`Some of the parameters also apply to connectionless transport. It is up to the
`transport layer to examine these parameters, and depending on the kind of
`CHAP. 6
`network service or services available to it, determine whether it can provide the
`required service. In the remainder of this section we will discuss some possible
`QoS parameters. They are summarized in Fig. 6-2. Note that few networks or
`protocols provide all of these parameters. Many just try their best to reduce the
`residual error rate and leave it at that. Others have elaborate QoS architectures
`(Campbell et al., 1994).
`Connection establishment delay
`Connection establishment failure probability
`Transit delay
`Residual error ratio
`- -
`Fig. 6-2. Typical transport layer quality of service parameters.
`The Connection establishment delay is the amount of time elapsing between a
`transport connection being requested and the confirmation being received by the
`user of the transport service. It includes the processing delay in the remote trans(cid:173)
`port entity. As with all parameters measuring a delay, the shorter the delay, the
`better the service.
`The Connection establishment failure probability is the chance of a connec(cid:173)
`tion not being established within the maximum establishment delay time, for
`example, due to network congestion, lack of table space somewhere, or other
`internal problems.
`The Throughput parameter measures the number of bytes of user data
`transferred per second, measured over some time interval. The throughput is
`measured separately for each direction.
`The Transit delay measures the time between a message being sent by the
`transport user on the source machine and its being received by the transport user
`on the destination machine. As with throughput, each direction is handled
`The Residual error ratio measures the number of lost or garbled messages as
`a fraction of the total sent. In theory, the residual error rate should be zero, since
`it is the job of the transport layer to hide all network layer errors. In practice it
`may have some (small) finite value.
`The Protection parameter provides a way for the transport user to specify
`interest in having the transport layer provide protection against unauthorized third
`parties (wiretappers) reading or modifying the transmitted data.
`SEC. 6.1
`The Priority parameter provides a way for a transport user to indicate that
`some of its connections are more important than other ones, and in the event of
`congestion, to make sure that the high-priority connections get serviced before the
`low-priority ones.
`Finally, the Resilience parameter gives the probability of the transport layer
`itself spontaneously terminating a connection due to internal problems or conges(cid:173)
`The QoS parameters are specified by the transport user when a connection is
`requested. Both the desired and minimum acceptable values can be given. In
`some cases, upon seeing the QoS parameters, the transport layer may immediately
`realize that some of them are unachievable, in which case it tells the caller that the
`connection attempt failed, without even bothering to contact the destination. The
`failure report specifies the reason for the failure.
`In other cases, the transport layer knows it cannot achieve the desired goal
`(e.g., 600 Mbps throughput), but it can achieve a lower, but still acceptable rate
`(e.g., 150 Mbps). It then sends the lower rate and the minimum acceptable rate to
`the remote machine, asking to establish a connection. If the remote machine can(cid:173)
`not handle the proposed value, but it can handle a value above the minimum, it
`may make a counteroffer. If it cannot handle any value above the minimum, it
`rejects the connection attempt. Finally, the originating transport user is informed
`of whether the connection was established or rejected, and if it was established,
`the values of the parameters agreed upon.
`This process is called option negotiation. Once the options have been nego(cid:173)
`tiated, they remain that way throughout the life of the connection. To keep custo(cid:173)
`mers from being too greedy, most carriers have the tendency to charge more
`money for better quality service.
`6.1.3. Transport Service Primitives
`The transport service primitives allow transport users (e.g., application pro(cid:173)
`grams) to access the transport service. Each transport service has its own access
`primitives. In this section, we will first examine a simple (hypothetical) transport
`service and then look at a real example.
`The transport service is similar to the network service, but there are also some
`important differences. The main difference is that the network service is intended
`to model the service offered by real networks, warts and all. Real networks can
`lose packets, so the network service is generally unreliable.
`The (connection-oriented) transport service, in contrast, is reliable. Of course,
`real networks are not error-free, but that is precisely the purpose of the transport
`layer-to provide a reliable service on top of an unreliable network.
`As an example, consider two processes connected by pipes in UNIX. They
`assume the connection between them is perfect. They do not want to know about
`acknowledgements, lost packets, congestion, or anything like that. What they
`CHAP. 6
`want is a 100 percent reliable connection. Process A puts data into one end of the
`pipe, and process B takes it out of the other. This is what the connection-oriented
`transport service is all about-hiding the imperfections of the network service so
`that user processes can just assume the existence of an error-free bit stream.
`As an aside, the transport layer can also provide unreliable (datagram) ser(cid:173)
`vice, but there is relatively little to say about that, so we will concentrate on the
`connection-oriented transport service in this chapter.
`A second difference between the network service and transport service is
`whom the services are intended for. The network service is used only by the
`transport entities. Few users write their own transport entities, and thus few users
`or programs ever see the bare network service. In contrast, many programs (and
`thus programmers) see the transport primitives. Consequently, the transport ser(cid:173)
`vice must be convenient and easy to use.
`To get an idea of what a transport service might be like, consider the five
`primitives listed in Fig. 6-3. This transport interface is truly bare bones but it
`gives the essential flavor of what a connection-oriented transport interface has to
`do. It allows application programs to establish, use, and release connections,
`which is sufficient for many applications.
`TPDU sent
`Block until some process tries to connect
`Actively attempt to establish a connection
`Send information
`Block until a DATA TPDU arrives
`This side wants to release the connection
`Fig. 6-3. The primitives for a simple transport service.
`To see how these primitives might be used, consider an application with a
`server and a number of remote clients. To start with, the server executes a LISTEN
`primitive, typically by calling a library procedure that makes a system call to
`block the server until a client turns up. When a client wants to talk to the server,
`it executes a CONNECT primitive. The transport entity carries out this primitive by
`blocking the caller and sending a packet to the server. Encapsulated in the pay(cid:173)
`load of this packet is a transport layer message for the server's transport entity.
`A quick note on terminology is now in order. For lack of a better term, we
`will reluctantly use the somewhat ungainly acronym TPDU (Transport Protocol
`Data Unit) for messages sent from transport entity to transport entity. Thus
`TPDUs (exchanged by the transport layer) are contained in packets (exchanged by
`the network layer). In turn, packets are contained in frames (exchanged by the
`data link layer). When a frame arrives, the data link layer processes the frame
`header and passes the contents of the frame payload field up to the network entity.
`CHAP. 6
`to send, but it is still willing to accept data from its partner. In this model, a con(cid:173)
`nection is released when both sides have done a DISCONNECT.
`A state diagram for connection establishment and release for these simple
`primitives is given in Fig. 6-5. Each transition is triggered by some event, either a
`primitive executed by the local transport user or an incoming packet. For simpli(cid:173)
`city, we assume here that each TPDU is separately acknowledged. We also
`assume that a symmetric disconnection model is used, with the client going first.
`Please note that this model is quite unsophisticated. We will look at more realis(cid:173)
`tic models later on.
`Connection request
`Connect primitive
`TPDU '""'''(-----------------1~--ID_L_E-~1--------..1""'"'''
`I , ________________ ,...
`Connect primitive
`est Disconnection requ
`TPDU recei ved
`___ ...
`DISCONNECT...,. __________ _
`Connection ac cepted
`TPDU receive
`' I
`Disconnect primitive
`Discon:~""e""ci- - - - - - - - -- - - - --I
`primitive executed
`Disconnection request
`TPDU received
`Fig. 6-5. A state diagram for a simple connection management scheme. Transi(cid:173)
`tions labeled in italics are caused by packet arrivals. The solid lines show the
`client's state sequence. The dashed lines show the server's state sequence.
`Berkeley Sockets
`Let us now briefly inspect another set of transport primitives, the socket prim(cid:173)
`itives used in Berkeley UNIX for TCP. They are listed in Fig. 6-6. Roughly
`speaking, they follow the model of our first example but offer more features and
`flexibility. We will not look at the coffesponding TPDUs here. That discussion
`will have to wait until we study TCP later in this chapter.
`The first four primitives in the list are executed in that order by servers. The
`SOCKET primitive creates a new end point and allocates table space for it within
`SEC. 6.1
`Create a new communication end point
`Attach a local address to a socket
`Announce willingness to accept connections; give queue size
`Block the caller until a connection attempt arrives
`CONNECT Actively attempt to establish a connection
`Send some data over the connection
`Receive some data from the connection
`Release the connection
`Fig. 6-6. The socket primitives for TCP.
`the transport entity. The parameters of the call specify the addressing format to
`be used, the type of service desired (e.g., reliable byte stream), and the protocol.
`A successful SOCKET call returns an ordinary file descriptor for use in succeeding
`calls, the same way an OPEN call does.
`Newly created sockets do not have addresses. These are assigned using the
`BIND primitive. Once a server has bound an address to a socket, remote clients
`can connect to it. The reason for not having the SOCKET call create an address
`directly is that some processes care about their address (e.g., they have been using
`the same address for years and everyone knows this address), whereas others do
`not care.
`Next comes the LISTEN call, which allocates space to queue incoming calls for
`the case that several clients try to connect at the same time. In contrast to LISTEN
`in our first example, in the socket model LISTEN is not a blocking call.
`To block waiting for an incoming connection, the server executes an ACCEPT
`primitive. When a TPDU asking for a connection arrives, the transport entity
`creates a new socket with the same properties as the original one and returns a file
`descriptor for it. The server can then fork off a process or thread to handle the
`connection on the new socket and go back to waiting for the next connection on
`the original socket.
`Now let us look at the client side. Here, too, a socket must first be created
`using the SOCKET primitive, but BIND is not required since the address used does
`not matter to the server. The CONNECT primitive blocks the caller and actively
`starts the connection process. When it completes (i.e., when the appropriate
`TPDU is received from the server), the client process is unblocked and the con(cid:173)
`nection is established. Both sides can now use SEND and RECEIVE to transmit and
`receive data over the full-duplex connection.
`Connection release with sockets is symmetric. When both sides have exe(cid:173)
`cuted a CLOSE primitive, the connection is released.
`CHAP. 6
`The transport service is implemented by a transport protocol used between
`the two transport entities. In some ways, transport protocols resemble the data
`link protocols we studied in detail in Chap. 3. Both have to deal with error con(cid:173)
`trol, sequencing, and flow control, among other issues.
`However, significant differences between the two also exist. These differ(cid:173)
`ences are due to major dissimilarities between the environments in which the two
`protocols operate, as shown in Fig. 6-7. At the data link layer, two routers com(cid:173)
`municate directly via a physical channel, whereas at the transport layer, this phy(cid:173)
`sical channel is replaced by the entire subnet. This difference has many important
`implications for the protocols.
`\ ~ D
`communication channel
`Fig. 6-7. (a) Environment of the data link layer. (b) Environment of the trans(cid:173)
`port layer.
`For one thing, in the data link layer, it is not necessary for a router to specify
`which router it wants to talk to-each outgoing line uniquely specifies a particular
`router. In the transport layer, explicit addressing of destinations is required.
`For another thing, the process of establishing a connection over the wire of
`Fig. 6-7(a) is simple: the other end is always there (unless it has crashed, in which
`case it is not there). Either way, there is not much to do. In the transport layer,
`initial connection establishment is more complicated, as we will see.
`Another, exceedingly annoying, difference between the data link layer and the
`transport layer is the potential existence of storage capacity in the subnet. When a
`router sends a frame, it may arrive or be lost, but it cannot bounce around for a
`while, go into hiding in a far corner of the world, and then suddenly emerge at an
`inopportune moment 30 sec later. If the subnet uses datagrams and adaptive rout(cid:173)
`ing inside, there is a nonnegligible probability that a packet may be stored for a
`number of seconds and then delivered later. The consequences of this ability of
`the subnet to store packets can sometimes be disastrous and require the use of spe(cid:173)
`cial protocols.
`A final difference between the data link and transport layers is one of amount
`rather than of kind. Buffering and flow control are needed in both layers, but the
`presence of a large and dynamically varying number of connections in the
`SEC. 6.2
`transport layer may require a different approach than we used in the data link
`layer. In Chap. 3, some of the protocols allocate a fixed number of buffers to each
`line, so that when a frame arrives there is always a buffer available. In the trans(cid:173)
`port layer, the larger number of connections that must be managed make the idea
`of dedicating many buffers to each one less attractive. In the following sections,
`we will examine all of these important issues and others.
`6.2.1. Addressing
`When an application process wishes to set up a connection to a remote appli(cid:173)
`cation process, it must specify which one to connect to. (Connectionless transport
`has the same problem: To whom should each message be sent?) The method nor(cid:173)
`mally used is to define transport addresses to which processes can listen for con(cid:173)
`nection requests. In the Internet, these end points are (IP address, local port)
`pairs. In ATM networks, they are AAL-SAPs. We will use the neutral term
`TSAP (Transport Service Access Point). The analogous end points in the net(cid:173)
`work layer (i.e., network layer addresses) are then called NSAPs. IP addresses
`are examples of NSAPs.
`Figure 6-8 illustrates the relationship between the NSAP, TSAP, network con(cid:173)
`nection, and transport connection for a connection-oriented subnet (e.g., ATM).
`Note that a transport entity normally supports multiple TSAPs. On some net(cid:173)
`works, multiple NSAPs also exist, but on others each machine has only one NSAP
`(e.g., one IP address). A possible connection scenario for a transport connection
`over a connection-oriented network layer is as follows.
`1. A time-of-day server process on host 2 attaches itself to TSAP 122 to
`wait for an incoming call. How a process attaches itself to a TSAP is
`outside the networking model and depends entirely on the local
`operating system. A call such as our LISTEN might be used, for
`2. An application process on host 1 wants to find out the time-of-day,
`so it issues a CONNECT request specifying TSAP 6 as the source and
`TSAP 122 as the destination.
`3. The transport entity on host 1 selects a network address on its
`machine (if it has more than one) and sets up a network connection
`between them. (With a connectionless subnet, establishing this net(cid:173)
`work layer connection would not be done.) Using this network con(cid:173)
`nection, host 1 's transport entity can talk to the transport entity on
`host 2.
`4. The first thing the transport entity on 1 says to its peer on 2 is:
`"Good morning. I would like to establish a transport connection
`between my TSAP 6 and your TSAP 122. What do you say?"
`CHAP. 6
`5. The transport entity on 2 then asks the time-of-day server at TSAP
`122 if it is willing to accept a new connection. If it agrees, the trans(cid:173)
`port connection is established.
`Note that the transport connection goes from TSAP to TSAP, whereas the net(cid:173)
`work connection only goes part way, from NSAP to NSAP.
`Host 1
`Application ~ TSAP 6
`process ~
`"- Server
`: "Transport
`starts here
`starts here
`i' NSAP
`\TSAP 122
`Network -,
`Data link
`Fig. 6-8. TSAPs, NSAPs, and connections.
`The picture painted above is fine, except we have swept one little problem
`under the rug: How does the user process on host 1 know that the time-of-day
`server is attached to TSAP 122? One possibility is that the time-of-day server has
`been attaching itself to TSAP 122 for years, and gradually all the network users
`have learned this. In this model, services have stable TSAP addresses which can
`be printed on paper and given to new users when they join the network.
`While stable TSAP addresses might work for a small number of key services
`that never change, in general, user processes often want to talk to other user
`processes that only exist for a short time and do not have a TSAP address that is
`known in advance. Furthermore, if there are potentially many server processes,
`most of which are rarely used, it is wasteful to have each of them active and
`In short, a better scheme is
`listening to a stable TSAP address all day long.
`One such scheme, used by UNIX hosts on the Internet, is shown in Fig. 6-9 in a
`simplified form. It is known as the initial connection protocol. Instead of every
`conceivable server listening at a well-known TSAP, each machine that wishes to
`CHAP. 6
`and the name server sends back the TSAP address. Then the user releases the
`connection with the name server and establishes a new one with the desired ser(cid:173)
`In this model, when a new service is created, it must register itself with the
`name server, giving both its service name (typically an ASCII string) and the
`address of its TSAP. The name server records this information in its internal data(cid:173)
`base, so that when queries come in later, it will know the answers.
`The function of the name server is analogous to the directory assistance
`operator in the telephone system-it provides a mapping of names onto numbers.
`Just as in the telephone system, it is essential that the address of the well-known
`TSAP used by the name server (or the process server in the initial connection pro(cid:173)
`tocol) is indeed well known. If you do not know the number of the information
`operator, you cannot call the information operator to find it out. If you think the
`number you dial for information is obvious, try it in a foreign country some time.
`Now let us suppose that the user has successfully located the address of the
`TSAP to be connected to. Another interesting question is how does the local
`transport entity know on which machine that TSAP is located? More specifically,
`how does the transport entity know which network layer address to use to set up a
`network connection to the remote transport entity that manages the TSAP
`The answer depends on the structure of TSAP addresses. One possible struc(cid:173)
`ture is that TSAP addresses are hierarchical addresses. With hierarchical
`addresses, the address consists of a sequence of fields used to disjointly partition
`the address space. For example, a truly universal TSAP address might have the
`following structure:
`address = <galaxy> <star> <planet> <country> <network> <host> <port>
`With this scheme, it is straightforward to locate a TSAP anywhere in the known
`universe. Equivalently, if a TSAP address is a concatenation of an NSAP address
`and a port (a local identifier specifying one of the local TSAPs), then when a
`transport entity is given a TSAP address to connect to, it uses the NSAP address
`contained in the TSAP address to reach the proper remote transport entity.
`As a simple example of a hierarchical address, consider the telephone number
`19076543210. This number can be parsed as 1-907-654-3210, where 1 is a coun(cid:173)
`try code (United States + Canada), 907 is an area code (Alaska), 654 is an end
`office in Alaska, and 3210 is one of the "p01ts" (subscriber lines) in that end
`The alternative to a hierarchical address space is a flat address space. If the
`TSAP addresses are not hierarchical, a second level of mapping is needed to
`locate the proper machine. There would have to be a name server that took trans(cid:173)
`port addresses as input and returned network addresses as output. Alternatively,
`in some situations (e.g., on a LAN), it is possible to broadcast a query asking the
`destination machine to please identify itself by sending a packet.
`SEC. 6.2
`6.2.2. Establishing a Connection
`Establishing a connection sounds easy, but it is actually surprisingly tricky.
`At first glance, it would seem sufficient for one transport entity to just send a CON(cid:173)
`NECTION REQUEST TPDU to the destination and wait for a CONNECTION
`ACCEPTED reply. The problem occurs when the network can lose, store, and
`duplicate packets.
`Imagine a subnet that is so congested that acknowledgements hardly ever get
`back in time, and each packet times out and is retransmitted two or three times.
`Suppose that the subnet uses datagrams inside, and every packet follows a dif(cid:173)
`ferent route. Some of the packets might get stuck in a traffic jam inside the sub(cid:173)
`net and take a long time to arrive; that is, they are stored in the subnet and pop out
`much later.
`The worst possible nightmare is as follows. A user establishes a connection
`with a bank, sends messages telling the bank to transfer a large amount of money
`to the account of a not-entirely-trustworthy person, and then releases the connec(cid:173)
`tion. Unfortunately, each packet in the scenario is duplicated and stored in the
`subnet. After the connection has been released, all the packets pop out of the sub(cid:173)
`net and arrive at the destination in order, asking the bank to establish a new con(cid:173)
`nection, transfer money (again), and. release the connection. The bank has no way
`of telling that these are duplicates. It must assume that this is a second, indepen(cid:173)
`dent transaction, and transfers the money again. For the remainder of this section
`we will study the problem of delayed duplicates, with special emphasis on algo(cid:173)
`rithms for establishing connections in a reliable way, so that nightmares like the
`one above cannot happen.
`The crux of the problem is the existence of delayed duplicates. It can be
`attacked in various ways, none of them very satisfactory. Otie way is to use
`throwaway transport addresses. In this approach, each time a transport address is
`needed, a new one is generated. When a connection is released, the address is dis(cid:173)
`carded. This strategy makes the process server model of Fig. 6-9 impossible.
`Another possibility is to give each connection a connection identifier (i.e., a
`sequence number incremented for each connection established), chosen by the ini(cid:173)
`tiating party, and put in each TPDU, including the one requesting the connection.
`After each connection is released, each transport entity could update a table listing
`obsolete connections as (peer transport entity, connection identifier) pairs. When(cid:173)
`ever a connection request came in, it could be checked against the table, to see if
`it belonged to a previously released connection.
`Unfortunately, this s.cheme has a basic flaw: it requires each transport entity
`to maintain a certain amount of history information indefinitely. If a machine
`crashes and loses its memory, it will no longer know which connection identifiers
`have already been used.
`Instead, we need to take a different tack. Rather than allowing packets to live
`forever within the subnet, we must devise a mechanism to kill off aged packets
`CHAP. 6
`that are still wandering about. If we can ensure that no packet lives longer than
`some known time, the problem becomes somewhat more manageable.
`Packet lifetime can be restricted to a known maximum using one of the fol(cid:173)
`lowing techniques:
`1. Restricted subnet design.
`2. Putting a hop counter in each packet.
`3. Times tamping each packet.
`The first method includes any method that prevents packets from looping, com(cid:173)
`bined with some way of bounding congestion delay over the (now known) longest
`possible path. The second method consists of having the hop count incremented
`each time the packet is forwarded. The data link protocol simply discards any
`packet whose hop counter has exceeded a certain value. The third method
`requires each packet to bear the time it was created, with the routers agreeing to
`discard any packet older than some agreed upon time. This latter method requires
`the router clocks to be synchronized, which itself is a nontrivial task unless syn(cid:173)
`chronization is achieved external to the network, for example by listening to
`WWV or some other radio station that broadcasts the precise time periodically.
`In practice, we will need to guarantee no.t only that a packet is dead, but also
`that all acknowledgements to it are also dead, so we will now introduce T, which
`is some small multiple of the true maximum packet lifetime. The multiple is
`protocol-dependent and