`Kemp et al.
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 6,621,799 B1
`Sep. 16, 2003
`
`US006621799B1
`
`(54) SEMI-RELIABLE DATA TRANSPORT
`
`(75)
`
`Inventors: Bradford H. Kemp, Salem, NH (US);
`Benjamin E. McCann, Acton, MA
`(US)
`
`(73) Assignee: Enterasys Networks, Inc., Andover,
`MA (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 09/167,097
`
`(22) Filed:
`
`Oct. 5, 1998
`
`Int. Cl.7 ................................................ .. H04B 1/44
`(51)
`(52) U.S. Cl.
`...................... .. 370/282; 370/236; 709/237
`(58) Field of Search ............................... .. 709/203, 224,
`709/230, 235, 237; 714/748; 380/212; 370/469,
`236, 235, 216, 230, 229, 394, 389, 231,
`248, 282
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`...................... .. 714/748
`9/1996 Miller
`5,553,083 A
`4/1998 Kirchner et al.
`..
`709/237
`5,745,685 A *
`6/2000 Wesley ........... ..
`709/235
`6,076,114 A *
`8/2000 Davis et al.
`709/224
`6,105,064 A *
`8/2001 Ben—DaVid ................ .. 709/230
`6,273,622 B1 *
`OTHER PUBLICATIONS
`
`
`
`Braden, R., “TCP Extensions for High Performance: An
`Update”, Jun. 21, 1993.
`Brakmo et al., “TCP Vegas: New Techniques for Congestion
`Detection and Avoidance”, Dept. of Computer Science,
`University of Arizona, Tucson, AZ, 1996.
`Brakmo et al., “TCP Vegas: End to End Congestion Avoid-
`ance on a Global Internet”, Dept. of Computer Science,
`University of Arizona, Tucson, AZ.
`Brakmo et al., “Performance Problems in BSD4.4 TCP”,
`Dept. of Computer Science, University of Arizona, Tucson,
`AZ.
`
`Fall et al., “Comparisons of Tahoe, Reno, and Sack TCP”,
`Lawrence Berkeley National Laboratory, Berkeley, CA,
`Dec. 2, 1995.
`Floyd, S., “TCP and Successive Fast Retransmits” Lawrence
`Berkeley Laboratory, Berkeley, CA, May 1995.
`Floyd et al., “Increasing TCP’s Initial Window”, Jul. 1997.
`Floyd S., “Issues of TCP with SACK”, Mar. 9, 1996.
`Hanks, S., “Generic Routing Encapsulation (GRE)”, Oct.
`1994.
`
`Jacobson, V., “Re: interpacket arrival variance and mean”,
`e—mail message, Jun. 15, 1987.
`Jacobson, V., “Re: your congestion scheme”, e—mail mes-
`sage, Nov. 1987.
`
`(List continued on next page.)
`
`Primary Exarniner—Hassan Kizou
`Assistant Examiner—John Pezzlo
`
`(74) Attorney, Agent, or Firm—Fish & Richardson P.C.
`
`(57)
`
`ABSTRACT
`
`A new type of communication protocol provides semi-
`reliable transport of data over a data channel, such as over
`the Internet. The new type of protocol limits the number of
`retransmissions of unsuccessfully delivered data and may
`eventually “give up” on successfully delivering particular
`data and go on sending subsequent data to the destination.
`When a reliable communication protocol, such as TCP/IP is
`tunneled between two computers over a virtual connection
`which uses the new type of semi-reliable protocol, overall
`error control of data passing between the two computers
`involves elements of error control implemented by both the
`semi-reliable protocol and the reliable protocol. This overall
`error control can provide higher throughput than provided
`by using either a completely reliable protocol (e.g., TCP) for
`the virtual connection, or a completely unreliable protocol
`(e.g., UDP) for the virtual connection. This advantage can be
`even more pronounced if the data stream is compressed or
`encrypted before being passed over the virtual connection
`using a technique which maintains state from one data
`packet to another.
`
`25 Claims, 14 Drawing Sheets
`
`GRE
`£9
`
`PP? ‘
`TO __.I wan: QUEUE
`MODULE
`310
`
`410
`
`420
`
`cwnn
`l:”m432
`KETRANSMIT QUEUE
`I
`
`3%
`MUX
`330-
`
`9
`
`412
`
`412
`412
`450
`no
`\T|MER
`
`450‘ ACK
`-“Mm
`
`:
`42;,
`r——— _
`1
`I
`I "5 I
`I
`DONE,
`476
`I J
`!
`QUEUED
`'-
`I
`474
`1
`i
`I
`I
`I
`472 I
`470
`|
`_;
`SACK—
`i
`f\CK-
`i
`QUEUED]]
`QUEUED
`L j : ¢ _I_ 7 —‘ J
`READ ousue
`I
`l
`44,2 442
`
`442
`
`T0OREMUX
`330
`
`TO
`PPPMODULE
`310
`
`BROADCOM l 0 0 5
`
`
`
`US 6,621,799 B1
`Page 2
`
`OTHER PUBLICATIONS
`
`Jacobson, V., “Dynamic Congestion Avoidance/Control”,
`e—mail message, Feb. 1988.
`Jacobson, V., “TCP Extensions for Long—Delay Paths”, Oct.
`1988.
`
`Jacobson, V., “Modified TCP Congestion Avoidance Algo-
`rithm”, e—mail message, Apr. 1990.
`Jacobson, V., “Design Changes to the Kernel Network
`Architecture for 4.4BSD”, Lawrence Berkeley Laboratory,
`Berkeley, CA May 1992.
`Jacobson, V., “Some Design Issues for High—Speed Net-
`works”, Lawrence Berkeley Laboratory, Berkeley, CA Nov.
`1993.
`
`Jacobson, V., “End2End”, e—mail message, Mar. 1994.
`Jacobson, V., “TCP Extensions for High Performance”,
`e—mail message, Feb. 1997.
`Mathis et al., “Forward Acknowledgement: Refining TCP
`Congestion Control”, Pittsburgh Supercomputing Center.
`Mathis et al., “TCP Rate-Halving with Bounding Param-
`eters”, Pittsburgh Supercomputing Center, Oct. 1996.
`Mathis et al., “TCP Rate-Halving with Bounding Param-
`eters”, Pittsburgh Supercomputing Center, Oct. 1996.
`Mathis et al., “TCP Selective Acknowledgement Options”,
`e—mail message, Oct. 1996.
`Meyer, G., “The PPP Encryption Control Protocol (ECP)”,
`Spider Systems, e—mail message, Jun. 1996.
`Partridge et al., “A Faster UDP”, IEEE/ACM Trans. on
`Networking, Aug. 1993.
`
`Rand, D., “The PPP Compression Control Protocol (CCP)”,
`Novell, e—mail message, Jun. 1996.
`Rizzo, L., “Issues in the implementation of selective
`acknowledgements for TCP”, e—mail message, Jan. 1996.
`Sharma et al., “Scalable Timers for Soft State Protocols”,
`Information Sciences Institute, University of Southern Cali-
`fornia.
`
`Simpson, W., “The Point-to-Point Protocol (PPP)”, Day-
`dreamer, Jul. 1994.
`Stevens, W., “TCP Slow Start, Congestion Avoidance, Fast
`Retransmit, and Fast Recovery Algorithms”, NOAO, Jan.
`1997.
`
`TCP Selective Acknowledgement option (and related
`changes) for FreeBSD, Sep. 1997.
`Network Working Group Request for Comments: 1072,
`“TCP Extensions for Long—Delay Paths”, Sep. 1997.
`Ideal Congestion Control, Sep. 1997.
`Marasli et al., “Partially Reliable Transport Service”, Pro-
`ceedings, Second IEEE Symposium on Computers and Com-
`munications (Cat. No. 97TBZ 00137), Proceedings Second
`IEEE Symposium on Computer and Communications, Alex-
`andria, Egypt, Jul. 1-3, 1997, pp. 648-656.
`Marasli et al., “Retransmission-Based Partially Reliable
`Transport Service: An Analytic Model”, Proceedings of
`Infocom, US, Los Alamitos, IEEE Comp. Soc. Press, vol.
`Conf. 15, 1996, pp. 621-629.
`
`* cited by examiner
`
`
`
`U.S. Patent
`
`Sep. 16, 2003
`
`Sheet 1 of 14
`
`US 6,621,799 B1
`
`
`
`COM-PUTER
`
`19.9
`
`
`
`WORKING
`
`I PROCESSOR
`MEMORY
`
`
`
`
`
`
`
`
`
`
`
`
`COMPUTER
`
`10$}
`
`PROGRAM
`SHORTAGE
`
`NETWORK
`INTERFACE
`
`
`
`
`104
`
`PROGRAM
`SHOR TAGE
`
`106 1
`
`102
`
`NETWORK
`INTERFACE
`
`COMPUTER
`
`
`
`
`100
`--
`
`106
`
`INTERFACE
`
`PRIOR ART
`
`FIG.
`
`1
`
`
`
`
`
`WORKING
`
`MEMORY
`
`PROGRAM
`SHORTAGE
`
`T
`
`R PROLESSOR
`
`102
`
`
`
`
`
`U.S. Patent
`
`Sep. 16, 2003
`
`Sheet 2 of 14
`
`US 6,621,799 B1
`
`COMPUTER
`
`192
`
`OTHER
`TRANSPORT
`MODULES
`
`222:
`
`‘
`
`I
`
`
`
`TUNNEL
`
`MODULE
`
`— -—-J
`
`PRIOR ART
`
`FIG. 2
`
`TO INTERNET 120
`
`
`
`U.S. Patent
`
`Sep. 16, 2003
`
`Sheet 3 of 14
`
`US 6,621,799 B1
`
`[:
`
`IP MODULE
`
`F230
`
`PPi
`
`
`
`
`
`I MODULE
`
`
` [____.__. I I0[:2 1% I I L__
`
`,
`
`GRE
`MODULE
`
`U0 Ix.) Q
`
`MODULE
`
`ORE
`
`FIG. 3
`
`
`
`
`
`MODULE
`
`MODULE ppp
`
`"'1
`
`U0 --I C
`
`
`
`U.S. Patent
`
`Sep. 16, 2003
`
`Sheet 4 of 14
`
`US 6,621,799 B1
`
`cm:
`
`3_29
`
`410
`
`420
`
`wane QUEUE
`I _ Z
`,
`
`412
`
`412
`412
`ATO
`_
`450 TIMER
`
`43°
`432
`r———’=——-H
`RETRANSMIT QUEUE
`I:lCl
`
`(:55
`MUX
`330
`
`422-.‘
`I-———-— ——
`I
`I 478
`l
`I
`DON!-2-
`[-
`I
`QUEUED
`:
`K
`'
`5
`i I
`
`
`
`1
`
`I
`I
`'
`J
`
`TO
`GRE
`MUX
`
`330
`
`476
`
`I
`
`474
`
`450 ACK
`
`AC“
`
`470
`
`472 I
`
`l_ _ _ _
`
`__ _ __1
`
`440
`
`READ Queue
`I I
`
`44
`
`2
`
`442 442
`
`FIG. 4
`
`To
`PPP
`MODULE
`
`310
`
`To
`PPP
`MODULE
`310
`
`
`
`U.S. Patent
`
`Sep. 16, 2003
`
`Sheet 5 of 14
`
`US 6,621,799 B1
`
`501
`
`502
`
`503
`
`504
`
`505
`
`505
`
`507
`
`508
`
`Receive Packet from PPP 310:
`
`IF Retransmit Queue 420 is not full THEN
`
`Transmit Packet
`
`ELSE.
`
`Notify PPP module 310 that the communication path is congested
`
`IF Write Queue 410 is not full THEN
`
`Save Packet in Write Queue 410
`
`END IF
`
`END IF
`
`F I G . 5
`
`Transmit Packet:
`
`em
`
`602
`
`603
`
`Append Packet to Retransmit Queue 420
`
`Build Header
`
`Send Packet with Header to GRE Mux 330
`
`FIG. 6
`
`
`
`U.S. Patent
`
`Sep. 16, 2003
`
`Sheet 6 of 14
`
`Us 6,621,799 B1
`
`701
`
`702
`
`703
`
`704
`
`705
`
`706
`
`707
`
`708
`
`709
`
`710
`
`711
`
`712
`
`713
`
`714
`
`715
`
`715
`
`Build Header:
`
`IF Packet has a data payIoad THEN
`
`Record sequence number in Header sequence number field
`
`END IF
`
`IF dane_queued 476 THEN
`
`Record done 478 in Header done field
`
`done_queued476 = False
`
`END IF
`
`IF sack_queued 472 AND read queue 440 has entries THEN
`
`Record sequence numbers of packets in read queue 440 in
`
`Header sack field
`
`END IF
`
`sac/Lqueued472 = False
`
`IF ack_queued 470 THEN
`
`Record ack 474 In Header ack field
`
`acIr_queued 470 = false
`
`END IF
`
`FIG. 7
`
`
`
`U.S. Patent
`
`Sep. 16,2003
`
`Sheet 7 of 14
`
`US 6,621,799 B1
`
`1301
`
`802
`
`803
`
`804
`
`805
`
`806
`
`807
`
`808
`
`809
`
`310
`
`811
`
`Receive Packet from GRE Mux 330:
`
`Process Received Header
`
`IF Received Packet has a data payload THEN
`
`Process Received Payload
`
`END IF
`
`IF the Received Header has a sack THEN
`
`Process retransmit queue 420
`
`ENDIF
`
`IF the Received Header has an ack THEN
`
`Process write queue 410
`
`ENDIF
`
`Process Pending acks and sacks
`
`FIG. 8
`
`
`
`U.S. Patent
`
`Sep. 16, 2003
`
`Sheet 8 of 14
`
`US 6,621,799 B1
`
`901
`
`902
`
`903
`
`904
`
`905
`
`906
`
`907
`
`908
`
`909
`
`910
`
`911
`
`912
`
`913
`
`914
`
`915
`
`915
`
`917
`
`913
`
`919
`
`920
`
`921
`
`922
`
`Process Received Header:
`
`lF Header has a done field THEN
`
`Send packets in read queue 440 with sequence numbers less than or
`
`equal to done field to PPP module 310
`
`aclr_queued 470 = True
`
`ad: 474 = later of dune field and current value of ack 474
`
`END IF
`
`IF Header has a sack field THEN
`
`Reduce cwnd 430
`
`LOOP over sacked packets in retransmit queue 420 D0
`
`Mark packet for no further retransmissions
`
`END LOOP
`
`done_queued 476 = True
`
`END IF
`
`lF Header has an ack field THEN
`
`increase cwnd 430
`
`Remove packets with sequence numbers less than or equal
`
`to ack field from retransmit queue 420
`
`done 478 = later of ack field and cwrent value of done 478
`
`IF retransmit queue 422 is empty THEN
`
`done_queued476 = False
`
`END IF
`
`END IF
`
`FIG. 9
`
`
`
`U.S. Patent
`
`Sep. 16, 2003
`
`Sheet 9 of 14
`
`US 6,621,799 B1
`
`1001
`
`1002
`
`1003
`
`1004
`
`1005
`
`1006
`
`1007
`
`1008
`
`1009
`
`1010
`
`1011
`
`1012
`
`1013
`
`1014
`
`1015
`
`1016
`
`1017
`
`1018
`
`Process Received Payload:
`
`IF packet is a dupiicate packet THEN
`
`Discard the payload
`
`ELSE
`
`Insert packet into read queue 440
`
`IF inserted packet is not the latest sequence number in queue THEN
`
`ar:k_queued 470 = True
`
`END IF
`
`END IF
`
`LOOP over earliest irrsequence packets in read queue 440 D0
`
`Send packet to PPP module 310
`
`Increment ad: 474
`
`acIr_queued 470 = True
`
`END LOOP
`
`IF there are any remaining packets in read queue 440 THEN
`
`IF a sack has not been sent in the last RTT interval THEN
`
`sack_queued 472 = True
`
`END IF
`
`END IF
`
`FIG. 10
`
`
`
`U.S. Patent
`
`Sep. 16, 2003
`
`Sheet 10 of 14
`
`US 6,621,799 B1
`
`1101
`
`1102
`
`1103
`
`1104
`
`1105
`
`1106
`
`1107
`
`1108
`
`1109
`
`1110
`
`1111
`
`1112
`
`Process Retransmil Queue 420:
`
`LOOP over fll'S'£ cwnd 430 packets in retransmit queue 420 D0
`
`IF packet is a candidate for retransmission THEN
`
`increment number of retries for packet
`
`Build Header
`
`Send packet with header to GRE Mux 330
`
`END JF
`
`END LOOP
`
`IF no packets were retransmitted THEN
`
`dane_queued 476 = True
`
`Build Header
`
`Send packet with header but no payload to GRE Mux 330
`
`END IF
`
`FIG. 11
`
`
`
`U.S. Patent
`
`Sep. 16, 2003
`
`Sheet 11 of 14
`
`US 6,621,799 B1
`
`Process write Queue 410:
`
`IF there are any packets in write queue 410 THEN
`
`Transmit next packet in write queue 410
`
`IF retransmit queue 422 still has Iess than cwnd 430 packets THEN
`
`LOOP UNTIL transmit queue 422 is fun DO
`
`Transmit next packet in write queue 410
`
`END LOOP
`
`END IF
`
`END IF
`
`IF retransmit queue has less than 1/2 cwnd 430 THEN
`
`Notify PPP module 310 than channel is decongested
`
`END IF
`
`1201
`
`1202
`
`1203
`
`1204
`
`1205
`
`1206
`
`1207
`
`1208
`
`1209
`
`1210
`
`1211
`
` FIG. 12
`
`
`
`U.S. Patent
`
`Sep. 16, 2003
`
`Sheet 12 of 14
`
`US 6,621,799 B1
`
`1301
`
`1302
`
`1303
`
`1304
`
`1305
`
`1305
`
`1307
`
`1308
`
`1309
`
`1310
`
`1311
`
`Process Pending acks and sacks:
`
`!F sack_queued 472 THEN
`
`Buiid Header
`
`Send packet with header but no payioad to GRE Mux 330
`
`END IF
`
`IF ack_queued 474 AND ack_queued was True before the latest packet
`
`was received THEN
`
`Build Header
`
`Send packet with header but no payload to GRE Mm: 330
`
`ELSE
`
`Update ack timer 460
`
`END IF
`
`FIG. 13
`
`
`
`U.S. Patent
`
`Sep. 16, 2003
`
`Sheet 13 of 14
`
`US 6,621,799 B1
`
`1401
`
`1402
`
`1403
`
`1501
`
`1502
`
`1503
`
`3504
`
`Ack timer 450 Expires:
`
`Build Header
`
`Send packet with header but no payload to GRE Mux 330
`
`Reset ack timer 460
`
`FIG .
`
`1 4
`
`ATO timer 450 Expires:
`
`Process retransmit queue 420
`
`F no packets were sent from retransmit queue 420 THEN
`
`Clear retransmit queue 420
`
`END IF
`
`FIG. 15
`
`
`
`
`
`US 6,621,799 B1
`
`1
`SEMI-RELIABLE DATA TRANSPORT
`
`BACKGROUND
`
`This invention relates data transport over a data channel.
`Data is typically transported over a data channel, such as
`over a data network, using a combination of communication
`protocols. For instance, on the Internet, data is sent between
`computers coupled to the Internet according to the Internet
`Protocol (IP), a “network layer” protocol. A communication
`session between software, such as two applications, execut-
`ing on different computers typically uses a transport layer
`protocol to pass data between the computers. Two transport
`layer protocols used on the Internet are the Transport Con-
`trol Protocol (TCP) and the User Datagram Protocol (UDP).
`Both these protocols are layered on IP to pass data between
`computers.
`The TCP protocol provides reliable and in-sequence
`delivery of data from one computer to another. Based on
`acknowledgements sent back from a receiving computer, the
`sending computer retransmits data if needed. The UDP
`protocol, on the other hand, does not provide reliable or
`in-sequence delivery of data.
`Communication passing between two computers accord-
`ing to a network layer communication protocol, such as IP
`or IPX, can pass between the computers over a virtual
`connection rather than over a physical connection in a
`technique known as “tunneling.” The virtual connection
`itself uses a transport layer protocol and IP to communicate
`over the Internet. The original network layer data streams
`that are passed between them can be compressed and
`encrypted before being passed over the virtual connection.
`
`SUMMARY
`
`According to a general aspect of the invention, a new type
`of communication protocol provides semi—reliable transport
`of data over a data channel, such as over the Internet. Unlike
`transport layer protocols, such as TCP,
`in which data is
`retransmitted from a source computer to a destination com-
`puter until it is successfully delivered to and acknowledged
`by the destination computer, the new type of protocol limits
`the number of retransmissions and may eventually “give up”
`on successfully delivering particular data and go on sending
`subsequent data to the destination. On the other hand, unlike
`transport layer protocols, such as UDP, in which data is not
`retransmitted if it is not successfully delivered, the new type
`of communication protocol provides error control using
`limited numbers of retransmissions.
`
`When a reliable communication protocol, such as TCP/IP
`is tunneled between two computers over a virtual connection
`which uses the new type of semi—reliable protocol, overall
`error control of data passing between the two computers
`involves elements of error control implemented by both the
`semi—reliable protocol and the reliable protocol. Advantages
`of this overall error control can include higher throughput
`than is provided by using either a completely reliable
`protocol (e.g., TCP) for the virtual connection, or a com-
`pletely unreliable protocol (e.g., UDP) for the virtual con-
`nection. This advantage can be even more pronounced if the
`data stream is compressed or encrypted before being passed
`over the virtual connection using a technique which main-
`tains state from one data packet to another.
`In general, in one aspect, the invention is a method for
`communicating between a first software module, such as an
`application or a communication module or driver, on a first
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`computer and a second software module on a second com-
`puter over a data channel. The data channel can pass over a
`data network such as the Internet. The method involves
`establishing a communication session, for instance at a
`transport layer, coupling the first software module and the
`second software module over the data channel. The method
`then includes sending outbound data from the first software
`module to the second software module over the communi-
`cation session. Sending this outbound data includes trans-
`mitting a first packet that includes the outbound data from
`the first computer to the second computer. Prior to receiving
`an indication from the second computer that the first packet
`was successfully received, such as an acknowledgement of
`the first packet or of a set of packets including the first
`packet, the method includes transmitting a second packet
`that includes the outbound data (that is, retransmitting the
`outbound data) from the first computer to the second com-
`puter. Prior to receiving an indication from the second
`computer that the second packet was successfully received,
`the method includes transmitting a third packet that includes
`an indication, for example, in the header of the third packet,
`that indicates that
`the outbound data will not be further
`transmitted from the first computer to the second computer.
`The method can further involve receiving inbound data at
`the first software module over the communication session
`from the second software module. Receiving the inbound
`data then includes receiving a first packet from the second
`computer that
`includes the inbound data, buffering the
`inbound data, and waiting for receipt of a packet from the
`second computer that includes prior inbound data that was
`sent by the second computer prior to sending the inbound
`data. The method then includes receiving a second packet
`from the second computer that includes an indication that
`prior inbound data will not be retransmitted by the second
`computer, and providing the inbound data to the first soft-
`ware module.
`Preferred embodiments of the invention include one or
`more of the following features.
`The first software module and the second software mod-
`
`ule implement a network layer protocol over a data network.
`The first and second software modules implement a
`network layer protocol and the outbound data includes
`network layer communication. For instance,
`the first and
`second software modules can tunnel network layer commu-
`nication over the communication session between the com-
`puters.
`least one additional
`The method includes sending at
`packet that includes the outbound data (that is, retransmit-
`ting the outbound data) prior to transmitting the third packet.
`The first software module implements a state-dependent
`data processing algorithm, such as a compression or an
`encryption algorithm, in which data processing of the out-
`bound data depends on outbound data that was previously
`sent from the first software module to the second software
`module.
`
`In general, in another aspect, the invention is a method for
`passing data over a data channel from a source to a desti-
`nation. The method includes transmitting a first data packet
`from the source to the destination, retransmitting the first
`data packet from the source to the destination, and sending
`from the source to the destination an indication that the first
`data packet will not be further retransmitted. Sending the
`indication that
`the first data packet will not be further
`retransmitted can include transmitting a second data packet
`from the source to the destination which includes the indi-
`cation that
`the first data packet will not be further
`retransmitted, for instance, in the header of the second data
`packet.
`
`
`
`US 6,621,799 B1
`
`3
`Preferred embodiments of the invention can further
`include, subsequent to transmitting the first data packet,
`transmitting a second data packet from the source to the
`destination, and prior to retransmitting the first data packet,
`accepting an indication that the second data packet was
`received at the destination prior to the first data packet being
`received at the destination.
`
`In general, in another aspect, the invention is a method for
`passing data over a data channel from a source to a desti-
`nation. The method includes receiving a first packet from the
`source that includes the data, and buffering the data, while
`waiting for receipt of a packet from the source that includes
`prior data that was sent by the source prior to sending the
`data. The method then includes receiving a second packet
`from the source that includes an indication that prior data
`will not be retransmitted by the source and then providing
`the buffered inbound data to the destination.
`
`the invention is a data
`in another aspect,
`In general,
`communication module for passing data between a first
`computer and a second computer over a data channel. The
`communication module includes a retransmission storage,
`such as a queue, and a retransmitter coupled to the retrans-
`mission storage. The retransmission storage holds informa-
`tion related to a set of packets previously transmitted from
`the first computer to the second computer. The storage
`related to each of the packets includes a retransmission
`counter used to determine whether the packet is a candidate
`for retransmission to the second computer. The retransmis-
`sion counter is updated when the packet is retransmitted.
`The retransmitter processes packets in the retransmission
`storage,
`including retransmitting a packet to the second
`computer if its associated retransmission counter indicates
`that the packet is a candidate for retransmission, and sending
`an indication that a packet whose counter indicates that it is
`not a candidate for retransmission.
`
`Other features and advantages of the invention will be
`apparent from the following description, and from the
`claims.
`
`DESCRIPTION OF DRAWINGS
`
`FIG. 1 illustrates several computers interconnected
`through the Internet;
`FIG. 2 illustrates software modules, including applica-
`tions and a protocol stack, which execute on a computer;
`FIG. 3 illustrates elements of a tunnel module which is
`
`part of the protocol stack executing on a computer;
`FIG. 4 illustrates elements of a transport layer module that
`is part of the tunnel module;
`FIG. 5 is a pseudocode listing of a procedure used to
`process an outbound packet;
`FIG. 6 is a.pseudocode listing of a procedure used to
`transmit a packet;
`FIG. 7 is a pseudocode listing of a procedure used to build
`a header for an outbound packet;
`FIG. 8 is a pseudocode listing of a procedure used to
`process an inbound packet;
`FIG. 9 is a pseudocode listing of a procedure used to
`process the header of an inbound packet;
`FIG. 10 is a pseudocode listing of a procedure used to
`process the payload of an inbound packet;
`FIG. 11 is a pseudocode listing of a procedure used to
`process packets in the retransmit queue;
`FIG. 12 is a pseudocode listing of a procedure used to
`process packets in the write queue;
`
`4
`FIG. 13 is a pseudocode listing of a procedure used to
`process pending acknowledgments and selective acknowl-
`edgments;
`FIG. 14 is a pseudocode listing of a procedure executed
`when the acknowledgment timer expires;
`FIG. 15 is a pseudocode listing of a procedure executed
`when the adaptive timeout timer expires; and
`FIG. 16 illustrates an exemplary sequence of transmis-
`sions between two GRE modules.
`
`DESCRIPTION
`
`1 System Overview (FIG. 1)
`Referring to FIG. 1, multiple computers 100 communicate
`with one another over the Internet 120, a packet switched
`data network. Each computer 100 includes a network inter-
`face 108 through which the computer makes a physical
`communication path to the Internet. A variety of types of
`network interfaces 108 can be used depending on the type of
`physical connection used, including, for example, a modem
`to make a communication path over a dialed telephone
`connection. Each computer 100 also includes a processor
`102 and program storage 104, which provides a static
`storage for the software that implements the applications and
`software modules described below. Each computer also
`includes working memory 106, which is used while execut-
`ing the applications and software modules.
`Computers 100 can send data to one another over Internet
`120 using the Internet Protocol (IP). IP is a network layer
`protocol, which provides an addressing capability that is
`used to route individual packets from one computer 100 to
`another. The packets generally travel through multiple com-
`munication links 122 that make up Internet 120, being
`routed from link to link according to the destination address
`included in each packet.
`Pairs of computers 100 can also communicate by first
`setting up a connection (e.g., a transport layer communica-
`tion session) over Internet 120 and then using this connec-
`tion as if they were a physical connection (i.e., a direct link)
`between the computers. Communication links 130, 132, and
`134 illustrate such connections. Such connections are often
`
`known as communication “tunnels.” In the system described
`below, communication tunnels are used to pass IP packets
`from one computer to another encapsulated in other packets
`that are used to send data over the tunnel connection.
`Network protocols other
`than IP, such as IPX, can
`alternatively, or concurrently, be sent through such a com-
`munication tunnel.
`
`In this embodiment each computer encapsulates IP pack-
`ets for transmission through a communication tunnel accord-
`ing to the standard Point-to-Point Protocol (PPP). A descrip-
`tion of PPP can be found in Internet Request for Comments
`(RFC) 1661. Other encapsulation protocols can alternatively
`be used.
`After encapsulating the IP packets in a PPP data stream,
`the computer sends the resulting PPP data stream using an
`extension of the standard GRE transport layer protocol (RFC
`1701). The computer passes the packets of the GRE data
`stream over the Internet using the IP network layer protocol.
`PPP includes the capability to compress and encrypt each
`packet
`it processes. For instance, RFC 1962 and 1968
`describe such capabilities. PPP’s compression and encryp-
`tion can operate in what is known as a “stateless” mode, or
`alternatively in a “stateful” mode. In stateless compression
`or encryption, each packet is treated separately without
`requiring that the receiver of the compressed or encrypted
`packets rely on the sequence of prior packets to process a
`received packet. The stateless mode is tolerant of data loss
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`
`
`US 6,621,799 B1
`
`5
`on the PPP data stream; if the data for a packet is lost of
`damaged, subsequent packets can be processed despite the
`missing packet.
`In “stateful” compression and encryption, the compres-
`sion or encryption of one packet generally depends on prior
`packets and thereby may achieve a higher compression
`factor or faster encryption than would stateless compression
`and encryption. In order to process a received packet, the
`receiver of the packet must first process each packet in the
`sequence of packets from an initial reset state (e.g., at the
`initiation of the communication session) up to an including
`the received packet. If a packet is lost or damaged, the PPP
`protocol supports a resynchronization procedure. In PPP’s
`resynchronization procedure the receiver of PPP communi-
`cation requests that the transmitter reset its state. Compres-
`sion and encryption of packets sent after the reset do not
`depend on packets sent before the reset. Therefore, the lost
`packets are then not required for processing packets that will
`be sent after the transmitter resets its state.
`
`Many transport layer protocols used for communication
`on the Internet, including TCP and GRE, send multiple
`packets without requiring an acknowledgment in a “sliding
`window” technique. Using this technique, many packets can
`be “in flight,” thereby providing a higher communication
`rate than if an acknowledgment of each packet must be
`received by the transmitting computer before it sends the
`next packet. PPP’s resynchronization procedure can incur a
`significant performance penalty since the packets sent after
`the lost packet but before the reset cannot be processed by
`the receiver. Due at least in part to this performance penalty,
`PPP is typically used in prior systems with stateless com-
`pression and encryption when communicating through a
`tunnel over the Internet. In this system, however, PPP is used
`with stateful compression and encryption.
`Although the system described below uses IP and com-
`munication over the Internet, alternative versions of the
`system could use other data networks and other network
`layer protocols. Similarly, alternatives to PPP can be used to
`encapsulate network layer protocols for transmission over
`the data network.
`
`2 Software Architecture (FIGS. 2-4)
`Referring to FIG. 2, multiple interacting software mod-
`ules execute on each computer 100. One or more applica-
`tions 210 on one computer 100 communicate with applica-
`tions on other computers across Internet 120. A layered set
`of communication modules on computer 100 forms a pro-
`tocol stack 205, which implements the overall communica-
`tion protocol used to communicate between the computers.
`Applications 210 on two different computers 100 commu-
`nicate over a path that includes protocol stack 205 at one
`computer 100, Internet 120, and protocol stack 205 on the
`other computer 100.
`At the “top” layer of protocol stack 205, applications 210,
`in general, communicate with a transport layer module, such
`as TCP module 220, or one of an variety of other transport
`modules 222,
`in order to communicate over the Internet.
`TCP module 220 and other transport modules 222 in turn
`communicate with IP module 230 which provides network
`layer services to the transport layer modules. IP module 230
`can pass data directly to a data link module 260, which
`provides low-level services for communication with other
`computers 100 over Internet 120. Addressing information
`provided by IP module 230 is used to direct each data packet
`from link to link on Internet 120 to reach an appropriate
`destination computer 100.
`When an application 210 sends data using TCP/IP over
`Internet 120 to a destination application 210 on another
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`computer, it first passes the data to TCP module 220. TCP
`module 220 passes the data to IP module 230 as a series of
`data packets. When a tunnel does not couple the computers,
`IP module 230 then passes each data packet, which includes
`its destination addressing information, to data link module
`260. Data link module 260 passes the data packet with its
`destination address onto Internet 120. The addressing infor-
`mation in the packet is used to direct the packet over Internet
`120 to the destination computer, where it passes from a data
`link module 260, to an IP module 230, then to a TCP module
`220 and finally to a destination application 210. At the
`sending computer, TCP module 220 determines whether
`data it sent was correctly received by the receiving TCP
`module 220 based on acknowledgments returned from the
`receiving TCP module 220. If necessary, the sending TCP
`module 220 retransmits lost or corrupted packets.
`Rather than sending data directly from IP module 230 to
`data link module 260 and then to Internet 120, a tunnel
`connection can be established between two computers 100.
`Two IP modules 230, one on each computer then commu-
`nicate with one another as if the tunnel connection were a
`
`physical connection. In particular, at the sending computer
`100, IP module 230 communicates with a combination of
`modules 235, which together provide data link layer services
`to IP module 230. In this combination of modules 235, a
`tunnel module 240 provides data link layer services to IP
`module 230. Tunnel module 240 establishes transport layer
`connections to one or more tunnel modules on other com-
`
`puters using the services of IP module 250. IP module 250
`in turn uses the data link layer services of data link module
`260. On a particular computer 100, IP module 250 can be a
`separate from IP module 230 (i.e., a separate instance), or
`can be part of a single software module, which implements
`the functionality of both IP module 230 and IP module 250.
`Referring to FIG. 3, tunnel module 240 includes a number
`of PPP modules 310, or equivalently, logical instances of a
`single software module. Each PPP module 310 provides
`physical layer services to IP module 230 for communication
`with a single corresponding PPP module 310 on a remote
`computer. That is, in general, one PPP module 310 is used