`Connery et al.
`
`US005937169A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,937,169
`Aug. 10, 1999
`
`[54] OFFLOAD OF TCP SEGMENTATION TO A
`SMART ADAPTER
`
`[75] Inventors: Glenn William Connery; W. Paul
`Sherer, both of Sunnyvale; Gary
`Jaszewski, Los Gatos; James S.
`Binder, San Jose, all of Calif.
`[73] Assignee: 3Com Corporation, Santa Clara, Calif.
`
`“Internet Protocol: DARPA Internet Program Protocol
`Specification”, rfc 791, prepared by Univ. of Southern Calif.,
`dated Sep. 1981, printed from web site “http://www.cis.o
`hio-state.edu/htbin/rfc”, 42 pages.
`Gilbert, H., “Introduction to TCP/IP”, dated Feb. 2, 1995,
`printed from web site “http://pclt.cis.yale.edu/pclt/comm/
`tcpip.htm”, 5 pages.
`
`[56]
`
`References Cited
`
`Primary Examiner—Robert B. Harrell
`[21] Appl. No.: 08/960,238
`Attorney, Agent, or Firm—Mark A. Haynes; Wilson,
`[22] Filed:
`Oct. 29, 1997
`Sonsini, Goodrich & Rosati
`[51] Int. Cl." … G06F 13/38
`[57]
`ABSTRACT
`[52] U.S. Cl. … 395/200.8
`A method is provided for sending data from a data source
`[58] Field of Search ........................ 364/DIG. 1, DIG. 2;
`executing a network protocol such as the TCP/IP protocol
`395/2003, 200.36, 200.37, 200.48, 2005,
`stack, which includes a process for generating headers for
`200.53, 200.55, 200.6, 200.66, 200.8
`packets according to the network protocol. The method
`includes sending such data on a network through a smart
`network interface. The network protocol defines a datagram
`in the data source, including generating a header template
`and supplying a data payload. The datagram is supplied to
`the network interface. At the network interface, a plurality of
`packets of data are generated from the datagram. The
`plurality of packets include respective headers, such as
`TCP/IP headers, based on the header template, and include
`respective segments of the data payload. The network inter
`face supports packets having a pre-specified length, and the
`data payload is greater than the pre-specified length, such as
`two to forty times larger or more. Thus, the higher layer
`processing specifies a very large datagram, which is auto
`matically segmented at the network interface layer, instead
`of at the TCP layer.
`
`U.S. PATENT DOCUMENTS
`5,321,819 6/1994 Szczepanek .......................... 395/200.8
`5,727,149 3/1998 Hirata et al. ......................... 395/200.8
`OTHER PUBLICATIONS
`Postel, J., “The TCP Maximum Segment Size and Related
`Topics”, rfc879, dated Nov. 1983, printed from web site
`“http://www.cis.ohio-state.edu/htbin/rfc”, 10 pages.
`Clark, D., “Window and Acknowledgement Strategy in
`TCP”, rfc813, dated Jul. 1982, printed from web site “http://
`www.cis.ohio-state.edu/htbin/rfc”, 18 pages.
`“Transmission Control Protocol: DARPA Internet Program
`Protocol Specification”, rfc793, prepared by Univ. of South
`ern Calif., dated Sep. 1981, printed from web site “http://
`www.cis.ohio-state.edu/htbin/rfc”, 77 pages.
`
`83 Claims, 6 Drawing Sheets
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`DMA ENGINE
`
`li?
`%
`#
`H
`= |
`g
`?º
`
`KD
`2
`É
`| ||
`?º
`
`TEMPLATE
`PAYLOAD
`RAM
`
`#
`DATA PATH AND ºf
`f
`TCP/IP *—- BUFFERING
`CHECKSUM
`<!.
`>
`LOGIC
`Cl
`
`26
`
`PROGRAM
`MEMORY:
`XMIT
`RCV
`TCP/IP SEGM,
`ETC,
`31
`-
`
`TO NETWORK
`?/IEDIUM
`
`Ex.1043.001
`
`DELL
`
`
`
`U.S. Patent
`
`Aug. 10, 1999
`
`Sheet 1 of 6
`
`5,937,169
`
`
`
`
`
`
`
`HOST CPU
`11.
`
`HOST
`MEMORY
`12.
`
`
`
`HOST I/O
`ETC.
`
`13.
`
`
`
`
`
`
`
`
`
`
`
`
`
`HOST BUS
`
`14.
`
`
`
`
`
`
`
`15.
`
`NIC
`(TCP/IP SEG.
`MODE
`SUPPORT)
`
`
`
`
`
`
`
`X.
`Cr.
`17 §3
`##
`# g.
`:=
`H
`
`PROGRAM
`10 | MEMORY:
`TCP/IP W/
`SEG. MODE
`MAC DRIVER W/
`SEG. MODE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`DMA ENGINE
`
`
`
`Lll
`CD
`%
`2
`É
`É
`H
`H:
`; z
`| 3 || #
`; L ºn
`:
`
`TcPMP
`CHECKSUM
`LOGIC
`
`!
`#|
`|
`| |
`
`:
`
`:
`
`TEMPLATE
`PAYLOAD
`RAM
`
`33
`
`Lil
`2
`DATA PATH AND C?
`BUFFERING | ºf
`<?
`>
`O
`
`28.
`
`22.
`
`17
`
`FIG. 2
`
`TO NETWORK
`MEDIUM
`
`PROGRAM
`MEMORY:
`
`XMIT
`RCV
`TCP/IP SEGM.
`ETC
`.
`31.
`
`Ex.1043.002
`
`DELL
`
`
`
`U.S. Patent
`
`Aug. 10, 1999
`
`Sheet 2 of 6
`
`5,937,169
`
`60
`
`SEND BIG
`FILE
`
`62
`TEMPLATE HDR
`(FIG. 4)
`OOB:
`SEG. CMD,
`MSS
`GDS FOR
`DATAGRAM
`
`64
`XMIT W/SEG
`TEMPLATE HDR
`GDS
`MSS
`
`INDICATIONS
`
`61
`
`REGISTER
`CAPABILITY,
`INDICATIONS
`
`63
`
`INDICATIONS
`SHARED STATE
`INFO
`
`65
`
`
`
`DATA SOURCE
`(HIGHER LAYERS)
`50.
`
`TCP/IP STACK
`
`MAC DRIVER
`
`SMART NIC
`(FIGS. 5-7)
`
`FIG. 3
`
`Ex.1043.003
`
`DELL
`
`
`
`U.S. Patent
`
`Aug. 10, 1999
`
`Sheet 3 of 6
`
`5,937,169
`
`
`
`
`
`
`
`
`
`
`
`0 || ||
`
`Ex.1043.004
`
`DELL
`
`
`
`U.S. Patent
`
`Aug. 10, 1999
`
`Sheet 4 of 6
`
`5,937,169
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`OPTIONAL
`INTER
`PACKET
`PROCESS
`
`RECEIVE SEND
`REQ
`
`200
`
`READ MSS
`
`PULL PACKET
`USING GDS
`
`PULL MSS SIZED
`SEGMENT USING
`GDs
`
`PACKET SEND
`PROCESS
`
`PRODUCE
`HEADER FROM
`TEMPLATE
`HEADER AND
`COMPUTE
`CHECKSUMS
`
`
`
`
`
`
`
`PACKET SEND
`PROCESS
`
`
`
`
`
`DATAGRAM
`FINISHED 7
`
`204
`
`FIG. 5
`
`Ex.1043.005
`
`DELL
`
`
`
`U.S. Patent
`
`Aug. 10, 1999
`
`Sheet 5 of 6
`
`5,937,169
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`INTER-PACKET
`PROCESS
`
`300
`
`301
`
`3O2
`
`303
`
`304
`
`MORE RECENT
`DATAGRAM
`7
`
`MORE
`RECENT WIN/
`ACK 7
`
`UPDATE
`TEMPLATE
`
`SEQ
`OUT OF
`ORDER 2
`
`
`
`STOP CURRENT
`DATAGRAM,
`BEGIN PROCESS
`FOR RECENT
`DATAGRAM
`
`
`
`/- 306
`
`RETURN
`
`FIG. 6
`
`Ex.1043.006
`
`DELL
`
`
`
`U.S. Patent
`
`Aug. 10, 1999
`
`Sheet 6 of 6
`
`5,937,169
`
`INTER-PACKET
`PROCESS
`
`400
`
`
`
`
`
`
`
`
`
`NO
`
`
`
`TCP
`PACKET
`RCV"D FOR THIS
`SESSION ?
`
`401
`
`YES
`
`402
`
`STOP SENDING
`THIS DATAGRAM
`
`403
`
`
`
`
`
`REPORT SEND
`ERROR
`
`405
`
`404
`
`FIG. 7
`
`Ex.1043.007
`
`DELL
`
`
`
`5,937,169
`
`1
`OFFLOAD OF TCP SEGMENTATION TO A
`SMART ADAPTER
`
`BACKGROUND OF THE INVENTION
`
`10
`
`15
`
`20
`
`1. Field of the Invention
`The present invention relates to network protocols for
`data networks; and more particularly to a process for off
`loading higher protocol layer processing such as TCP/IP
`processing for sending data files onto a smart network
`interface adapter.
`2. Description of Related Art
`Data networks are controlled by network protocols which
`according to the commonly used ISO model are classified
`into layers. The ISO layers include a physical layer, a data
`link layer of which the medium access control MAC layer is
`a subset, a network layer and so on.
`The physical and MAC layers are typically implemented
`on network adapter cards with efficient integrated circuits.
`Higher layers are handled by software drivers for the adapter
`cards and by a protocol stack executed in the host processor.
`The drivers and protocol stack require relatively intense
`processing by the host, particularly in serving applications
`that require substantial network traffic.
`25
`According to typical protocols, the host processor com
`poses the packets, generates headers and checksums, and
`transfers the composed packets down the stack to the driver.
`The driver sends the packet to the network adapter card. As
`the data is transferred down the stack to the card, significant
`host processing at each layer is required.
`One common protocol stack includes the transmission
`control protocol TCP running over the Internet Protocol IP.
`commonly referred to as TCP/IP, TCP is a connection
`oriented, end-to-end reliable protocol designed to fit into a
`layered hierarchy of protocols which support multi-network
`applications. Processes running in the host system transmit
`data by calling on TCP and passing buffers of data as
`arguments. The TCP packages the data from these buffers
`into appropriately sized segments, and calls on the IP layer
`to transmit each segment to the destination. On the receive
`side, the TCP stack/layer places the data from one or more
`segment into the receiving user’s buffer, and notifies the
`receiving user.
`The IP module is associated with the TCP and provides an
`interface to the local network. This IP module packages the
`TCP segments inside Internet packets and routes these
`packets to a destination at the IPlayer, or to an intermediate
`gateway. The IP module may also break the TCP segments
`into smaller IP fragments, to address lower layer packet size
`issues. To transmit the packet through the local network, it
`is embedded in a local network packet at lower layers of the
`process. The drivers at the lower layers may perform further
`packaging, fragmentation or other operations to achieve the
`delivery of the local packet to the destination.
`Transmission according to the TCP/IP model is made
`reliable via the use of sequence numbers and acknowledg
`ments. Conceptually, each octet of data is assigned a
`sequence number. The sequence number of the first octet of
`data in a segment is transmitted with that segment, and is
`called the segment sequence number. Segments also carry an
`acknowledgment number which is the sequence number of
`the next expected data octet of transmissions in the reverse
`direction. When the TCP module transmits a segment con
`taining data, it puts a copy on the transmission queue and
`starts a timer. When acknowledgment for that data is
`received, the segment is deleted from the queue. If the
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`acknowledgment is not received before the timer runs out,
`the segment is retransmitted.
`To govern the flow of data between TCP modules, a flow
`control mechanism is employed. The receiving TCP module
`reports a window to the sending TCP. This window specifies
`the number of octets, starting with the acknowledgment
`number, that the receiving TCP is currently prepared to
`receive. The number of bytes specified as the window, is the
`maximum number of bytes which a sender is permitted to
`transmit until the receiver opens some additional window.
`Thus, the sender controls the amount of data sent onto the
`network so that it does not exceed the size of the advertised
`window of the destination.
`According to the typical prior art system, the size of the
`segment sent by the TCP protocol down to the IPlayer must
`match one-to-one with the packets transmitted by the IP
`layer to the network (ignoring IP fragmentation). The driver
`passes packets from the IP layer to the MAC in the network
`interface card. For example, in the network driver interface
`specification NDIS driver model for Windows based
`platforms, packets are passed to the MAC driver as NDIS
`PACKET structures. These structures are basically a list of
`buffers that put together make up the packet. Also, some
`out-of-band OOB data is allowed per packet according to the
`NDIS model (for example, an indication of priority). These
`packet structures are constrained to the maximum packet
`size for the media, for example 1514 bytes for Ethernet. This
`packet size structure propagates up the TCP/IP protocol
`stack. This requirement results in significant processing at
`and above the TCP layer in order to package large buffers for
`transmission across the network.
`Accordingly, it is desirable to improve the performance of
`data processing systems, by simplifying the higher layer
`processing which must be performed by the host system in
`order to transmit large quantities of data across data net
`works.
`
`SUMMARY OF THE INVENTION
`According to the present invention, a significant portion
`of the higher layer transmit processing is offloaded onto a
`smart adapter. The present invention accomplishes this off
`loading without interfering with other processing in the
`system, without breaking other protocols, and without harm
`ing the performance to the overall system.
`The present invention provides a method for sending data
`from a data source executing a network protocol, such as the
`TCP/IP protocol stack, which includes a process for gener
`ating packet control data, such as TCP/IP headers for packets
`according to the network protocol. The method includes
`sending such data on a network through a smart network
`interface. According to the process, the network protocol
`defines a large datagram from the data source (buffer),
`including generating a packet control data template and
`supplying a data payload. The datagram is supplied to the
`network interface. At the network interface, a plurality of
`packets of data are generated from the datagram. The
`plurality of packets include respective packet control fields,
`such as TCP/IP headers, based on the packet control data
`template, and include respective segments of the data pay
`load. According to the present invention, the network inter
`face supports packets having a pre-specified length, and the
`data payload is greater than the pre-specified length, such as
`two to forty times larger or more.
`Thus, the higher layer processing specifies a very large
`datagram, which is automatically segmented at the network
`interface layer, instead of at the TCP layer. Significant host
`
`Ex.1043.008
`
`DELL
`
`
`
`5,937,169
`
`3
`processing is thus offloaded to a smart network interface. For
`the Ethernet example in which the maximum packet size on
`the medium is 1514 bytes,
`the protocol according to the
`present invention is allowed to pass much bigger datagrams,
`up to 64k bytes or more, knowing that the smart adapter will
`take care of segmenting them into proper sized Ethernet
`packets and transmitting them. Except for the fact that the
`datagrams are very large, substantially the same interface
`and data structure can be used.
`
`According to other aspects of the invention, the network
`protocol comprises TCP/1P. The TCP/1P header template has
`an IP total length field set to indicate the length of the data
`payload. The step of generating in the network interface a
`plurality of packets includes setting the IP total length fields
`in the plurality of packets based on the size or sizes of the
`respective segments of the data payload included in the
`plurality of packets. Also, in the TCP/IP header template, an
`IP identification field is set
`to an initial value for
`the
`
`datagram. In generating the packets at the network interface
`layer, IP identification values are set based on the initial
`value, such as by using the initial value for the first packet,
`and incrementing it
`thereafter until all packets for the
`datagram have been transmitted. In addition,
`the TCP/TP
`header template includes an initial TCP sequence number for
`the datagram. In generating the plurality of packets, TCP
`sequence numbers are provided in each packet based on the
`initial TCP sequence number and the size or sizes of the
`respective segments of the data payload. According to
`another aspect of the invention,
`the TCP/IP header for
`TCP/1P protocols includes an IP header checksum field, and
`a TCP checksum field. In generating the plurality of packets
`based on the datagram, the network interface computes the
`IP header checksums and TCP checksums for each of the
`
`plurality of packets.
`Other TCP/IP functions are either disabled or supported
`for the large datagrams, without requiring modification of
`the underlying protocol functions.
`The present
`invention can also be characterized as a
`method for sending data from a data source executing a
`TCP/IP network protocol. The method according to this
`aspect includes establishing a connection with a destination
`for a session according to the TCP/IP network protocol.
`Next, a TCP window size is determined from the destination
`which indicates an amount of data the destination is ready to
`receive. The datagram is defined in the data source by
`generating a TCP/IP header template and supplying a data
`payload having a size less than or equal to the window size.
`The datagram is supplied, along with a segment size param-
`eter and a request to segment the datagram to the network
`interface. In the MAC driver in the network interface, a
`plurality of packets are generated in response to the segment
`size parameter and to the request to segment. The plurality
`of packets is composed from the datagram by executing
`processes in the network interface to provide respective
`TCP/1P headers based on the TCP/IP header template, to
`provide respective segments of the data payload having
`lengths equal to or less than the segment size parameter, and
`to compute IP header checksums and TCP checksums for the
`plurality packets. The packets are then sent to the destina-
`tion. Finally, an acknowledgment
`is received from the
`destination that the plurality of packets was successfully sent
`according to the network protocol.
`If a TCP packet with a non—zero data payload is received
`before a last packet in the plurality of packets is sent, then
`there is a possibility that the windowing and sequencing of
`the TCP protocol will fall out of synchronism. This can be
`ignored, or a variety of optional techniques can be applied
`
`10
`
`15
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`to handle it or reduce its impact. Thus, according to one
`alternative, all unsent packets in the plurality of packets for
`the datagram are abandoned by the network interface if a
`TCP packet with a non-zero payload is received before the
`last packet is sent. It is responsibility of the higher layer to
`resend the datagram in this case. According to another
`alternative, the step of generating the plurality of packets
`includes the processes before sending each packet of deter—
`mining whether a more recent datagram for the same session
`has been supplied to the network interface. If no more recent
`datagram has been supplied, then the TCP header acknowl-
`edgment number and window field are set to the values in
`the TCP/IP header template. If a more recent datagram has
`been supplied, then the TCP header acknowledgment num-
`ber and window field are set to the values in the TCP/IP
`header template of the more recent datagram. According to
`another alternative, the MAC in the network interface may
`hold onto the received packets for this session until the last
`packet has been sent.
`the
`According to yet another aspect of the invention,
`process includes sending a plurality of datagrams to the
`network interface for the same session. Sequence numbers
`are assigned to the plurality of datagrams. The step of
`generating the plurality of packets for current datagram
`includes determining whether a more recent datagram has
`been supplied having a sequence number which precedes a
`sequence number in the current datagram. If it has, then the
`generating of the plurality of packets for the current data-
`gram is stopped, and the generating of the plurality of
`packets for the more recent datagram is begun. This tends to
`maintain the order of transmission to a greater degree, in the
`event that an earlier transmitted datagram in the sequence is
`being retransmitted.
`Also, in the process of sending a plurality of datagrams to
`the network interface for the same session,
`the step of
`generating the plurality of packets for a current datagram
`includes for a last packet in the plurality of packets, deter—
`mining whether data from a following datagram falls in
`sequence with it. If it does,
`then data from the current
`datagram may be concatenated with data from the following
`datagram to compose the last packet.
`A variety of other optimizations and techniques for off-
`loading TCP/IP segmentation to a smart adapter are pro—
`vided according to the present invention. Furthermore, the
`present invention is also extendable to offloading further
`TCP layer send processing to the smart adapter.
`These processes provide significant benefits. For
`example, servers send more data than they receive, and most
`network benchmarks are even more heavily weighted to
`sending. Handling a large quantity of data at the TCP layer
`allows the protocol, for example, to avoid allocating blocks
`of data for copies of the packet header, copying it, and
`freeing it. A variety of other processes involved in the
`transitions from protocol to driver are also avoided, includ-
`ing a variety of interrupts for transmit completions for the
`packets, for acknowledgments of the packets, and for other
`processing steps.
`According to other variations, a shared state structure can
`be created in which ACK and window parameters for the
`TCP/IP protocol stack are maintained. The MAC driver
`updates the ACK and Window parameters in outgoing
`packets from the shared structure periodically. Also, in other
`embodiments the windowing and retransmission algorithms
`are handled at higher layers without requiring the network
`interface to pick these up. The protocol would simply have
`to limit itself to payload data up to the end of the advertised
`
`DELL Ex.1043.009
`Ex.1043.009
`
`DELL
`
`
`
`5,937,169
`
`5
`window. Other subsets and variations are also possible. For
`example,
`the MAC driver could manage the transmit
`window, watching incoming receive frames for an opening
`in the window and only transmitting datagrams within the
`current window.
`
`Other aspects and advantages of the present invention can
`be seen upon review of the figures, the detailed description,
`and the claims which follow.
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`FIG. 1 is a simplified block diagram of a data processing
`system with offloading according to the present invention.
`FIG. 2 is a simplified block diagram of a smart network
`interface card implementing the TCP segmentation of the
`present invention.
`FIG. 3 is a network protocol layer diagram which pro-
`vides a simplified illustration of the present invention.
`FIG. 4 is a simplified diagram of a TCP/IP header
`template generated according to the present invention.
`FIG. 5 provides a flow chart of the datagram handling
`processes executed by the network interface according to the
`present invention.
`FIG. 6 illustrates optional inter-packet processes to be
`executed by the network interface.
`FIG. 7 illustrates another optional inter-packet process to
`be executed by the network interface driver.
`
`DETAILED DESCRIPTION
`
`Adetailed description of the present invention is provided
`with respect to FIGS. 1—7, in which FIGS. 1 and 2 illustrate
`the hardware system environment.
`FIG. 1 shows a data processing system 10 which includes
`a host central processing unit 11, host memory 12, host
`input/output devices 13, such as keyboards, displays,
`printers, a pointing device and the like. The system also
`includes a program memory 14 (usually part of the host
`memory block) and a network interface card 15. All these
`elements are interconnected by a host system bus 16. The
`network interface card 15 provides for connection to a
`network medium as indicated at line 17.
`
`FIG. 1 is a simplified diagram of a computer, such as a
`personal computer or workstation. The actual architecture of
`such systems is quite varied. This system for one example
`corresponds to a personal computer based on the Intel
`microprocessor running a Microsoft Windows operating
`system. Other combinations of processor and operating
`system are also suitable.
`According to the present invention, the program memory
`includes a TCP/IP protocol stack with a segmentation mode
`according to the present invention. A MAC driver is also
`included in the program memory which supports the seg-
`mentation mode. Other programs are also stored in program
`memory to suit
`the needs of the particular system. The
`network interface card 15 includes resources to manage
`TCP/1P segmentation according to the present invention.
`FIG. 2 provides a simplified block diagram of the network
`interface card 15 of the present invention. The network
`interface card 15 includes a bus interface 20 coupled to the
`host bus 16. Amemory composed of random access memory
`RAM 21 is included on the card 15. Also, a medium access
`control unit 22 is coupled to the card which is coupled to the
`network interface 17. The path from the host bus interface 20
`to the RAM 21 includes appropriate buffering 25 and a DMA
`engine 26 in order to offload processing from the host system
`
`10
`
`15
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`for transferring data packets into the RAM 21. Also, the data
`path from the RAM 21 to the MAC unit 22 includes
`appropriate data path and buffering logic 26 to support
`efficient transmission and reception of packets. A DMA
`engine 28 is also included on this path to provide for efficient
`transferring of data between the network medium 17 into the
`RAM 21. Also included on the card 15 is a central process-
`ing unit 30 having a program memory 31. The CPU 30 is
`coupled to the host bus 16 and to the RAM 21 on line 32.
`Also, the CPU 30 generates control signals represented by
`the arrow 33 for controlling other elements of the network
`interface card 15. Also according to this embodiment, TCP/
`IP checksum logic 34 is coupled to the data path and
`buffering logic 27 in the path from the RAM 21 to the
`network medium 17. The program memory for the CPU 30
`includes the transmit, receive, TCP/IP segmentation control
`and other processes which manage the operation of the smart
`adapter card.
`The block diagram illustrated in FIG. 2 provide a simpli-
`fied overview of the functional units in a network interface
`
`according to the present invention. Avariety of other archi-
`tectures could be implemented to achieve similar functions.
`For example, DMA engines 26, 28 are not strictly required
`here. State machines handshaking with each other, or other
`data processing resources could move data from one block
`to the next.
`
`In one embodiment, all of these elements are imple-
`mented on a single integrated circuit.
`In another
`embodiment, all elements except for the RAM 21 are
`implemented on a single integrated circuit. Other embodi-
`ments include discreet components for all of the major
`functional blocks of the network interface card.
`
`FIG. 3 is a simplified diagram of the network protocol
`layers implemented according to the present invention. The
`protocol layers include a data source 50 which corresponds
`to the higher layers of the network protocol. The data source
`is coupled by path 51 to the TCP/IP stack 52. The TCP/IP
`stack 52 is coupled by a path 53 to the MAC driver 54. The
`MAC driver is coupled by path 55 to the smart network
`interface card 56. The smart network interface card 56 is
`
`coupled by path 57 to the network medium.
`According to the present invention, the data source 50
`sends a big buffer across path 51 along with a command on
`line 60 to the TCP/IP stack 52. The TCP/IP stack 52
`generates appropriate indications on line 61 to the data
`source 50 to manage delivery of the big buffer to its
`destination.
`
`The TCP/IP stack 52 determines that segmentation offload
`is suitable for the buffer based on a variety of network state
`parameters. If it is suitable, then a template header, such as
`shown in FIG. 4, and out-of-band data are sent on line 62 to
`the MAC driver 54. The out-of-band data includes a seg-
`mentation command and a maximum segment size MSS.
`Gather descriptors GDs for the datagram to be sent are also
`supplied for the buffer. The MAC driver 54 registers the
`capability to do the segmentation offload processing with the
`TCP/IP stack 52, and supplies appropriate indications to the
`TCP/IP stack 52 to manage delivery of the data on line 63.
`The MAC driver 54 sends the transmit command with
`
`the gather
`segmentation along with the template header,
`descriptors for the buffer, and the MSS parameter as indi-
`cated at line 64 to the smart network interface card hardware
`
`56. The smart network interface card 56 also sends appro-
`priate indications on line 65 back to the MAC driver 54. The
`smart network interface hardware 56 processes the datagram
`by segmentation according to the parameters supplied by the
`
`DELL Ex.1043.010
`Ex.1043.010
`
`DELL
`
`
`
`5,937,169
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`7
`higher layers, and sends the datagram segmented into pack
`ets on the network medium 57.
`One preferred embodiment of the present invention is to
`support the Windows 32 (Windows NT, Windows 95) oper
`ating systems provided by Microsoft Corporation. Accord
`ing to the Windows 32 environment, the so-called NDIS
`specification provides the functionality of the MAC driver
`layer of the protocol stack. The MAC driver is modified to
`implement the new TCP segmentation functionality accord
`ing to the present invention in this example. When supported
`by the driver, an aware TCP/IP stack can offload certain
`processing to the adapter.
`The TCP/IP protocol passes “large datagrams” to the
`NDIS driver’s MiniPortSend routine. These datagrams are
`properly formatted, except for the IP and TCP checksums.
`Normal media, IP, and TCP headers are present, along with
`a larger than usual payload. Along with the datagram, a
`session MSS will be passed, telling the driver what size
`pieces to cut the payload into. The driver will then cut the
`datagram into packets, using the “template” headers from
`the datagram along with certain simple rules to produce the
`actual packets to be sent on the media.
`For this example, the protocol can pass down only pay
`load which can be transmitted immediately, ie. is within the
`advertised window of the other side.
`A “large datagram” send may not be completely sent by
`the MAC driver. On completion, the TCP/IP protocol might
`look at some NextSEQ field to determine how much payload
`the driver actually sent, and perhaps where to pick up again
`with the next send attempt.
`The term “datagram” is used herein to refer to the buffer,
`such as a large NDIS PACKET, passed down from the
`protocol to the MAC driver for segmentation.
`The term “packet” is used generally to refer to the
`media-sized packets that result from segmenting the data
`gram.
`The headers of the datagram are the “template”, since they
`are copied to the media packets during segmentation with a
`few simple changes.
`The data beyond the TCP/IP header in the datagram is
`referred to as the “payload”.
`This description is generally written from the point of
`view of the MAC driver. As such “incoming” refers to
`datagrams being passed down from the protocol, and “out
`going” refers to packets being transmitted onto the wire after
`segmentation.
`Basically we have substituted a single “large packet” send
`for a number of smaller sends. The goal is to reduce host
`CPU utilization and improve performance and scalability.
`Each MiniPortSend call to send a packet must handle
`transitions from TCP thru IP, various intermediate drivers
`and the NDIS miniport wrapper to the MAC driver, as well
`as handle similar transitions on send completion. This over
`head can be quite significant at high wire speeds. Also,
`though computing the header for each new packet is likely
`quite simple (ack+=bytes; id++; etc), the header must be
`copied to a new buffer for each send. An NDIS PACKET
`structure must be obtained from a free queue, filled in,
`queued elsewhere, etc. And of course there is a significant
`physicalization penalty for each packet. Also the number of
`interrupts on the host CPU is reduced to one per “large
`packet” rather than one per packet or one per some number
`of packets (algorithmic). Offloading TCP segmentation to
`the adapter will help reduce all of this quite significantly.
`This description is based on Internet Protocol, version 4,
`IPv4 processing, though other versions are also suitable. For
`IPv4, the processing is managed as follows:
`
`8
`The IP Total Length field is 16 bits and as such limits the
`size of the incoming datagram IP+TCP+Payload to 64k.
`This specification allows larger payloads for example by
`setting the IP Total Length field to zero (0) on incoming
`datagrams (in this case the length of the incoming datagram
`must be determined by examining the GD).
`The IP “don’t fragment” bit is legal. The IP “more
`fragments” bit is not. If IP fragmentation is required, it must
`be done by the protocol. As such the IP fragment offset field
`must be zero also.
`The IP Header Checksum field can be left undefined.
`Various TCP flags are illegal on incoming datagrams to be
`segmented. The RST, SYN and FIN flags are all disallowed.
`The TCP Checksum field can be left undefined.
`TCP and IP options are allowed but will be sent on each
`packet which may not be appropriate. In alternatives, more
`management of the options field could be executed in the
`NIC, to handle each packet to which the template applies
`individually, for example.
`The MAC driver takes the incoming datagram, consisting
`of a template header and a payload, and carves it up into
`packets. Packets are generated by cutting the payload into
`chunks based on the MSS passed down. All packets but the
`last will have a payload exactly MSS bytes in size. The last
`may be smaller.
`The header of outgoing packets is derived from the
`template header on the incoming datagram as follows:
`The sizes of the media, IP, and TCP headers are all
`identical to those in the template.
`Unless otherwise specified, fields are simply copied from
`the template to the outgoing packet