`
`WORLD INTELLECTUAL PROPERTY ORGANIZATION
`International Bureau
`
`
`
`INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`(51) International Patent Classification 7 :
`(11) International Publication Number:
`WO 00/52882
`HO4L 12/00
`
`(43) International Publication Date:
`
`8 September 2000 (08.09.00)
`
`
`
`(22) International Filing Date:
`
`29 February 2000 (29.02.00)
`
`(30) Priority Data:
`09/258 ,952
`
`1 March 1999 (01.03.99)
`
`US
`
`packets from being dropped.
`
`A system and method are provided for randomly discarding a packet received at a high performance network interface if the rate of
`packet transfers cannot keep pace with the rate of packet arrivals. A selected packet may be dropped as it arrives at a packet queue, or
`a packet already in the queue may be discarded. The queue has multiple defined regions, any of which may overlap or share a common
`boundary. A probability indicator is associated with a region to specify the probability of a packet being discarded whenthe level oftraffic
`in the queue is within the region. Probability indicators may differ from region to region so that the probability of discarding a packet
`fluctuates as the level oftraffic stored in the queue changes. Information gleaned from a packet may be applied to prevent certain types of
`
`(21) International Application Number: PCT/US00/05343|(81) Designated States: AE, AL, AM, AT, AU, AZ, BA, BB, BG,
`BR, BY, CA, CH, CN, CR, CU, CZ, DE, DK, DM,EE,
`ES, FI, GB, GD, GE, GH, GM, HR, HU,ID,IL,IN,IS, JP,
`KE, KG, KP, KR, KZ, LC, LK, LR, LS, LT, LU, LV, MA,
`MD, MG, MK, MN, MW,MX,NO, NZ,PL, PT, RO, RU,
`SD, SE, SG, SI, SK, SL, TJ, TM, TR, TT, TZ, UA, UG,
`UZ, VN, YU, ZA, ZW, ARIPO patent (GH, GM,KE,LS,
`MW,SD,SL, SZ, TZ, UG, ZW), Eurasian patent (AM, AZ,
`BY, KG, KZ, MD, RU, TJ, TM), European patent (AT, BE,
`CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC,
`NL, PT, SE), OAPI patent (BF, BJ, CF, CG, CI, CM, GA,
`GN, GW, ML, MR, NE,SN, TD, TG).
`
`(71) Applicant: SUN MICROSYSTEMS, INC. [US/US]; 901 San
`Antonio Road, Palo Alto, CA 94303 (US).
`
`(72) Inventors: MULLER, Shimon; Apartment D, 983 La Mesa
`Terrace, Sunnyvale, CA 94086 (US). CHENG,Linda; 1318
`Burkette Drive, San Jose, CA 95129 (US). GENTRY,| Published
`Denton; 34892 Sea Cliff Terrace, Fremont, CA 94555 (US).
`Without international search report and to be republished
`upon receipt of that report.
`
`(74) Agents: VAUGHAN,Daniel, E. et al.; Park & Vaughan LLP,
`Suite 310, 702 Marshall Street, Redwood City, CA 94063
`(US).
`
`(54) Title) METHOD AND APPARATUS FOR EARLY RANDOM DISCARD OF PACKETS
`
`NETWORKINTERFACE RECEIVE CIRCUIT 100
`
`
`
`
`
`EBEmAan<oBAmMaAacvEZ0ooOAWOL
`
`DYNAMIC
`BATCHING
`MODULE
`122
`
`CONTROL
`QUEUE
`118
`
`PACKET
`QUEUE
`116
`
`OMA ENGINE
`120
`
`cDSSS
`FLOW
`DATABASE
`110
`
`FLOW DATABASE
`MANAGER108
`
`LOAD
`DISTRIBUTOR
`112
`
`HEADER PARSER
`106
`
`INPUT PORT
`PROCESSING
`MODULE
`104
`
`CHECKSUM
`GENERATOR
`114
`
`Juniper Exhibit 1005
`
`(57) Abstract
`
`Juniper Exhibit 1005
`
`
`
`Zimbabwe
`
`Albania
`Armenia
`Austria
`Australia
`Azerbaijan
`Bosnia and Herzegovina
`Barbados
`Belgium
`Burkina Faso
`Bulgaria
`Benin
`Brazil
`Belarus
`Canada
`Central African Republic
`Congo
`Switzerland
`Céte d'Ivoire
`Cameroon
`China
`Cuba
`Czech Republic
`Germany
`Denmark
`Estonia
`
`SI
`SK
`SN
`SZ
`TD
`TG
`TJ
`
`Slovenia
`Slovakia
`Senegal
`Swaziland
`Chad
`Togo
`Tajikistan
`Turkmenistan
`Turkey
`Trinidad and Tobago
`Ukraine
`Uganda
`United States of America
`Uzbekistan
`Viet Nam
`Yugoslavia
`
`FOR THE PURPOSES OF INFORMATION ONLY
`
`Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT.
`
`™T
`
`R
`TT
`UA
`UG
`US
`UZ
`VN
`YU
`ZW
`
`ES
`FI
`FR
`GA
`GB
`GE
`GH
`GN
`GR
`HU
`IE
`IL
`Is
`iT
`JP
`KE
`KG
`KP
`
`KR
`KZ
`LC
`LI
`LK
`LR
`
`Spain
`Finland
`France
`Gabon
`United Kingdom
`Georgia
`Ghana
`Guinea
`Greece
`Hungary
`Treland
`Tsrael
`Iceland
`Ttaly
`Japan
`Kenya
`Kyrgyzstan
`Democratic People’s
`Republic of Korea
`Republic of Korea
`Kazakstan
`Saint Lucia
`Liechtenstein
`Sri Lanka
`Liberia
`
`LS
`LT
`LU
`LV
`MC
`MD
`MG
`MK
`
`ML
`MN
`MR
`MW
`MX
`NE
`NL
`NO
`NZ
`PL
`PT
`RO
`RU
`SD
`SE
`SG
`
`Lesotho
`Lithuania
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Madagascar
`The former Yugoslav
`Republic of Macedonia
`Mali
`Mongolia
`Mauritania
`Malawi
`Mexico
`Niger
`Netherlands
`Norway
`New Zealand
`Poland
`Portugal
`Romania
`Russian Federation
`Sudan
`Sweden
`Singapore
`
`
`
`“WO 00/52882
`
`PCT/US00/05343
`
`METHOD AND APPARATUS FOR EARLY RANDOM
`
`DISCARD OF PACKETS
`
`BACKGROUND
`
`This invention relates to the fields of computer systems and computer networks. In
`
`particular, the present invention relates to a Network Interface Circuit (NIC) for processing
`
`communication packets exchanged between a computer network and a host computer
`
`system.
`
`The interface between a computer and a networkis often a bottleneck for
`
`10
`
`communications passing between the computer and the network. While computer
`
`performance(e.g., processor speed) has increased exponentially over the years and
`
`computer network transmission speeds have undergonesimilar increases, inefficiencies in
`
`the way networkinterface circuits handle communications have become more and more
`
`evident. With each incremental increase in computer or network speed, it becomes ever
`
`15
`
`more apparentthat the interface between the computer and the network cannot keep pace.
`
`These inefficiencies involve several basic problems in the way communications between a
`
`network and a computer are handled.
`
`Today’s most popular forms of networks tend to be packet-based. These types of
`
`networks, including the Internet and manylocal area networks, transmit information in the
`
`20
`
`form of packets. Each packet is separately created and transmitted by an originating
`
`endstation and is separately received and processed by a destination endstation. In addition,
`
`each packet may, in a bus topology network for example, be received and processed by
`
`numerousstations located between the originating and destination endstations.
`
`Onebasic problem with packet networksis that each packet must be processed
`
`25
`
`through multiple protocols or protocol levels (known collectively as a “protocol stack”) on
`
`both the origination and destination endstations. When data transmitted betweenstationsis
`
`longer than a certain minimal length,the data is divided into multiple portions, and each
`
`portion is carried by a separate packet. The amountofdata that a packet can carry is
`
`generally limited by the network that conveys the packet and is often expressed as a
`
`30
`
`maximum transfer unit (MTU). Theoriginal aggregation of data is sometimes known as a
`
`“datagram,” and each packetcarrying part of a single datagram is processed very similarly
`
`to the other packets of the datagram.
`
`
`
`-WO 00/52882
`
`PCT/US00/05343
`
`Communication packets are generally processed as follows. In the origination
`
`endstation, each separate data portion of a datagram is processed through a protocol stack.
`
`During this processing multiple protocol headers(e.g., TCP, IP, Ethernet) are added to the
`
`data portion to form a packetthat can be transmitted across the network. The packetis
`
`received by a network interface circuit, which transfers the packet to the destination
`
`endstation or a host computerthat serves the destination endstation. In the destination
`
`endstation, the packet is processed through the protocol stack in the opposite direction as in
`
`the origination endstation. During this processing the protocol headers are removedin the
`opposite order in which they were applied. The data portion is thus recovered and can be
`made available to a user, an application program,etc.
`
`10
`
`Several related packets (e.g., packets carrying data from one datagram) thus undergo
`
`substantially the same process in a serial manner(i.e., one packet at a time). The more data
`
`that must be transmitted, the more packets must be sent, with each one being separately
`
`handled and processed through the protocol stack in each direction. Naturally, the more
`
`15
`
`packets that must be processed, the greater the demand placed upon an endstation’s
`
`processor. The numberof packets that must be processed is affected by factors other than
`
`just the amountof data being sent in a datagram. For example, as the amountofdata that
`
`can be encapsulated in a packet increases, fewer packets need to be sent. As stated above,
`
`however, a packet may have a maximum allowable size, depending on the type of network
`
`20
`
`in use (e.g., the maximum transfer unit for standard Ethernettraffic is approximately 1,500
`
`bytes). The speed of the network also affects the number of packets that a NIC may handle
`
`in a given period of time. For example, a gigabit Ethernet network operating at peak
`
`capacity may require a NIC to receive approximately 1.48 million packets per second.
`
`Thus, the numberof packets to be processed through a protocol stack may place a
`
`25
`
`significant burden upon a computer’s processor. The situation is exacerbated by the need
`
`to process each packet separately even though each one will be processed in a substantially
`
`similar manner.
`
`A related problem to the disjoint processing of packets is the manner in which data
`is moved between “user space”(e.g., an application program’s data storage) and “system
`space”(e.g., system memory) during data transmission and receipt. Presently, data is
`
`30
`
`simply copied from one area of memory assignedto a user or application program into
`
`another area of memory dedicated to the processor’s use. Because each portion of a
`
`datagram that is transmitted in a packet may be copied separately (e.g., one byte at a time),
`
`2
`
`
`
`WO 00/52882
`
`PCT/US00/05343
`
`there is a nontrivial amountof processor time required and frequent transfers can consume
`
`a large amount of the memory bus’ bandwidth. Illustratively, each byte of data in a packet
`
`received from the network may be read from the system space and written to the user space
`
`in a separate copy operation, and vice versa for data transmitted over the network.
`
`Although system space generally provides a protected memory area(e.g., protected from
`
`manipulation by user programs), the copy operation does nothing of value when seen from
`
`the point of view of a networkinterface circuit. Instead, it risks over-burdening the host
`
`processorandretarding its ability to rapidly accept additional network traffic from the NIC.
`
`Copying each packet’s data separately can therefore be very inefficient, particularly in a
`
`10
`
`high-speed network environment.
`
`In addition to the inefficient transfer of data (e.g., one packet’sdata at a time), the
`
`processing of headers from packets received from a networkisalso inefficient. Each
`
`packet carrying part of a single datagram generally has the same protocol headers(e.g.,
`
`Ethernet, IP and TCP), although there may be somevariation in the values within the
`
`15
`
`packets’ headers for a particular protocol. Each packet, however,is individually processed
`
`through the same protocol stack, thus requiring multiple repetitions of identical operations
`
`for related packets. Successively processing unrelated packets through different protocol
`
`stacks will likely be muchless efficient than progressively processing a numberofrelated
`
`packets through one protocolstack at a time.
`
`20
`
`Another basic problem concerning the interaction between present network
`
`interface circuits and host computer systemsis that the combination often fails to capitalize
`
`on the increased processor resources that are available in multi-processor computer
`
`systems. In other words, present attempts to distribute the processing of network packets
`
`(e.g., through a protocol stack) among a numberofprotocols in an efficient manner are
`
`25
`
`generally ineffective. In particular, the performance of present NICs does not comeclose to
`
`the expected or desired linear performance gains one may expect to realize from the
`
`availability of multiple processors. In some multi-processor systems,little improvement in
`
`the processing of networktraffic is realized from the use of more than 4-6 processors, for
`
`example.
`
`30
`
`In addition, the rate at which packets are transferred from a network interface circuit
`
`to a host computer or other communication device mayfail to keep pace with therate of
`
`packetarrival at the network interface. One element or another of the host computer(e.g., a
`
`memory bus, a processor) may be over-burdenedor otherwise unable to accept packets with
`
`3
`
`
`
`-WO 00/52882
`
`PCT/US00/05343
`
`sufficient alacrity. In this event one or more packets may be dropped or discarded.
`
`Dropping packets may cause a networkentity to re-transmit sometraffic and, if too many
`
`packets are dropped, a network connection may require re-initialization. Further, dropping
`one packet or type of packet instead of another may makea significant difference in overall
`
`network traffic. If, for example, a control packet is dropped, the corresponding network
`
`connection may be severely affected and maydolittle to alleviate the packetsaturation of
`
`the network interface circuit because ofthe typically small size of a control packet.
`
`Therefore, unless the dropping of packets is performed in a mannerthatdistributes the
`
`effect among many network connections or that makes allowancefor certain types of
`
`10
`
`packets, networktraffic may be degraded more than necessary.
`
`Thus, present NICs fail to provide adequate performanceto interconnect today’s
`
`high-end computer systems and high-speed networks. In addition, a network interface
`
`circuit that cannot make allowance for an over-burdened host computer may degrade the
`
`computer’s performance.
`
`15
`
`SUMMARY
`
`In one embodimentof the invention packets are received from a network and stored
`
`in a packet queuepriorto being transferred to a host computer. If the rate of packet
`
`transfers to the host computer cannot keep pace with the rate of packetarrivals at the queue,
`
`20
`
`one or more packets may be dropped. Therefore, a system and methodofdiscarding
`
`packets in a random manneris provided, such that the effect of lost packets is fairly
`
`distributed among network communicants.
`
`In one embodimentof the invention a packet queuethat is used to store packets
`
`received from a network is divided into multiple regions. Each regionis distinct yet shares
`
`25
`
`a boundary with an adjacent region. In an alternative embodiment regions may overlap. A
`
`fullness gauge or indicator is employed to indicate how full the packet queue is. In
`
`particular, read and write pointers that are used to update the packet queue can also be used
`
`to determine how full the queue is. This fullness indicator thus fluctuates as the level of
`
`network traffic stored in the packet queue ebbs and flows.
`
`30
`
`For one or more of the multiple packet queue regions, a programmable probability
`
`indicator is assigned. Each probability indicator indicates the probability of dropping a
`
`packet whenthe fullness indicator indicates that the level oftraffic stored in the queueis
`
`within the probability indicator’s associated region. Probability indicators may be
`
`4
`
`
`
`-WO00/52882
`
`PCT/US00/05343
`
`programmed and re-programmedasthelevel oftraffic in the packet queue changes. The
`
`probability indicator may take the form of a percentageorratio that is configured to
`
`randomly select packets to be discarded.
`
`In one particular embodimentof the invention a probability indicator takes the form
`
`of a bit or flag mask. Eachbit or flag may take one of twopossible values(e.g., zero and
`
`one). In this embodiment, a counter tracks the numberofpackets received at the packet
`queueby repeatedly counting through a limited range of numbers,such as zero through N.
`
`Thebit or flag mask correspondingly contains N+1 bits or flags. Thus, for each counter
`
`value the corresponding bit or flag in the mask indicates whether the packet received during
`
`10
`
`that counter value is dropped.
`
`In an alternative embodiment, a random numberis generated whena packetis
`
`received. The random number may be compared to a threshold to determine whether the
`
`received packet is dropped. Each region may have a separate threshold for determining
`
`whether a packet is dropped.
`
`15
`
`In yet another embodimentof the invention, a packet may be immunized or
`
`exempted from being discarded becauseit exhibits a particular characteristic or status. For
`
`example, a control packet may be one type of packetthat is not dropped. In this
`
`embodiment, the counter is not incremented when a non-discardable packet is received.
`
`Other packets that may be exempt from discarding may be packets within a particular
`
`20
`
`network connection or flow, packets associated with a particular application, packets
`
`formatted accordingto a particular protocol, etc. A relevant characteristic or detail of a
`
`packet may be extracted during a process in which one or more of the packet’s headers are
`
`parsed.
`
`In one embodimentof the invention, when a probability indicator indicates that a
`
`25
`
`packet should be dropped the packet that is dropped maybe onejust received at the packet
`
`queue. In another embodiment, however, a packet already stored in the packet queue may
`
`be dropped.
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`30
`
`FIG. 1A is a block diagram depicting a network interface circuit (NIC) for receiving
`
`a packet from a network in accordance with an embodimentof the present invention.
`
`
`
`-WO 00/52882
`
`PCT/US00/05343
`
`FIG. 1B is a flow chart demonstrating one method of operating the NIC of FIG. 1A
`
`to transfer a packet received from a networkto a host computer in accordance with an
`
`embodimentofthe invention.
`
`FIG.2 is a diagram of a packet transmitted over a network and received at a
`
`network interface circuit in one embodimentof the invention.
`
`FIG.3 is a block diagram depicting a headerparser of a networkinterface circuit for
`
`parsing a packet in accordance with an embodimentofthe invention.
`
`FIGs. 4A-4B comprise a flow chart demonstrating one methodofparsing a packet
`
`received from a network at a network interface circuit in accordance with an embodiment
`
`10
`
`of the present invention.
`
`FIG. 5 is a block diagram depicting a networkinterface circuitflow database in
`
`accordance with an embodimentofthe invention.
`
`FIGs. 6A-6E comprise a flowchart illustrating one method of managing a network
`
`interface circuit flow database in accordance with an embodimentofthe invention.
`
`15
`
`FIG.7 is a flow chart demonstrating one method ofdistributing the processing of
`
`network packets among multiple processors on a host computer in accordance with an
`
`embodimentofthe invention.
`
`FIG.8 is a diagram of a packet queue for a networkinterface circuit in accordance
`
`with an embodimentof the invention.
`
`20
`
`FIG.9 is a diagram of a control queue for a network interface circuit in accordance
`
`with an embodimentof the invention.
`
`FIG. 10 depicts a system for randomly discarding a packet from a network interface
`
`in accordance with an embodimentofthe invention.
`
`FIGs. 11A-11B comprise a flow chart demonstrating one method of discarding a
`
`25
`
`packet from a network interface in accordance with an embodimentof the invention.
`
`FIG. 12 depicts one set of dynamic instructions for parsing a packet in accordance
`
`with an embodiment of the invention.
`
`DETAILED DESCRIPTION
`
`30
`
`The following description is presented to enable any person skilledin the art to
`
`makeanduse the invention, and is provided in the context of particular applications of the
`
`invention and their requirements. Various modifications to the disclosed embodiments will
`
`be readily apparent to those skilled in the art and the general principles defined herein may
`
`6
`
`
`
`-WO00/52882
`
`PCT/US00/05343
`
`be applied to other embodiments and applications without departing from thespirit and
`
`scope of the present invention. Thus, the present invention is not intended to be limited to
`
`the embodiments shown,butis to be accorded the widest scope consistent with the
`
`principles and features disclosed herein.
`
`In particular, embodiments of the invention are described below in the form of a
`
`network interface circuit (NIC) receiving communication packets formatted in accordance
`
`with certain communication protocols compatible with the Internet. Oneskilled in the art
`
`will recognize, however, that the present invention is not limited to communication
`
`protocols compatible with the Internet and may be readily adapted for use with other
`
`10
`
`protocols and in communication devices other than a NIC.
`
`The program environment in which a present embodimentofthe invention is
`
`executed illustratively incorporates a general-purpose computer or a special purpose device
`
`such a hand-held computer. Details of such devices (e.g., processor, memory, data storage,
`
`input/output ports and display) are well known and are omitted for the sakeofclarity.
`
`15
`
`It should also be understood that the techniques of the present invention might be
`
`implemented using a variety of technologies. For example, the methods described herein
`
`may be implemented in software running on a programmable microprocessor, or
`
`implemented in hardware utilizing either a combination of microprocessors or other
`
`specially designed application specific integrated circuits, programmable logic devices, or
`
`20
`
`various combinations thereof. In particular, the methods described herein may be
`
`implemented by a series of computer-executable instructions residing on a storage medium
`
`such as a carrier wave, disk drive, or other computer-readable medium.
`
`Introduction
`
`25
`
`In one embodimentof the present invention, a network interface circuit (NIC)is
`
`configured to receive and process communication packets exchanged between a host
`
`computer system and a network suchasthe Internet. In particular, the NIC is configured to
`
`receive and manipulate packets formatted in accordance with a protocol stack(e.g., a
`
`combination of communication protocols) supported by a network coupled to the NIC.
`
`30
`
`A protocol stack may be described with reference to the seven-layer ISO-OSI
`
`(International Standards Organization - Open Systems Interconnection) model framework.
`
`Thus, oneillustrative protocol stack includes the Transport Control Protocol (TCP)at layer
`
`four, Internet Protocol (IP) at layer three and Ethernet at layer two. For purposes of
`
`7
`
`
`
`-WO 00/52882
`
`PCT/US00/05343
`
`discussion, the term “Ethernet” may be used herein to refer collectively to the standardized
`
`IEEE(Institute of Electrical and Electronics Engineers) 802.3 specification as well as
`
`version two of the non-standardized form of the protocol. Where different forms ofthe
`
`protocol need to be distinguished, the standard form maybeidentified by including the
`
`“802.3” designation.
`
`Other embodiments of the invention are configured to work with communications
`
`adhering to other protocols, both known (e.g., AppleTalk, IPX (Internetwork Packet
`
`Exchange), etc.) and unknown at the present time. The methodsprovided bythis invention
`
`are easily adaptable for new communication protocols.
`
`10
`
`In addition, the processing of packets described below may be performed on
`
`communication devices other than a NIC. For example, a modem,switch, router or other
`
`communication port or device(e.g., serial, parallel, USB, SCSI) maybe similarly
`
`configured and operated.
`
`In embodiments of the invention described below, a NIC receives a packet from a
`
`15
`
`network on behalf of a host computer system or other communication device. The NIC
`
`analyzes the packet(e.g., by retrieving certain fields from one or moreofits protocol
`
`headers) and takes action to increase the efficiency with which the packetis transferred or
`
`providedto its destination entity. Equipment and methods discussed below for increasing
`
`the efficiency of processing or transferring packets received from a network mayalso be
`
`20
`
`used for packets moving in the reverse direction(i.e., from the NIC to the network).
`
`One technique that may be applied to incoming networktraffic involves examining
`
`or parsing one or more headers of an incoming packet(e.g., headers for the layer two, three
`
`and four protocols) in order to identify the packet’s source and destination entities and
`
`possibly retrieve certain other information. Using identifiers of the communicating entities
`
`25
`
`as a key, data from multiple packets may be aggregated or re-assembled. Typically, a
`
`datagram sent to one destination entity from one source entity is transmitted via multiple
`
`packets. Aggregating data from multiple related packets (e.g., packets carrying data from
`
`the same datagram) thus allows a datagram to be re-assembled andcollectively transferred
`
`to a host computer. The datagram maythen be providedto the destination entity in a highly
`
`30
`
`efficient manner. For example, rather than providing data from one packetat a time (and
`
`one byte at a time) in separate “copy” operations, a “page-flip” operation may be
`
`performed. In a page-flip, an entire memory page of data may be providedto the
`
`destination entity, possibly in exchange for an empty or unused page.
`
`8
`
`
`
`-WO 00/52882
`
`PCT/US00/05343
`
`In another technique, packets received from a network are placed in a queue to
`
`await transfer to a host computer. While awaiting transfer, multiple related packets may be
`
`identified to the host computer. After being transferred, they may be processed as a group
`
`by a host processor rather than being processedserially (e.g., one at a time).
`
`Yet another technique involves submitting a numberof related packetsto a single
`
`processor of a multi-processor host computer system. Bydistributing packets conveyed
`
`between different pairs of source and destination entities among different processors,the
`
`processing of packets through their respective protocol stacks can be distributed whilestill
`
`maintaining packetsin their correct order.
`
`10
`
`The techniques discussed above for increasing the efficiency with which packets are
`
`processed may involve a combination of hardware and software modules located on a
`
`networkinterface and/or a host computer system. In one particular embodiment,a parsing
`
`module on a host computer’s NIC parses header portions of packets. Ilustratively, the
`
`parsing module comprises a microsequencer operating accordingto a set of replaceable
`
`15
`
`instructions stored as micro-code. Using information extracted from the packets, multiple
`
`packets from one source entity to one destination entity may be identified. A hardwarere-
`
`assembly module on the NIC maythen gather the data from the multiple packets. Another
`
`hardware module on the NIC is configured to recognize related packets awaiting transferto
`
`the host computer so that they may be processed through an appropriate protocol stack
`
`20
`
`collectively, rather than serially. The re-assembled data and the packet’s headers may then
`
`be provided to the host computer so that appropriate software (e.g., a device driver for the
`
`NIC) mayprocess the headers and deliver the data to the destination entity.
`
`Where the host computer includes multiple processors, a load distributor (which
`
`may also be implemented in hardware on the NIC) mayselect a processor to process the
`
`25
`
`headers of the multiple packets through a protocol stack.
`
`In another embodimentof the invention, a system is provided for randomly
`
`discarding a packet from a NIC when the NICis saturated or nearly saturated with packets
`
`awaiting transfer to a host computer.
`
`30
`
`One Embodimentof a High Performance Network Interface Circuit
`
`FIG. 1A depicts NIC 100 configured in accordance with an illustrative embodiment
`
`of the invention. A brief description of the operation and interaction of the various
`
`modules of NIC 100 in this embodimentfollows.
`
`9
`
`
`
`“WO 00/52882
`
`PCT/US00/05343
`
`A communication packet may be received at NIC 100 from network 102 by a
`
`medium access control (MAC) module (not shown in FIG. 1A). The MAC module
`
`performs low-level processing of the packet such as reading the packet from the network,
`
`performing someerror checking, detecting packet fragments, detecting over-sized packets,
`removing the layer one preamble, etc.
`
`Input Port Processing (IPP) module 104 then receives the packet. The IPP module
`
`stores theentire packet in packet queue 116, as received from the MAC module or network,
`
`and a portion of the packet is copied into header parser 106. In one embodimentof the
`
`invention IPP module 104 may act as a coordinator of sorts to prepare the packet for
`
`10
`
`transfer to a host computer system. In such a role, IPP module 104 mayreceive
`
`information concerning a packet from various modules of NIC 100 and dispatch such
`
`information to other modules.
`
`Header parser 106 parses a header portion of the packet to retrieve various pieces of
`
`information that will be used to identify related packets (e.g., multiple packets from one
`
`15
`
`same source entity for one destination entity) and that will affect subsequent processing of
`
`the packets. In the illustrated embodiment, header parser 106 communicates with flow
`
`database manager (FDBM)108, which manages flow database (FDB) 110. In particular,
`
`header parser 106 submits a query to FDBM 108 to determine whethera valid
`
`communication flow (described below) exists between the source entity that sent a packet
`
`20
`
`and the destination entity. The destination entity may comprise an application program,a
`
`communication module, or some other element of a host computer system that is to receive
`
`the packet.
`
`In the illustrated embodimentof the invention, a communication flow comprises
`
`one or more datagram packets from one source entity to one destination entity. A flow may
`
`25
`
`be identified by a flow key assembled from source and destination identifiers retrieved
`
`from the packet by header parser 106. In one embodimentofthe invention a flow key
`
`comprises address and/or port information for the source and destination entities from the
`
`packet’s layer three (e.g., IP) and/or layer four (e.g., TCP) protocol headers.
`
`For purposesofthe illustrated embodimentof the invention, a communication flow
`
`30
`
`is similar to a TCP end-to-end connection but is generally shorter in duration. In particular,
`
`in this embodimentthe duration of a flow may belimited to the time neededto receive all
`
`of the packets associated with a single datagram passed from the source entity to the
`
`destination entity.
`
`10
`
`
`
`WO 00/52882
`
`PCT/US00/05343
`
`Thus, for purposes of flow management, header parser 106 passes the packet’s flow
`
`key to flow database manager 108. The header parser mayalso provide the flow database
`
`manager with other information concerning the packet that was retrieved from the packet
`
`(e.g., length of the packet).
`
`Flow database manager 108 searches FDB 110 in response to a query received from
`
`headerparser 106. Illustratively, flow database 110 stores information concerning each
`
`valid communication flow involving a destination entity served by NIC 100. Thus, FDBM
`
`108 updates FDB 110 as necessary, depending upon the information received from header
`
`parser 106. In addition, in this embodimentof the invention FDBM 108associates an
`
`10
`
`operation or action code with the received packet. An operation code may be used to
`
`identify whether a packetis part of a new or existing flow, whether the packet includes data
`
`or just control information, the amount of data within the packet, whether the packet data
`
`can be re-assembled with related data (e.g., other data in a datagram sent from the source
`
`entity to the destination entity), etc. FDBM 108 mayuse information retrieved from the
`
`15
`
`packet and provided by header parser 106 to select an appropriate operation code. The
`
`packet’s operation code is then passed back to the headerparser, along with an index of the
`
`packet’s flow within FDB 110.
`
`In one embodimentof the invention the combination of header parser 106, FDBM
`
`108 and FDB 110,or a subset of these modules, may be known asa traffic classifier due to
`
`20
`
`their role in classifying or identifying networktraffic received at NIC 100.
`
`In theillustrated embodiment, header parser 106 also passes the packet’s flow key
`
`to load distributor 112. In a host computer system having multiple processors, load
`
`distributor 112 may determine which processor an incoming packetis to be routed to for
`
`processing through the appropriate protocol stack. Load distributor 112 may, for example,
`
`25
`
`ensure that related packets are routed to a single processor. By sendingall packets in one
`
`communication flow or end-to-end connection to a single processor, the correct ordering of
`
`packets can be enforced. Load distributor 112 may be omitted in an alternative
`
`embodimentof the invention. In an alternative embodiment, header parser 106 may also
`
`communicate directly with other modules ofNIC 100 besides the load distributor and flow
`
`30
`
`database manager.
`
`Thus, after header parser 106 parses a packet FDBM 108alters or updates FDB 110
`
`and load distributor 112 identifies a processor in the