`
`WORLD INTELLECTUAL PROPERTY ORGANIZATION
`International Bureau
`
`
`
`INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`
`(51) International Patent Classification 7 3
`H04L 12/00
`
`(11) International Publication Number:
`
`W0 (IO/52882
`
`(43) International Publication Date:
`
`8 September 2000 (08.09.00)
`
`(21) International Application Number:
`
`PCT/USOO/05343
`
`(22) International Filing Date:
`
`29 February 2000 (29.02.00)
`
`(30) Priority Data:
`09/258,952
`
`1 March 1999 (0103.99)
`
`US
`
`(71) Applicant: SUN MICROSYSTEMS, INC. [US/US]; 901 San
`Antonio Road, Palo Alto, CA 94303 (US).
`
`(81) Designated States: AE, AL, AM, AT, AU, AZ, BA, BB, BG,
`BR, BY, CA, CH, CN, CR, CU, CZ, DE, DK, DM, BE,
`ES, FI, GB, GD, GE, GH, GM, HR, HU, ID, IL, IN, IS, JP,
`KE, KG, KP, KR, KZ, LC, LK, LR, LS, LT, LU, LV, MA,
`MD, MG, MK, MN, MW, MX, NO, NZ, PL, PT, RO, RU,
`SD, SE, SG, SI, SK, SL, TJ, TM, TR, TT, TZ, UA, UG,
`UZ, VN, YU, ZA, ZW, ARIPO patent (GH, GM, KE, LS,
`MW, SD, SL, SZ, TZ, UG, ZW), Eurasian patent (AM, AZ,
`BY, KG, KZ, MD, RU, TJ, TM), European patent (AT, BE,
`CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC,
`NL, PT, SE), OAPI patent (BF, BJ, CF, CG, CI, CM, GA,
`GN, GW, ML, MR, NE, SN, TD, TG).
`
`(72) Inventors: MULLER, Shimon; Apartment D, 983 La Mesa
`Terrace, Sunnyvale, CA 94086 (US). CHENG, Linda; 1318
`Burkette Drive, San Jose, CA 95129 (US). GENTRY, Published
`Denton; 34892 Sea Cliff Terrace, Fremont, CA 94555 (US).
`Without international search report and to be republished
`upon receipt of that report.
`
`(74) Agents: VAUGHAN, Daniel, E. et a1.; Park & Vaughan LLP,
`Suite 310, 702 Marshall Street, Redwood City, CA 94063
`(US).
`
`(54) Title: METHOD AND APPARATUS FOR EARLY RANDOM DISCARD OF PACKETS
`
`NETWORK INTERFACE RECEIVE CIRCUIT 100
`
`
`
`
`
`
`
`§m4m<mxm4Ct§OO«mo:
`
`DYNAMIC
`
`873%:st
`MODULE
`122
`
`CONTROL
`QUEUE
`113
`
`2‘55ng
`11a
`
`DMA ENGINE
`120
`
`FLOW
`DATABASE
`110
`
`FLOW DATABASE
`MANAGER 108
`
`LOAD
`DISTRIBUTOR
`112
`
`HEADER PARSER
`106
`
`INPUT PORT
`PROCESSING
`MODULE
`
`CHECKSUM
`GENERATOR
`114
`
`(57) Abstract
`
`A system and method are provided for randomly discarding a packet received at a high performance network interface if the rate of
`packet transfers cannot keep pace with the rate of packet arrivals. A selected packet may be dropped as it arrives at a packet queue, or
`a packet already in the queue may be discarded. The queue has multiple defined regions, any of which may overlap or share a common
`boundary. A probability indicator is associated with a region to specify the probability of a packet being discarded when the level of traffic
`in the queue is within the region. Probability indicators may differ from region to region so that the probability of discarding a packet
`fluctuates as the level of traffic stored in the queue changes. Information gleaned from a packet may be applied to prevent certain types of
`packets from being dropped.
`
`Juniper Exhibit 1005
`
`Juniper Exhibit 1005
`
`
`
`FOR THE PURPOSES OF INFORMATION ONLY
`
`Codes used to identify States party to the PCT on the front pages of pamphlets publishing intemational applications under the PCT.
`
`Zimbabwe
`
`Albania
`Armenia
`Austria
`Australia
`Azerbaijan
`Bosnia and Herzegovina
`Barbados
`Belgium
`Burkina Faso
`Bulgaria
`Benin
`Brazil
`Belarus
`Canada
`Central African Republic
`Congo
`Switzerland
`Céte d'lvoire
`Cameroon
`China
`Cuba
`Czech Republic
`Germany
`Denmark
`Estonia
`
`ES
`FI
`FR
`GA
`GB
`GE
`GH
`GN
`GR
`HU
`IE
`[L
`IS
`IT
`JP
`KE
`KG
`KP
`
`KR
`KZ
`LC
`LI
`LK
`LR
`
`Spain
`Finland
`France
`Gabon
`United Kingdom
`Georgia
`Ghana
`Guinea
`Greece
`Hungary
`Ireland
`Israel
`Iceland
`Italy
`Japan
`Kenya
`Kyrgyzstan
`Democratic People’s
`Republic of Korea
`Republic of Korea
`Kazakstan
`Saint Lucia
`Liechtenstein
`Sri Lanka
`Liberia
`
`LS
`LT
`LU
`LV
`MC
`MD
`MG
`MK
`
`ML
`MN
`MR
`MW
`MX
`NE
`NL
`NO
`NZ
`PL
`PT
`RO
`RU
`SD
`SE
`SG
`
`Lesotho
`Lithuania
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Madagascar
`The former Yugoslav
`Republic of Macedonia
`Mali
`Mongolia
`Mauritania
`Malawi
`Mexico
`Niger
`Netherlands
`Norway
`New Zealand
`Poland
`Portugal
`Romania
`Russian Federation
`Sudan
`Sweden
`Singapore
`
`SI
`SK
`SN
`SZ
`TD
`TG
`TJ
`TM
`TR
`TT
`UA
`UG
`US
`UZ
`VN
`YU
`ZW
`
`Slovenia
`Slovakia
`Senegal
`Swaziland
`Chad
`Togo
`Tajikistan
`Turkmenistan
`’I‘urkey
`Trinidad and Tobago
`Ukraine
`Uganda
`United States of America
`Uzbekistan
`Viet Nam
`Yugoslavia
`
`
`
`7WO 00/52882
`
`PCT/USOO/05343
`
`METHOD AND APPARATUS FOR EARLY RANDOM
`
`DISCARD OF PACKETS
`
`BACKGROUND
`
`5
`
`10
`
`This invention relates to the fields of computer systems and computer networks. In
`
`particular, the present invention relates to a Network Interface Circuit (NIC) for processing
`
`communication packets exchanged between a computer network and a host computer
`
`system.
`
`The interface between a computer and a network is often a bottleneck for
`
`communications passing between the computer and the network. While computer
`
`performance (e.g., processor speed) has increased exponentially over the years and
`
`computer network transmission speeds have undergone similar increases, inefficiencies in
`
`the way network interface circuits handle communications have become more and more
`
`evident. With each incremental increase in computer or network speed, it becomes ever
`
`15
`
`more apparent that the interface between the computer and the network cannot keep pace.
`
`These inefficiencies involve several basic problems in the way communications between a
`
`network and a computer are handled.
`
`Today’s most popular forms of networks tend to be packet-based. These types of
`
`networks, including the Internet and many local area networks, transmit information in the
`
`20
`
`form of packets. Each packet is separately created and transmitted by an originating
`
`endstation and is separately received and processed by a destination endstation. In addition,
`
`each packet may, in a bus topology network for example, be received and processed by
`
`numerous stations located between the originating and destination endstations.
`
`One basic problem with packet networks is that each packet must be processed
`
`25
`
`through multiple protocols or protocol levels (known collectively as a “protocol stack”) on
`
`both the origination and destination endstations. When data transmitted between stations is
`
`longer than a certain minimal length, the data is divided into multiple portions, and each
`
`portion is carried by a separate packet. The amount of data that a packet can carry is
`
`generally limited by the network that conveys the packet- and is often expressed as a
`
`30
`
`maximum transfer unit (MTU). The original aggregation of data is sometimes known as a
`
`“datagram,” and each packet carrying part of a single datagram is processed very similarly
`
`to the other packets of the datagram.
`
`
`
`-WO 00/52882
`
`PCT/US00/05343
`
`Communication packets are generally processed as follows. In the origination
`
`endstation, each separate data portion of a datagram is processed through a protocol stack.
`
`During this processing multiple protocol headers (e. g., TCP, IP, Ethernet) are added to the
`
`data portion to form a packet that can be transmitted across the network. The packet is
`
`received by a network interface circuit, which transfers the packet to the destination
`
`endstation or a host computer that serves the destination endstation. In the destination
`
`endstation, the packet is processed through the protocol stack in the opposite direction as in
`
`the origination endstation. During this processing the protocol headers are removed in the
`opposite order in which they were applied. The data portion is thus recovered and can be
`
`10
`
`made available to a user, an application program, etc.
`
`Several related packets (e.g., packets carrying data from one datagram) thus undergo
`
`substantially the same process in a serial manner (i.e., one packet at a time). The more data
`
`that must be transmitted, the more packets must be sent, with each one being separately
`
`handled and processed through the protocol stack in each direction. Naturally, the more
`
`packets that must be processed, the greater the demand placed upon an endstation’s
`
`processor. The number of packets that must be processed is affected by factors other than
`
`just the amount of data being sent in a datagram. For example, as the amount of data that
`
`can be encapsulated in a packet increases, fewer packets need to be sent. As stated above,
`
`however, a packet may have a maximum allowable size, depending on the type of network
`
`in use (e. g., the maximum transfer unit for standard Ethernet traffic is approximately 1,500
`
`bytes). The speed of the network also affects the number of packets that a NIC may handle
`
`in a given period of time. For example, a gigabit Ethernet network operating at peak
`
`capacity may require a NIC to receive approximately 1.48 million packets per second.
`
`Thus, the number of packets to be processed through a protocol stack may place a
`
`significant burden upon a computer’s processor. The situation is exacerbated by the need
`
`to process each packet separately even though each one will be processed in a substantially
`
`15
`
`20
`
`25
`
`similar manner.
`
`A related problem to the disjoint processing of packets is the manner in which data
`is moved between “user space” (e.g., an application program’s data storage) and “system
`
`30
`
`space” (e.g., system memory) during data transmission and receipt. Presently, data is
`
`simply copied from one area of memory assigned to a user or application program into
`
`another area of memory dedicated to the processor’s use. Because each portion of a
`
`datagram that is transmitted in a packet may be copied separately (e. g., one byte at a time),
`
`2
`
`
`
`-WO 00/52882
`
`PCT/USOO/05343
`
`there is a nontrivial amount of processor time required and frequent transfers can consume
`
`a large amount of the memory bus’ bandwidth. Illustratively, each byte of data in a packet
`
`received from the network may be read from the system space and written to the user space
`
`in a separate copy operation, and vice versa for data transmitted over the network.
`
`Although system space generally provides a protected memory area (e.g., protected from
`
`manipulation by user programs), the copy operation does nothing of value when seen from
`
`the point of view of a network interface circuit. Instead, it risks over-burdening the host
`
`processor and retarding its ability to rapidly accept additional network traffic from the NIC.
`
`Copying each packet’s data separately can therefore be very inefficient, particularly in a
`
`10
`
`high-speed network environment.
`
`15
`
`20
`
`In addition to the inefficient transfer of data (e. g., one packet’s. data at a time), the
`
`processing of headers from packets received from a network is also inefficient. Each
`
`packet carrying part of a single datagram generally has the same protocol headers (e. g.,
`
`Ethernet, IP and TCP), although there may be some variation in the values within the
`
`packets’ headers for a particular protocol. Each packet, however, is individually processed
`
`through the same protocol stack, thus requiring multiple repetitions of identical operations
`
`for related packets. Successively processing unrelated packets through different protocol
`
`stacks will likely be much less efficient than progressively processing a number of related
`
`packets through one protocol stack at a time.
`
`Another basic problem concerning the interaction between present network
`
`interface circuits and host computer systems is that the combination often fails to capitalize
`
`on the increased processor resources that are available in multi—processor computer
`
`systems. In other words, present attempts to distribute the processing of network packets
`
`(e.g., through a protocol stack) among a number of protocols in an efficient manner are
`
`25
`
`generally ineffective. In particular, the performance of present NICs does not come close to
`
`the expected or desired linear performance gains one may expect to realize from the
`
`availability of multiple processors. In some multi-processor systems, little improvement in
`
`the processing of network traffic is realized from the use of more than 4-6 processors, for
`
`example.
`
`30
`
`In addition, the rate at which packets are transferred from a network interface circuit
`
`to a host computer or other communication device may fail to keep pace with the rate of
`
`packet arrival at the network interface. One element or another of the host computer (e.g., a
`
`memory bus, a processor) may be over-burdened or otherwise unable to accept packets with
`
`3
`
`
`
`-WO 00/52882
`
`PCT/USOO/05343
`
`sufficient alacrity. In this event one or more packets may be dropped or discarded.
`
`Dropping packets may cause a network entity to re-transmit some traffic and, if too many
`
`packets are dropped, a network connection may require re-initialization. Further, dropping
`
`one packet or type of packet instead of another may make a significant difference in overall
`
`network traffic. If, for example, a control packet is dropped, the corresponding network
`
`connection may be severely affected and may do little to alleviate the packet saturation of
`
`the network interface circuit because of the typically small size of a control packet.
`
`Therefore, unless the dropping of packets is performed in a manner that distributes the
`
`effect among many network connections or that makes allowance for certain types of
`
`packets, network traffic may be degraded more than necessary.
`
`Thus, present NICs fail to provide adequate performance to interconnect today’s
`
`high-end computer systems and high-speed networks. In addition, a network interface
`
`circuit that cannot make allowance for an over-burdened host computer may degrade the
`
`computer’s performance.
`
`SUMMARY
`
`In one embodiment of the invention packets are received from a network and stored
`
`in a packet queue prior to being transferred to a host computer. If the rate of packet
`
`transfers to the host computer cannot keep pace with the rate of packet arrivals at the queue,
`
`one or more packets may be dropped. Therefore, a system and method of discarding
`
`packets in a random manner is provided, such that the effect of lost packets is fairly
`
`distributed among network communicants.
`
`In one embodiment of the invention a packet queue that is used to store packets
`
`received from a network is divided into multiple regions. Each region is distinct yet shares
`
`a boundary with an adjacent region. In an alternative embodiment regions may overlap. A
`
`fullness gauge or indicator is employed to indicate how full the packet queue is. In
`
`particular, read and write pointers that are used to update the packet queue can also be used
`
`to determine how full the queue is. This fullness indicator thus fluctuates as the level of
`
`network traffic stored in the packet queue ebbs and flows.
`
`For one or more of the multiple packet queue regions, a programmable probability
`
`indicator is assigned. Each probability indicator indicates the probability of dropping a
`
`packet when the fullness indicator indicates that the level of traffic stored in the queue is
`
`within the probability indicator’s associated region. Probability indicators may be
`
`4
`
`10
`
`15
`
`20
`
`25
`
`30
`
`
`
`- WO 00/52882
`
`PCT/USOO/05343
`
`programmed and re—programmed as the level of traffic in the packet queue changes. The
`
`probability indicator may take the form of a percentage or ratio that is configured to
`
`randomly select packets to be discarded.
`
`In one particular embodiment of the invention a probability indicator takes the form
`
`of a bit or flag mask. Each bit or flag may take one of two possible values (e.g., zero and
`
`one). In this embodiment, a counter tracks the number of packets received at the packet
`
`queue by repeatedly counting through a limited range of numbers, such as zero through N.
`
`The bit or flag mask correspondingly contains N+l bits or flags. Thus, for each counter
`
`value the corresponding bit or flag in the mask indicates whether the packet received during
`
`10
`
`that counter value is dropped.
`
`In an alternative embodiment, a random number is generated when a packet is
`
`received. The random number may be compared to a threshold to determine whether the
`
`received packet is dropped. Each region may have a separate threshold for determining
`
`whether a packet is dropped.
`
`In yet another embodiment of the invention, a packet may be immunized or
`
`exempted from being discarded because it exhibits a particular characteristic or status. For
`
`example, a control packet may be one type of packet that is not dropped. In this
`
`embodiment, the counter is not incremented when a non-discardable packet is received.
`
`Other packets that may be exempt from discarding may be packets within a particular
`
`network connection or flow, packets associated with a particular application, packets
`
`formatted according to a particular protocol, etc. A relevant characteristic or detail of a
`
`packet may be extracted during a process in which one or more of the packet’s headers are
`
`parsed.
`
`In one embodiment of the invention, when a probability indicator indicates that a
`
`packet should be dropped the packet that is dropped may be one just received at the packet
`
`queue. In another embodiment, however, a packet already stored in the packet queue may
`
`be dropped.
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`FIG. 1A is a block diagram depicting a network interface circuit (NIC) for receiving
`
`a packet from a network in accordance with an embodiment of the present invention.
`
`15
`
`20
`
`25
`
`30
`
`
`
`- WO 00/52882
`
`PCT/USOO/05343
`
`FIG. 1B is a flow chart demonstrating one method of operating the NIC of FIG. 1A
`
`to transfer a packet received from a network to a host computer in accordance with an
`
`embodiment of the invention.
`
`FIG. 2 is a diagram of a packet transmitted over a network and received at a
`
`network interface circuit in one embodiment of the invention.
`
`FIG. 3 is a block diagram depicting a header parser of a network interface circuit for
`
`parsing a-packet in accordance with an embodiment of the invention.
`
`FIGS. 4A-4B comprise a flow chart demonstrating one method of parsing a packet
`
`received from a network at a network interface circuit in accordance with an embodiment
`
`10
`
`of the present invention.
`
`15
`
`20
`
`25
`
`30
`
`FIG. 5 is a block diagram depicting a network interface circuit flow database in
`
`accordance with an embodiment of the invention.
`
`FIGS. 6A-6E comprise a flowchart illustrating one method of managing a network
`
`interface circuit flow database in accordance with an embodiment of the invention.
`
`FIG. 7 is a flow chart demonstrating one method of distributing the processing of
`
`network packets among multiple processors on a host computer in accordance with an
`
`embodiment of the invention.
`
`FIG. 8 is a diagram of a packet queue for a network interface circuit in accordance
`
`with an embodiment of the invention.
`
`FIG. 9 is a diagram of a control queue for a network interface circuit in accordance
`
`with an embodiment of the invention.
`
`FIG. 10 depicts a system for randomly discarding a packet from a network interface
`
`in accordance with an embodiment of the invention.
`
`FIGS. 11A—11B comprise a flow chart demonstrating one method of discarding a
`
`packet from a network interface in accordance with an embodiment of the invention.
`
`FIG. 12 depicts one set of dynamic instructions for parsing a packet in accordance
`
`with an embodiment of the invention.
`
`DETAILED DESCRIPTION
`
`The following description is presented to enable any person skilled in the art to
`
`make and use the invention, and is provided in the context of particular applications of the
`
`invention and their requirements. Various modifications to the disclosed embodiments will
`
`be readily apparent to those skilled in the art and the general principles defined herein may
`
`6
`
`
`
`'WO 00/52882
`
`PCT/USOO/05343
`
`be applied to other embodiments and applications without departing from the spirit and
`
`scope of the present invention. Thus, the present invention is not intended to be limited to
`
`the embodiments shown, but is to be accorded the widest scope consistent with the
`
`principles and features disclosed herein.
`
`In particular, embodiments of the invention are described below in the form of a
`
`network interface circuit (NIC) receiving communication packets formatted in accordance
`
`with certain communication protocols compatible with the Internet. One skilled in the art
`
`will recognize, however, that the present invention is not limited to communication
`
`protocols compatible with the Internet and may be readily adapted for use with other
`
`protocols and in communication devices other than a NIC.
`
`The program environment in which a present embodiment of the invention is
`
`executed illustratively incorporates a general-purpose computer or a special purpose device
`
`such a hand-held computer. Details of such devices (e.g., processor, memory, data storage,
`
`input/output ports and display) are well known and are omitted for the sake of clarity.
`
`It should also be understood that the techniques of the present invention might be
`
`implemented using a variety of technologies. For example, the methods described herein
`
`may be implemented in software running on a programmable microprocessor, or
`
`implemented in hardware utilizing either a combination of microprocessors or other
`
`specially designed application specific integrated circuits, programmable logic devices, or
`
`various combinations thereof. In particular, the methods described herein may be
`
`implemented by a series of computer-executable instructions residing on a storage medium
`
`such as a carrier wave, disk drive, or other computer-readable medium.
`
`Introduction
`
`In one embodiment of the present invention, a network interface circuit (NIC) is
`
`configured to receive and process communication packets exchanged between a host
`
`computer system and a network such as the Internet. In particular, the NIC is configured to
`
`receive and manipulate packets formatted in accordance with a protocol stack (e.g., a
`
`combination of communication protocols) supported by a network coupled to the NIC.
`
`A protocol stack may be described with reference to the seven-layer ISO-OSI
`
`(International Standards Organization — Open Systems Interconnection) model framework.
`
`Thus, one illustrative protocol stack includes the Transport Control Protocol (TCP) at layer
`
`four, Internet Protocol (IP) at layer three and Ethernet at layer two. For purposes of
`
`7
`
`10
`
`15
`
`20
`
`25
`
`30
`
`
`
`~WO 00/52882
`
`PCT/USOO/05343
`
`discussion, the term “Ethernet” may be used herein to refer collectively to the standardized
`
`IEEE (Institute of Electrical and Electronics Engineers) 802.3 specification as well as
`
`version two of the non-standardized form of the protocol. Where different forms of the
`
`protocol need to be distinguished, the standard form may be identified by including the
`
`“802.3” designation.
`
`Other embodiments of the invention are configured to work with communications
`
`adhering to other protocols, both known (e.g., AppleTalk, IPX (Intemetwork Packet
`
`Exchange), etc.) and unknown at the present time. The methods provided by this invention
`
`are easily adaptable for new communication protocols.
`
`In addition, the processing of packets described below may be performed on
`
`communication devices other than a NIC. For example, a modem, switch, router or other
`
`communication port or device (e.g., serial, parallel, USB, SCSI) may be similarly
`
`configured and operated.
`
`In embodiments of the invention described below, a NIC receives a packet from a
`
`network on behalf of a host computer system or other communication device. The NIC
`
`analyzes the packet (e.g., by retrieving certain fields from one or more of its protocol
`
`headers) and takes action to increase the efficiency with which the packet is transferred or
`
`provided to its destination entity. Equipment and methods discussed below for increasing
`
`the efficiency of processing or transferring packets received from a network may also be
`
`used for packets moving in the reverse direction (i.e., from the NIC to the network).
`
`One technique that may be applied to incoming network traffic involves examining
`
`or parsing one or more headers of an incoming packet (e. g., headers for the layer two, three
`
`and four protocols) in order to identify the packet’s source and destination entities and
`
`possibly retrieve certain other information. Using identifiers of the communicating entities
`
`as a key, data from multiple packets may be aggregated or re-assembled. Typically, a
`
`datagram sent to one destination entity from one source entity is transmitted via multiple
`
`packets. Aggregating data from multiple related packets (e.g., packets carrying data from
`
`the same datagram) thus allows a datagram to be re-assembled and collectively transferred
`
`10
`
`15
`
`20
`
`25
`
`to a host computer. The datagram may then be provided to the destination entity in a highly
`
`30
`
`efficient manner. For example, rather than providing data from one packet at a time (and
`
`one byte at a time) in separate “copy” operations, a “page-flip” operation may be
`
`performed. In a page-flip, an entire memory page of data may be provided to the
`
`destination entity, possibly in exchange for an empty or unused page.
`
`8
`
`
`
`~WO 00/52882
`
`PCT/USOO/05343
`
`In another technique, packets received from a network are placed in a queue to
`
`await transfer to a host computer. While awaiting transfer, multiple related packets may be
`
`identified to the host computer. After being transferred, they may be processed as a group
`
`by a host processor rather than being processed serially (e.g., one at a time).
`
`Yet another technique involves submitting a number of related packets to a single
`
`processor of a multi-processor host computer system. By distributing packets conveyed
`
`between different pairs of source and destination entities among different processors, the
`
`processing of packets through their respective protocol stacks can be distributed while still
`
`maintaining packets in their correct order.
`
`10
`
`15
`
`The techniques discussed above for increasing the efficiency with which packets are
`
`processed may involve a combination of hardware and software modules located on a
`
`network interface and/or a host computer system. In one particular embodiment, a parsing
`
`module on a host computer’s NIC parses header portions of packets. Illustratively, the
`
`parsing module comprises a microsequencer operating according to a set of replaceable
`
`instructions stored as micro-code. Using information extracted from the packets, multiple
`
`packets from one source entity to one destination entity may be identified. A hardware re-
`
`assembly module on the NIC may then gather the data from the multiple packets. Another
`
`hardware module on the NIC is configured to recognize related packets awaiting transfer to
`
`the host computer so that they may be processed through an appropriate protocol stack
`
`20
`
`collectively, rather than serially. The re-assembled data and the packet’s headers may then
`
`be provided to the host computer so that appropriate software (e.g., a device driver for the
`
`NIC) may process the headers and deliver the data to the destination entity.
`
`Where the host computer includes multiple processors, a load distributor (which
`
`may also be implemented in hardware on the NIC) may select a processor to process the
`
`25
`
`headers of the multiple packets through a protocol stack.
`
`In another embodiment of the invention, a system is provided for randomly
`
`discarding a packet from a NIC when the NIC is saturated or nearly saturated with packets
`
`awaiting transfer to a host computer.
`
`30
`
`One Embodiment of a High Performance Network Interface Circuit
`
`FIG. 1A depicts NIC 100 configured in accordance with an illustrative embodiment
`
`of the invention. A brief description of the operation and interaction of the various
`
`modules of NIC 100 in this embodiment follows.
`
`9
`
`
`
`~WO 00/52882
`
`PCT/USOO/05343
`
`A communication packet may be received at NIC 100 from network 102 by a
`
`medium access control (MAC) module (not shown in FIG. 1A). The MAC module
`
`performs low-level processing of the packet such as reading the packet from the network,
`
`performing some error checking, detecting packet fragments, detecting over-sized packets,
`
`removing the layer one preamble, etc.
`
`Input Port Processing (IPP) module 104 then receives the packet. The IPP module
`
`stores the entire packet in packet queue 116, as received from the MAC module or network,
`
`and a portion of the packet is copied into header parser 106. In one embodiment of the
`
`invention IPP module 104 may act as a coordinator of sorts to prepare the packet for
`
`transfer to a host computer system. In such a role, IPP module 104 may receive
`
`information concerning a packet from various modules of NIC 100 and dispatch such
`
`information to other modules.
`
`Header parser 106 parses a header portion of the packet to retrieve various pieces of
`
`information that will be used to identify related packets (e.g., multiple packets from one
`
`same source entity for one destination entity) and that will affect subsequent processing of
`
`the packets. In the illustrated embodiment, header parser 106 communicates with flow
`
`database manager (FDBM) 108, which manages flow database (FDB) 110. In particular,
`
`header parser 106 submits a query to FDBM 108 to determine whether a valid
`
`communication flow (described below) exists between the source entity that sent a packet
`
`and the destination entity. The destination entity may comprise an application program, a
`
`communication module, or some other element of a host computer system that is to receive
`
`the packet.
`
`10
`
`15
`
`20
`
`In the illustrated embodiment of the invention, a communication flow comprises
`
`one or more datagram packets from one source entity to one destination entity. A flow may
`
`25
`
`be identified by a flow key assembled from source and destination identifiers retrieved
`
`from the packet by header parser 106. In one embodiment of the invention a flow key
`
`comprises address and/or port information for the source and destination entities from the
`
`packet’s layer three (e.g., IP) and/or layer four (e.g., TCP) protocol headers.
`
`For purposes of the illustrated embodiment of the invention, a communication flow
`
`30
`
`is similar to a TCP end-to-end connection but is generally shorter in duration. In particular,
`
`in this embodiment the duration of a flow may be limited to the time needed to receive all
`
`of the packets associated with a single datagram passed from the source entity to the
`
`destination entity.
`
`1 0
`
`
`
`WO 00/52882
`
`PCT/USOO/05343
`
`Thus, for purposes of flow management, header parser 106 passes the packet’s flow
`
`key to flow database manager 108. The header parser may also provide the flow database
`
`manager with other information concerning the packet that was retrieved from the packet
`
`(e. g., length of the packet).
`
`Flow database manager 108 searches FDB 110 in response to a query received from
`
`header parser 106. Illustratively, flow database 110 stores information concerning each
`
`valid communication flow involving a destination entity served by NIC 100. Thus, FDBM
`
`108 updates FDB 110 as necessary, depending upon the information received from header
`
`parser 106. In addition, in this embodiment of the invention FDBM 108 associates an
`
`operation or action code with the received packet. An operation code may be used to
`
`identify whether a packet is part of a new or existing flow, whether the packet includes data
`
`or just control information, the amount of data within the packet, whether the packet data
`
`can be re-assembled with related data (e.g., other data in a datagram sent from the source
`
`entity to the destination entity), etc. FDBM 108 may use information retrieved from the
`
`packet and provided by header parser 106 to select an appropriate operation code. The
`
`packet’s operation code is then passed back to the header parser, along with an index of the
`
`10
`
`15
`
`packet’s flow within FDB 110.
`
`In one embodiment of the invention the combination of header parser 106, FDBM
`
`108 and FDB 110, or a subset of these modules, may be known as a traffic classifier due to
`
`20
`
`their role in classifying or identifyng network traffic received at NIC 100.
`
`In the illustrated embodiment, header parser 106 also passes the packet’s flow key
`
`to load distributor 112. In a host computer system having multiple processors, load
`
`distributor 112 may determine which processor an incoming packet is to be routed to for
`
`processing through the appropriate protocol stack. Load distributor 112 may, for example,
`
`25
`
`ensure that related packets are routed to a single processor. By sending all packets in one
`
`communication flow or end-to-end connection to a single processor, the correct ordering of
`
`packets can be enforced. Load distributor 112 may be omitted in an alternative
`
`embodiment of the invention. In an alternative embodiment, header parser 106 may also
`
`communicate directly with