`
`FOR THE PURPOSES OF INFORMATION ONLY
`
`Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT.
`
`AL
`AM
`AT
`AU
`AZ
`BA
`BB
`BE
`BF
`BG
`BJ
`BR
`BY
`CA
`CF
`CG
`CH
`CI
`CM
`CN
`cu
`CZ
`DE
`DK
`EE
`
`Albania
`Am1enia
`Austria
`Australia
`Azerbaijan
`Bosnia and Herzegovina
`Barbados
`Belgium
`Burkina Faso
`Bulgaria
`Benin
`Brazil
`Belarus
`Canada
`Central African Republic
`Congo
`Switzerland
`Ci\te d'Ivoire
`Cameroon
`China
`Cuba
`Czech Republic
`Germany
`Denmark
`Estonia
`
`ES
`FI
`FR
`GA
`GB
`GE
`GH
`GN
`GR
`HU
`IE
`IL
`IS
`IT
`JP
`KE
`KG
`KP
`
`KR
`KZ
`LC
`LI
`LK
`LR
`
`Spain
`Finland
`France
`Gabon
`United Kingdom
`Georgia
`Ghana
`Guinea
`Greece
`Hungary
`Ireland
`Israel
`Iceland
`Italy
`Japan
`Kenya
`Kyrgyzstan
`Democratic People's
`Republic of Korea
`Republic of Korea
`Kazakstan
`Saint Lucia
`Liechtenstein
`Sri Lanka
`Liberia
`
`LS
`LT
`LU
`LV
`MC
`MD
`MG
`MK
`
`ML
`MN
`MR
`MW
`MX
`NE
`NL
`NO
`NZ
`PL
`PT
`RO
`RU
`SD
`SE
`SG
`
`Lesotho
`Lithuania
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Madagascar
`TI1e former Yugoslav
`Republic of Macedonia
`Mali
`Mongolia
`Mauritania
`Malawi
`Mexico
`Niger
`Netherlands
`Norway
`New Zealand
`Poland
`Portugal
`Romania
`Russian Federation
`Sudan
`Sweden
`Singapore
`
`SI
`SK
`SN
`sz
`TD
`TG
`TJ
`TM
`TR
`TT
`UA
`UG
`us
`uz
`VN
`YU
`zw
`
`Slovenia
`Slovakia
`Senegal
`Swaziland
`Chad
`Togo
`Tajikistan
`Turkmenistan
`Turkey
`Trinidad and Tobago
`Ukraine
`Uganda
`United States of America
`Uzbekistan
`Viet Nam
`Yugoslavia
`Zimbabwe
`
`Ex.1049.002
`
`DELL
`
`
`
`WO 00/13091
`
`PCT /US98/24943
`
`INTELLIGENT NETWORK INTERFACE DEVICE
`
`AND SYSTEM FOR ACCELERATING COMMUNICATION
`
`Technical field
`
`5
`
`The present invention relates generally to computer or other networks, and more
`
`particularly to processing of information communicated between hosts such as computers
`
`connected to a network.
`
`Background
`
`The advantages of network computing are increasingly evident. The convenience and
`
`10
`
`efficiency of providing information, communication or computational power to individuals at
`
`their personal computer or other end user devices has led to rapid growth of such network
`
`computing, including internet as well as intranet devices and applications.
`
`As is well known, most network computer communication is accomplished with the
`
`aid of a layered software architecture for moving information between host computers
`
`15
`
`connected to the network. The layers help to segregate information into manageable
`
`segments, the general functions of each layer often based on an international standard called
`
`Open Systems Interconnection (OSI). OSI sets forth seven processing layers through which
`
`information may pass when received by a host in order to be presentable to an end user.
`
`Similarly, transmission of information from a host to the network may pass through those
`
`20
`
`seven processing layers in reverse order. Each step of processing and service by a layer may
`
`include copying the processed information. Another reference model that is widely
`
`implemented, called TCP/IP (TCP stands for transport control protocol, while IP denotes
`
`internet protocol) essentially employs five of the seven layers of OSI.
`
`Networks may include, for instance, a high-speed bus such as an Ethernet connection
`
`25
`
`or an internet connection between disparate local area networks (LANs), each of which
`
`includes multiple hosts, or any of a variety of other known means for data transfer between
`
`hosts. According to the OSI standard, physical layers are connected to the network at
`
`respective hosts, the physical layers providing transmission and receipt of raw dat~ bits via
`
`the network. A data link layer is serviced by the physical layer of each host, the data link
`
`30
`
`layers providing frame division and error correction to the data received from the physical
`
`layers, as well as processing acknowledgment frames sent by the receiving host. A network
`
`layer of each host is serviced by respective data link layers, the network layers primarily
`
`controlling size and coordination of subnets of packets of data.
`
`Ex.1049.003
`
`DELL
`
`
`
`WO 00/13091
`
`PCT/US98/24943
`
`A transport layer is serviced by each network layer and a session layer is serviced by
`
`each transport layer within each host. Transport layers accept data from their respective
`
`session layers and split the data into smaller units for transmission to the other host's
`
`transport layer, which concatenates the data for presentation to respective presentation layers.
`
`5
`
`Session layers allow for enhanced communication control between the hosts. Presentation
`
`layers are serviced by their respective session layers, the presentation layers translating
`
`between data semantics and syntax which may be peculiar to each host and standardized
`
`structUres of data representation. Compression and/or encryption of data may also be
`
`accomplished at the presentation level. Application layers are serviced by respective
`
`10
`
`presentation layers, the application layers translating between programs particular to
`
`individual hosts and standardized programs for presentation to either an application or an end
`
`user. The TCP/IP standard includes the lower four layers and application layers, but
`
`integrates the functions of session layers and presentation layers into adjacent layers.
`
`Generally speaking, application, presentation and session layers are defined as upper layers,
`
`15 while transport, network and data link layers are defined as lower layers.
`
`The rules and conventions for each layer are called the protocol of that layer, and
`
`since the protocols and general functions of each layer are roughly equivalent in various
`
`hosts, it is useful to think of communication occurring directly between identical layers of
`
`different hosts, even though these peer layers do not directly communicate without
`
`20
`
`information transferring sequentially through each layer below. Each lower layer performs a
`
`service for the layer immediately above it to help with processing the communicated
`
`information. Each layer saves the information for processing and service to the next layer.
`
`Due to the multiplicity of hardware and software architectures, devices and programs
`
`commonly employed, each layer is necessary to insure that the data can make it to the
`
`25
`
`intended destination in the appropriate form, regardless of variations in hardware and
`
`software that may intervene.
`
`In preparing data for transmission from a first to a second host, some control data is
`
`added at each layer of the first host regarding the protocol of that layer, the control. data being
`
`indistinguishable from the original (payload) data for all lower layers of that host. Thus an
`
`30
`
`application layer attaches an application header to the payload data and sends the combined
`
`data to the presentation layer of the sending host, which receives the combined data, operates
`
`on it and adds a presentation header to the data, resulting in another combined data packet.
`
`The data resulting from combination of payload data, application header and presentation
`
`header is then passed to the session layer, which performs required operations including
`
`2
`
`Ex.1049.004
`
`DELL
`
`
`
`WO 00/13091
`
`PCT /US98/24943
`
`attaching a session header to the data and presenting the resulting combination of data to the
`
`transport layer. This process continues as the information moves to lower layers, with a
`
`transport header, network header and data link header and trailer attached to the data at each
`
`of those layers, with each step typically including data moving and copying, before sending
`
`5
`
`the data as bit packets over the network to the second host.
`
`The receiving host generally performs the converse of the above-described process,
`
`beginning with receiving the bits from the network, as headers are removed and data
`
`processed in order from the lowest (physical) layer to the highest (application) layer before
`
`transmission to a destination of the receiving host. Each layer of the receiving host
`
`10
`
`recognizes and manipulates only the headers associated with that layer, since to that layer the
`
`higher layer control data is included with and indistinguishable from the payload data.
`
`Multiple interrupts, valuable central processing unit (CPU) processing time and repeated data
`
`copies may also be necessary for the receiving host to place the data in an appropriate form at
`
`its intended destination.
`
`15
`
`The above description of layered protocol processing is simplified, as college-level
`
`textbooks devoted primarily to this subject are available, such as Computer Networks, Third
`
`Edition (1996) by Andrew S. Tanenbaum, which is incorporated herein by reference. As
`
`defined in that book, a computer network is an interconnected collection of autonomous
`
`computers, such as internet and intranet devices, including local area networks (LANs), wide
`
`20
`
`area networks (WANs), asynchronous transfer mode (ATM), ring or token ring, wired,
`
`wireless, satellite or other means for providing communication capability between separate
`
`processors. A computer is defined herein to include a device having both logic and memory
`
`functions for processing data, while computers or hosts connected to a network are said to be
`
`heterogeneous if they function according to different operating devices or communicate via
`
`25
`
`different architectures.
`
`As networks grow increasingly popular and the information communicated thereby
`
`becomes increasingly complex and copious, the need for such protocol processing has
`
`increased. It is estimated that a large fraction of the processing power of a host CfU may be
`
`devoted to controlling protocol processes, diminishing the ability of that CPU to perform
`
`30
`
`other tasks. Network interface cards have been developed to help with the lowest layers,
`
`such as the physical and data link layers. It is also possible to increase protocol processing
`
`speed by simply adding more processing power or CPUs according to conventional
`
`arrangements. This solution, however, is both awkward and expensive. But the complexities
`
`presented by various networks, protocols, architectures, operating devices and applications
`
`3
`
`Ex.1049.005
`
`DELL
`
`
`
`WO 00/13091
`
`PCT /US98/24943
`
`generally require extensive processing to afford communication capability between various
`
`network hosts.
`
`Disclosure of the Invention
`
`The current invention provides a device for processing network communication that
`
`5
`
`greatly increases the speed of that processing and the efficiency of transferring data being
`
`communicated. The invention has been achieved by questioning the long-standing practice
`
`of performing multilayered protocol processing on a general-purpose processor. The protocol
`
`processing method and architecture that results effectively collapses the layers of a
`
`connection-based, layered architecture such as TCP/IP into a single wider layer which is able
`
`10
`
`to send network data more directly to and from a desired location or buffer on a host. This
`
`accelerated processing is provided to a host for both transmitting and receiving data, and so
`
`improves performance whether one or both hosts involved in an exchange of information
`
`have such a feature.
`
`The accelerated processing includes employing representative control instructions for
`
`15
`
`a given message that allow data from the message to be processed via a fast-path which
`
`accesses message data directly at its source or delivers it directly to its intended destination.
`
`This fast-path bypasses conventional protocol processing of headers that accompany the data.
`
`The fast-path employs a specialized microprocessor designed for processing network
`
`communication, avoiding the delays and pitfalls of conventional software layer processing,
`
`20
`
`such as repeated copying and interrupts to the CPU. In effect, the fast-path replaces the states
`
`that are traditionally found in several layers of a conventional network stack with a single
`
`state machine encompassing all those layers, in contrast to conventional rules that require
`
`rigorous differentiation and separation of protocol layers. The host retains a sequential
`
`protocol processing stack which can be employed for setting up a fast-path connection or
`
`25
`
`processing message exceptions. The specialized microprocessor and the host intelligently
`
`choose whether a given message or portion of a message is processed by the microprocessor
`
`or the host stack.
`
`Brief Description of the Drawings
`
`FIG. 1 is a plan view diagram of a device of the present invention, including a host
`
`30
`
`computer having a communication-processing device for accelerating network
`
`communication.
`
`FIG. 2 is a diagram of information flow for the host of FIG. 1 in processing network
`
`communication, including a fast-path, a slow-path and a transfer of connection context
`
`between the fast and slow-paths.
`
`4
`
`Ex.1049.006
`
`DELL
`
`
`
`WO 00/13091
`
`PCT /US98/24943
`
`FIG. 3 is a flow chart of message receiving according to the present invention.
`
`FIG. 4A is a diagram of information flow for the host of FIG. 1 receiving a message
`
`packet processed by the slow-path.
`
`FIG. 4B is a diagram of information flow for the host of FIG. 1 receiving an initial
`
`5 message packet processed by the fast-path.
`
`FIG. 4C is a diagram of information flow for the host of FIG. 4B receiving a
`
`subsequent message packet processed by the fast-path.
`
`FIG. 4D is a diagram of information flow for the host of FIG. 4C receiving a message
`
`packet having an error that causes processing to revert to the slow-path.
`
`10
`
`FIG. 5 is a diagram of information flow for the host of FIG. 1 transmitting a message
`
`by either the fast or slow-paths.
`
`FIG. 6 is a diagram of information flow for a first embodiment of an intelligent
`
`network interface card (INIC) associated with a client having a TCP/IP processing stack.
`
`FIG. 7 is a diagram of hardware logic for the INIC embodiment shown in FIG. 6,
`
`15
`
`including a packet control sequencer and a fly-by sequencer.
`
`FIG. 8 is a diagram of the fly-by sequencer of FIG. 7 for analyzing header bytes as
`
`they are received by the INIC.
`
`FIG. 9 is a diagram of information flow for a second embodiment of an INIC
`
`associated with a server having a TCP/IP processing stack.
`
`20
`
`FIG. 10 is a diagram of a command driver installed in the host of FIG. 9 for creating
`
`and controlling a communication control block for the fast-path.
`
`FIG. 11 is a diagram of the TCP/IP stack and command driver of FIG. 10 configured
`
`for NetBios communications.
`
`FIG. 12 is a diagram of a communication exchange between the client of FIG. 6 and
`
`25
`
`the server of FIG. 9.
`
`FIG. 13 is a diagram of hardware functions included in the INIC of FIG. 9.
`
`FIG. 14 is a diagram of a trio of pipelined microprocessors included in the INIC of
`
`FIG. 13, including three phases with a processor in each phase.
`
`FIG. 15A is a diagram of a first phase of the pipelined microprocessor of FIG. 14.
`
`30
`
`FIG. l 5B is a diagram of a second phase of the pipelined microprocessor of FIG. 14.
`
`FIG. 15C is a diagram of a third phase of the pipelined microprocessor of FIG. 14.
`
`FIG. 16 is a diagram of a plurality of queue storage units that interact with the
`
`microprocessor of FIG. 14 and include SRAM and DRAM.
`
`FIG. 17 is a diagram of a set of status registers for the queues storage units of FIG. 16.
`
`5
`
`Ex.1049.007
`
`DELL
`
`
`
`WO 00/13091
`
`PCT /US98/24943
`
`FIG. 18 is a diagram of a queue manager, which interacts, with the queue storage
`
`units and status registers of FIG. 16 and FIG. 17.
`
`FIGs. 19A-D are diagrams of various stages of a least-recently-used register that is
`
`employed for allocating cache memory.
`
`5
`
`FIG. 20 is a diagram of the devices used to operate the least-recently-used register of
`
`FIGs. 19A-D.
`
`Best Mode for Carrying Out the Invention
`
`· FIG. 1 shows a host 20 of the present invention connected by a network 25 to a
`
`remote host 22. The increase in processing speed achieved by the present invention can be
`
`10
`
`provided with an intelligent network interface card (INIC) that is easily and affordably added
`
`to an existing host, or with a communication processing device (CPD) that is integrated into a
`
`host, in either case freeing the host CPU from most protocol processing and allowing
`
`improvements in other tasks performed by that CPU. The host 20 in a first embodiment
`
`contains a CPU 28 and a CPD 30 connected by a PCI bus 33. The CPD 30 includes a
`
`15 microprocessor designed for processing communication data and memory buffers controlled
`
`by a direct memory access (DMA) unit. Also connected to the PCI bus 33 is a storage device
`
`35, such as a semiconductor memory or disk drive, along with any related controls.
`
`Referring additionally to FIG. 2, the host CPU 28 controls a protocol processing stack
`
`44 housed in storage 35, the stack including a data link layer 36, network layer 38, transport
`
`20
`
`layer 40, upper layer 46 and an upper layer interface 42. The upper layer 46 may represent a
`
`session, presentation and/or application layer, depending upon the particular protocol being
`
`employed and message communicated. The upper layer interface 42, along with the CPU 28
`
`and any related controls can send or retrieve a file to or from the upper layer 46 or storage 35,
`
`as shown by arrow 48. A connection context 50 has been created, as will be explained below,
`
`25
`
`the context summarizing various features of the connection, such as protocol type and source
`
`and destination addresses for each protocol layer. The context may be passed between an
`
`interface for the session layer 42 and the CPD 30, as shown by arrows 52 and 54, and stored
`
`as a communication control block (CCB) at either CPD 30 or storage 35.
`
`When the CPD 30 holds a CCB defining a particular connection, data received by the
`
`30 CPD from the network and pertaining to the connection is referenced to that CCB and can
`
`then be sent directly to storage 35 according to a fast-path 58, bypassing sequential protocol
`
`processing by the data link 36, network 38 and transport 40 layers. Transmitting a message,
`
`such as sending a file from storage 35 to remote host 22, can also occur via the fast-path 58,
`
`in which case the context for the file data is added by the CPD 30 referencing a CCB, rather
`
`6
`
`Ex.1049.008
`
`DELL
`
`
`
`WO 00/13091
`
`PCT /US98/24943
`
`than by sequentially adding headers during processing by the transport 40, network 38 and
`
`data link 36 layers. The DMA controllers of the CPD 30 perform these transfers between
`
`CPD and storage 35.
`
`The CPD 30 collapses multiple protocol stacks each having possible separate states
`
`5
`
`into a single state machine for fast-path processing. As a result, exception conditions may
`
`occur that are not provided for in the single state machine, primarily because such conditions
`
`occur infrequently and to deal with them on the CPD would provide little or no performance
`
`benefit to the host. Such exceptions can be CPD 30 or CPU 28 initiated. An advantage of
`
`the invention includes the manner in which unexpected situations that occur on a fast-path
`
`10 CCB are handled. The CPD 30 deals with these rare situations by passing back or flushing to
`
`the host protocol stack 44 the CCB and any associated message frames involved, via a control
`
`negotiation. The exception condition is then processed in a conventional manner by the host
`
`protocol stack 44. At some later time, usually directly after the handling of the exception
`
`condition has completed and fast-path processing can resume, the host stack 44 hands the
`
`15
`
`CCB back to the CPD.
`
`This fallback capability enables the performance-impacting functions of the host
`
`protocols to be handled by the CPD network microprocessor, while the exceptions are dealt
`
`with by the host stacks, the exceptions being so rare as to negligibly effect overall
`
`performance. The custom designed network microprocessor can have independent
`
`20
`
`processors for transmitting and receiving network information, and further processors for
`
`assisting and queuing. A preferred microprocessor embodiment includes a pipelined trio of
`
`receive, transmit and utility processors. DMA controllers are integrated into the
`
`implementation and work in close concert with the network microprocessor to quickly move
`
`data between buffers adjacent to the controllers and other locations such as long term storage.
`
`25
`
`Providing buffers logically adjacent to the DMA controllers avoids unnecessary loads on the
`
`PCI bus.
`
`FIG. 3 diagrams the general flow of messages received according to the current
`
`invention. A large TCP/IP message such as a file transfer may be received by the post from
`
`the network in a number of separate, approximately 64 KB transfers, each of which may be
`
`30
`
`split into many, approximately 1.5 KB frames or packets for transmission over a network.
`
`Novell NetWare protocol suites running Sequenced Packet Exchange Protocol (SPX) or
`
`NetWare Core Protocol (NCP) over Internetwork Packet Exchange (IPX) work in a similar
`
`fashion. Another form of data communication which can be handled by the fast-path is
`
`Transaction TCP (hereinafter T /TCP or TTCP), a version of TCP which initiates a connection
`
`7
`
`Ex.1049.009
`
`DELL
`
`
`
`WO 00/13091
`
`PCT /US98/24943
`
`with an initial transaction request after which a reply containing data may be sent according
`
`to the connection, rather than initiating a connection via a several-message initialization
`
`dialogue and then transferring data with later messages. In any of the transfers typified by
`
`these protocols, each packet conventionally includes a portion of the data being transferred,
`
`5
`
`as well as headers for each of the protocol layers and markers for positioning the packet
`
`relative to the rest of the packets of this message.
`
`When a message packet or frame is received 47 from a network by the CPD, it is first
`
`validated by a hardware assist. This includes determining the protocol types of the various
`
`layers, verifying relevant checksums, and summarizing 57 these findings into a status word or
`
`10 words. Included in these words is an indication whether or not the frame is a candidate for
`
`fast-path data flow. Selection 59 of fast-path candidates is based on whether the host may
`
`benefit from this message connection being handled by the CPD, which includes determining
`
`whether the packet has header bytes indicating particular protocols, such as TCP/IP or
`
`SPX/IPX for example. The small percent of frames that are not fast-path candidates are sent
`
`15
`
`61 to the host protocol stacks for slow-path protocol processing. Subsequent network
`
`microprocessor work with each fast-path candidate determines whether a fast-path connection
`
`such as a TCP or SPX CCB is already extant for that candidate, or whether that candidate
`
`may be used to set up a new fast-path connection, such as for a TTCP/IP transaction. The
`
`validation provided by the CPD provides acceleration whether a frame is processed by the
`
`20
`
`fast-path or a slow-path, as only error free, validated frames are processed by the host CPU
`
`even for the slow-path processing.
`
`All received message frames which have been determined by the CPD hardware assist
`
`to be fast-path candidates are examined 53 by the network microprocessor or INIC
`
`comparator circuits to determine whether they match a CCB held by the CPD. Upon
`
`25
`
`confirming such a match, the CPD removes lower layer headers and sends 69 the remaining
`
`application data from the frame directly into its final destination in the host using direct
`
`memory access (DMA) units of the CPD. This operation may occur immediately upon
`
`receipt of a message packet, for example when a TCP connection already exists aJ:\d
`
`destination buffers have been negotiated, or it may first be necessary to process an initial
`
`30
`
`header to acquire a new set of final destination addresses for this transfer. In this latter case,
`
`the CPD will queue subsequent message packets while waiting for the destination address,
`
`and then DMA the queued application data to that destination.
`
`A fast-path candidate that does not match a CCB may be used to set up a new fast(cid:173)
`
`path connection, by sending 65 the frame to the host for sequential protocol processing. In
`
`8
`
`Ex.1049.010
`
`DELL
`
`
`
`WO 00/13091
`
`PCT /US98/24943
`
`this case, the host uses this frame to create 51 a CCB, which is then passed to the CPD to
`
`control subsequent frames on that connection. The CCB, which is cached 67 in the CPD,
`
`includes control and state information pertinent to all protocols that would have been
`
`processed had conventional software layer processing been employed. The CCB also
`
`5
`
`contains storage space for per-transfer information used to facilitate moving application-level
`
`data contained within subsequent related message packets directly to a host application in a·
`
`form available for immediate usage. The CPD takes command of connection processing
`
`upon receiving a CCB for that connection from the host.
`
`As shown more specifically in FIG. 4A, when a message packet is received from the
`
`10
`
`remote host 22 via network 25, the packet enters hardware receive logic 32 of the CPD 30,
`
`which checksums headers and data, and parses the headers, creating a word or words which
`
`identify the message packet and status, storing the headers, data and word temporarily in
`
`memory 60. As well as validating the packet, the receive logic 32 indicates with the word
`
`whether this packet is a candidate for fast-path processing. FIG. 4A depicts the case in which
`
`15
`
`the packet is not a fast-path candidate, in which case the CPD 30 sends the validated headers
`
`and data from memory 60 to data link layer 36 along an internal bus for processing by the
`
`host CPU, as shown by arrow 56. The packet is processed by the host protocol stack 44 of
`
`data link 36, network 38, transport 40 and session 42 layers, and data (D) 63 from the packet
`
`may then be sent to storage 35, as shown by arrow 65.
`
`20
`
`FIG. 4B, depicts the case in which the receive logic 32 of the CPD determines that a
`
`message packet is a candidate for fast-path processing, for example by deriving from the
`
`packet's headers that the packet belongs to a TCP/IP, TTCP/IP or SPX/IPX message. A
`
`processor 55 in the CPD 30 then checks to see whether the word that summarizes the fast(cid:173)
`
`path candidate matches a CCB held in a cache 62. Upon finding no match for this packet, the
`
`25
`
`CPD sends the validated packet from memory 60 to the host protocol stack 44 for processing.
`
`Host stack 44 may use this packet to create a connection context for the message, including
`
`finding and reserving a destination for data from the message associated with the packet, the
`
`context taking the form of a CCB. The present embodiment employs a single specjalized
`
`host stack 44 for processing both fast-path and non-fast-path candidates, while in an
`
`30
`
`embodiment described below fast-path candidates are processed by a different host stack than
`
`non-fast-path candidates. Some data (DI) 66 from that initial packet may optionally be sent
`
`to the destination in storage 35, as shown by arrow 68. The CCB is then sent to the CPD 30
`
`to be saved in cache 62, as shown by arrow 64. For a traditional connection-based message
`
`9
`
`Ex.1049.011
`
`DELL
`
`
`
`WO 00/13091
`
`PCT/US98/24943
`
`such as typified by TCP/IP, the initial packet may be part of a connection initialization
`
`dialogue that transpires between hosts before the CCB is created and passed to the CPD 30.
`
`Referring now to FIG. 4C, when a subsequent packet from the same connection as the
`
`initial packet is received from the network 25 by CPD 30, the packet headers and data are
`
`5
`
`validated by the receive logic 32, and the headers are parsed to create a summary of the
`
`message packet and a hash for finding a corresponding CCB, the summary and hash
`
`contained in a word or words. The word or words are temporarily stored in memory 60 along
`
`with the packet. The processor 55 checks for a match between the hash and each CCB that is
`
`stored in the cache 62 and, finding a match, sends the data (D2) 70 via a fast-path directly to
`
`10
`
`the destination in storage 35, as shown by arrow 72, bypassing the session layer 42, transport
`
`layer 40, network layer 38 and data link layer 36. The remaining data packets from the
`
`message can also be sent by DMA directly to storage, avoiding the relatively slow protocol
`
`layer processing and repeated copying by the CPU stack 44.
`
`FIG. 4D shows the procedure for handling the rare instance when a message for
`
`15 which a fast-path connection has been established, such as shown in FIG. 4C, has a packet
`
`that is not easily handled by the CPD. In this case the packet is sent to be processed by the
`
`protocol stack 44, which is handed the CCB for that message from cache 62 via a control
`
`dialogue with the CPD, as shown by arrow 76, signaling to the CPU to take over processing
`
`of that message. Slow-path processing by the protocol stack then results in data (D3) 80 from
`
`20
`
`the packet being sent, as shown by arrow 82, to storage 35. Once the packet has been
`
`processed and the error situation corrected, the CCB can be handed back via a control
`
`dialogue to the cache 62, so that payload data from subsequent packets of that message can
`
`again be sent via the fast-path of the CPD 30. Thus the CPU and CPD together decide
`
`whether a given message is to be processed according to fast-path hardware processing or
`
`25 more conventional software processing by the CPU.
`
`Transmission of a message from the host 20 to the network 25 for delivery to remote
`
`host 22 also can be processed by either sequential protocol software processing via the CPU
`
`or accelerated hardware processing via the CPD 30, as shown in FIG. 5. A message (M) 90
`
`that is selected by CPU 28 from storage 35 can be sent to session layer 42 for processing by
`
`30
`
`stack 44, as shown by arrows 92 and 96. For the situation in which a connection exists and
`
`the CPD 30 already has an appropriate CCB for the message, however, data packets can
`
`bypass host stack 44 and be sent by DMA directly to memory 60, with the processor 55
`
`adding to each data packet a single header containing all the appropriate protocol layers, and
`
`sending the resulting packets to the network 25 for transmission to remote host 22. This fast-
`
`10
`
`Ex.1049.012
`
`DELL
`
`
`
`WO 00/13091
`
`PCT /US98/24943
`
`path transmission can greatly accelerate processing for even a single packet, with the
`
`acceleration multiplied for a larger message.
`
`A message for which a fast-path connection is not extant thus may benefit from
`
`creation of a CCB with appropriate control and state information for guiding fast-path
`
`5
`
`transmission. For a traditional connection-based message, such as typified by TCP/IP or
`
`SPX/IPX, the CCB is created during connection initialization dialogue. For a quick(cid:173)
`
`connection message, such as typified by TTCP/IP, the CCB can be created with the same
`
`transaction that transmits payload data. In this case, the transmission of payload data may be
`
`a reply to a request that was used to set up the fast-path connection. In any case, the CCB
`
`10
`
`provides protocol and status information regarding each of the protocol layers, including
`
`which user is involved and storage space for per-transfer information. The CCB is created by
`
`protocol stack 44, which then passes the CCB to the CPD 30 by writing to a command
`
`register of the CPD, as shown by arrow 98. Guided by the CCB, the processor 55 moves
`
`network frame-sized portions of the data from the source in host memory 35 into its own
`
`15 memory 60 using DMA, as depicted by arrow 99. The processor 55 then prepends
`
`appropriate headers and checksums to the data portions, and transmits the resulting frames to
`
`the network 25, consistent with the restrictions of the associated protocols. After the CPD 30
`
`has received an acknowledgement that all the data has reached its destination, the CPD will
`
`then notify the host 35 by writing to a response buffer.
`
`20
`
`Thus, fast-path transmission of data communications also relieves the host CPU of
`
`per-frame processing. A vast majority of data transmissions can be sent to the network by the
`
`fast-path. Both the input and output fast-paths attain a huge reduction in interrupts by
`
`functioning at an upper layer level, i.e., session level or higher,