throbber
UNITED STATES DEPARTMENT OF COMMERCE
`
`United States Patent and Trademark Office
`
`November 03, 2004
`
`THIS IS TO CERTIFY THAT ANNEXED HERETO IS A TRUE COPY FROM
`THE RECORDS OF THE UNITED STATES PA TENT AND TRADEMARK
`OFFICE OF THOSE PAPERS OF THE BELOW IDENTIFIED PATENT
`APPLlCA TION THAT MET THE REQUlREMENTS TO BE GRANTED A
`FILING DA TE UNDER 35 USC 111.
`
`APPLICA TlON NUMBER: 601061,809
`FILJNG DA TE: October 14, 1997
`
`By Authority of the
`COMMISSIONER OF PA TENTS AND T RADEMARJ{S
`
`Certifying Officer
`
`ALA00138383
`
`

`

`Approved for use through·Ot/31198. OMB 0651-0037
`Patent nnd Trademark Office; U.S. OEP ARTMENT OF COMMERCE
`,...=
`;:> .........:
`PROVISIONAL APPLICATION FOR PATENT COVER SHEET
`•
`""!!:O:i
`(Large EntUy)
`-35§
`._
`r--
`~~~is a request for filing a PROVISIONAL APPLICATION FOR PATENT under37CFR 1.53 (b)(2) •
`I Docket Number I
`I Type a plus sign
`...-;
`.,. p--
`........... =
`ALA--001
`"'~
`~ ~ ; ====
`
`PTO/SB/16 (11195) (Modif.lti l...1tl/
`
`(+)inside Ibis ~
`
`I +
`
`TNVENTOR{1)1APPL1CANT(1)
`
`LJ.STl'IAMll
`
`~NAME
`
`MIDl>L£ llCTllAI.
`
`IUSIDENCE (CITY AMI> UTIUJlSTAn: Olt FOR.EiC:. COUl'llllV)
`
`Boucher
`Blightman
`Craft
`Higgin
`
`Laurence
`Stephen
`Peter
`David
`
`B.
`E.J.
`K.
`A.
`
`Saratoga, California
`San Jose, California
`San Francisco, California
`Saratoga, California
`
`INTELLIGENT NETWORK INTERFACE CARD AND SYSlCM FOR PROTOCOL PROCESSING
`
`TITLE OFTlll: INVENTION (2&o cbancten max)
`
`Mark Lauer
`6850 Regional Street,
`Suite250
`Dublin
`
`S'r .. TE I CA
`
`CO~PONDENCE ADDRESS
`
`Tel: (510)556-3500
`Fax: (510 803-8189
`
`I ~coos I
`
`94568
`
`I CO~-rRY I
`
`USA
`
`ENCLOSED APPUCA TlON PARTS (check all tltat app(y)
`
`Number of Patu
`
`130
`
`Nkmkro/Slt-
`
`S"""lflcatio
`
`Other (s eclfy) . ~~ngs are lncludeil within
`P
`•
`lflcatfon
`
`I
`
`METHOD OF PAYMENT OF FU..ING FEES FOR TJllS l'ROVISlONAL A'PPLlCA TION FOR PATENT (dreck t>ne)
`
`~ Sptcitiutlon
`Included In ~
`I ~ Dr1Win1(s)
`-. ~ A tbeck or money onkr ls tndoSO>d to covu the Ollng ftu
`D T bt Commissioner is bereby authorlud to cbat1t I
`
`lilin& fees a nd credit D<90slt A«Ount Num~
`
`l!J.LING FEE
`
`I AMOUNT
`
`$150-00
`
`The invention wu made by au a gene)' of the Unlud States Government or under a contract with an agency oflhe United States
`~ No.
`0 Yes, tbt name of the U.S. Gon1"11mtnt agency a.nd the Govttllmtnt oontract number
`
`Respectfully sublllltted,
`
`SIGNATURE
`
`Date
`
`Octobet" 141997
`
`TYPED o r PRJNTED NAME Markuutr
`
`REGJSTRA TION NO.
`(if approprliue)
`·_rvi ~ Additional inventors are bel.ng named on separately numbered sheets attached hereto
`
`36,578
`
`USE ONLY FOR FILING A PROVISIONAL APPLICATION FOR PATENT
`SEND TO: Box Provisumal Application, Assistant Commissioner for Patelfts, Washington, 'DC 20231
`
`ALA001 38384
`
`

`

`PROVISIONAL APPLICATION FOR PATENT COVER SHEET
`(Large Entity)
`
`US'Tl<AME
`
`Fll\Stx.ua
`
`Philbrick
`StarT
`
`Clive
`Daryl
`
`MU)DU IMrT1AL
`M.
`0.
`
`iu:smi:xo: (crrv M1> J:l'T'OXRS't,\TI; 01' FOJU3Q< CO'IJNTl<'r)
`
`San Jose, California
`Milpl1as, California
`
`INVE.NTOR(s)/APPLICANT(s)
`
`USE ONLY FOR FILING A PROVISIONAL APPLICATION FOR PATENT
`SEND TO: Box Provisional Application, As:ristant Commissioner for Patents, Washington, ,J>C 20231
`
`!Page 2 of 2)
`
`P18V.RGEIR£V02
`
`ALA00138385
`
`

`

`CERTIFICATE OF MAILING BY "EXPRESS MAIL" (37 CFR 1.10)
`Applicant(s): Laurence B. Boucher et al.
`
`Docket No.
`
`. ALA-001
`
`Serial No.
`
`Filing Date
`
`Examiner
`
`Group Art Unit
`
`Invention:
`
`INTELLIGENT NETWORK INTERFACE CARD AND SYSTEM FOR PROTOCOL PROCESSING
`
`I hereby certify that this PROVISIONAL PATENT APPJJCATIOr:Y, COVER SHEET & CHECK FOR St SO 00
`(ldtntify type of COTTnJH>ffdt1tU}
`
`is being deposited with the United States Postal Service "Express Mail Post Office to Addressee• service under
`
`37 CFR 1.10 in an envelope addressed to: The Assistant Commissioner for Patents, Washington, D.C. 20231 on
`October 14, 1997
`{Dlllt)
`
`Mark Lauer
`(Typ« or Printed Name of Ptnon Mallilfg Co"apo11dt1tce)
`
`(Signature of Pason Ma1Un1 CorraJH>1tdurce)
`
`EH75623Q105US
`
`Note: Each paper must have Its own certificate of malting.
`
`ALA00138386
`
`

`

`INTELLIGENT NETWORK fNTERF ACE CARD
`
`AND SYSTEM FOR PROTOCOL PROCESSING
`
`Provisional Patent Application Under 35 U.S.C. § t 11 (b)
`
`Inventors:
`
`Laurence B. Boucher
`Stephen E. J. Blightman
`Peter K. Craft
`David A. Higgin
`Clive M. Philbrick
`Daryl D. Starr
`
`Assignee:
`
`Alacritecb Corporation
`
`1 Background of the Invention
`
`Network processing as it exists today is a costly and inefficient use of system resources.
`A 200 MHz Pentium-Pro is typically consumed simply processing network data from a
`1 OOMb/second-network connection. The reasons that this processing is so costly are
`described here.
`
`1.1 Too Many Data Moves
`
`When network packet arrives at a typical network interface card (NIC), the NJC moves
`the data into pre-allocated network buffers in system main memory. From there the data
`is read into the CPU cache so that it can be checkswnmed (assuming of course ~bat the
`protocol in use requires checksums. Some, like IPX, do not.). Once the data has been
`fully processed by the protocol stack, it can then be moved into its final destination in
`memory. Since the CPU is moving the data, and must read the destination cache line in
`before it can fill it and write it back out, this involves at a minimum 2 more trips across
`the system memory bus. In short, the best one can hope for is that the data will.get
`moved across the system memory bus 4 times before it arrives in irs final destination. It
`can, and does, get worse. If the data happens to get invalidated from system cache after it
`has been checksummed, then it must get pulled back across the memory bus before it can
`be moved to its final destination. Finally, on some systems, including Windows NT 4.0,
`the data gets copied yet another time while being moved up the protocol stack. In NT
`4.0, th.is occurs between the m.iniport driver interface and the protocol driver interface.
`This can add up to a whopping 8 trips across the system memory bus (the 4 trips
`described above, plus the move to replenish the cache, plus 3 more to copy from the
`miniport to the protocol driver). That's enough to bring even today's advanced memory
`busscs to their knees.
`
`Provisional Pat. App. of Afacritech, lac.
`Inventors Laurence B. Boucher et al.
`Express Mail label II EH75623010SUS
`
`ALA00138387
`
`

`

`1.2 Too Much Processing by the CPU
`
`In all but the original move from the NIC to system memory, the system CPU is
`responsible for moving the data. This is particularly expensive because while the CPU is
`moving this data it can do nothing else. While moving the data the CPU is typically
`stalled waiting for the relatively slow memory to satisfy its read and write requests. A
`CPU, which can execute an instruction every 5 nanoseconds, must now wait as lt>ng as
`several hundred nanoseconds for the memory controller to respond before it can begin its
`next instruction. Even today's advanced pipelining technology doesn't help in ~ese
`situations because that relies on the CPU being able to do useful work while it waits for
`the memory controller to respond. If the only thing the CPU has to look forward to for
`the next several hundred instructions is more data moves, then the CPU ultimately gets
`reduced to the speed of the memory controller.
`
`Moving all this data with the CPU slows the system down even after the data has been
`moved. Since both the source and destination cache lines must be pulled into the CPU
`cache when the data is moved, more than 3k of instructions and or data resident ·in the
`CPU cache must be flushed or invalidated for every 1500 byte frame. This is of course
`assuming a combined instruction and data second level cache, as is the case with the
`Pentium processors. After the data has been moved, the former resident of the cache will
`likely need to be pulled back in, stalling the CPU even when we are not performing
`network processing. Ideally a system would never have to bring network frames into the
`CPU cache, instead reserving that precious commodity for instructions and data that are
`referenced repeatedly and frequently.
`
`But the data movement is not the only drain on the CPU. There is also a fair amount of
`processing that must be done by the protocol stack software. The most obvious expense
`is calculating the checksum for each TCP segment (or UDP datagram). Beyond this,
`however, there is other processing to be done as well. The TCP COIUlection object must
`be located when a given TCP segment arrives, IP header checksums must be calculated,
`there are buffer and memory management issues, and finally there is also the significant
`expense of interrupt processing which we will discuss in the following section.
`
`1.3 Too Many Interrupts
`
`A 64k SMB request (write or read-reply) is typically made up of 44 TCP segments when
`running over Ethernet (1500 byte MTU). Each of these segments may result in·an
`interrupt to the CP.U. Furthermore, since TCP must acknowledge all of this incoming
`data, it's possible to get another 44 transmit-complete interrupts as a result of sending out
`the TCP acknowledgements. While this is possible, it is not terribly likely. Delayed
`ACK timers allow us to acknowledge more than one segment at a time. And delays in
`interrupt processing may mean that we are able to process more than one incoming
`network frame per interrupt Nevertheless, even if we assume 4 incoming frames per
`input, and an acknowledgement for every 2 segments (as is typical per the ACK-every(cid:173)
`other-segment property ofTCP), we are still left with 33 interrupts per 64k SMB request.
`
`Interrupts tend to be very costly to the system. Often when a system is interrupted,
`important infonnation must be flushed or invalidated from the system cache so that the
`interrupt routine instructions, and needed data can be pulled into the cache. Since the
`
`Provisional Pat. App. of AJacritcch, lnc.
`Inventors Laurence B. Boucher ct al.
`Express Mail Label# EH75623010SUS
`
`2
`
`"
`
`ALA00138388
`
`

`

`-~
`
`CPU will return to its prior location after the interrupt. it is likely that the infonnation
`flushed from the cache will immediately need to be pulled back into the cache.
`
`What's more, interrupts force a pipeline flush in today's advanced processors. While the
`processor pipeline is an extremely efficient way of improving CPU perfonnance, it can
`be expensive to get going after it has been flushed.
`
`Finally, each of these interrupts results in expensive register accesses across the
`peripheral bus (PCI). This is discussed more in the following section.
`
`1.4
`
`Inefficient Use of the Peripheral Bus (PCI)
`
`We noted earlier that when the CPU has to access system memory, it may be stalled for
`several hundred nanoseconds. When it has to read from PCI, it may be stalled for many
`microseconds. This happens every time the CPU takes an interrupt from a stand.ard NIC.
`The first thing the CPU must do when it receives one of these interrupts is to read the
`NIC Interrupt Status Register (ISR) from PCI to determine the cause of the inter,:upt. The
`most troubling thing about this is that since interrupt lines are shared on PC-based
`systems, we may have to perform this expensive Per read even when the interrupt is not
`meant for us!
`
`There are other peripheral bus inefficiencies as well. Typical NICs operate using
`descriptor rings. When a frame arrives, the NlC reads a receive descriptor from system
`memory to determine where to place the data. Once the data has been moved to main
`memory, the descriptor is then written back out to system memory with status about the
`received frame. Transmit operates in a similar fashion. The CPU must notify that NlC
`that it has a new transmit The NIC will read the descriptor to locate the data, read the
`data itself, and then write the descriptor back with status about the send. Typically on
`transmits the NIC will then read the next expected descriptor to see if any more data
`needs to be sent. In short, each receive or transmit frame results in 3 or 4 separate PCI
`reads or writes (not counting the status register read).
`
`2 Summary of the Invention
`
`Alacritech was formed with the idea that the network processing described above could
`be offloaded onto a cost-effective Intelligent Network Interface Card (INIC). With the
`Alacritech INIC, we address each of the above problems, resulting in the following
`advancements:
`l. The vast majority of the data is moved directly from the INIC into its final
`destination. A single trip across the system memory bus.
`2. There is no header processing, little data copying, and no checksumming required by
`the CPU. Because of this, the data is never moved into the CPU cache, allowing the
`system to keep imponant instructions and data resident in the CPU cache.
`3. Intenupts are reduced to as little as 4 interrupts per 64k SMB read and 2 per 64k
`SMB write.
`4. There are no CPU reads over PCI and there are fewer PCl operations per receive or
`transmit transaction.
`
`In the remainder of this document we will describe how we accomplish the above.
`
`Provisional Pat .. App. of Alacritech, Inc.
`Inventors Laurence B. Boucher et al.
`Express Mail Label# EH756230105US
`
`3
`
`ALA00138389
`
`

`

`Cli c
`0
`0\
`""' co
`0 w
`•
`
`2.1 Perform Transport Level Processing on the INIC
`
`In order to keep the system CPU from having to process the packet headers or checksum
`the packet, we must perform this task on the INIC. This is a daunting task. Th~e are
`more than 20,000 lines of C code that make up the FreeBSD TCP/IP protocol stick.
`Clearly this is more code than could be efficiently handled by a competitively priced
`network card. Furthermore, as we've noted above, the TCP/IP protocol stack is .
`complicated enough to consume a 200 MHz Pentium-Pro. Clearly in order to perform
`this function on an inexpensive card, we need special network processing hardware as
`opposed to simply using a general purpose CPU.
`
`2.1.1 Only Support TCP/IP
`
`In this section we introduce the notion of a "context". A context is required to keep track
`of information that spans many, possibly discontiguous, pieces of information. When
`processing TCP/IP data, there are actually two contexts that must be maintained. The
`first context is required to reassemble IP fragments. It holds information about the status
`of the IP reassembly as well as any checksum information being calculated across the IP
`datagram (UDP or TCP). This context is identified by the IP _ID of the datagram as well
`as the source and destination IP addresses. The second context is required to hahdle the
`sliding window protocol of TCP. It holds information about which segments have been
`sent or received, and which segments have been acknowledged, and is identified by the
`IP source and destination addresses and TCP source and destination ports.
`
`lf we were to choose to handle both contexts in hardware, we would have to potentially
`keep track of many pieces of information. One such example is a case in which a single
`64k SMB write is broken down into 44 1500 byte TCP segments, which are in turn
`broken down into 131 576 byte IP fragments., all of which can come in any order (though
`the maximwn window size is likely to restrict the number of outstanding segments
`considerably).
`
`Fortunately, TCP performs a Maximum Segment Size negotiation at connection
`establishment time, which should prevent IP fragmentation in nearly all TCP
`connections. The only time that we should end up with fragmented TCP conn~tions is
`when there is a router in the middle of a connection which must fragment the s<;gments to
`support a smaller MTU. The only networks that use a smaller MTU than Ethernet are
`serial line interfaces such as SLIP and PPP. At the moment, the fastest of these
`connections only run at 128k (ISON) so even if we had 256 of these connections, we
`would still only need to support 34Mb/sec, or a little over three lObT connections worth
`of data. This is not enough to justify any performance enhancements that the INIC
`offers. If this becomes an issue at some point, we may decide to implement the MTU
`discovery algorithm, which should prevent TCP fragmentation on all connections (unless
`an ICMP redirect changes the connection route while the connection is established).
`
`With this in mind, it seems a worthy sacrifice to not attempt to handle fragmented TCP
`segments on the INIC.
`
`UDP is another matter. Since UDP does not support the notion of a Maximum Segment
`Size, it is the responsibility of IP to break down a UDP datagram into MTU sized
`
`Provisional Pat .. App. of Alacrilech, Inc.
`Inventors Laurence B. Boucher et al.
`Express Mail Label# EH756230105US
`
`4
`
`ALA00138390
`
`

`

`packets. Thus, fragmented UDP datagrams are very common. The most common UDP
`application running today is NFSV2 over UDP. While this is also the most common
`version of NFS running today, the current version ofSolaris being sold by Sun
`Microsystems runs NFSV3 over TCP by defaulL We can expect to see the NFSV2/UDP
`traffic start to decrease over the coming years.
`
`In summary, we will only offer assistance to non-fragmented TCP connections on the
`INIC.
`
`2.1.2 Don't handle TCP "exceptions"
`
`As noted above, we won't provide support for fragmented TCP segments on the INIC.
`We have also opted to not handle TCP connection and breakdown. Here is a list of other
`TCP "exceptions" which we have elected to not handle on the INIC:
`
`Fragmented Segments - Discussed above.
`
`Retransmission Timeout - Occurs when we do not get an acknowledgement for
`previously sent data within the expected time period.
`
`Out of order segments - Occurs when we receive a segment with a sequence number
`other than the next expected sequence number.
`
`FIN segment - Signals the close of the connection.
`
`•
`
`Since we have now eliminated support for so many different code paths, it might seem
`hardly worth the trouble to provide any assistance by the card at all. This is not the case .
`According to W. Richard Stevens and Gary Write in their book "TCP/IP Illustrated
`Volume 2", TCP operates without experiencing any exceptions between 97 and 100
`percent of the time in local area networks. As network, router, and switch reliability
`improve this number is likely to only improve with time.
`
`Provisional Pat. App. of Alacritech, lnc.
`Invent.ors Laurence B. Boucher et al.
`Express Mail Label# EH756230105US
`
`5
`
`ALA00138391
`
`

`

`~,
`
`2.1.3 Two modes ofoperation
`
`So the next question is what to do about the network packets that do not fit our criteria.
`The answer is to use two modes of operation: One in which the network frames are
`processed on the INIC through TCP and one in which the card operates like a typical
`dumb NIC. We call these two modes fast-path, and slow-path. In the slow-path·case,
`network frames are handed to the system at the MAC layer and passed up through the
`host protocol stack like any other network frame. In the fast path case, network data is
`given lo the host after the headers have been processed and stripped.
`
`INIC
`
`NetBIOS
`
`TCP
`
`IP
`
`MAC
`
`PHYSIC.•T
`
`FAST-PATII
`
`~H.
`I
`
`CLIENT
`
`TOI
`
`Tt..:.I'
`
`U'
`
`SLOW-PATII
`
`Ml\.~
`
`bt11emet
`
`PCI
`
`The transmit case works in much the same fashion. In slow-path mode the packets are
`given to the INIC with all of the headers attached. The INIC simply sends these packets
`out as if it were a dumb NIC. In fast-path mode, the host gives raw data to the INIC
`which it must carve into MSS sized segments, add headers to the data, perform
`checksums on the segment, and then send it out on the wire.
`
`2.1.4 The TCB cache
`
`Consider a situation in which a TCP connection is being bandied by the card and a
`fragmented TCP segment for that connection arrives. In this situation, it will be
`necessary for the card to tum control of this connection over to the host.
`
`This introduces the notion of a Transmit Control Block (TCB) cache. A TCB is a
`structure that contains the entire context associated with a connection. This includes the
`source and destination IP addresses and source and destination TCP ports that d~fine the
`connection. It also contains information about the connection itself such as the current
`send and receive sequence numbers, and the first-hop MAC address, etc. The complete
`set ofTCBs exists in host memory, but a subset of these may be "owned" by the card at
`any given time. This subset is the TCB cache. The INIC can own up to 256 TCBs at any
`given time.
`
`TCBs are initialized by the host during TCP connection setup. Once the connection has
`achieved a "steady-state" of operation, its associated TCB can then be turned over to the
`INIC, putting us into fast-path mode. From this point on, the INIC owns the connection
`until either a FIN arrives signaling that the connection is being closed, or until an
`Provisional Pat. App. of AJacriteeh, Inc.
`Inventors Laurence B. Boucher et al.
`Express Mail Label# EH756230105US
`
`6
`
`ALA00138392
`
`

`

`exception occurs which the INIC is not designed to handle (such as an out of order
`segment). When any of these conditions occur, the INIC will then flush the TCB back to
`host memory, and issue a message to the host telling it that it has relinquished c~ntrol of
`the connection, thus putting the connection back into slow-path mode. From thi~ point
`on, the IN1C simply hands incoming segments that are destined for this TCB off: to the
`host with all of the headers intact.
`
`Note that when a connection is owned by the INIC, the host is not allowed to reference
`the corresponding TCB in host memory as it will contain invalid information about the
`state of the connection.
`
`2.1.5 TCP hardware assistance
`
`When a frame is received by the IN1C, it must verify it completely before it even
`detennines whether it belongs to one of its TCBs or not. This includes all header
`validation (is it IP, IPV 4 or V 6, is the IP header checksum correct, is the TCP checksum
`correct, etc). Once this is done it must compare the source and destination IP aqdress and
`the source and destination TCP port with those in each of its TCBs to determine if it is
`associated with one of its TCBs. This is an expensive process. To expedite thi~, we have
`added several features in hardware to assist us. The header is fully parsed by hardware
`and its type is summarized in a single status word. The checksum is also verified
`automatically in hardware, and a hash key is created out of the IP addresses and'. TCP
`ports to expedite TCB lookup. For full details on these and other hardware optimizations,
`refer to the INIC Hardware Specification sections (Heading 8).
`
`I!
`
`With the aid of these and other hardware features, much of the work associated with TCP
`is done essentially for free. Since the card will automatically calculate the checksum for
`TCP segments, we can pass this on to the host, even when the segment is for a TCB that
`the INIC does not own.
`
`2.1.6 TCP Summary
`
`By moving TCP processing down to the INIC we have offioaded the host of a large
`amount of work. The host no longer bas to pull the data into its cache to calculate the
`TCP checksum. It does not have to process the packet headers, and it does not have to
`generate TCP ACKs. We have achieved most of the goals outlined above, but we are not
`done yet.
`
`2.2 Tra.nsport Layer Interface
`
`This section defines the INIC's relation to the hosts transport layer interface (Called TDI
`or Transport Driver Interface in Windows NT). For full details on this interface, refer to
`the Alacritech TCP (ATCP) driver specification (Heading 4).
`
`2.2.1 Receive
`
`Simply implementing TCP on the INIC does not allow us to achieve our goal oflanding
`the data in its final destination. Somehow the host has to tell the IN1C where to put the
`data. This is a problem in that the host can not do this without knowing what the data
`
`Provisional PaL App. of Alacritech, Inc.
`lnventors Laurence B. Boucher et al.
`Express Mail Label# EH756230I05US
`
`7
`
`ALA00138393
`
`

`

`.,
`
`actually is. Fortunately, NT has provided a mechanism by which a transport driver can
`"indicate" a small amount of data to a client above it while telling it that it has more data
`to come. The client, having then received enough of the data to know what it is,!is then
`responsible for allocating a block of memory and passing the memory address or
`addresses back down to the transport driver, which is in turn responsible for movtng the
`data into the provided location.
`
`We will make use of this feature by providing a small amount of any received data to the
`host, with a notification that we have more data pending. When this small amount of data
`is passed up to the client, and it returns with the address in which to put the rem:iinder of
`the data, our host transport driver will pass that address to the INIC which will DMA the
`remainder of the data into its final destination.
`
`Clearly there are circumstances in which this does not make sense. When a small amount
`of data (500 bytes for example), with a push flag set indicating that the data must be
`delivered to the client immediately, it does not make sense to deliver some of the data
`directly while waiting for the list of addresses to DMA the rest. Under these
`circumstances, it makes more sense to deliver the 500 bytes directly to the host, and
`allow the host to copy it into its final destination. While various ranges are feasible, it is
`currently preferred that anything less than a segment's (1500 bytes) worth of data will be
`delivered directly to the host, while anything more will be delivered as a small piece
`which may be128 bytes, while waiting until receiving the destination memory address
`before moving the rest.
`
`The trick then is knowing when the data should be delivered to the client or not. As
`we've noted, a push flag indicates that the data should be delivered to the client
`immediately, but this alone is not sufficient. Fortunately, in the case ofNetBIOS
`transactions (such as SMB), we are explicitly told the length of the session message in the
`NetBIOS header itself With this we can simply indicate a small amount of data to the
`host immediately upon receiving the first segment. The client will then allocat~ enough
`memory for the entire NetBIOS transaction, which we can then use to DMA the
`remainder of the data into as it arrives. In the case ofa large (56k for example) NetBIOS
`session message, all but the first couple hundred bytes will be DMA'd to their final
`destination in memory.
`
`But what about applications that do not reside above NetBIOS? In this case we can not
`rely on a session level protocol to tell us the length of the transaction. Under these
`circumstances we will buffer the data as it arrives until A) we have receive some
`predetermined number ofbytes such as 8k, or B) some predetermined period of time
`passes between segments or C) we get a push flag. If after any of these conditions occur
`we will then indicate some or all of the data to the host depending on the amount of data
`buffered. If the data buffered is greater than about 1500 bytes we must then also wait for
`the memory address to be returned from the host so that we may then DMA the
`remainder of the data.
`
`2.2.2 Transmit
`
`The transmit case is much simpler. In this case the client (NetBIOS for example) issues a
`TDI Send with a list of memory addresses which contain data that it wishes to send along
`
`Provisional Pat. App. of Alacritech, Inc.
`Inventors Laurence B. Boucher et al.
`Express Mail Label# EH75623010SUS
`
`8
`
`ALA00138394
`
`

`

`,
`
`with the length. The host can then pass this list of addresses and length off to the INIC.
`The INIC will then pull the data from its source location in host memory, as it needs it,
`until the complete TOI request is satisfied.
`
`2.2.3 Affect on interrupts
`
`Note that when we receive a large SMB transaction, for example, that there are two
`interactions between the INIC and the host. The first in which the INIC indicates a small
`amount of the transaction to the host, and the second in which the host provides the
`memory location(s) in which the JNIC places the remainder of the data This results in
`only two interrupts from the INIC. The first when it indicates the small amount of data
`and the second after it has finished filling in the host memory given to it. A drastic
`reduction from the 33/64k SMB request that we estimate at the beginning of this section.
`
`On transmit, we actually only receive a single interrupt when the send command that has
`been given to the INIC completes.
`
`2.2.4 Transport Layer Interface Summary
`
`Having now established our interaction with Microsoft's TDI interface, we have achieved
`our goal of landing most of our data directly into its final destination in host memory.
`We have also managed to transmit all data from its original location on host memory.
`And finally, we have reduced our interrupts to 2 per 64k SMB read and 1 per 64k SMB
`write. The only thing that remains in our list of objectives is to design an efficient host
`(PCI) interface.
`
`2.3 Host (PCI) Interface
`
`In this section we define the host interface. For a more detailed description, refer to the
`"Host Interface Strategy for the Alacritech JNIC" section (Heading 3).
`
`2.3.1 Avoid PCI reads
`
`One of our primary objectives in designing the host interface of the INIC was to
`eliminate PC! reads in either direction. PC! reads are particularly inefficient in that they
`completely stall the reader until the transaction completes. As we noted above, this could
`hold a CPU up for several microseconds, a thousand times the time typically required to
`execute a single instruction. PCl writes on the other hand, are usually buffered by the
`memory-bus¢> PCI-bridge allowing the writer to continue on with other instructions.
`This technique is known as "posting".
`
`2.3.1.1 Memory-based status register·
`
`The only PCI read that is required by most NICs is the read of the interrupt status
`register. This register gives the host CPU infonnation about what event has caused ~
`inteilllpt (if any). In the design of our INIC we have elected to place this necessary status
`register into host memory. Thus, when an event occurs on the INIC, it writes the status
`register to an agreed upon location in host memory. The correspondihg driver on the host
`reads this local register to determine the cause of the interrupt. The interrupt lines are
`
`Provisional Pat. App. of Alacritech. Inc.
`Inventors Laurence B. Boucher et al.
`Express Mail Label II EH756230105US
`
`9
`
`ALA00138395
`
`

`

`held high until the host clears the interrupt by writing to the INIC's Interrupt Clear
`Register. Shadow registers are maintained on the INIC to ensure that events are )lot lost.
`
`2.3.1.2 Buffer Addresses are pushed to the INIC
`
`Since it is imperative that our INIC operate as efficiently as possible, we must also avoid
`PCl reads from the INlC. We do this by pushing our receive buffer addresses to the
`INIC. A$ mentioned at the b eginning of this section, most NICs work on a descriptor
`queue algorithm in which the NIC reads a descriptor from main memory in order to
`detennine where to place the next frame. We will instead write receive buffer addresses
`to the INIC as receive buffers are filled. In order to avoid having to write to the lNJC for
`every receive frame, w e instead allow the host to pass off a pages worth (4k) of buffers in
`a single write.
`
`2.3.2 Support small and large buffers on receive
`
`In order to reduce further the number of writes to the INIC, and to reduce the amount of
`memory being used by the host, we support two different buffer sizes. A small buffer
`contains roughly 200 bytes of data payload, as well as extra fields containing status about
`the received data bringing the total size to 256 bytes. We can therefore pass 16 of these
`small buffers at a time to the INIC. Large buffers are 2k in size. They are used to
`contain any fast or slow-path data that does not fit in a small buffer. Note that when we
`have a large fast-path receive, a small buffer will be used to indicate a small piece of the
`data. while the remainder of the data will be DMA'd directly into memory. Large
`buffers are never passed to the host by themselves, instead lbey are always accompanied
`by a small buffer which contains status about the receive along with the large buffer
`address. By operating in the manner, the driver must only maintain and process the small
`buffer queue. Large buffers are returned to the host by virtu e of being attached to small
`buffers. Since large buffers are 2k in size they are passed to the INIC 2 buffers at a tim

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket