throbber
IEEE Global Telecommunications Conference
`
`1·· I ·I ::1 II :: 1·· • I ··1 II ·11 ·11 ' 111::
`
`II II
`
`II
`
`II
`
`II I II
`
`I
`
`11111
`
`·:i II
`
`111
`
`I
`
`Phoenix, Arizona • December 2-5, 1991
`
`"COUNTDOWN TO THE NEW MILLENNIUM"
`Featuring a Mini-Theme on:
`Personal Communications Services (PCS)
`

`
`Conference Record
`
`Vol. 3 of 3
`
`VOLUME
`
`DAY
`
`SESSIONS
`
`PAGES
`
`1
`
`2
`
`3
`
`Tuesday
`
`1-20
`
`1·694
`
`Wednesday
`
`21-42
`
`695-1531
`
`Thursday
`
`43-608
`
`1533-2150
`
`m m
`_,
`~ )> -!; .
`°' r-m
`
`0
`0
`""Cl -<
`
`•. ~··.
`
`tt
`
`IEEE
`
`ICC
`
`'U
`
`GLOBE COM
`
`Sponsored by the
`IEEE Communications Socie!y
`and the Phoenix IEEE Section
`
`DEFS-ALAOO 10601
`INTEL Ex.1029.001
`
`

`

`Additional sets of Volumes 1, 2, and 3 may be ordered from:
`
`IEEE Service Center
`Publications Sales Departmenl
`445 Hoes Lane
`P.O. Box 1331
`Piscataway, New Jersey 08855-1331
`
`IEEE GLOBECOM '91
`
`IEEE Catalog No.:
`ISBN Numbers:
`
`9ICH2980-l
`0-87942-697-7
`0-87942-698-5
`0-8794 2-699-3
`
`Softbound
`Case bound
`Microfiche
`
`Library of Congress No.: 87-640337
`
`Serial
`
`m m en _,
`~ )> -s;
`all r(cid:173)m
`0
`0
`~
`
`COPYRIGHT AND REPRINT PERMISSION5.;.
`
`Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of U.S.
`copyright law for private use of patrons those artides in this volume that carry a code at the. bottom of the first page,
`provided the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 20 Congress St.,
`Salem, Mass. 01970. Instructors are permitted to photocopy isolated articles for noncommercial classroom use
`without fee. For other copying, reprint or republications permission, write to Director, Publishing Services, IEEE,
`345 East 47th St., New York, NY 10017. All rights reserved. Copyright© 1991 by the Institute of Electrical and
`Electronics Engineers, Inc.
`
`ii.
`
`DEFS-ALAOO 10602
`INTEL Ex.1029.002
`
`

`

`•:.t
`
`..
`
`AN OUTDOARD PROCESSOR FOR HlGH PERFORMANCE IMPLEMENTATION OF
`TRANSPORT LAYER PROTOCOLS
`
`R. ANDREW MACLEAN and SCOTT E. BARVICK 1
`
`Bellcore, 444 Hoes Lane, Piscataway, NJ 08854
`
`AllSTllACT: Tho high throughputs promised by cmcrclnc
`nct\York tochnologlcs ar• ancn difficult la a<hic\'t •pplicotion(cid:173)
`lo·•ppllc•tlon boOIU>C or host transport protocol boUJtntcks.
`This pupcr describes an cxptrlmcntal prt1totypc lmplemcnta.
`lion or on outboard protocol processor which eliminates these
`butUcnttk• by performing transport layer fundlons In dcdf.
`coted hi>rdwu...,, The architecture conslslJ of stpnr•I< tr•ns·
`mil and tccdvc CPUs, each wllh checksum and DMA circuiu.
`Mca<uromonts madc.UJ<lnc an lmplcmcntalion orthc TCP pro·
`locol Indicate thol this orchllccturc can wpporl ond·lo·cnd
`thruuchpuu in excess of 11,000 p:u:kcts/&cc bctw«:n UNIX,
`hosts.
`
`I. INTRODUCTION
`
`M EASUREMENTS of the end-to-end pcrform3ncc or
`
`LANc indic:ite \hat lransport ond network layer protocol
`implementation$ often limit lhroughput between communicat·
`ing applications (1). Several solutions huve been proposed, the
`most common or which can be ca1egori1.cd as follows:
`I) increase the size of Lhe transport protocol data Wlit (TPDU),
`2) optimize the protocol software,
`3) change to a more cffiCienl protocol and
`4) use a more powerful host processor.
`The idea behind Lhe first method is to increase Lhc rn1io or
`1fota bits to he:ider hits so lhat the protocol proccs!ing required
`per data hit i~ reduced. Depending upon the underlying net·
`work, however, thi~ appro3ch may lead lo incrca~ed lmcncy in
`the network, or Lhc need for fragmentation 1111d reassembly in
`the lower layers.
`Optimization of the communications soflwo.re for through·
`put has been discussed for I.he case of TCP/IP [2J. The melhod·
`is to change the existing implementation
`ology adopted
`soflwarc to reduce the protocol processing overhead. Perfor(cid:173)
`mance comparisons of optimized and non·optimized implc·
`mcntations of TCP/IP foWld in ! I) indicate that significant
`pcrfonnance increases can be realiwd using such tcc:hniques ..
`The TCP and OSI TP4 protocols have been dC$igncd pri(cid:173)
`marily for robustness and utility rather Uw1 throughput. Sev·
`cral 'lightweight' tllmsport layer protocols have been proposed
`which oITcr performance improvcmenlS: NErBLT [3), XTP
`[4}, NACK [5), SNR(7}, and VMTP (6) are examples of such
`protocols. With respect 10 lhc hardware, typically, the data link
`layer (to use OSI terminology) has been handled by some kind
`or host network adapter and the network layer and above has
`
`I. Scott Barvick b now with Wellnce1 Communitalior,,,
`15 Crosby Orivc. Bedford, MA 01730
`2. UNIX is a rcgi~terod l!Udcmuk of UNIX Syncrn Lthor:itorics Inc.
`
`been the responsibility of the host processor. Thus there nrc
`potentially two areas where processing power can be added,
`the host or the communications adapter.
`While the use of more powerful host processors is a com·
`mon solution, there is a growing trend towards increasing lhc
`front end intelligence of the 1/0. There Me two primary reasons
`for this; firstly, lhc IJO subsystem often demands response
`limes which ·cannot be guaranteed by the host processor or
`would cauM: the host to behave inefficie11tly or erratically. Sec(cid:173)
`ondly, it is often the case that !%rt.ain compute intensive runc(cid:173)
`tions can be perfonned more quickly or more cost effectively
`by speciali1.ed hardware than by the host processor. Several
`outboard processor designs have been reported [7H 12). Our
`objective for this project has been to explore the outbonrd
`approach by designing an cxperiment:il prototype processor as
`a plat!onn for analyr.ing transport layer protocols. We call this
`processor the Protocol Accelerator (PA), and it is described in
`lhe sections which follow. One of our u:timate aims is to
`explore different high iq:ieed protocols using this processor as a
`platfonn, in order to determine I.he most appropriate techniques
`for transport of data on high speed Metropolitan Arca Net·
`works.
`
`2. PROTOCOL ACCELERATOR
`
`1.1 System Configuration
`llic Protocol Accelerator is a bo:lrd on the VME system
`bus. Figure I shows how the PA integrates into the host sys·
`tcm. On the network side, the PA is equipped with both input
`and. output 32 bit parallel ports, each $upporting data transfer
`ratc:s in excess of 320 Mbits/sec. Intentionally, no media acccs~
`circuitry has been included on the PA; this provides us wiU1 the
`capability to connect Ille Protocol Aecelerator to a variety of
`network· types vill appropriate adapters, or, a.s in the case of
`loop-back testing, to leave out the network circuitiy com(cid:173)
`pletely. ln future communications sul»ystcms, lhe transport
`protocol acceleration circuitry probably would physic:illy
`reside on the network adapter card.
`
`Nl!TWOllXJ
`t.OOl'IMCK
`ORCUTT
`
`FIGURE I. System Configuration
`
`1'728
`
`49.4.1
`CH2980-119110000·1728 $1.00 © 1991 IEEE
`
`GLOBECOM '91
`
`m m en _,
`~ )> -
`~ I m
`
`0
`0
`~
`
`DEFS-ALAOO 10603
`INTEL Ex.1029.003
`
`

`

`2.2 Functionality and Data Flow
`The prc11io11.•ly rcponcd outboard processor$ can be catego(cid:173)
`rized into single aml muhiplc CPU architectures. The single
`CPU implcmenu1ions [4, 10. 11) include peripheral suppon for
`high llal.:l throughputs. The muhiproccs.sor implementations [i,
`8, 9, 12! use up to eight CPUs, typically with no special pcriph(cid:173)
`crJl devices. In our design, we exploit fClltures from both of
`these approaches with a dual CPU design and special purpose
`peripheral circuitry in 11.11 architecture optimized for 1.ransport
`layer processing.
`Tue internal functions and data nows of !he protocol accel·
`er.nor are shown in Figure 2. We use a dual CPU approach to
`protocol processing, with one CPU subsyst.cm dedicated to lhc
`transmission, and lhc other to the reception. The transmit am.I
`receive CPUs arc both 68020 (25 MHz) based, each with its
`own private resources: ROM, parallel VO, interrupt circuitry
`~ml 128 kilobytes of random access memory (RAM). Jn addi·
`tion !here is 128 kilobytes of RAM shared by both CPUs which
`is ulso accessible to lhc two host busses, VME and VSB. Using
`both host busscs simuhancously, it is p0ssiblc for the PA to
`move data blocks both to and from host memory. The trnnsmit
`and receive CPUs have VME bus master capability. All the
`dau p~Lhs shown are 32 bits wide.
`2.J Openition
`On transmit., dai.a can be piped from one of three locations;
`host memory, shared memory, or transmit CPU memory into
`!he network port, while the transmit CPU liUJlCrviscs transfer$
`ond compiles headers. No intermediate buffering of the appli(cid:173)
`cation data takes place. We believe this is key to high speed
`operation.
`On receive, p:irallcl dau from the network is pipelined to
`the hoSl memory, loc:al r~ive CPU memory, or shared mcm·
`ory. In normal operation. the receive CPU will OMA the
`header to local memory, perform initial proccs..~ing to csublish
`header integrity and paytoad dcstination, and then start a DMA
`
`process to transfer the data segment of thc packet either to host
`memory or to lhe shared memory region. While s1orage of the
`payload is proceeding, the receive CPU completes its procm(cid:173)
`ing of the header infonnation. Messages arc passed between
`the transmit/receive CPUs and the host either hy u.•ing the
`shared memory region or by using interrupt mechanisms which
`exist among all CPUs (h0$t, transmit or receive).
`The Protoc:ol Accelerator enables a rapid data flow to and
`from the network by intraducing a high degree of concwncy
`into the communication mechanism. Several activities execute
`simulwieously:
`I) host processing (of higher layers and application),
`2) transmit protocol processing,

`3) receive protocol processing,
`4) data transfer from host memory 10 the networl<: adapter,
`S) data transfer from the network adapter to host memory,
`6) receive data checksumming,
`7) transmit data checksumming, a11d
`8) MAC frame proccning.
`2.4 Din!ct Memory Access
`An impon.ant feature of the hardware architecture is the
`dual dircet memory access controllers (DMACs). The DMACs
`are capable of moving 32 bit data words al rates of up to 33
`Mbytes/sec over VME or 30 MBytcs/scc over !he VSB bus
`directly to and from the network pons. Scatter-gather type
`operations are fully supported both to and from the application
`memory. Data paths also exist for OMA of data between the
`network ports and any other RAM area on !he PA (i.e. CPU
`RAM space or shared RAM space) so that intermediate daLa
`buffering is possible whenever necessary.
`2.S Checlt.wmming
`The on-board CPUs are capable of checksumming d~ta at a
`rate in the region of 75 Mbitsfscc[l3] but operation at !his rate
`would ·1ca11c no time for protocol processing. To maximi1.c our
`data
`throughput.,
`it was decided
`include hardware
`to
`
`RS:n!Vll
`CJ'\J
`SUBSYS'TliM
`
`RECEIVE
`MEMORY
`
`sn>1Nno
`MF.MORY
`
`TRANS"'1T
`MEMORY
`
`TllJINSMITBIJS
`
`TRANSMIT
`CllECttSUM
`
`RfiC!ilVli DUS
`
`llECillVE
`ClCECXSUM
`
`H~TWOllK
`ll'PVTBUS
`
`IJOST V Mil llUS
`
`SUAlEDDUS
`
`FlGURE 2. Protocol Accelerator Fiinctional Block Oiagi:am
`
`f ............ ~;, ................. ..
`
`• D BUS
`:
`TICAHSCEIVER
`
`l r:J BtDlRF.C'llONAL
`i r:11tE01STEREO
`:LJ BUJ-T-B
`
`~ .. lo" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`49.4.2
`
`1729
`
`DEFS-ALAOO 10604
`INTEL Ex.1029.004
`
`

`

`checksumming in preference 10 using faster, more sophisti·
`c~tcd proce.ssors for this function.
`We use an 'on-thc-ny' technique for the checksum und in
`order 10 take advant.agc of this hardware, the checksum field is ·
`required to trail the fields being checked. In the c:ise of TCP
`this requires that the checksum field be moved from ilJ usu~I
`pl~ce in the header. Ahhough hardware schemes can be
`devised which leave the checksum field in place, these arc
`somewhat more difficult to implement, and because the intent
`with this design was to produce a testbed suitable for many
`protocols, the checksumming circuitry was not designed to be
`protocol specific.
`· The circuits uLilh:e 32 bit ones complement adders and
`operate in tandem with the OMA controllers; every word
`nmved by lhe OMA controller is simultaneously applied to the
`checksum circuits. On the transmit processor, lhc circuit auto·
`nia1ic.1lly imerts a 32 bit chccbum at the end or hoth the
`header and lhe data segments. On 1hc receiver these fields arc
`checked, again automatically.
`
`3. TCP IMPLEMENTATION
`3.1 Introduction
`The ttansport protocol used to dcmons1.ratc the c:ipabilitics
`of the Protocol Aca:lcrator is tbc Transmission Control Proto(cid:173)
`col (TCP). It was chosen because it is currcnll y one of the most
`wide-spread transport protocols in ·use on networks today. It is
`~!so the subject of a great deal of continuing re~h. nnd I.he
`expanding use ofUNIX-b:lscd desktop workstations is increa.~­
`ing its penetration. Along with the incre:ised use of TCP is the
`fear that as network rates rise, network throughput may ,not
`keep pace or may even decline as connection-oriented, window
`now-conttollcd protocols sucll as TCP become a bonlcnocll:.
`Some researchers do not believe lhat the protocol is to blame
`for the low throughput observed when current TCP implcmcn·
`talions arc run over experimental high speed networks l2J.
`Instead, they cite inefficient implementations of the protocol
`wu! interactions with lhe host operating system (UNIX) as the
`causes of the poor performance. The intensity of this debate
`"'ill grow as more hii;h pcrrormancc rruichincs running TCP
`observe less t.h:ln ideal performance over high spc.cd net works.
`Therefore, TCP was implemented to show th11t wilh an efficient
`implementation on the proper hardware platform. even a proto(cid:173)
`col dC$igned for moderate-speed, error-prone networks can
`achieve high throughput.
`3.2 Implementation Details
`The high performance aspects of Ille Protocol Accclcralor
`such as lhc dual processors. DMA, and on·lhe-Oy check.sum
`require the design of lhc transport protocol implementation io
`re specific to the PA. Therefore, a custom implementation of
`TCP was developed. The modules wm: written in C for Ille
`main protocol processing functions wilh cmbetldcd 68020
`Assembly language: code to pcrfonn many of the hardware rpc·
`cilic tasks. such as setting up the OMA controller, controlling
`1imcrs, and polling Jtatus flags. No operating sys1crn is used on
`I.he PA. Many implcmenta1ion decisions were made to optimize
`JPCCd and efficiency for lhe 'main palh' of protocol proceSling
`n1 the expense of processing for infrequent error condilions.
`These decisions ere jus1ificd given I.he low c:.rror mes chanic·
`tcrisLic or high speed libcr networks.
`The dual processor hardware archilccture of the Protocol
`Accclcra1or leads naturally to the software archiLCctwc. The
`
`TCP implementation consislJ of separate ll'ansmit and receive
`processes n.inning their respective tasks on separate micropro(cid:173)
`cessors. The ll'll!ISmittcr and receiver do most of their process(cid:173)
`ing on data stored in private memory, but they do communicate
`through lhe shared memory. The bulk of this communication
`occurs through the Transmission Conttol Block (TCB). the
`main TCP Stale informal.ion structure which resides in shared
`memory. Al appropriate times, state changes in either the trnns(cid:173)
`milter or receiver ere updated in lhc TCB which may !hen oo
`read by lhe olher processor in the course of itS work.
`As noted in the hardware description, lhe generic nature of
`the PA requires that chedcsums be placed afier the header and
`after the data. 01.her than lhis difference, t.he implementation
`provides all of !he TCP fW\Clions required t6 transmit and
`receive data in the TCP (call) ESTABLISHED state. Among
`others, these functions include maim:iining lhe retransmission
`queue. providing resequencing for out or order packets, sup(cid:173)
`porting retran~mlssion timers, and packcti:ting hosl data imo
`TCP scgmC'llts, or TI'OUs. ll must be noted that 10 minimize
`data movement, host data is moved directly between host
`memory and the nctworic interface. This precludes further scg·
`mcntation/rca.<.'<Cmbly of lhe data at whai would be the lnLcmct
`Protocol (IP) layer. Therefore, although certain functions of the
`!? layer are rolled into tbc TCP header (IP address, length, pro(cid:173)
`tocol), IP functionality is not claimed.
`Another objective of lhis work is to quantiry lhc eITccls or
`the end host system on outboard protocol processing. To this
`end a UNIX devici: driver, applications programming interface
`(AP!), and application were developed for lhe host UNIX sys·
`tcm. The relationships among lhe components are shown in
`Figure 3.
`
`UNIX
`APPLICATION
`
`AP!
`
`TC!'
`RF.CGIVFJI
`
`FlOURE 3. System Software AtchileetUrc
`
`Because many variations are possible wilh software in 1he
`UNIX environment, auempts were made to keep the device
`· driver, API, and application code as simple ..5 possible while
`maintaining funclional similarity to current methodologies for
`protoc0l/sys1em interl'ace.s. The results rriay then be used to
`exttapolate meaningful performance expcetations of olhcr sys(cid:173)
`tems with different basic parameter! such as processor capabil·
`ity, network packet site, or bus speeds.
`
`1730
`
`49.4.3
`
`DEFS-ALAOO 10605
`INTEL Ex.1029.005
`
`

`

`4. PERFORMANCE MEASUREMENTS I ANALYSIS
`
`4.1 T~I Confii:uration
`For these initial results, the circuit was operated in a hard(cid:173)
`ware loopbaelc configuration (see Figure 1) with the network
`output port connected to the network inpw port via FIFO buJT·
`crs. Thc. loopback caU$cs no Jou, errors, or reordering of pack(cid:173)
`ets :ind is thus a best case. For the host we used a VME b;ised
`single board computer based on the Motorola 68030 processor
`orcrating at 25 MHz. This processor was cquirpcd with an
`mca or shared d)lllal'llic RAM (4 MBylCS) eca:~ible from both
`the host processor subsystem and the VME bus.
`The communications model U$ed to test the capabilities or
`the Protocol Accc:lcr~tor is that of a rile server/client which c;in
`cillicr provide or receive large messages at the throughput rates
`or llrn PA. Performance measurements were Ulkcn based on the
`UNIX host transmitting bulk data over the network which, in
`this system, is a loopbaclc to the receiver side of the PA. In the
`initial measw-cmcms we found !he tranSmittcr to be the rate(cid:173)
`\lctcnnining element in the end-to-end process, being roughly
`filly percent slower than the receiver. We t!'iercforc focus ow(cid:173)
`discuuion on the transmiucr. In order to fully test the capabili·
`tics or !he tl'lll1smit1cr, the received da1n is not run back to !he
`UNIX host because thal \\/Quld cause excessive contention for
`ihc VME bus. For these measurements only one host bus
`(VME) was used.
`~.2 Protocol Acceler:nor Performance
`Figure 4 shows !he timing diagram for the now of TPDUs
`through the PA transmiucr. The header generation time is the
`time talccn for the pmccsscr to access the c!1rrcn1 sllltc informa·
`tion nnd popul:i1c the header fields. Onee I.he payload OMA
`1ransrcr is undcrwoy, further transport protocol processing pro·
`cccds in parallel and docs not impede data transfer ll!llc.1s the
`p~y!ond trwtsfcr time is shorter than the overall 1r.1nsport pro(cid:173)
`cessing time.
`
`-1•
`
`()jl.KC •l
`
`53µ.soc
`
`I llllADO!. I
`
`Cl:NCRATION
`
`TRANSPORT PROTOCOL PROCESSING
`
`PAYLOAD OMA TRANSrtlR
`(TIME IS PACKET SIZE DEPENDliNl)
`
`TIMC:
`
`FlGURE 4. Transmitter Per Packet Processing lime
`
`The payload transfer time is simply the product of lhc pay(cid:173)
`load si1.e (in 32 bit words) and the da1a lral\sfcr cycle time. The
`dii~ Lransfcr ~~cle time for our aystcm can be calculated by
`adding the mm1mum DMA cycle time of80 nscc (asynchro(cid:173)
`nous) to the source (Ol dc.sllnalion) memory 8cccss time. 1111:
`shared PA static RAM has an access time of 40 nsec:. leading to
`a cycle time of 120 nscc per 32 bit word (i.e. peak data tr.insfcr
`rate. in c::11"ss of 250 Mbits/scc). The VME mcm~on the
`ava1l:ible host computer, on the other hand, has a measun::d
`response time averaging 580 nscc, which n!duccs the peak
`~iroughput to below SO Mbits/scc when this memory is used.
`These figures assume no olher activity on lhc S)'lltcm bus. For
`these preliminary findings we use a constant packet size or
`
`2016 biu of which 1728 bits was payload. The total transport
`processing time of our transmiuer TCP implementation is
`66 µ.sec and this becomes the limiting throughput factor for
`small payload paclcct.s. The header generation part of lhis time
`is 13 J.UCC.
`
`Using these results, it is sttaightforw111d to extrapolate the
`perfonnance' of the PA transmitter for any memory access time
`and packet size. An ex.ample is shown in Figure 5. Herc, the
`payload throughput and Lhc packet throughput arc plotted
`•g~inst packet siu: for dilferi:nt memory access times. The Ont
`region of !he packet throughput cwves are in the region where
`the protocol processor limits throughput. To the right of this
`region, throughput is limited by memory bandwidth. Two cases
`nrc shown, l 20 nst:(: and 660 nscc memory cycle times; these
`represent the memory cycle times for the two cases described
`previously. PA shared SRAM and host VME DRAM.
`
`161-~---~
`14
`
`/ ..
`
`··--·····s····---·· 250
`
`/
`I
`I
`
`I
`
`cc
`l2Q
`llM"'CYCU.
`JI'
`,
`.
`.
`
`12
`10
`lrl 8
`e
`lS
`~
`
`I<!
`
`0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
`PACKET SIZE (IUJ YTESl
`
`200
`
`150
`
`100 ~
`~
`50 ~
`
`0
`
`FIGURES. PA Tuoughput Based on Memory Access lime
`and Packet Size
`
`4..3 PA rerfonnanct In UNIX Environment
`
`We now ex.amine the effect of host system soft ware on the
`observed throughput to the application. Host system interac(cid:173)
`tions with outboard proccuors are critical because regardless
`of how fast I.he data may depart from or arrive to the outboard
`processor, it is not until Ille data ic actually in user space on the
`host system !hat the communication is complete.
`The UNIX overlie.ad time was measured for a number of
`hosi mess.age sizes transmiucd from the PA. The UNIX over(cid:173)
`head c011sis1S or lhc lime between lhc user C4ll to the APl's
`BSD-like write soclcel call and the rcccipl of !he command by
`the PA on the inbound side plus the time between !he genera·
`tion or the UNIX int.crrupt (signaling !he end of PA processing.
`or the host meuage) and n:wm of control to the user at the md
`of message U'BtlSmission. The cormection cstabli1hmcnt func(cid:173)
`tions of socltet and c:oanect are not included in these measure(cid:173)
`ment.& because they are llOI directly associated with !he per
`message data tnul$fcr. The. syuem was kept unloaded except
`for lllllldard S)'$11!m daemons in order to minimi7.c contention
`for proce~sor and bus resources. The results of the UNIX over(cid:173)
`head measuremenU; for message transmission are shown in
`Figure 6.
`
`49.4.4
`
`1731
`
`DEFS-ALAOO 10606
`INTEL Ex.1029.006
`
`

`

`I 2
`~l ~ l
`~
`)( z :::>
`
`IOK
`5K
`15K
`JIOST MHSS/\06 SIZE (BYTHS)
`
`20K
`
`FlvURE 6. Absolute and Relative UN IX Overhead vs. Host
`Message Size
`
`5. CONCLUSIONS
`An outboard protocol aa:elerator optimized for transport
`layer procc!'Sing has been implcmcntcd and tested. The design
`consists of separate transmit and receive units, each with its
`own CPU, checksum, and OMA hardware. Using an imple·
`mentation of TCP, we demonstra1ed that this approach.can sup·
`pon end 10 end throughput rates in excess of 15,000 packets/
`sec bctwe.ert network and application. When integrated into the
`UNIX cnvirorunc:nt, the protocol accelerato.r freed lhe host of
`the traditional protocol procc.1sing tasks but led a 25% decrea~e
`in average throughput due to the UNIX overllead.
`
`ACKNOWLEDGEMENTS
`We would like 10 thank Mike Koblcnu: for his major con1ri-
`001ions and.Michael Stanton for his part in lhe initial design of
`·the hard wore,
`
`PRgTOCOl. ACCfll.F.R/)TClll TCP
`
`As the message size increases, the absolu1c UNIX overhead
`REFERENCES
`increases while ii.I percentage of the !Dial operation time falls.
`[1) L. Svobodova, "Measured Performance of Transport Ser·
`If the TCP interface process had been the only process running
`vice in LANs", Compuu:r Network.J and ISDN Systems,
`on the host system, the absolute overhead would remain con(cid:173)
`vol 18, pp. 31.-45, 1989m.
`~ton1 as host mC$sagc size increases. This is because the only
`{2) D. D. Clark, V. JacobSen, J. Romlc:cy and H. Salwen, "An
`action required of lhe host is to change lhe command message
`Analysis of TCP Processing Overhead", IEEE Comm.
`length delivered to the 1ransmitll.lr. The observed results arc
`Mag., pp 23-29, June 1989.
`due to the accumulation of higher priority host processes while
`[31 D. D.QaTk, M. L. Lambert and L. Zhang, "NETBLT; Bulk
`the PA OMA is in control of the UNIX memory. As the me!·
`Data Transfer Protocol", Network Information Center
`s;igc size increases, the OMA conltols the host bus for more
`RFC-998, SRI lntcmalional, Menlo Park. CA, 1987.
`time per message. The linear increase in absolute overhead also
`(4) G. Chesson, E. Brendan, V. Schryver, A. Chercnson arid A.
`indicates that the higher priority functions are generally peri(cid:173)
`Whaley, "XTP Protocol Definition." Protocol Engines Inc.,
`odic. It should be noted that even as the absolute overhead
`1900 State St., Sanlll Barbera CA. 93101.
`increases with meSJage size, the on-board TCP pnx:e.~sing time
`[5) R. P. Singh and A. Erramilli, "?rotocol Design and Model(cid:173)
`remains constant on n JlCT segment ba.~is. This shows th~t the
`ing bi;uc~ in Broadhilnd Ne1work5", Prix:. ICC '90,
`increase in absolute overhead is nnt due to increa.~c.' in I inte
`Atlanlll, GA, pp 1458, 1990.
`waiting for the host bus or other pc.r segment processing; it
`!61 D. R. Cheritan, "VM'Il'; Versatile Message Transaction
`occurs at the end of the packet proc:cSsing before control is
`Protocol, Protocol Specification'', Network Information
`rc1umcd to the user process.
`Centre RE'C-1045, SRI International, Menlo Park, CA,
`1987.
`The declining percentage of overhead as host message size
`[7) A. N. Netravali, W. D. Roome, K. Sabnani, "Design and
`is increased confirms 1>ur expccta1i0ns. As lhe host messnge
`Implementation of a High-Speed Transpan Protocol",
`si1.c incrca.,es, the amount of time spent on UNIX overhead
`IEEE Tra1U. 011 Communicazions, vol 38, no. 11, pp.2010,
`processing relative to the overall l.Jllnsmission time decreases.
`Nov.1990.
`111i1 again is due to the constant amoun1 or code which must be
`[8) D. Giarrizzo, M. Kaiserswcrth, T. Wicki, R. C. Williunson,
`executed. The observed asymptotic approach to 25'1. rcprc·
`"High Speed Parallel Protocol Implementation", H. Rudin
`scnts a balance between the decreasing percent.age or host pro(cid:173)
`and R. C. Williamson, Protocols for High Spud Ncrworh,
`cessing time per message and the increased amount of system
`Nonh-Holland, 1989.
`backlog which is serviced before the user process regains con·
`[91 M. Zitterban, MHigh Speed Transport Components", IEEE
`trol. Figure 7 shows lhe overall performance of the PA in and
`Network MagauM, January, 1991, pp. 54-63.
`out of the UNIX cnvirorunc:n1.
`[10] H. Kanakia and D. R. Cheritan, '1lle· VMP Network
`Adapter Board (NAB); High performance Networic Com·
`8
`munication for Multiprocessors.." PTCK:. ACM SJGCOM '88
`~
`Symposium 011 ComtnJlllicarions Architectures and Proto(cid:173)
`15
`cols, Stanford University, CA, 1988, pp. 17.5·187.
`~ 13
`[11] E. C. Cooper, P.A. SteenldSle, R. D. Sansom, B. D. Zill,
`~11 ~ ""
`"Protocol lmplemcnllltion on the Nec:lllr Communication
`Processor", Proc. SIGCOM '90, Philadelphia, PA, 1990
`~h
`pp. 135.
`(121 N. Jain, M. Schwaru, T. R. Ba.shkow, "Transport Protocol
`~
`Processing at GBPS Rates", Proc. SIGCOM
`'90,
`I=
`Philadelphia. PA. 1990, pp. 188.
`[13) R.Bradcn, D. Borman, C. Partridge; "Computing the
`Internet Checksum", Network Information Center RFC-
`1071, SRI International, Menlo Park, CA, t9RR.
`
`PA.UNOl!lt UNIX
`
`10.000
`15,000
`5,000
`HOST MESSA Gil SIZE (BYTES)
`
`20,000
`
`FIGURE 7. Throughput vs. Host Mcs.o;:igc Size
`
`1732
`
`49.4.5
`
`DEFS-ALAOO 10607
`INTEL Ex.1029.007
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket