`
`RE: Y.H. Thia and C.M. Woodside, “A Reduced Operation Protocol Engine (ROPE) for a
`
`multiple-layer bypass architecture,” Protocol for High Speed Networks IV, 1st Edition (TJ Press
`
`Ltd. 1995), pages 224-239.
`
`I, Lisa Rowlison de Ortiz, declare:
`
`1.
`
`I am the Head of Catalog & Metadata Services at University of California,
`
`Berkeley (“UC Berkeley”) library.
`
`I am familiar with the UC Berkeley library
`
`system, including the library catalog and policies and procedures regarding the
`
`receipt, indexing, and availability of books and periodicals.
`
`2.
`
`According to UC Berkeley Library policies and procedures, Library items are
`
`indexed in the library catalog and are made freely available to the faculty and
`
`student body of UC Berkeley as well as to the general public.
`
`3.
`
`The UC Berkeley library holds a copy of a chapter by Y.H. Thia and C.M.
`
`Woodside, “A Reduced Operation Protocol Engine (ROPE) for a multiple-layer
`
`bypass architecture,” published in the book Protocol for High Speed Networks IV,
`
`1st Edition (TJ Press Ltd. 1995), pages 224-239. (“Thia”).
`
`4.
`
`When a monograph is received and cataloged by the UC Berkeley Library, the
`
`date of cataloging is set and retained in the catalog record. The catalog date (“Cat
`
`Date”) for Thia is February 26, 1996 (see Exhibit A). Furthermore, after the
`
`volume is labeled and sent to its shelving location the date of receipt by this
`
`shelving location is stored in an internal note. This information shows that the
`
`volume was received by the Engineering Library on March 20, 1996. Id. The
`
`volume would have been available to the public within a few days of that date.
`
`Ex.1064.001
`
`DELL
`
`
`
`I declare under the penalty of perjury that I understand that willful false statements and
`
`the like are punishable by fine or imprisonment, or both under Section 1001 of Title 18 of the
`
`United States Code and may jeopardize the validity of the application or any patent issuing
`
`thereon. I declare that all statements made of my knowledge are true, and that all statements
`
`made on information and belief are believed to be true.
`
`Executed on January 27, 2017, in Berkeley, California.
`
`Lisa Rowlison de Ortiz
`
`Ex.1064.002
`
`DELL
`
`
`
`Ex.1064.003
`
`Melbourne•Madras
`LondonGlasgowWeinheimNewYorkTokyo
`CHAPMAN&HALL
`
`InternationalFederationforInformationProcessing(IFIP)
`PublishedbyChapman&Hallonbehalfofthe
`IEEECornSoc
`SponsoredbyIFIPWG6.1IWG6.4incooperationwith
`
`Canada
`Vancouver
`UniversityofBritishColumbia
`DepartmentofComputerScience
`GeraldNeufieldandMaboIto
`Editedby
`
`NetworksIV
`HighSpeed
`Protocolsfor
`
`________
`
`DELL
`
`
`
`Ex.1064.004
`
`UniversityofBritishColumbia
`DepartmentsofElectricalEngineeringandComputerScience
`
`GeraldNeufeld
`MaboIto
`
`CICSR,IJBC,MPRTeltechandNewbridgeNetworks.
`organizationswhichhavecontributedfinanciallytothisworkshop,speciallyNSERC,ASI,
`Wearethankfultoalltheauthorsofthepapersthatweresubmitted.Wealsothankseveral
`programcommitteewho,withtheadditionalreviewers,helpedmaketheselectionofthepapers.
`Manypeoplehavecontributedtothesuccessofthisworkshopincludingthemembersofthe
`
`accommodatemorediscussioninkeepingwiththeformatofaworkshop.
`theprogramcommitteedecidedtokeepthenumberoffullpresentationslowinorderto
`papersandfourwerepresentedasposterpapers.Althoughwereceivedmanyexcellentpapers
`contributions.Thisyear,fortypaperswerereceivedofwhichsixteenwerepresentedasfull
`Stockholm(1993)respectively.Wereceivedalargenumberofpapersinresponsetoourcallfor
`workshopfollowsthreeverysuccessfulworkshopsheldinZurich(1989),PaloAlto(1990)and
`WelcometothefourthIFIPworkshoponprotocolsforhighspeednetworksinVancouver.This
`
`PREFACE
`
`362
`
`361
`
`349
`
`328
`
`308
`
`295
`
`276
`
`260
`
`243
`
`224
`
`205
`
`189
`
`171
`
`Keywordindex
`
`Indexofcontributors
`
`JosephD.Touch
`
`21Protocolparallelization
`
`I.JnoueandN.Morita
`recoveryinB-ISDN
`
`20Partial-frameretransmissionschemefordatacommunicationerror
`
`19FromSDLspecificationstooptimizedparallelprotocolimplementations
`
`S.LeueandP.Oechstin
`
`18AmultimediadocumentdistributionsystemoverDQDBMANs
`
`L.Ororzco-BarbosaandM.Soto
`
`PARTSEVENPosters
`
`RB.Osborne
`
`17AhybriddepositmodelforlowoverheadcommunicationinhighspeedLANs
`
`TKaineda,J.TingandD.Fracchia
`delivery-on-demandofcontinuousmediadata
`
`16Aguaranteed-ratechannelallocationschemeanditsapplicationto
`
`K.MoldeklevandP.Gunningberg
`
`15DeadlocksituationsinTCPoverATM
`
`PARTSIXImplementationandPerformance
`
`YH.ThiaandC.M.Woodside
`bypassarchitecture
`
`14Areducedoperationprotocolengine(ROPE)foramultiple-layer
`
`13PATROCLOS:aflexibleandhigh-performancetransportsubsystem
`
`TBraun
`
`12Highperformancepresentationandtransportmechanismsforintegrated
`
`WS.Dabbous
`communicationsubsystems
`
`11ThedesignofBTOP—anATMbulktransferprotocol
`
`LCasey
`
`PARTFIVEProtocols
`
`Contents
`
`vi
`
`DELL
`
`
`
`tokeepdata
`UAshostprocessingspeedcontinuestooutpacememorybandwidthandasthenetwork
`
`is
`
`it
`
`important
`
`orforbothtogether,andiscompatiblewithotherend-systemsimplementedwithoutabypass.
`throughthenormal“heavyweight”path.Abypasspathcanbeprovidedforsend,forreceive,
`originalsoftware.Conformancetotheprotocolismaintainedbydoingalltheotheroperations
`Abypassaddsanadditionalpathforcertainoperations,withminimalchangestothe
`
`2TheBypassConcept
`
`Section7.
`chipusingtheindustrystandardhardwaredescriptionlanguage,VHDL,withconclusionsin
`forabypassVLSIimplementation.Sections4,5and6describeadesignstudyofaROPE
`Section3analyzesthekeyprotocolprocessingoverheadsanddiscussestherequirements
`implementation.
`
`Thenextsectionintroducesthebypassconcept,
`
`itsarchitectureantI
`
`approaching10bps.
`Itappearstobefeasibletosupportanend-systemsingle-connectiondatarate
`chipdesign.
`interfaceandthechipoperation,andtoreportonaVHDL-basedfeasibilitystudyofthe
`ReducedOperationProtocolEngine.Thecontributionofthispaperistodefinethehost/chip
`andasimplecommandprotocol.ThechipdesignbasedonbypassingiscalledROPE,for
`andminimizestheirinteraction,whichissupportedbyanaccesstest,someDMAprocessing
`layersforcertaincases.Thissimplifiestheinterfacebetweenthehostandtheadaptorchip
`theuseofoffboarclprocessing,byimplementinganentireservicethroughall
`maylimit
`Prediction”algorithm[20]forTCP/IP.Bypasssolvestheproblemsidentifiedabove,which
`basedonthe“protocolbypassconcept”[371whichisageneralizationofJacobson’s“Header
`providesahardware“fastpath”forthem,whichwillbeefficientforbulkdatatransfer.
`is
`combinestherelativelysimpleoperationsneededfordatatransferacrossmultiplelayersand
`It
`
`It
`
`Thispaperpresentsafeasibllitystudyforanewapproachtohardwareassistance.
`movementontheworkstationsclowntotheminimum[4,9,28].
`bandwidthapproachestheprocessormemorybandwidth,
`
`tasksinthehostsoftwareforflexibility.
`significantadvantageinprovidinghardwaresupportforthesefunctionsleavingtheother
`therecanbe
`thefrequentlyexecutedportionoftheprotocolremainrelativelystable,
`Ifthekeyfunctionsof
`
`OThereisatradeoffbetweenperformance,flexibilityandcost.
`
`designedforVLSIimplementation[1,3].
`supportTCPchecksums.Also,somenewerlightweighttransportprotocolsarespecially
`In[8],dedicatedVLSIchipsareusedto
`datalinklayerhasbeendisappointingsofar.
`implementationabovethe
`becauseofthecomplexityofexistingprotocols,VLSI[24]
`fullprotocolstackcanbeoffloaded,generalpurposemicroprocessorsareused.Probably
`In[2,22]wherethetransportprotocollayerisoffloacledorin[7]wherethe
`supports.
`UThechoiceofhardwarefortheadaptordependsonthecomplexityofthefunctionsit
`
`indeeplylayeredprotocolstacks.
`plesincludeinterrupthandling,contextswitchinganddatacopyingatlayerboundaries
`UNon-protocol-specificprocessingisalargepartofthetotalload,asshownin[35].Exam
`
`225
`
`ROPEfora‘nuiriple-layerbypassarchitecture
`
`Ex.1064.005
`
`fleawasatrlete,,Iniversity
`
`ibisrenewal,wasdonewhileDr.
`
`protocol
`maybeoffloaded,butthisleavestheproblemofcontrolforaccessingitwithinthefull
`offsetthepotentialgainfromoffloading.Forexample,
`thebuffermanagementtask[36]
`leadtoacomplexadditionalprotocolbetweenthetwoparts,whichmaycanceloutor
`UPartitioningthefunctionalitybetweenthehostandtheadaptorisdifficultandmayeasily
`
`logic.
`
`Thekeyproblemsassociatedwithoffboardprocessinginclude:
`
`partoftheprotocolfunctionstoanadaptor.Thispapertakesthelatterapproach.
`[14,21,38],specialprotocolstructures[15,30]andhardwareassist[22]byoffloadingallor
`improvedsoftwareimplementationofexistingprotocols[5,35],parallelprocessingtechniques
`thedatastream.Toalleviatetheend-systembottleneckonemayconsidernewprotocols[10],
`combinationofoperatingsystemoverhead,protocolcomplexity,andper-octetprocessingon
`quality-of-serviceguaranteeswillreinforcethiseffect.Theheavyprocessingloadisduetoa
`municationsprocessingintheend-pointsofthesystem[26].Othertrendssuchasimproved
`rates,hasshiftedtheperformancebottleneckfromthecommunicationschannel
`tothecom
`TheadventofFibreOptictechnology,whichoffershighbandwidthandlowbiterror
`
`1Introduction
`
`Keywords:NetworkProtocols,DataCommunicationsDevices
`Keywordcodes:C.2.2,B.4.l
`
`persecond,
`arraytechnology,andsimulationshowsthatitcansupportadatarateapproaching1gigabit
`usingVHDL.Thedesignispracticalintermsofchipcomplexityandarea,usingcurrentgate
`paperdescribesthedesignofaROPEchipfortheOSISessionandTransportlayerprotocols,
`areasignificantoverhead.ROPEisintendedtosupporthigh-speedbulkdatatransfer.The
`andbuffermanagement,contextswitchingandmovementofdataacrosslayers,allofwhich
`hardware.Multiple-layerbypassalsoeliminatessomeinter-layeroperationssuchasqueue
`involvesonlyasmallsubsetofthecompleteprotocol,whichcanthenbeimplementedin
`pathfordatatransfer.Themotivationforidentifyingthisseparateprocessingpathisthatit
`criticalfunctionsofamultiple-layerprotocolstack,basedonthe“bypassconcept”ofafast
`Abstract—TheReducedOperationProtocolEngine(ROPE)presentedhereoffloads
`
`inaconnectionattachedtoanend-system.
`
`Dept.ofSystemsandComputerEngineering,CarletonUniversity,Ottawa,Canada(**)
`NewbridgeNetworks.Inc.,Ottawa.Canada(*)anti
`
`Y.H.Thia(*)landC.M.Woodside(**)
`multiple-layerbypassarchitecture
`AReducedOperationProtocolEngine(ROPE)fora
`
`14
`
`DELL
`
`
`
`Ex.1064.006
`
`parameterB.
`frequencyisimplementation-dependent,andtheirtimingwillbeaggregatedandincludedin
`Per-group-of-packetsoperationsincludeforexampletransmissionofacknowledgments,whose
`theper-packetprocedurestaketimeBperpacket(e.g.addressdecodingantimultiplexing).
`operations.Theper-octetoperationstakeanaveragetimeAperoctet(e.g.,checksum).and
`Protocolprocedurescanbecharacterizedasper-octet,per-packetorper-group-of-packets
`layeredstackonanendsystem.ItfollowsthedescriptionbyHeatleyandStokesberry[161.
`Thissectionsummarizesthemajorfactorsaffectingthroughputperformanceinadeeply
`
`3.1Factorsaffectingsystemperformance
`
`3DesignConsiderationsforaHardwareBypass
`
`reliabletransferofdataacrossthecommunicationsnetwork.
`duringtheentiredatatransferphaseandtheprotocolprocessingisreducedtoensuring
`longasprocessingremainsinthebypasspath.Thestateofthesystemdoesnotchange
`OThefinitestatemachineoftheprotocolisnowreducedtoonlythe“OPEN”state,foras
`OThenumberofpossiblePDUformatsinthebypasspathisreducedtodatatransferPDUs;
`OTheprocessingpathofdataPDUscanbeoptimizeti;
`
`Insummary,theseparationofthebypasspathoffersthefollowingadvantages:
`
`separatelayersintheSPSpathhandletheotherphases.
`theadjacentlayerswhentheyaresimultaneouslyinthedatatransferphase.Meanwhile,the
`Amultiple-layerbypasspathisaconcatenationofprocessingproceduresperformedby
`
`layers,havebeenfurthersubdividedintosublayers.
`Theadvantageisincreasedfurtherincaseswheresomelayers,likethenetworkantIapplication
`
`0Queueingofdataatlayerboundaries.
`
`data;
`
`OExecutingthefullgeneralprotocollogicforthelayerstodecidehowtomanipulatethe
`
`layers:
`
`informationpassedbetsveen
`
`OOverheadofencodingantidecodingtheinterfacecontrol
`
`Abypassformultiplelayersinsteadofjustonegivesadditionalgainsbyavoiding:
`
`2.3Multiple-layerbypass
`
`transferandcanbecontrolledbytheapplicationprocess,theseoverheadsarenotexcessive.
`betweentheSPSandthebypassstack.Sincetheyareonlyinsertedperiodicallyinbulkdata
`primitivescanbeusedassynchronizationpointswithinthebypassarchitecture,asitswitches
`thesessionlayer[19,181whicharemappedbyequivalentapplicationandpresentationservice
`afterwhichcontrolcangobacktotheflag.Tokenmanagementandsynchronizationpointsof
`thenafull“no-in-transitPDU”testmustbeperformedforeachpacketuntilthetestsucceeds,
`antiitissufficienttomaintainaflagtoindicatethis.Onceapacketfails,andgoestotheSPS,
`newconnection,itisautomaticallysatisfied.
`Itholdsaslongasnopacketfailsabypasstest,
`The‘no-in-transitPDU”testcanoftenbeavoided.Atthebeginningofdatatransferona
`
`2.2Efficientlogicforthebypasstest
`
`227
`
`ROPEfora,nultiple-layerbypassarchitecture
`
`discussiononthisispresentedinanotherpaper[331
`inthecurrentpath.
`i.e.“noin-transitPDU5”,beforethechangeismade.Amoredetailed
`SPSandthebypassstack,checksareperformedtoensurethattherearenooutstandingpackets
`andconnectionidentifiers.Wheneverthereisachangeintheprocessingpathbetweenthe
`consistencybetweentheSPSandthebypassstack,includingwindowflowcontrolparameters
`protocolprocessinginthedatatransferphase.Theshareddataareusedtomaintainstate
`identifiesthepredictedbypassableheaders.Thebypassstackperformsall
`therelevant
`phase.ThereceivebypasstestmatchestheincomingPDUheaderswithatemplatethat
`Thesendbypasstestidentifiesoutgoingpacketsthataredatapacketsinthedatatransfer
`
`LShareddataforaccessbythetwotests,theBypassstackantitheSPS.
`0BypassStack;
`0ReceiveBypassTest;
`0SendBypassTest;
`layeredprotocolstack.Thebypasshas4keycomponents:
`withoutthebypass.TheSPSmayrefertoasinglelayerortomultipleadjoininglayersofa
`Thestandardprotocolstack(SPS)istheprocessingpathtakenbyallPDUsduringaconnection
`illustratesthearchitectureofabypassimplementationforanystandardprotocol.
`2.1BypassArchitecture
`
`Figure1
`
`Figure1BypassArchitecture
`
`4’*
`
`j
`
`PartFiveProtocols
`
`226
`
`DELL
`
`
`
`ispossibletobeginsimulationanddebuggingofcomplexsystems
`specificationtool,
`levelsofabstraction,fromlogicgatestothesystemlevel.ByutilizingVHDLasa
`all
`synthesizethechip.VHDLisanindustrystandardlanguagewhichcanbeusedtorepresent
`TheVHSICHardwareDescriptionLanguage(VHDL)[6,27]wasusedtomodeland
`
`it
`
`Engine(ROPE)chipusingVHDL
`4VLSIimplementationofaReducedOperationProtocol
`
`NetworkInterfaceAdaptor.
`thesendbypasstestisdoneonthehostandthereceivebypasstestisdoneonthe
`these,
`chip,andthosewhicharebetterhandledbythehost,duringthedatatransferphase.Besides
`TableIidentifiesprocedureswhicharestrongcandidatesforimplementationinthebypass
`encodinginterferewithefficientcaching,soitisparticularlyappropriatetooffloadthem.
`Howeverlongtraversesthroughthedataforper-octetoperationslikechecksumand
`resourceisoftenthehostbus/memorypath.andcachingistisedtoincreaseitsefficiency.
`theprocessingrequirementsanddatamovementoftheApplicationPDUs.Thecritical
`tolookatboth
`
`important
`
`is
`
`It
`
`0Removeoperationsthatareinefficientonthehost.
`
`TableIBypas,cablever.c,i.rNim-bypassablefunctian.v
`
`a,,,)))MA..
`pool,nr.nory
`(‘ucofdual
`
`(So.rplrucbrtno)j
`M,nuo.li
`
`inpI,maulatjon,
`Depro.fuon!
`
`ke,n,,,ka
`
`‘(
`
`x
`
`x
`
`-
`
`—
`
`x
`
`x
`
`x
`
`x
`
`X
`
`x
`
`Ex.1064.007
`
`TheoperationsimplementedinVLSIareunderstoodtohavelessneedofflexibility.
`efficientlyhandledbythehostgiventheprojectedincreaseinhostprocessingspeed[171.
`processing.Theyarealsoonlyper-packet,sotheyhavelessimpactoverall.Theycanbe
`transferpacketswhicharetypicallysmallbutrequiremoreflexibleandmorecomplex
`OTradeoffbetweenperformance,flexibilityandcost.Thehostsoftwareprocessesnon-data-
`UVLSIimplementationcomplexity:onlythedatatransferfunctionsareimplemented.
`
`incorporatemultiple-layerstacksandremoveoverheadthatway;
`edgmenthandlingaltogetherfromthehost.Also,thebypasssystemcanbeextendedto
`structions,ratherthanbytheprotocolprocessingitself.Ourapproachremovesacknowl
`knowledgmentpacketsisdominatedbyinterrupthandling,
`typicallyafewhundredin
`DReducednon-protocol-specificprocessingoverhead.Forexampletheprocessingofac
`
`thebypassstack;
`atthepacketentrypoint.ThereisrelativelyinfrequentswitchingbetweentheSPSand
`functionsarecompleteinthemselvesandhaveafocussedinterfacewiththehostsoftware
`betweenthehostandadaptorisdesired,andisprovidedbyabypass.Itsparticularsetof
`tocommunicate
`
`DAcleanseparationoffunctionalityrequiringonlyasimpleprotocol
`
`Theproblemsassociatedwithseparateoroffboardprocessing,whichwerediscussedin
`
`theintroduction,areaddressedbytheVLSIdesignasfollows:
`
`3.2RequirementsofabypassVLSIimplementation
`
`Hardwareimplementationisparticularlyefficientforper-octetoperations.
`
`Per-octetprocessinglikepresentationconversionandchecksumroutines.
`serverprocessoutsidethekemeldomain[11];
`problembecomesmorepronouncedinmicrokernelswhichtreatsaprotocoltaskasa
`Crossingprotectiondomains(addressspaces)—e.g.attheuser/kernelboundary.This
`Copyingbetweentheadaptorbufferandthehostsystemmemory;
`
`•
`
`•
`
`•
`
`transfer.ThedataportionofaPDUmaybephysicallymovedforthefollowingreasons:
`limitstheeffectivethroughputpresentedtotheapplicationprocess,especiallyforbulkdata
`memorybandwidthoftheendsystem,thecostofmovingdataandofper-octetprocessing
`networkadaptor.Astherawdatabitratesupporledbyopticalnetworksapproachesthemain
`Theprotocolprocessingloadonanendsystemistypicallysharedbetweenthehostandthe
`
`(3.1.2)
`
`Inbulk
`
`(3.1.1)
`
`=A+B/M
`
`£
`
`datatransfer,as£becomeslarge.
`where.risthesizeoftheusermessageinoctetsandlviisthemaximumPDUsize.
`
`=A.+.Ee/lvi1
`
`f”hulkdo,0rrauoj(.e(if)
`(Ii)A5gregairdI,.Po..Pa.ki
`
`Per-Gr.’up-Qf-Po.rkro
`
`,PeeOririPo.-Parfer
`
`x
`
`J
`
`—
`
`—
`
`—
`
`(A)
`
`I
`
`rln,snntrd.
`W,Qtn,layeruiu
`hyo.doGOopyk,5
`Wthm5II
`
`cnPyf
`
`ot
`
`—
`
`tSwtto,g
`
`AIlltfr,
`
`x
`
`x
`
`X
`
`x
`
`x
`
`X
`
`X
`
`X
`
`X
`
`X
`
`BufferMntagrtnrnr
`
`HeaderOrcotir
`
`Header(‘onulroction
`
`krrrqornu,ng
`
`pauk,n(Flow(‘.u0o((
`Generauno0IALK
`
`TimerMo.,ogemonl
`
`Oho.ko,nn(Opsonal)
`
`JTokenn.anagr.nr,Il
`
`(Cbno4)
`Troospo0
`
`Con,prr.oon
`
`So.rypt...n
`
`Enuods,g
`
`Pro.rimun
`
`Byp.mC/rrp
`
`Pro,rd...e
`
`L5
`
`Thethroughputboundimposedbyprotocolprocessingalone.Arn,inoctetspersecond.
`
`isthengivenbytheequation:
`
`229
`
`ROPEforamultiple-layerbypassarchitecture
`
`PartFiveP,vtocols
`
`228
`
`DELL
`
`
`
`•
`
`•Generationofacknowledgmentpacketsonreceive.
`•Generationoftheheaderfieldincludingthesequencenumber.
`•WindowHowcontrollogic.
`
`thefollowing:
`forexamplethearrivalofaconnectionreleasePDU.Thetestsequenceallowedustoverify
`aseriesofbypassabledatapackets.Non-bypassablepacketsarealsoinsertedtosimulate
`processorsubmodelgeneratesasimpletestsequencethatinitiatesbypassconnectionsand
`isincludedinthismodel,asitIS(luringthisphasethatthebypasschipisactive.Ahost
`andtheTransportprotocolclass2(TP2)protocols.Onlythelogicforthedatatransferphase
`packets.ThefirstdesignimplementedtheBasicCombinedSubset(BCS)ofthesessionlayer
`high-speednetworkinterfaceadapterwhichactssimplyasaninfinitesource/sinkforclam
`thebypasschip,thehostprocessorwithasimplifiedbus/memoryarchitecture,andavestigial
`Itincludes
`weenvisage,andthatthedesignistechnicallyfeasible,forinstanceinareasonablechiparea.
`feasibilitycheck,thatthelogicwespecifiedwillexecutetheprotocolwithintheenvironment
`model,astructuralorRTLmodel,andagateleveldesign.Thesegaveustwokindsof
`Figure3showsthestepsfollowedinthisstudy.Therewerethreestages,abehavioural
`
`AVHDLbehavioralmodelforthesystemwasinitiallywrittenandtested.
`
`4.3FirstDesign:DesignSteps
`
`231
`
`ROPE/hramultiple-layerbypassarchitecture
`
`PartFiveProtocols
`
`230
`
`Next,thebehaviouralmodelofthebypasschipwasmanuallyconvertedtoastructural
`SynchronizationbetweentheSPSandthebypassstack.
`
`forsynthesis,
`
`modeltoensurebehavioralconsistency.
`wasagaingeneratedbythehostprocessor.anditsresultswerecomparedwiththeoriginal
`hardwareincludeIFTHENELSEstatementsandsignalassignments.Thesametestsequence
`inhardware,butisusefulinbehavioralsimulation.Featuresthatcanbeeasilymappedto
`donotmapintotheRTLmodel.ForexampleaWAITFORstatementhasnomeaning
`notallofthefunctionalityofVHDLcanbeusedbecausesomeofthelanguagefeatures
`aretechnology-independent(siliconlibraryindependent).InaVHDLRTLleveldescription,
`operationsIntoclockcyclesisalsomacIc.Thesedescriptions,likebehavioraldescriptions,
`intermsofregisters,switches(multiplexors),andoperations.Aninitialassignmentof
`descriptionhasadefinitearchitectureandclockingscheme,andcharacterizesthesystem
`behavioraldescriptionhasnoimpliedarchitectureinitsrepresentation,whileinRTLlevel
`(RTL)model
`
`A
`
`leavingtheothercomponentsasbehavioralconstructs.
`
`Theseconddesign,withadditionalfunctionality.isdescribedinthenextsection.
`
`characteristicscanbeeasilyextractedfromdatabooks.
`submodels.Thedual-portedSRAMwasnotsynthesizedasitsgatecountsandperformance
`themodelwastoolargefortheSYNOPSYSpackagewewereusing,itwasdividedinto3
`stepofgeneratingachiplayoutforfabricationandfaultanalysiswasriotperformed.As
`ourpresentpurposes.toestimatethespacecomplexityandtimingofthechip,andthefinal
`toobtainthesimulationthroughputresultspresentedinsection5.Thiswassufficientfor
`Thetiminginformationgeneratedbythisprocesswasback-annotatedtothestructuralmodel
`gatelevelgenerationwiththeOSprimBiCMOSmacrolibraryfromTexasInstruments[32].
`ThestructuralmodelwasthenpassedthroughtheSYNOPSYSsynthesistool[31]for
`
`Ex.1064.008
`
`wasnotfullydesigned.
`
`ThepresentationmoduleshownintheFigurewasallowedforinthedatastructuresbut
`
`OThecontrolregistersofthebypasschipareI/Omappedtothehostprocessor.This
`
`enablesthehostprocessortoconfigurethebypasschipdirectly.
`
`constraintsonbusaccesslatencyandthroughput.
`
`toavoidcritical
`
`0On-chipdual-portedmemoryisused,ratherthanthehostmemory.
`
`OMovementofdataacrossthehostbusinterfaceareminimizedbyusingan(in-chipDMA
`
`forfastblockdatatransferto/fromthehostsystemmemory.
`
`inthechipdesigncanbesummarizedasfollows:
`ThisplacesthemaximumstressontheROPEchip.Thearchitecturalconsiderationsinvolved
`modelingpurposestheyweredescribedasbeinginfinitelyfast,eitherasasourceorasasink.
`providelogicalinterfacesforsimulationofbehaviour,buttheyinsertnotimingdelays.For
`Figure2showstheblockdiagramofthesystem.ThehostprocessorandNIAcomponents
`
`4.2ArchitecturalDescription
`
`inwhichany
`Italsooffersthepotential
`
`modificationstothespecificationcanbeeasilypropagatedtothegateleveldesign.
`ofanautomaticpathfromtheprotocolspecificationtoVLSIimplementation.
`beforedetailsregardingtheimplementationarefullyspecified.
`
`Figure2BlockI)iagrarnofVL.Ibypasssystem
`
`TransmissionMedium
`
`DELL
`
`
`
`discarded.Otherwise,aPDUisbufferedforresequencing.DuplicateTPDUscanbedetected
`thereceiverend,out-of-sequencePDUsoutsidetheflow-controlwindowwillbe
`
`At
`
`4.5.3RetransmissionandResequencing
`
`areaofon-chipmemoryisreservedtostorethestateinformationofthetimerlist.
`isstarted,atimerisanautonomousprocessuntilaninterruptsignalisactivated.Aseparate
`otherprotocolprocesses.Theonlyoverheadisinstartingandstoppingthetimers.Onceit
`implementation.On-chiptimersareveryefficientandcanbeexecutedconcurrentlywith
`overheadandupdateprocessingofthetimerqueue[341,butcanbeeasilyhandledbyVLSI
`Timermanagementinsoftwareisanexpensiveprocessduetosoftwareinterrupthandling
`
`andthewindowtimerwereimplementedhere.
`(W)andanInactivitytimer(I).Onlytheretransmissiontimerwithoneintervalperconnection
`DuringthedatatransferphaseTP4usesaRetransmissiontimer(TI),aWindowtimer
`
`4.5.2Timers
`
`headerfields.
`thechecksumcannowbesimplifiedfurtherbyprecomputingthepartialchecksumofthe
`structureandthepositionofthechecksumfieldareknowninadvance.Also,calculationof
`fieldisplacedinthevariablepartoftheheader.However,withthebypasssystem,theheader
`Itisoftendifficulttoperformitontheflyatthesenderendasthetwo-bytechecksum
`mented.
`4.5.1OSIChecksumThetransportprotocolclass4checksumalgorithm[121wasimple
`
`butarealsodiscussedinsection6.
`layerfunctionalityandproceduresforpresentationlayerconversionwerenotimplemented,
`retransmissionontimeoutandresequencingwereimplemented.ExtensionstotheSession
`andTP2functionality,toincludesomecommonTP4functionality.Proceduresforchecksum,
`Thissectiondescribesextensionstothefirstdesign,whichonlysupportsSessionBCS
`
`4.5SecondDesign,includingmajorproceduresforTransportClass4(Implemented)
`
`changedforthedurationoftheconnectionneednotbeupdated.
`parameters,backtothebypasschip.ParametersliketheDST-REFfieldwhichisnot
`onlythosedatathatwereupdatedinthestandardprotocolstack,likewindowflowcontrol
`issueaBYPASS_RESTARTprocedurewhichwillpass
`processingpath,
`6)Wheneverthehostprocessorwishestore-enterthebypasspathafteraswitchinthe
`
`thehostwill
`
`thesessionlayer(seesection6)duringthedatatransferphase.
`itreceives,forexample,aconnectionreleaseprimitiveorasessioncontrolprimitiveof
`hostinordertomaintainstateconsistencybetweenthetwopaths.Thismayoccurwhen
`updatedinformation,forexamplewindowcontrolparameters.fromthebypasschiptothe
`flushthebypasschipofany“in-transitPDU”forthatparticularconnectionandreturnany
`bypassstacktotheSPS,itwillissueaBYPASS_SYNCprocedure.Thisprocedurewill
`fromthe
`
`5)Wheneverthehostprocessorencountersaswitchintheprocessingpath.
`
`i.e.
`
`Ex.1064.009
`
`constructtheheaderfieldveryquickly.
`DMAtransferto/fromthehostprocessor.Aprecomputedheadertemplateisusedto
`Thedual-portedstructureallowsprotocolprocessingtoproceedconcurrentlywithany
`processed.IfthestatusisFILLED,protocolprocessingofthebypassstackcanproceed.
`3)Theprotocolenginepoflsthestatusfieldofapackettocheckifthereareanydatatobe
`
`bufferpointers.
`allocatedinfixedsizesandareaccessedbyasimpleroundrobinschemeusingasetof
`transfersthePDUintotheinternaldual-portedSRAM(StaticRAM).Buffersarepre
`busbetweenthehostandDMAisprovidedbytheDMAreqandDMAacklines.DMA
`Thedestinationaddressissuppliedbythebypasschip.Arbitrationforthehostprocessor
`bysendingthestartingaddresspointerwherethePDUislocated,anditstotal
`length.
`procedurewhichchecksforfreebufferspaceinthebypasschipandprogramstheDMA
`initiatestheBYPASS_DMA
`
`thehostprocessor
`
`2)Forsubsequentbypassablepackets,
`
`controlblocksallocatedinthebypasschip(5inthisstudy—Seefigure4).
`ofconnectionsallowedforsimultaneousbypassingwillbeequaltothenumberofthese
`orylocationsandarealsoaccessiblebythehost(TJOmapped).Themaximumnumber
`arestoredinaprocesscontrolblockfortheparticularconnectioninfixedon-chipmem
`initialwindowflowcontrolparametersandtheDST_REFfieldtothebypasschip.These
`iscalled.Thisproceduresetsupabypassableconnectionbysendinginformationlikeits
`PASS_RESTART.OnreceivingthefirstbypassablePDU,theBYPASS_STARTprocedure
`bypasschip,namely:BYPASS_START.BYPASS_DMA,BYPASS_SYNCandBY
`the
`
`I)Fourhighlevelproceduresaremadeavailabletothehostprocessortocontrol
`
`Thesequenceofoperationinthebypasssystemissummarizedasfollows:
`
`4.4Behaviouraldescription
`
`Figure3I)esigtiflowdiagram
`
`obtainthroughputresutta)
`(Timingsback-annotatedto
`
`4)Processedin-sequencepacketswithinthetransmitwindowarepassedtothenetwork
`
`interfaceadapter,whichinthisstudyactedasaninstantaneoussink.
`
`233
`
`ROPEforamultiple-layerbypassarchitecture
`
`PartFiveProtocols
`
`232
`
`T
`
`DELL
`
`
`
`Ex.1064.010
`
`thetotalgatecountincreasedto54,545gates.TexasInstrumentsoffersa
`timercircuitries,
`is51,231equivalentNAND2gates.WiththeadditionalTP4procedureslikechecksumand
`countforthebypasschipwithSession(BCS),TP2and4KbyteinternaldualportedSRAM
`Hencethethroughputresultincludesjustonecopyoperationofthedatapacket.Thetotalgate
`chiptothenetworkinterfaceadapter,butnotthedatacopyoperationfromthehostmemory.
`includesthetimetakentomovethedatapacketoutfromtheinternalmemoryofthebypass
`Table2showsthethroughputperformanceoftheROPEchip.Thethroughputvalue
`
`O4KbyteofinternaldualportedSRAM.
`COnethousandpacketssvereprocessedforeachiteration.
`
`sinks/sourceofdatapackets.
`
`CThehostbus/memorysubsystemandnetworkinterfaceadapterwereassumedtobeinfinite
`CWindowsizeof64.Anacknowledgmentpacketissentforevery20packetsreceived.
`CIKbytedatapackets
`
`Achipclockrateof66MHz.
`
`madeinthisstudyare:
`modeltoobtainthroughputperformanceresults.Theoperatingparametersandassumptions
`Thetiminginformationobtainedfromthenetlistwasback-annotatedtothestructural
`
`Table2Throughput!‘erformanceandgatecountofbypas.vVLSIchip
`
`313.3
`
`1256]
`
`4227
`
`8334
`
`N/A
`
`41,984
`Approximate/v
`
`2,362.8
`
`9247
`
`N/A
`
`2929
`
`N/A
`
`6318
`
`(Mbps)
`packetlength
`with1Kbyte
`pe,formance
`Throughput
`
`NAND2gates)
`(Equivalent
`TotalArea
`
`NAND2gates)
`Area(Equivalent
`Combinational
`Non
`
`NAND2gates)
`Area(Equivalent
`Combinational
`
`5Results
`
`tintercircuitries
`Procedurewith
`Checksum
`additionofthe
`Class4withthe
`Transport
`Session(BCS)/
`
`SRAM(4Kbyte)
`DualPot-ted
`
`Class2
`Transport
`Session(BCS)/
`
`Procedures
`
`resequencing,andslowerexternalmemorywouldbeneeded.
`toholdtheunacknowledgeddatapacketsforretransmissionortobufferdatapacketsfor
`retransmissionstrategywasused.Foralargesvindow,theon-chipbuffermaynotbesufficient
`InthisdesigntheGo-back-N
`orallTPDUs(Go-back-N)waitingforacknowledgment.
`thetransportentitycanretransmiteitherthefirstTPDU,
`senderend,
`easilybecausethesequencenumbermatchesthatofapreviouslyreceivedTPDU.Atthe
`
`iftimerTIexpires,
`
`Figure4Organizationofinternalbypasschipmemory
`
`STATUSIndicatesthestatusofthebuffer.e.g.EMPTY.FILLING.FILLEDorCLOSED.
`
`BYPASS_SYNCorBYPASS_RESTART.
`
`HostTagThistagissetonreceiptofahostcotninand.e.gBYPASS_START,BYPASS_DMA.
`
`SpaceforData
`Buffer
`Reserved
`
`ProtocolHeader
`
`Reserved
`
`BlockAddress
`
`STATUS
`
`SpaceforData
`Buffer
`Reserved
`
`DataPointer
`
`HeaderPointer
`
`ProtocolHeader
`
`Reserved
`
`BlockAddressPointer
`
`STATUS
`
`.—.-.—32bit.s
`
`DataPointer
`
`HeaderPointer
`
`PresentationContext_ID
`
`Options
`
`FonnatType
`
`Class
`
`DST-REFfield
`UpperWindow
`LowerWindosv
`SequenceNumber
`
`BlockN
`Control
`
`PresentationContext_ID
`
`Options
`
`FonnatType
`
`Class
`
`DST-REFfield
`
`UpperWindow
`LowerWindow
`SequenceNumber
`
`BlockI
`Control
`
`BlockAddress
`
`HostTag
`
`BypasschipFULL
`DMALength
`
`DMAStartAddress
`
`.m___—32Bits
`
`235
`
`ROPEforamultiple-layerbypassarchitecture
`
`PartFiveProtocols
`
`234
`
`It
`
`DELL
`
`
`
`Ex.1064.011
`
`theNectarCommunicationProcessor,”inACMSIGCOMM’90,1990.
`
`[7]CooperE.C,SteenkisteP.A.,SansomRD.andZillB.D.,“ProtocolImplementationon
`[61CnelhoD.R.,“TheVHDLHandbook.”KiuwerAcademicPublishers,1989.
`
`overhead.”inIEEECommum.Mag..vol.27,pp.23-29,June1989.
`
`[5]ClarkD.,JacobsonV.,RomkeyJ.,andSalsvenH.,“AnanalysisofTCPprocessing
`
`Protocols,”inACMSIGCOMM1990.
`
`[4]ClarkD.andTennenhouseD.,“ArchitecturalConsiderationsforaNewGenerationof
`
`IFIPWorkshopProtocolsfor
`
`[3]ChessonG..“XTP/PEDesignConsiderations,”inProc.
`
`High-SpeedNetworks,Zurich.May9-11,pp.27-33,1989.
`
`[2]BeachB.,“UltraNet:AnArchitectureforGigabitNetworking,”inProc.15thConference
`
`onLocalComputerNetworks,MinnesotaOct,1990.
`
`[11BalrajT.S.andYeminiY.,“PuttingtheTransportLayeronVLSI-thePROMPTprotocol
`
`chip”.IFIP,Stockholm,May13-15,1992.
`
`References
`
`theTelecommunicationsResearchInstituteofOntario.
`programofCentersofExcellence,throughtheTelecomSoftwareMethodsProjectofTRIO,
`Bourahlahelpedwiththeiruse.ThisresearchwassupportedbytheOntariogovernment
`Curry,HemiThakar,Dr.ParvizYousefpour,BernardDoray,MikeMajidandMustapha
`Bell-NorthernResearchprovidedtheVHDLtoolsusedinthisstudy,andDr.Simon
`
`Acknowledgments
`
`approachviableforsomeconsiderabletimetocome.
`advancesinspeedcanbeobtainedinproportiontotechnologyimprovements,makingthe
`ontimeoutandresequencingprocedures,
`thethroughputdecreasedto313.3Mbps.Further
`Intheseconddesign,extendedtoincludetheTP4checksum,retransmission
`technology.
`2.30bps(SessionBCSandTP2)for1KbyteTPDUpacketsusingcurrent0.StiuBiCMOS
`Inthefirstdesign,thebypasschipwitha66MHzclockcansupportathroughputrateof
`
`ReassemblysublayeroftheATIVIadaptationlayerisagoodplaceforsuchfunctions[25].
`lowerlayersandshouldoccuronlyonceintheprotocolstack[23].TheSegmentationand
`restriction,asresearchsuggeststhatfragmentationofPDUsshouldberestrictedonlytothe
`isnosegmentation/reassemblywithinthebypasspath,butwedonotseethisasamajor
`Abypassdoesnotincludefastconnectionsetupbutalsodoesnotinterferewithit.There
`Thescopeoffunctionsincludedinabypassmaybenarrowlydefined,ormoreextended.
`
`softsvare,thusprovitlinganeasymigrationpathforcurrentsystems.
`oftheOSIstackcanbeadaptedforbypassingwithonlyasmallmodificationoftheoriginal
`hostprocessorisalsorelievedofacknowledgmentprocessing.Anexistingimplementation
`andreassemblyorSARoperationwouldalsobeinhardware,sinceitisdonefrequently.)The
`
`(InanATMsystemweassumethatthesegmentation
`to-applicationthroughputperformance.
`bandwidthofhigh-speednetworks,e.g.ATMtechnology,therebyincreasingtheapplication-
`Thespeedofcommunicationprocessinginthehostsystemcannowmatchthetransmission
`significantproportionofprotocolprocessingandcanconcentrateontheapplicationprocessing.
`areparticularlyefficientwhenperformedonthechip.Thehostprocessorisrelievedofa
`easilyfitintoacommerciallyavailablegatearrayIntegratedCircuit.Per-octetoperations
`a“ReducedOperationProtocolEngine”(ROPE).Thegatecountforthebypasschipcan
`offloadsthecriticalprotocolfunctionsandtheassociatednon-protocol-specificfunctionsonto
`leastanorderofmagnitudehigherthansoftwareprotocolprocessing.Thebypasssystem
`theperformancewouldbeat
`leastforthetransportandsessionlayers)inVLSIandthat
`Itcanbeconcludedfromthisstudythatitisfeasibletoimplementthebypassstack(at
`
`7Summary
`
`standardssuchasMPEG[13].
`ROPEwithhardwiredpresentationconversionisinvideoserverswiththeproposedencoding
`theinflexibilityofahardwareversionisanevidentweakness.Onepossibleapplicationof
`gainscouldresultifthepresentationconversionsaresimpleandareusedconsistently,although
`consistsonlyofthePresentationdataencoding/decodingfunctions.Substantialperformance
`it
`
`Presentationprocessingcandefinitelybebypass