throbber
Chapter 
`AdvancedSystolicDesign
`DominiqueLavenier
`PatriceQuinton
`SanjayRajopadhye
`IRISA,CampusdeBeaulieu
` RennesCedex
`France
`email:flavenier,quinton,rajopadhg@irisa.fr
`AbstractSystolicarraysarelocallyconnectedparallelarchitectures,whosestructure
`iswell-suitedtotheimplementationofmanyalgorithms,inscienti(cid:12)ccomputation,
`signalandimageprocessing,biologicaldataanalysis,etc.Thenatureofsystolic
`algorithmsmakesitpossibletosynthesizearchitecturessupportingthem,using
`correctnesspreservingtransformations,inatheoreticalframeworkthathasadeep
`relationshipwithloopparallelizationtechniques.Thisopensthewaytonewvery
`high-levelarchitecturesynthesistechniques,whichwillbeamajorstepinmastering
`theuseofICtechnologiesinthefuture.Thischapterhastwoparts.First,we
`presentthecurrentstateoftheartinsystolicsynthesistechniques,andthesecond
`partsurveysvariouswaysofimplementingsystolicalgorithmsandarchitectures
`usingprogrammablearchitectures,fpgas,ordedicatedarchitectures.
`. Introduction
`ThetermsystolicarrayswascoinedbyKungandLeisersonin tode-
`scribeapplicationspeci(cid:12)cvlsiarchitecturesthatwereregular,locallyconnected
`andmassivelyparallelwithsimpleprocessingelements(PEs).Theideaofusing
`suchregularcircuitswasevenpresentinvonNeuman'scellularautomatainthe
`(cid:12)fties,Hennie'siterativelogicarraysinthesixties,andalsoinspecializedarithmetic
`circuits(Lyon'sbit-serialmultiplier[ ]isclearlyalinearsystolicarray).However,
`theemergenceofvlsitechnologyinthelateseventiesandearlyeightiesmadethe
`timeripeforintroducingsucharchitecturesinordertohighlightthecharacteristics
`appropriatetothetechnology.
`Systolicarraysimmediatelycaughton,sincetheyinvolvedafascinatingin-
`terplaybetweenalgorithmandarchitecturedesign.Whenresearchersstartedin-
`
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2040, p. 1
`
`

`

`Chapter
`
`vestigatingautomaticsynthesis,athirdstreamjoinedthiscon(cid:13)uence,namelythe
`analysis,manipulationandtransformationofprograms.Someoftheearlydesigns
`areclassics.TheGuibasetal.arrayforoptimalstringparenthesization[]isone
`ofouralltimefavorites.
`Theearlyeightieswasaperiodofintenseactivityinthisarea.Alargenumber
`of\paperdesigns"wereproposedforawidevarietyofalgorithmsfromlinearalge-
`bra,graphtheory,searching,sorting,etc.Therewasalsomuchworkonautomatic
`synthesismethods,usingdependencyanalysis,space-timetransformationsofinner
`loops,andalsootherformalismssuchasrecurrenceequations.
`Then,thetechnologicalevolutionsinthelateeightiesseemedtoinvalidatethe
`assumptionsofthesystolicmodel,namelythat(i)locality,regularityandsimplicity
`wereprimordialforvlsiand(ii)anelementarycomputationcouldbeperformed
`inthesametimethatittooktoperformanelementarycommunication.The(cid:12)rst
`onemeantthatwiththeimprovedcadtoolsandincreasinglevelsofintegration,
`circuitdesignerswerenotobligedtoalwaysfollowthatdictatesofthesystolicmodel
`(whichwasitselfapplicabletoonlyapart|albeit,acomputationallysigni(cid:12)cant
`one|ofthecompleteapplication).Thesecondonemeantthatthesystolicspace-
`timemappingmethodswerenotdirectlyapplicableforgeneralpurposeparallel
`machineswherethecommunicationlatencywasanorderofmagnitudehigherthan
`computationspeed.
`Anumberofrecentdevelopmentsnowleadustobelievethatitistimeto
`revivethe(cid:12)eld.Firstofall,technologyhasevolved.Withthegrowinglevelsof
`integration,itisfeasibletoactuallyimplementanumberoftheold\paperde-
`signs",particularlyiftheelementaryoperationsarenot(cid:13)oatingpoint,butcome
`fromasimpleralgebraicstructure(suchassemi-ringswithadditionsandcompar-
`isons,etc.)Second,thecircuitsoftodayaremuchmorecomplex,andweneedto
`againquestiontheargumentsagainsttheregularity,localityandsimplicityofsys-
`tolicarrays.Cananirregularcircuitsachievebetterperformance?Willtheydoso
`reliably?Whatwillbethedesigntime?Athirdandveryimportantreasonisthe
`emergenceoffpgas,programmablelogicandrecon(cid:12)gurablecomputing.Thebasic
`technologyunderlyingfpgasisregularity,localityandsimplicity.Itisthereforenot
`surprisingthatsomeoftheimpressivesuccessesinthisdomainaresystolicdesigns.
`Fourth,thereisthegrowingunderstandingthatthetechniquesofsystolicsynthesis
`haveaclosebearingonautomaticloopparallelizationforgeneralpurposeparallel
`machines.Thesetechniquescanthusbeseenasveryhighlevelsynthesismethods,
`applicableforsoftwareaswellashardware,andthusformafoundationforcode-
`sign,foracertainclassofproblems.Coupledwiththefactthatrecentadvancesin
`integrationpermitoneormoreprogrammableprocessorcorestobeimplemented
`\on-chip",thismakessuchtechniqueswellsuitedforprovablycorrectcodesign.
`Finally,webelievethattheincreasingcomplexityofvlsisystemdesignnaturally
`leadstowardsformaldesignmethodsinvolvingcorrectness-preservingtransforma-
`tionofhigh-levelspeci(cid:12)cations.
`Thischapterpresentstwoaspectsofadvancedsystolicdesign,namelyad-
`vancedmethodsforsystolicarraydesign(Secs.{..),butalsodesigntechniques
`andcasestudiesofadvanced,stateoftheartsystolicarrays(Sec. ).
`Inthe(cid:12)rstpartweuseaformalismcalledsystemsofrecurrenceequations
`(sres)todescribeboththeinitialspeci(cid:12)cationaswellasthe(cid:12)nalarray.Theentire
`processofsystolicdesignisviewedastheapplicationofaseriesoftransformations
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2040, p. 2
`
`

`

`
`EmergingTechnologies
`totheinitialspeci(cid:12)cation,untiloneobtainsadescriptionofthe(cid:12)nalarray.We
`desirethatallthedetails,includingcontrol,I/O,interfaceetc.bespeci(cid:12)edinthe
`sameformalism,andthatasimpletranslationstepshouldyieldadescriptionthat
`issuitableforaconventionalcadtool(sayvhdl).Foreachtransformationweare
`concernedwithtwoaspects:(i)themanipulationofthesretoobtainaprovably
`equivalentsre((cid:18)ala`correctness-preserving'programtransformations),and(ii)the
`choiceofthetransformation,usuallywithaviewtooptimizingcertaincostcriteria.
`InSection.wepresentthebasicnotationsofrecurrenceequationsandtheir
`domains.InSection..wedescribethefoundationsofthe(cid:12)rstaspect,namelythe
`formalmanipulationofrecurrenceequations.Insections.. to..,weexplain
`howthesevarioustransformationscanbeusedtotransformaspeci(cid:12)cationinto
`anabstractarchitecture:section.. detailsschedulingtechniques,section..
`dealswithallocationofcomputationsonprocessors,and(cid:12)nally,section..is
`concernedwithlocalizationandserializationtransformations.
`Inthesecondpart,weillustratefourdi(cid:11)erentwaystoimplementsystolic
`arrays:general-purposeprogrammablearchitectures(Section. . ),application
`orientedprogrammablearchitectures(Section. .),recon(cid:12)gurablearchitectures
`(Section. . ),andspecial-purposearchitectures(Section. .),andweillustrate
`eachonewithonecasestudy.Althoughsomeofthedesignspresentedarenot
`new,aclosestudyoftheirdesignisinstructive.Asthetechnologyofintegrated
`circuitsisstillinfullmotion,webelievethatlessonshavetobelearnedfromthese
`examples,inordertotakethebestadvantageoffuturearchitecturalopportunities.
`.SystolicDesignbyRecurrenceTransformations
`SystemsofRecurrenceEquations(sres)playanimportantpartinthedesign
`process.Theyareusefulas(i)behavioraldescriptionsofthe(cid:12)nalarrays,andalso
`(ii)highlevelspeci(cid:12)cationsoftheinitialalgorithm.Forexample,considerthe
`systemofequationsgivenbelow.
`X(t;p)=<:fp=;(cid:0)n+ (cid:20)t<g:
`fp=;t(cid:21)g
`:xt
`( )
`f<p<n;p(cid:20)tg
`:X(t(cid:0);p(cid:0) )
`W(t;p)=(cid:26)ft=g:wp
`()
`f<tg:W(t(cid:0) ;p)
`Y(t;p)=(cid:26)p=;t(cid:21)
`:W(t;p)(cid:3)X(t;p)
`( )
`n>p>;t(cid:21)p:Y(t(cid:0) ;p(cid:0) )+W(t;p)(cid:3)X(t;p)
`Ifweinterprettheindextasthetimeandtheindexpastheprocessor,
`thissystemcanbeviewedasaspeci(cid:12)cationofthevaluesthatwillappearinthe
``registers'X,WandYofanlineararraywithn-processorsasshowninFig.. .
`FromEqn.( )weseethat(otherthanintheboundaryprocessorp=),the
`valueintheXregisterofprocessorpattimeinstanttisthesameasthatinthe
`Xregisterofprocessorp(cid:0) attimet(cid:0)(thiscorrespondstoaninterconnection
`betweenadjacentprocessorswithadelayofcycles.)Similarly,theWvaluesstay
`intheprocessorsafterinitialization,andtheYvaluespropagatebetweenadjacent
`processorswitha -cycledelay(afterbeingincrementedbytheW*Xproductineach
`processor).NotethattheequationsdonotclearlyspecifyhowtheWregistersare
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2040, p. 3
`
`

`

`Chapter
`
`Figure. ThesystolicarrayforEqns.( { ).
``loaded'att=,nordotheydescribetheinitialvaluesintheXregistersofthe
`processorsotherthanthe(cid:12)rstone.Thesedetailsandthecontrolsignalstoensure
`correctloadingandpropagationofinitialvaluescanbeincluded,buthavebeen
`omittedintheinterestsofsimplicity.Itiswellknownthatthisarraycomputesthe
`convolutionofthexstreamwiththecoe(cid:14)cientsw.
`Thesamesreformalism,augmentedwithreductionoperations,canserveasa
`veryhighlevel,mathematicaldescriptionofthealgorithmforwhichasystolicarray
`istobedesigned.Forexample,then-pointconvolutionofasequenceofsamples
`x;x ;:::withtheweightsw:::wn(cid:0) producesthesequencey;y ;:::givenby
`thefollowingequation(assumingthatxk=,for(cid:0)n<k<).
`yi=Xjwjxi(cid:0)j
`()
`.. RecurrenceEquations:De(cid:12)nitionsandNotation
`Wepresentthefundamentalde(cid:12)nitionsofrecurrenceequations,andthedo-
`mainsoverwhichtheyarede(cid:12)ned.Inwhatfollows,Zdenotesthesetofintegers,
`andNthesetofnaturalnumbers.
`De(cid:12)nition.. ARecurrenceEquationde(cid:12)ningafunction(variable)Xat
`allpoints,z,inadomain,D,isanequationoftheform
`X(z)=DX:
`g(:::X(f(z)):::)
`()
`wherezisann-dimensionalindexvariable.
`Xisadatavariable,denotingafunctionofnintegerarguments;itissaid
`tobeann-dimensionalvariable.
`f(z)isadependencyfunction(alsocalledanindexoraccessfunction),
`f:Zn!Zn;
`the\..."indicatethatgmayhaveotherarguments,eachwiththesame
`syntax;
`gisastrict,single-valuedfunction;itisoftenwrittenimplicitlyasanexpres-
`sioninvolvingoperandsoftheformX(f(z))combinedwithbasicoperators
`andparentheses.
`
`+*
`
`+*
`
`+*
`
`+*
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2040, p. 4
`
`

`

`
`EmergingTechnologies
`DXisasetofpointsinZnandiscalledthedomainoftheequation.Often,
`thedomainsareparameterizedwithoneormore(say,l)sizeparameters.
`Inthiscase,werepresenttheparameterasavector,pZl,andusepasan
`additionalsuperscriptonD.
`Avariablemaybede(cid:12)nedbymorethanoneequation.Inthiscase,weuse
`thesyntaxshownbelow:X(z)=>><>>:
`...
`Di
`:
`gi(:::X(f(z)):::)
`()
`...
`Eachlineiscalledacase,andthedomainofXistheunionofthedomainsof
`allthecases,DX=SiDi(actually,theconvexhulloftheunion,foranalysis
`purposes).TheDi'smustbedisjoint.Indeed,thedomainappearinginsuchan
`equationmustbesubscripted(annotated)with(i)thenameofthevariablebeing
`de(cid:12)ned,(ii)thebranchofthecaseandalso(iii)theparametersofthedomain.
`Whenthereisnoambiguity,wemaydroponeormoreofthesesubscripts.
`Finally,theexpressionde(cid:12)ningg,maycontainreductionoperators,i.e.,
`associativeandcommutativebinaryoperatorsappliedtoacollectionofvaluessuch
`asaddition(P),multiplication((cid:5)),minimum(min),maximum(max),boolean
`or(W),booleanand(V),etc.Theseoperatorsaresubscriptedwithoneormore
`auxiliaryindexvariables,z,whosescopeislocaltothereduction.Adomainfor
`theauxiliaryindicesmayalsobegiven.
`De(cid:12)nition..Arecurrenceequation()asde(cid:12)nedabove,iscalledanA(cid:14)ne
`RecurrenceEquation(are)ifeverydependencefunctionisoftheform,f(z)=
`Az+Bp+a,whereA(respectivelyB)isaconstantn(cid:2)n(respectively,n(cid:2)l)matrix
`andaisaconstantn-vector.ItissaidtobeaUniformRecurrenceEquation
`(ure)ifitisoftheform,f(z)=z+a,whereaisaconstantn-dimensionalvector,
`calledthedependencevector.uresareapropersubsetofares,whereAisthe
`identitymatrixandB=.
`De(cid:12)nition.. Asystemofrecurrenceequations(sre)isasetofmsuchequa-
`tions,de(cid:12)ningthedatavariablesX :::Xm.Eachvariable,Xiisofdimensionni,
`andsincetheequationsmaynowbemutuallyrecursive,thedependencefunctionsf
`mustnowhavetheappropriate(notnecessarilysquare)matrices.Weareinterested
`insystemsofares(sares)wherealldependencefunctionsarea(cid:14)ne,andalsoa
`propersubset,systemsofures(sures).Notethatinasure,thedomainsofall
`variablesmusthavethesamenumberofdimensions,sinceAhastobetheidentity
`matrix.
`DomainsAnimportantpartofthesreformalismisthenotionofdomain,i.e.,theset
`ofindiceswhereaparticularcomputationisde(cid:12)ned.Thedomainofthevariables
`ofansreareusuallyspeci(cid:12)edexplicitly.Thedomainsthatweusearepolyhedra
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2040, p. 5
`
`

`

`Chapter
`
`(i.e.,thesetofintegralindicesthatsatisfya(cid:12)nitenumberoflinear(in)-equality
`constraints),oraunionof(cid:12)nitelymanypolyhedra.Thedomainsmayhaveone
`ormoresizeparameters.Often,otherfamiliesofdomainssuchaslattices,sparse
`polyhedra(theintersectionsoflatticesandpolyhedra)andlinearlyboundedlattices
`(theimageofpolyhedrabya(cid:14)nefunctions)areused,butwillnotbediscussed
`hereforthesakeofsimplicity.Thedomainsareoftenparameterizedbyanumber
`ofsizeparameters.Animportantgoaloftheresearchinthis(cid:12)eldistoelaborate
`parameter-independentanalysistechniquesandarchitectures.
`Asubtlepointtonoteisthatinansreeach(sub)expressionitselfcanbe
`assignedadomain,andthiscanbededucedusingsimplerules.Forexample,let
`usdothisforeachsub-expressionontheright-handside(rhs)ofEqn..First,
`notethattheexpressionwithinthesummationhasa-dimensionaldomain,since
`itinvolvestwoindicesiandj.Thejindexislocal,andnotvisibleoutsidethe
`summation,thedomainoftheexpressionXjwjxi(cid:0)jisthustheprojectionalongj
`ofthedomainoftheexpressionwjxi(cid:0)j.Wenowreasonasfollows:
`Thedomainofwisfij(cid:20)i<ng.
`Thedomainoftheexpressionwjisthesetof(-dimensional)indexpoints
`[i;j]forwhichwjisde(cid:12)ned,i.e.,pointssuchthatwjbelongstothedomain
`ofw.Thisimposestheconstraintthatjmustbebetweenandn(andno
`constraintoni),andgivesusthe`in(cid:12)nitestrip',fi;jj(cid:20)j<ng.
`Thedomainofxisfij(cid:0)n<ig(rememberitispaddedbyn(cid:0) zeros).
`Thedomainofxi(cid:0)jisthesetofpoints[i;j]suchthati(cid:0)jbelongstothe
`domainofx,yieldingfi;jj(cid:0)n<i(cid:0)jg.
`Theexpressionwjxi(cid:0)jistheproductofthetwosub-expressions,wj,and
`xi(cid:0)j.Itisthusde(cid:12)nedonlywherebothoperandsarepresent,andhenceits
`domainistheintersectionoftherespectivedomainsofwjandxi(cid:0)j,namely
`thedomainfi;jj(cid:20)j<n;j(cid:0)n<ig.
`Finally,thedomainoftheentirerhsistheprojectionofthedomainofwjxi(cid:0)j
`ontheiaxis,andisgivenasfij(cid:0)n<ig.Observethatthisincludesthedomain
`ofthevariabley,namelythehalflinefij(cid:20)ig,andhencetheequationis
``wellformed',i.e.thereexistsavalidde(cid:12)nitionforeverypointinthedomain
`ofy.Finally,wepreciselyde(cid:12)newhichsrescorrespondtoasystolicarray.Actu-
`ally,itisnoteasytogiveaformalde(cid:12)nitionwithoutbeingeithertoorestrictiveor
`overlygeneral.Thefollowingde(cid:12)nitioncorrespondstoarestrictiveversion.Some
`oftherestrictionsbelowcanberelaxed(yielding,whatmaynotbeane(cid:14)cientor
`practicallyviablesystolicarray).
`De(cid:12)nition..Asystolicarrayisde(cid:12)nedasasrewhere
`Allvariables,exceptpossiblyinputsandoutputs,havethesamenumberof
`dimensions.
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2040, p. 6
`
`

`

`
`EmergingTechnologies
`Alldependencyfunctions(exceptpossiblythoseinvolvinginputandoutput
`variables)areuniform.
`Eachinstanceofeachinputvariableisusedexactlyonce(typically,butnot
`exclusivelyatboundaryprocessors).
`Oneindexvariable,canbeinterpretedasthetime,i.e.,alldependencyvectors
`arestrictlynegativeinthisdimension.
`Theotherindicesareinterpretedasprocessorcoordinates.
`Thespatialcomponentsofalldependencyvectorsareinterpretedasintercon-
`nectionsbetweenprocessors.
`..SynthesisbyFormalManipulationofsres:AWalk-through
`Wenowdescribethevarioustransformationsthatcanbeappliedtoansre.
`Wewillfocusprimarilyondescribingthebasicformalmanipulationsandonen-
`suringthattheresultingsreiscomputationallyequivalenttotheoriginalone.
`Anothersubject,thatwewillnotconsiderhere,isthechoiceoftheparticular
`transformationtoapply.Asarunningexample,wewillusetheconvolutionsreof
`Eqn.,andwewillillustratehowthesreofEqns. { aresystematicallyderived
`fromit.
`SerializationofReductions
`Inthistransformationwereplacethereductionoperationsbyasequenceof
`binaryoperations.Thisisneededbecausethehardwareelementsthatoneusesto
`implementthealgorithmhavebounded`fan-in'.Thechoiceofthetransformation
`involvesthe`directionofaccumulation':eithertheincreasingordecreasingorder
`ofj,fortheconvolutionexample(recallthatjisthelocalindexintroducedbythe
`reduction),andalsothenameofthetemporaryvariableinwhichtoaccumulate
`thepartialresults.Supposeweaccumulateintheincreasingjdirection,anduse
`thenameYforournewvariable,wewillobtainthefollowingsre:
`()
`yi=Y(i;n(cid:0) )
`Y(i;j)=(cid:26)fj=;(cid:20)ig
`:wj(cid:3)xi(cid:0)j
`()
`f<j<n;(cid:20)ig:Y(i;j(cid:0) )+wj(cid:3)xi(cid:0)j
`Tounderstandwhythissreisequivalenttothereduction,andhowitcan
`bederivedautomatically,considertheexpressionwithinthereductionofEqn.,
`namelywjxi(cid:0)j.Ateachpoint[i;j]initsdomain,weassociate,inadditiontothe
`productofthewandxterms,apartialsum,whichistheaccumulationofthe
`terms`uptoj'.Thedomainoftheauxiliaryvariableisthusthesameasthat
`ofwjxi(cid:0)j.Moreover,weseethattheaccumulationcorrespondstoarecurrence
`de(cid:12)ningY(i;j)withthe(uniform)dependencyY(i;j(cid:0) ),andthisfollowsfrom
`ourchoiceoftheaccumulationdirection.TheinitializationoftheYvariable(the
`clauseinitsrecurrencewherethereisnodependencyonY)correspondstothe
`(sub)domainwherej=,andtheresultisavailableinthej=nsubdomain
`(thisisre(cid:13)ectedintheY(i;n(cid:0) )dependencyinEqn.).These`boundaries'of
`thedomainofYcanbededucedbytranslatingitby(cid:6) inthejdirectionand
`determiningthedi(cid:11)erence.
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2040, p. 7
`
`

`

`Chapter
`
`Alignment
`Inthesre({),variablesxandw,de(cid:12)nedover -dimensionaldomains,are
`usedinEqn.de(cid:12)ningY,a-dimensionalvariable.Wethereforealignxandw
`withthe-dimensionaldomainofY.Inparticular,letuschoosetoalignwwith
`thei=line,andxwiththej=line.Sincexandwareinputsofthesre,we
`willintroducetwoauxiliaryvariables,sayWandX,yieldingthefollowingsre.
`( )
`yi=Y(i;n(cid:0) )
`Y(i;j)=(cid:26)fj=;(cid:20)ig
`:W(;j)(cid:3)X(i(cid:0)j;)
`f<j<n;(cid:20)ig:Y(i;j(cid:0) )+W(;j)(cid:3)X(i(cid:0)j;)( )
`W(i;j)=(cid:8)fi=;(cid:20)j<ng:wj
`( )
`X(i;j)=(cid:8)fj=;(cid:20)ig:xi
`( )
`Wewillseelaterthatalignmentisaparticularcaseofageneraltransformation
`calledchangeofbasis.
`Localization
`Thissredoesnothaveuniformdependenciesandlocalizationenablesusto
`deriveanequivalentsure.Toillustratethistransformation,notethatalthough
`thedependencyofYonXisnotuniform,itismanytoone.Indeed,allpointsin
`thedomainofY,lyingonthestraightlinei(cid:0)j=cdependonX(c;).Moreover,
`thepoint[c;]alsoliesonthisline.Asaresult,wecanintroduceanauxiliary
`variable,Xinoursrewhosedomainisfi;jj(cid:20)j<n;(cid:20)igand`pipelinethe
`Xvaluetoallthepointswhereitisneeded'.UsingasimilarargumentforW,we
`obtainthefollowingsre.
`yi=Y(i;n(cid:0) )
`( )
`Y(i;j)=(cid:26)fj=;(cid:20)ig
`:W(i;j)(cid:3)X(i;j)
`( )
`f<j<n;(cid:20)ig:Y(i;j(cid:0) )+W(i;j)(cid:3)X(i;j)
`W(i;j)=(cid:8)fi=;(cid:20)j<ng:wj
`( )
`X(i;j)=(cid:8)fj=;(cid:20)ig:xi
`( )
`W(i;j)=(cid:26)fi=;(cid:20)j<ng:W(i;j)
`( )
`fi>;(cid:20)j<ng:W(i(cid:0) ;j)
`X(i;j)=(cid:26)fj=;(cid:20)ig
`:X(i;j)
`( )
`f<j<n;(cid:20)ig:X(i(cid:0) ;j(cid:0) )
`Thelegalityofthistransformationcanbeshownbyaninductiveargument,
`providedthatthenewpipeliningandinitializationdependency`spans'thesetof
`pointsthatdependonagivenvalue{anypointthatneedsagivenvaluecan
`bereachedbycomposingonecopyoftheinitializationdependencyandapositive
`numberofcopiesofthepipeliningdependency.
`srescanbeviewedasfunctionalprograms.Inparticular,theyarereferentially
`transparentandonecansubstituteequalsforequals.Hence,manipulationsthatone
`performsonmathematicalequations(epitomizedbyphrasessuchas\substituting
`forxandsimplifying"thatone(cid:12)ndsinmathematicaldiscourse)canequallywell
`beperformedonsres.Forexample,wecansubstitutetheusesofW(i;j)and
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2040, p. 8
`
`

`

`
`EmergingTechnologies
`X(i;j)(ontherhsofEqns. and )bytheirde(cid:12)nitions,andthenEqns.( {
` )canbedroppedsincetheyarenotusedinanyotherequation.This(followed
`bydroppingtheprimesontheremainingvariables)yieldsthesregivenbelow.
`Althoughthisseemsutterlyobvioussincewearemanipulatingsresbyhand,such
`transformationscanbeperformedautomaticallyonanysre.
`yi=Y(i;n(cid:0) )
`( )
`Y(i;j)=(cid:26)fj=;(cid:20)ig
`:W(i;j)(cid:3)X(i;j)
`()
`f<j<n;(cid:20)ig:Y(i;j(cid:0) )+W(i;j)(cid:3)X(i;j)
`W(i;j)=(cid:26)fi=;(cid:20)j<ng:wj
`( )
`fi>;(cid:20)j<ng:W(i(cid:0) ;j)
`X(i;j)=(cid:26)fj=;(cid:20)ig
`:xi
`()
`f<j<n;(cid:20)ig:X(i(cid:0) ;j(cid:0) )
`Changeofbasis
`Finally,wemayapplyareindexingtransformation(alsocalledachangeof
`basisoraspace-timemapping)tothevariablesofansre.Thetransformation,T
`mustadmitaleftinverseforallpointsinthedomainofthevariable.Whenapplied
`tothevariableXofansrede(cid:12)nedasfollows:
`X(z)=>><>>:
`...
`DXi
`gi(:::Y(f(z)):::)
`:
`...
`weobtainanequivalentsreasfollows:
`ReplaceeachDXibyT(DXi),itsimagebyT.
`IntheoccurrencesofavariableontherhsoftheequationforX,replacethe
`dependencyfbyf(cid:14)T(cid:0) ,thecomposition offandT(cid:0) .
`InalloccurrencesofthevariableXontherhsofanyequation,replacethe
`dependencyfbyT(cid:14)f.
`TheoccurrencesofXontherhsoftheequationforXitselfconstituteaspecial
`casewherethelasttworulesarebothapplicable,andwereplacethedependency
`fbyT(cid:14)f(cid:14)T(cid:0) .Similartransformationrulescanbedevelopedforsreswith
`reductionoperations,butarebeyondthescopeofthisoverview.
`Forourexample,letuschooseatransformationT(i;j)=i+j;j,whichmaps
`thepoint[i;j]to[i+j;j].Here,T(cid:0) (j;k)=(i(cid:0)j;j)istherequiredinverse.Let
`usapplyittoallthevariables,X,WandY.Thisgivesusthefollowingsre:
`yi=Y(i+n(cid:0) ;n(cid:0) )
`( )
`Y(i;j)=(cid:26)fj=;j(cid:20)ig
`:W(i;j)(cid:3)X(i;j)
`f<j<n;j(cid:20)ig:Y(i(cid:0) ;j(cid:0) )+W(i;j)(cid:3)X(i;j)()
`W(i;j)=(cid:26)fi=j;(cid:20)j<ng:wj
`()
`fi>j;(cid:20)j<ng:W(i(cid:0) ;j)
` Recallthatfunctioncompositionisrightassociative,i.e.,(g(cid:14)h)(z)=g(h(z)).
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2040, p. 9
`
`

`

`Chapter
`
`X(i;j)=(cid:26)fj=;j(cid:20)ig
`:xi
`()
`f<j<n;j(cid:20)ig:X(i(cid:0);j(cid:0) )
`Wenowseethatifwesimplyrenamethenewindicesastandp,weobtain
`thesreofEqns.( { )whichdescribesthearrayofFig.. .Asmentionedabove,
`thetransformation,Tmustadmitanintegralleftinverse(forallpointsinthe
`domain),andhenceitisrestrictedtowhatarecalledunimodulartransformations.
`Often,onemaydesiretoapplyothertransformations.Aparticularexampleis
`non-unimodularnon-singularintegertransformations.Thisleadstosreswhose
`domainsare\sparse"(eg.allevenpointsfrom ton),butadetaileddiscussionis
`beyondthescopeofthischapter.
`Weemphasizethatwhatwehavepresentedfortheconvolutionexampleisa
`particularseriesoftransformations.Ingeneral,thetransformationscanbeapplied
`indi(cid:11)erentorders,andindeed,in(cid:12)nitelymanyequivalentsrescanbederivedfrom
`theoriginalspeci(cid:12)cation.
`.. Scheduling
`Wehavesofardiscussedtheformalmanipulationsfortransformingsres
`andobtainingprovablyequivalentsres.Wenowaddressthesecondquestion,
`whattransformationtoapply.Ingeneral,theanswertothisquestionconsistsof
`twointerrelatedparts:thechoiceofascheduling,andthechoiceofaprocessor
`allocation.
`Schedulingtechniquesarebasedonrepresentingeachsystemofrecurrence
`equationswithdependencegraphs,de(cid:12)nedasfollows.Thedependencegraph(or
`reduceddependencegraph)G=(VG;EG)ofasreisthegraphwhoseverticesVG
`arethevariablesofthesystem,andthereexistsanarceEGbetweentwovertices,
`YandXifandonlyifthereexistsanequationwhereXistheleft-handside,and
`Yisanargumentoftheright-handside.Thearcsofthisgrapharelabeledwith
`informationdescribingthedependencies.Forexample,inthesystemofEqns.( {
`),thenodesofthedependencegraph(shownin(cid:12)gure.)arey,x,w,Y,X,
`andW.ThereexistsanarcfromYtoX,sincethede(cid:12)nitionofYdependsonthat
`ofX.
`Schedulingsures
`sures(SystemsofUniformRecurrenceEquations,seeDef... )areofpar-
`ticularinterestinordertostudyscheduling.
`Inasure,thedependenciescan
`berepresentedbyn-dimensionalvectorsgivingthetranslationbetweentheindex
`pointsofdependentinstancesoftwovariables.Thus,dependenciescanbesumma-
`rizedinadependencematrixDwhosecolumnsarethedependencevectorsofthe
`sure.IntheexampleofEqns( {),ifweconsideronlyvariableswhichareneither
`inputs(x,andw)noroutput(i.e.,y),wecanseethatthesystemisuniform:Y[i;j]
`dependsonX[i;j],W[i;j]andY[i;j(cid:0) ],X[i;j]dependsonX[i(cid:0) ;j(cid:0) ],and
`(cid:12)nally,W[i;j]dependsonW[i(cid:0) ;j].Wenoticealsothatthedependencybetween
`Y[i;j]andX[i;j]issomehow\arti(cid:12)cial",inthesensethatY[i;j]dependsinfact
`Amatrixissaidtobeunimodularifitsdeterminantis(cid:6) .Henceanintegralunimodular
`matrixadmitsanintegralinverse.
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2040, p. 10
`
`

`

`EmergingTechnologies
`
`y
`
`Y
`
`
`
`W
`
`w
`
`X
`
`x
`
`Figure.DependencegraphassociatedwiththesystemofEqns.( {)
`onX[i(cid:0) ;j(cid:0) ],throughthede(cid:12)nitionofX.AsimilarremarkshowsthatY[i;k]
`depends,infact,onW[i(cid:0) ;k].
`Thus,thefollowingdependencematrixsummarizesalltheinformationwhich
`isneededtoschedulethissystemofequations:
`
`D=(cid:18) (cid:19):
`Thedependencegraphcanbeanalyzedviatheso-calleddependencerelation.
`ConsiderthesetIofpairs(X;x),whereXisavariable,andxDX.Suchapair
`iscalledaninstanceinthefollowing.Giventwoinstances(X;x)and(Y;y)wesay
`that(X;x)dependsdirectlyon(Y;y),denotedby(X;x) (Y;y),ifandonlyif
`thereexistsaninstanceofequationX(x)=f[:::;Y(y);:::].Therelations\depend
`intstepson"(X;x)t (Y;y),and\dependon"(X;x) (Y;y)arede(cid:12)nedinthe
`usualway,bytransitivity.Theextendeddependencegraphofthesare,denoted
`(cid:0),isthegraphofthedependencerelation(seeforexampleFig.. ).Itdescribes
`exactlythesetofvariableinstancestobecalculatedinordertocomputethesare.
`Ascheduleisamappingt:I!Zsuchthat
` .(X;x) (Y;y))t(X;x)>t(Y;y)(causalitycondition).
`.t(X;x)(cid:21),(X;x)I(positivitycondition).
`Inthisde(cid:12)nition,thecausalityconditionguaranteesthatdependentcomputa-
`tionsareperformedinaconsistentorder,whereasthepositivityconditionensures
`thatwecansetastartingtimeforthecomputations.
`Observethataschedulingfunctionmaynotexist.Thisisthecase,forex-
`ample,whentheextendeddependencegraphcontainsacycle.Thismotivatesthe
`computabilityanalysisofrecurrenceequations,whichisdirectlyrelatedtoschedul-
`ing.AvariableXissaidtobecomputable{or,explicitlyde(cid:12)ned{ifandonlyif
`forallxD,thereexistsaschedulingfunctiontsuchthatt(X;x)<+ .
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2040, p. 11
`
`

`

` 
`
`Chapter
`
`y
`
`j
`y
`y
`y
`y
`y
`y
`y
`w ww
`w
`i
`x
`x
`x
`x
`x
`x
`x
`x
`Figure. Extendeddependencegraphforthesureoftheconvolutionalgo-
`rithm,Eqns.( {),forn= .
`Thefollowingtheorem(duetoKarp,MillerandWinograd[ ]),setsaframe-
`workforthestudyofscheduling,inthecaseofasinglerecurrenceequation:
`Theorem Givenauniformrecurrenceequation
`X(z)=f[X(z(cid:0)d );:::;X(z(cid:0)ds)];
`()
`de(cid:12)nedonthedomainD=Nn,thefollowingconditionsareequivalent:
` .X(z)iscomputable.
`.Thereexistsnosemi-positivevector (u ;:::;us)suchthat
`sXj= (cid:0)ujdj(cid:21)
`:
` .If(cid:28)=((cid:28) ;:::;(cid:28)n)isavectorofZn,thesystemofinequalitiesdj:(cid:28)(cid:21) ,
` (cid:20)j(cid:20)s,and(cid:28)i(cid:21), (cid:20)i(cid:20)nhasasolution.
`.ForallzNn,thefollowinglinearprogramshaveanoptimalcommonvalue
`m(z):
`II<:(cid:28)i(cid:21); (cid:20)i(cid:20)n;
`I<:z(cid:0)Psj= ujdj(cid:21);
`dj:(cid:28)(cid:21) ; (cid:20)j(cid:20)s;
`uj(cid:21); (cid:20)j(cid:20)s;
`maxPsi= uj
`minz:(cid:28)
` Avectorissaidtobesemi-positiveifitisnonnullandallitscomponentsarenonnegative.
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2040, p. 12
`
`

`

`
`EmergingTechnologies
`Condition hasasimplegeometricinterpretation:theextremitiesofvectors
`di, (cid:20)i(cid:20)smustallbestrictlyonthesamesideofthehyperplane(cid:28):z=,as
`showninFig...
`j
`d
`(cid:28)
`i
`d
`t
`Figure.Illustrationoftheorem .Pointsareindexedbyiandj,anddepen-
`dencevectorared andd.Thevector(cid:28)isnormaltotheiso-temporallines.
`Moreover,the(cid:28)vectorforwhichthesolutionofthelinearprogramIIis
`obtainedde(cid:12)nesalinearschedule.Inotherwords,wecancomputeX(z)attime
`(cid:28):z.Fig..depictsatwodimensionalcase,wherethedashedlines(oftencalled
`iso-temporallines)representcalculationsdoneatsuccessiveinstantstandt+(cid:28):di.
`Inpractice,oneisofteninterestedinlinear(ora(cid:14)ne)schedules,asthen,
`thevelocityofthedataisconstant,andtheimplementationofthecorresponding
`parallelprogramiseasier.
`Letusillustratetheapplicationofthistechniquetotheconvolutionexample
`ofEqns.( {).Asfarasschedulingisconcerned,wemayconsiderthedepen-
`dencyinformationofthissystemasbeingequivalenttothatofthefollowingsingle
`equation. @Y(i;k)W(i;k)
`X(i;k) A=g(Y(i;j(cid:0) );W(i(cid:0) ;j);X(i(cid:0) ;j(cid:0) ))
`()
`Applyingcondition ofTheorem leadstothefollowinginequalities:
`(cid:28) (cid:21)
`(cid:28)(cid:21)
`(cid:28) +(cid:28)(cid:21)
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2040, p. 13
`
`

`

`Chapter
` 
`Avalidscheduleisthereforet(i;j)=i+j.
`Linearschedules
`Inpractice,systemsofrecurrenceequationsareseldomsures,andmoreover,
`thereisnoreasonwhyequationsarealignedinsuchawaythatacommonschedule
`exist,e.g.,onemayhavetoaligntwovariablesinoppositedirectionsinorderto
`beabletocomputethem.Forthesereasons,onehastoassociatedi(cid:11)erentlinear
`schedulestodi(cid:11)erentvariables.
`Considerourstandardsre(Def... )parameterizedbyanintegerp
`X(z;p)=>><>>:
`...
`DXi
`:
`gi(:::Y(f(z;p)):::)
`...
`wheref(z)=Az+Bp+C.Toeachedgeeofthereduceddependencegraph
`GdescribingthedependencebetweenvariablesYandXofthisequation,letus
`associatethedomainDXi,andtheindexfunctionf.Theedgeecanthereforebe
`isasub-domainofDXwhere
`representedbya-tuple(Y;X;DXi;f),whereDXi
`thedependencee(cid:11)ectivelyoccurs.
`De(cid:12)nition..(A(cid:14)nebyvariableparameterizedschedule)Ana(cid:14)neby
`variableparameterizedschedulehastheform
`tX(z)=(cid:28)Xz+(cid:27)Xp+(cid:11)X
`( )
`where(cid:28)Xand(cid:27)Xare(cid:12)xedvectorswithrationalcoordinates,and(cid:11)Xisarational
`number.Forthesakeofsimplicity,wewillconsideronlyrationaltimingfunctions.All
`thefollowingresultscanbeextendedtothecasewheretX(z)=b(cid:28)Xz+(cid:27)Xp+(cid:11)Xc
`withoutdi(cid:14)culty.
`Def. assumesthatthescheduledependslinearlyontheparameter.The
`problemistocharacterizethevaluesof(cid:28)X,(cid:27)Xand(cid:11)Xsuchthatthecausalityand
`positivityconditionsaresatis(cid:12)ed,andthentoselectoneparticularsolutionsoas
`tooptimizesomequalitycriterionlikemaximalthroughputorminimumlatency.
`Inordertosatisfythecausalityconditionforeachedge(Y;X;Di;f)ofEG
`onemustsatisfythecondition:
`zDXi
`:tX(z)(cid:21)tY(I(z))+ :
`( )
`Theabovede(cid:12)nitionoftiming-functionmapsdirectlytothestructureoftheequa-
`tions:itisimplicitlyassumedthattheevaluationofthesystemistobeperformed
`onanarchitecturewherethecombinationaloperatorsareseparatedbyatleastone
`delay.However,thishypothesisexcludespracticalcaseswhenonemaywishto
`cascadeseveralcombinationalelements.Changingthewaydelaywillbemapped
`isnotaproblemprovidedtheinformationisknownstatically.Therealproblemis
`toensurethatcondition( )issatis(cid:12)edonandonlyondomainDXi.
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2040, p. 14
`
`

`

` 
`EmergingTechnologies
`Whenonesubstitutesappropriatevaluesforzin( ),oneobtainssystems
`oflinearinequalitiesinthecoe(cid:14)cientsoftXandtY.Thissituationissimilarwith
`thepositivityconstraints,tX(z)(cid:21).Theproblemisthatthereisaverylarge
`number(perhapsanin(cid:12)nity)ofvaluesofzandI(z)whichresultinalargenumber
`ofinequalities.Weneedtoexpressalltheseinequalitiesusinga(cid:12)nitedescription.
`Therearetwobasicmethodsforthat.Oneusesafundamentalresultofthe
`theoryoflinearinequalities,thea(cid:14)neformofFarkas'lemma[],andiscalledthe
`Farkasmethod.Theotherone,calledthevertexmethod,usesthedualrepresenta-
`tionofpolyhedra,andissummarizedasfollows.
`Therearetwowaysofdescribingapolyhedron.Oneisasthesetofpoints
`whichsatisfya(cid:12)nitesetoflinearinequalities,andtheotheroneisastheconvex
`hullofa(cid:12)nitesetofpoints.Foreachpolyhedron,thereexistsa(cid:12)nitesetofpoints,
`calleditsvertices,whichtogethergeneratethepolyhedron,noneofthembeing
`aconvexcombinationoftheother.Suchasetofpointsisaminimalgenerating
`system.Ifthepolyhedronisunbounded,someoftheverticeswillbeatin(cid:12)nity,
`theyarecalledrays(arayisanin(cid:12)nitedirectioninthepolyhedron).Thereare
`standardalgorithmsforgoingfromonerepresentationtotheotherone[].
`Thekeyideabehindthevertexmethodisthefactthatalinearinequality
`issatis(cid:12)edatallpointsofaconvexpolyhedronifandonlyifitissatis(cid:12)edatall
`pointsofthegeneratingsystem(whichis(cid:12)nite).Hencethemethod:(cid:12)ndgenerating
`systemsforthepolyhedraDXi,write( )ateachofthesepoints,andsolveforthe
`unknowncoe(cid:14)cients.Themethodissummarizedasfollows:
`ComputeageneratingsystemforthepolyhedronDofanyedge(Y;X;D;f)
`ofthedependencegraph,andforallpolyhedraDX,XV.
`Writeinstancesof( )atallverticesofpolyhedraDofeachedge.
`WriteinstancesoftX(z)(cid:21)atallverticesandraysofDX,
`solvetheresulting(cid:12)nitesystemoflinearinequalitiesbyoptimizinganap-
`propriatecriterion.Forexample,minimizingthetotalcomputationtimefor
`asystemofequationscanbeachievedbywritingthetotaltimeasalinear
`functionoftheparameters,andexpressingthatthisfunct

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket