LAT&T
`AT&TLabsResearch
`subject:DynamicReplicationontheInternet
`date:March,
`WorkProjectNo. - -
`from:MichaelRabinovich
`Dept.HA
`FPA
` - -
`misha@research.att.com
`HA - - TM
`IrinaRabinovich
`TheDunandBradstreet
`Corp.Rabinovich@mail.dbs.com
`RajmohanRajaraman
`DIMACSCenter,Rutgers
`University
`rraj@dimacs.rutgers.edu
`TECHNICALMEMORANDUM
`ThispaperproposesaprotocolsuitefordynamicreplicationandmigrationofInternet
`objects.Itconsistsofanalgorithmfordecidingonthenumberandlocationofobject
`replicasandanalgorithmfordistributingrequestsamongcurrentlyavailableobject
`replicas.Ourapproachattemptstoplacereplicasinthevicinityofamajorityofrequests
`whileensuringatthesametimethatnoserversbeoverloaded.Therequestdistribution
`algorithmusesthesamesimplemechanismtotakeintoaccountbothserverproximity
`andload,withoutactuallyknowingthelatter.Thereplicaplacementalgorithmexecutes
`autonomouslyoneachnode,withouttheknowledgeofotherobjectreplicasinthe
`system.
`Theproposedalgorithmsrelyontheinformationavailableindatabasesmaintainedby
`Internetrouters.Asimulationstudyusingsyntheticworkloadsandthenetworkback-
`boneofUUNET,oneofthelargestInternetserviceproviders,showsthattheproposed
`protocolise(cid:11)ectiveineliminatinghotspotsandachievesasigni(cid:12)cantreductionofback-
`bonetra(cid:14)candserverresponsetimeattheexpenseofcreatingonlyasmallnumberof
`extrareplicas.
`
`1
`
`Petitioner Limelight - LN1005
`
`
`AT&TLabsResearch
`subject:DynamicReplicationontheInternet
`date:March,
`WorkProjectNo. - -
`from:MichaelRabinovich
`Dept.HA
`FPA
` - -
`misha@research.att.com
`HA - - TM
`IrinaRabinovich
`TheDunandBradstreet
`Corp.Rabinovich@mail.dbs.com
`RajmohanRajaraman
`DIMACSCenter,Rutgers
`University
`rraj@dimacs.rutgers.edu
`TECHNICALMEMORANDUM
`ThispaperproposesaprotocolsuitefordynamicreplicationandmigrationofInternet
`objects.Itconsistsofanalgorithmfordecidingonthenumberandlocationofobject
`replicasandanalgorithmfordistributingrequestsamongcurrentlyavailableobject
`replicas.Ourapproachattemptstoplacereplicasinthevicinityofamajorityofrequests
`whileensuringatthesametimethatnoserversbeoverloaded.Therequestdistribution
`algorithmusesthesamesimplemechanismtotakeintoaccountbothserverproximity
`andload,withoutactuallyknowingthelatter.Thereplicaplacementalgorithmexecutes
`autonomouslyoneachnode,withouttheknowledgeofotherobjectreplicasinthe
`system.
`Theproposedalgorithmsrelyontheinformationavailableindatabasesmaintainedby
`Internetrouters.Asimulationstudyusingsyntheticworkloadsandthenetworkback-
`boneofUUNET,oneofthelargestInternetserviceproviders,showsthattheproposed
`protocolise(cid:11)ectiveineliminatinghotspotsandachievesasigni(cid:12)cantreductionofback-
`bonetra(cid:14)candserverresponsetimeattheexpenseofcreatingonlyasmallnumberof
`extrareplicas.
`
`1
`
`Petitioner Limelight - LN1005
`
`
`
`
`DynamicReplicationontheInternet
` Introduction
`Replication,ormirroringinInternetparlance,iscommonlyusedtoaddressascalabilityproblemof
`popularInternetsites.Currently,mirroringisdonemanuallybysystemadministrators,whomonitor
`thedemandforinformationontheirsitesanddecidewhatdatashouldbereplicatedandwhere.
`Thisdauntingtaskisespeciallydi(cid:14)cultandimportantforagrowingnumberofcompaniesthat
`o(cid:11)erahostingservice,i.e.,theserviceofmaintainingandprovidingaccesstoobjectsbelongingto
`third-partyinformationproviders.Asthescaleofthesesystemsincreases(intermsofthenumber
`ofobjectsandhostingservers),thedecisionspaceforreplicaplacementincreaseswhileabrute-
`forceworstcasedesignbecomesprohibitivelyexpensive.Withoutappropriatenewtechnology,
`systemadministrationrelatedtoobjectplacementmaybecomeafactorlimitingthescaleofhosting
`platforms.
`Inthispaper,weproposeasuiteofalgorithmsthatdynamicallyreplicateInternetobjectsin
`responsetochangingdemand.Inadditiontoautomatingtheplacementdecisionsbysystemadmin-
`istrators,automaticdynamicreplicationwouldmakethesystemmoreresponsivetochangesinthe
`demandpatterns.
`Twofundamentalissuesanydynamicreplicationsystemmustaddressaredecidingonthenumber
`andplacementofreplicasanddistributingrequestsamongobjectreplicas.Inourprotocol,replica
`placementdecisionsaremadebyeachnodeautonomously,basedontherequestsservicedbythe
`nodeandwithoutanyknowledgeofotherreplicasofthesameobjectsthatmightexist.Infact,
`eachnodehasverylimitedknowledgeaboutotherhoststhatexistinthesystem.Thisisessential
`fortheintendedscaleofthesystem.Forinstance,thismakesaddingordeletinghostsinthesystem
`alocalizedadministrativeaction.
`Ourrequestdistributionalgorithmtakesbothserverloadandproximityintoaccountwithout
`knowingtheactualloadoftheservers.Toillustratetheproblem,asimpleround-robinrequest
`distributionwoulddistributetheloadamongallreplicasbutwouldbeoblivioustotheproximityof
`requesterstoservers.Ontheotherhand,alwaysdirectingrequeststotheclosestreplicawouldcreate
`problemswhenaserverisswampedwithrequestsoriginatingfromitsclosevicinity:nomatterhow
`manyadditionalreplicastheservercreates,allrequestswillbesenttoitanyway.
`Someexistingprotocolsaddressthisproblembycollectingloadreportsfromhostingserversand
`weighingserverloadsintotheproximity-basedrequestdistribution.Thisapproachinnotwellsuited
`fordynamicreplicationonaglobalscale(whichisevidencedbythefactthatproductsfollowingthis
`approach,e.g.,CISCO'sLocalDirector,assume(cid:12)xedreplicasets).First,inthesystemofthisscale,
`therequestre-directionsubsystemwillbehighlydistributed.Therefore,eachhostingserverwould
`havetosenditsloadreporttoalargenumberofredirectingservers,increasingthenetworktra(cid:14)c
`andthedelaysinreachingalltheredirectors.Second,andmoreimportantly,requestdistribution
`foragivenobjectbecomesdependentonthepopularityofmanyotherobjectsco-locatedatthe
`samehostingservers,greatlycomplicatingautonomousreplicaplacementdecisions.
`Inourprotocol,asimpledeterministicrequestdistributionalgorithmallowsahostingserverto
`deriveboundsonthenumberofrequestsgoingtopotentialnewreplicasbasedonlocalknowledge
`only.Usingtheseboundshastwomainadvantages.First,itallowsahosttodecideonobject
`migrationandreplicationautonomously.Second,itenablesobjectrelocationtobedoneenmass,
`withoutwaitingfornewaccessstatisticsaftereachmove.Withoutthis,asystemofourintended
`scalewouldbehopelesslyslowinadjustingtodemandchanges.
` . PastWork
`Thereisagrowingnumberofcommercialproductsthato(cid:11)ertransparentloadbalancingamong
`multipleInternetsites[ ,, ].Theydi(cid:11)erinthenetworklevelwhereindirectionofrequests
`
`2
`
`
`
`
`DynamicReplicationontheInternet
`tophysicalreplicasoccur:CISCO'sDistributedDirectorperformsre-directionattheDNSlevel(a
`similarideaisusedin[ ]),whileIBM'sNetDispatcherandCISCO'sLocalDirectorre-directat
`thefront-endrouterlevel,andWinddance'sWebChallengerdoesitattheapplicationlevelusing
`redirectionfeaturesoftheHTTPprotocol.Noneoftheseproductso(cid:11)erdynamicreplicationor
`migrationofreplicas.
`Anup-comingproductWebRepeaterfromtheSandpiperstart-upcompany[ ]redirectsevery
`requesttotheclosesthost,andreplicatestheobjecttothishost.Thehopeisthattherewillbe
`otherrequestsfromthevicinityofthathost,whichwouldamortizereplicationcosts.Suchaggressive
`replicationmayresultinsigni(cid:12)cantperformancedegradationwhentheaboveassumptiondoesnot
`hold.Anevenmoreaggressivereplicationapproachisdescribedin[ ]
`inthecontextof\push
`caching",whereasingleobjectaccessresultsincreatingcopiesoftheobjectatmultipleproxy
`caches.Muchofexistingworkondynamicreplicationhasconcentratedonquorum-basedreplicaman-
`agement.Thisworkfocusesonalgorithmsforadjustingquorumswhenreplicasetschange(e.g.,
`[ , , ])andonregeneratingobjectreplicasinplaceoffailedones(e.g.,[]).Incontrast,our
`workemploysreplicationandmigrationforperformance,andassumesthatasynchronousupdate
`propagationandnotquorumsisusedforconsistency.
`Existingprotocolsforperformance-motivateddynamicreplicationrelyonassumptionsthatare
`unrealisticintheInternetcontext.Wolfsonetal[]proposeanADRprotocolthatdynamically
`replicatesobjectstominimizecommunicationcostsduetoreadsandwrites.Giventhatmostof
`Internetobjectsarerarelywritten ,thisisnotasuitablecostmetricfortheInternet.Inaddition,
`theprotocolimposeslogicaltreestructuresonhostingserversandrequiresthatrequeststravel
`alongtheedgesofthesetrees.Becauseofamis-matchbetweenthelogicalandphysicaltopology,
`andespeciallybecauseeachnodeonthewaymustinterprettherequesttocollectstatistics(which
`requiresinpracticeaseparateTCPconnectionbetweeneachpairofnodes),thiswouldresultin
`impracticallyhighdelayofrequestpropagation.
`HeddayaandMirdad'sWebWavedynamicreplicationprotocol[ ]wasproposedspeci(cid:12)callyfor
`theWeb.However,itburdenstheInternetrouterswiththetaskofmaintainingreplicalocationsfor
`WebobjectsandinterceptingandinterpretingrequestsforWebobjects.Italsoassumesthateach
`requestarrivesinasinglepacket.Astheauthorsnote,thisprotocolcannotbedeployedintoday's
`networks.
`Algorithmically,bothADRandWebWavedecideonreplicaplacementbasedontheassumption
`thatrequestsarealwaysservicedbytheclosestreplica.Therefore,neitherprotocolallowsload
`sharingwhenaserverisoverloadedwithrequestsfromitsvicinity.Objectsarereplicatedonly
`betweenneighborservers,whichwouldresultinhighdelaysandoverheadsforcreatingdistant
`replicas,acommoncaseformirroringontheInternet.Also,ADRrequiresreplicasetstobe
`contiguous,makingitexpensivetomaintainreplicasindistantcornersoftheglobalnetworkeven
`ifinternalreplicasmaintainonlycontrolinformation.
`TheworksofBestavros[]andBestavrosandCunha[]appeartobethepredecessorsof
`[]proposestoreducenetworktra(cid:14)cwithinanintranetbycachingorganization's
`WebWave.
`popularobjectsclosetotheintranet'sentrypoint.
`Inourcontext,anybackbonenodeinthe
`hostingplatformcanbesuchanentrypoint.Ourprotocolsaddresstheproblemsofchoosingentry
`pointsatwhichtoplaceobjectreplicasandallocatingrequeststothosereplicas.Thesequestions
`arenotconsideredin[].In[],BestavrosandCunhadiscussthebene(cid:12)tsofreplicatingpopular
`objectsfromthehostserveruptherequesttree.Noalgorithmsaredescribed.
`Baentschetal[]proposeaninfrastructureforperformingreplicationontheWeb,without
`describingalgorithmsfordecidingonreplicasets.Also,theinfrastructureassumesgraduallearning
` Recenttracestudies(e.g.,[])consistentlyshowthat %ofrequestsaretostaticobjects,andmanyofthe
`remainingobjectsaredynamicallygeneratedresponsestoread-onlyqueries.
`
`3
`
`
`
`
`DynamicReplicationontheInternet
`ofthereplicasetbyclients,whichmayhurttheresponsivenessofthesystem.Ourprotocolis
`complimentarytothiswork.GwertzmanandSeltzer[ ]motivatetheneedforgeography-based
`objectreplication.Theyproposetobasereplicationdecisionsonthegeographicaldistance(inmiles)
`betweenclientsandservers.Thismeasuremaynotcorrectlyre(cid:13)ectcommunicationcostsforfetching
`anobject,sincethenetworktopologyoftendoesnotcorrespondtothegeographicaldistances.
`Theproblemofplacingobjectsintheproximityofrequestingclientshasalsobeenaddressedin
`researchon(cid:12)leallocation(see[]
`forearlysurveyand[ ,,]
`formorerecentwork).Earlywork
`inthisareaassumesacentralpointwheredecisionsonobjectplacementaremadebysolvingan
`integerprogrammingoptimizationproblem.Evenwhenthesearchspaceisheuristicallypruned,the
`scaleofourapplicationwouldmakesuchapproachesimpractical.Also,thisapproachrequiresthe
`decision-makingpointtohavecompleteinformationonnetworktopology,serverloads,anddemand
`patterns.However,itwouldbeaninterestingquestionforfutureworktoseehowmuchworsethe
`performanceofourprotocoliscomparedtotheoptimalplacementobtainedbysolvingtheglobal
`integerprogrammingoptimizationproblem.
`Morerecently,theproblemofobtainingdistributedsolutionsforthe(cid:12)leallocationhasbeen
`addressed[ ,,].In[],Awerbuch,Bartal,andFiatdesignadistributed(cid:12)leallocationprotocol
`andusetheframeworkofcompetitiveanalysis[ ]toshowthattheirprotocolisnearlyoptimalin
`termsoftotalcommunicationcostandstoragecapacityofthenodes.However,theydonotaddress
`theissueofloadbalancingamongdi(cid:11)erentservers.Moreover,whiletheirworkissigni(cid:12)cantfroma
`theoreticalstandpoint,issuesconcerningimplementationoftheirprotocolovertheInternetarenot
`addressed.
` .OurContributions
`Ourmaincontributionsareasfollows.
`(cid:15)WepresentanewprotocolfordynamicreplicationontheInternetbyintegratinganalgorithm
`fordecidingonthenumberandplacementofobjectreplicaswithanalgorithmfordistributing
`requestsamongreplicas.Therequestdistributionalgorithmusesthesamesimplemechanism
`totakeintoaccountboththeproximityofrequeststoserversandserverload,withoutactually
`knowingthelatter.Thereplicaplacementalgorithmreliesontheknowledgeofhowrequests
`wouldbedistributedamongpotentialreplicasinmakingplacementdecisionsautonomously
`oneachnode,withoutevenknowingoftheexistenceofotherreplicas.
`(cid:15)OurprotocolreliesonpracticalassumptionsderivedfromtheexistingInternettechnology.
`WeuseinformationavailablefromroutingdatabasesandIPheaderstodrivetheprotocol.No
`newfunctionalityisrequiredfromtherouters.
`(cid:15)Systemresponsivenesstochangesindemandpatternsisoneoftheexplicitdesigngoals.
`Responsivenessisimprovedintwoways.First,unlikeexistingwork,ourprotocolperforms
`replicationbetweendistantnodesdirectly,withoutreplicatingwithallintermediatenodes
`onthepath.Second,basedonboundsontheloadofthenewobjectreplicas,theprotocol
`relocatesmultipleobjectsatonce,withoutwaitingfornewaccessstatisticsaftereachmove.
`Giventheintendedscaleofthesystem,aprotocolwithoutthisfeaturecouldbesoslowthat
`bythetimeitarrivestoadesiredreplicationscheme,thedemandpatternmaychangeagain.
`(cid:15)Theprotocolemploysinformationhidingtokeepthesystemmanageabledespiteitsscale.It
`usesthehierarchicaldivisionintoadministrativeunits(autonomoussystemsandOSPFareas)
`supportedbytheInternetforthispurpose.Inourprotocol,eachadministrativeunithasa
`representative(calledareplicator).Eachreplicatormaintainsa(cid:12)xedamountofinformation
`
`4
`
`
`
`
`DynamicReplicationontheInternet
`pereachchildreplicator.Ahostknowsonlyaboutotherhostswithinitslowest-leveladmin-
`istrativeunit,thereplicatorsonthepathfromitselftotherootofthereplicatortree,and
`thesiblingsofthosereplicators.Thisarrangementlocalizeshostadditionsanddeletionsand
`reducesnetworktra(cid:14)cfor(cid:12)ndinglower-loadedhostswheno(cid:15)oadingofahostisneeded.
`AsimulationstudyusingsyntheticworkloadsonthebackbonetopologyofUUNET,oneofthe
`largestcommercialInternetServiceProviders,showsthatourprotocolise(cid:11)ectiveineliminating
`hotspotsamongserversandachievesasigni(cid:12)cantreductionofbackbonetra(cid:14)candserverresponse
`timeattheexpenseofcreatingonlyasmallnumberofextrareplicas.Theprotocolalsoshowed
`acceptableresponsivenesstochangesindemandpatterns.
`TheSystemModel
`WeassumetheInternetenvironment[ ],whereindividualdestinations(nodesandnetworks)are
`organizedintoOSPFareas(calledbelowjustareasforshort),whichintheirturnareorganizedinto
`autonomoussystems,ASs(seeFigure ).Thereisalsoaproposaltoorganizeautonomoussystems
`intofederations.Ourprotocolextendsdirectlyformorelevelsofthehierarchy.Therearetwokinds
`ofnodes-routers,whichforwarddatapackets,andprocessingnodes.
`Weconsiderahostingsystemthatmaintainsandprovidesaccesstoobjects.Weassumethat
`thehostingplatformisdistributedgloballyacrossmultipleASs.Inparticular,amessagebetween
`twonodesbelongingtothesystemmaytraversethird-partyautonomoussystemsenroute.As
`anexample,onFigure ,thehostingsystemcontainsautonomoussystemsAS ,AS,andAS .
`AutonomoussystemAS comprisesareasAR ,AR,andAR .AmessagefromnodesinAS to
`nodesinASmusttravelthroughoutsideroutersr andr.
`Infact,hostingsystemsareoften
`providedbyInternetserviceproviders(ISP),inwhichcasethehostingplatformmaybecontained
`inasingleautonomoussystem.
`Nodes,areasandASsbelongingtothehostingsystemwillbecalledinternalnodes,areasand
`ASs.Othernodes,areasandASswillbereferredtoasexternal.Internalnodesthatmaintainobjects
`andserviceclientrequestsarecalledhostingserversorhostsforshort.Forsimplicity,wewillassume
`homogeneoushosts.Heterogeneitycouldbeintroducedbyincorporatingintotheprotocolweights
`correspondingtorelativepowerofhosts.
`. HostProximity
`Weabstractthenotionofhostproximitybyde(cid:12)ningproximityfunctions.Inthefollowingde(cid:12)nition,
`thenotionof\closeness"re(cid:13)ectscommunicationdistance(communicationcosts/delay)accordingto
`theroutingprotocolused.Notethatwedonotassumethatthevaluesofthesefunctionsarealways
`known.Wewillstateourassumptionsaboutwhatinformationisavailablelater.
`De(cid:12)nition (Proximityfunctions)
`(cid:15)ForasetofinternalnodesSandagivenInternet
`nodec,functionbestNode(S;c)de(cid:12)nesthesetofallnodesfromSthatarethe\closest"toc;
`(cid:15)ForagiveninternalautonomoussystemAandanodecwithinthisAS,functionbestArea(A;c)
`de(cid:12)nesthesetofallareaswithinAthatarethe\closest"toc.Weassumethatanynodein
`bestArea(A;c)isclosertocthananynodeinAoutsidebestAS(A;c).Thisassumptionholds
`forsocalledbroadcastareas(e.g.,basedontheEthernetlocalareanetwork)sincecommunica-
`tionbetweenanynodesintheareahasthesamecost.Forpoint-to-pointareasthisassumption
`byitselfmaynothold,inwhichcaseourprotocolsmayarriveatsuboptimalreplicaplacement.
`Topreventthis,thenetworkadministratorshouldorganizeinternalautonomoussystemsinto
`
`5
`
`
`
`
`DynamicReplicationontheInternet
`areasinsuchawaythatcommunicationwithinareasisnevermuchhigherthancommunication
`betweenareas.
`(cid:15)ForasetofinternalautonomoussystemsAandagivenInternetnodec,functionbestAS(A;c)
`de(cid:12)nesthesetofallautonomoussystemsfromAthatarethe\closest"toc.Weassumethat
`anynodeinbestAS(A;c)isclosertocthananyinternalnodeoutsidebestAS(A;c).This
`assumptionagreeswiththewayInternetroutingprotocolsconsidercosts.
`WediscussproximityfunctionsforInternetroutingprotocolsinSection.Notethatthevalues
`ofproximityfunctionsaresetsof,respectively,nodes,autonomoussystems,andareas.Thisre(cid:13)ects
`thefactthatmultiplenodes,ASsorareasmaybeequidistanttonodec.
`Ourheuristicsforreplicaplacementarebasedontheroutesmessagestakegettingfromthe
`sourcetothedestination.
`(Granteddi(cid:11)erentmessagesbetweenagivenpairofnodescantake
`di(cid:11)erentrouteseverytime,inpracticetheseroutesareusuallythesameorverysimilar[ ].)
`De(cid:12)nition(Preferencepath)Lets!r !:::!r n!r !:::!rm!r l!:::!r k!c
`betherouterpathfroman(internal)hoststoanexternalclientc,wherefr igarerouterswithin
`s'sarea,frigareroutersins'sASbutoutsides'sarea,andfr igareroutersoutsides'sAS.
`ThepreferencepathbetweensandcisasequencebestNode(ARs;r )!:::!bestNode(ARs;r n)!
`bestArea(ASs;r )!:::!bestArea(ASs;rm)!bestAS(R;r )!:::!bestAS(R;r k)!s;
`whereARsisthesetofallhostsiss'sarea,ASsistheAStowhichsbelongs,andRisthesetof
`allinternalautonomoussystems.
`Insomecases,thesameinternalASmaybetheclosesttodi(cid:11)erentroutersonthepath.Inthese
`caseswecansimplifythepreferencepathbyconsideringthecanonicalpreferencepathinstead.
`De(cid:12)nition (Canonicalpreferencepath)Lets!E !:::!En!cbethepreferencepath
`fromstoc.
`Foralli<j,ifEiTEj=;,replaceEiwithEinEj.IfEibecomesempty,removeitfromthe
`path.RepeatthisuntilEiTEj=;forallpairsofthepathelements.Theresultingpathiscalled
`thecanonicalpreferencepathfromstoc.
`Fromnowon,wewillusethetermpreferencepathmeaningcanonicalpreferencepath.
`Example ConsidertheinternetworkonFigure ,wherelinesrepresenttheproximityfunctions.
`Amessagefromhoststoanexternalclientctravelsviarouterpathr !r!r!r!
`r!r!r ,resultinginthepreferencepaths!fh ;hg!fh g!fARg!fARg!
`fAS g!fAS;AS g!c.Thecorrespondingcanonicalpathiss!fh ;hg!fh g!fARg!
`fAS;AS g!c.
`Amessagefromhoststoclientcpassesbythehostsonthepreferencepathfromstoc,assuming
`thatproximityfunctionsarede(cid:12)nedappropriately.GiventhatInternetroutingprotocolsattemptto
`choosetheshortestroutesandtheroutesfromthesehoststocareshorterthanfroms,itwouldhave
`beenadvantageousforthisrequestiftherequestwasservicedbyoneofthesehosts.Furthermore,
`thecloserthedataisonthepreferencepathtocthegreaterthebene(cid:12)ts.Itisthislastobservation
`thatmotivatestransformingthepreferencepathtothecanonicalform.
`Wewillassumetheavailabilityofthefollowinginformation:
`(cid:15)ForanyclientcandanysubsetSof(internal)hosts,thevalueofBestNode(S;c);
`
`6
`
`
`
`
`
`DynamicReplicationontheInternet
`
`c
`
`AS3
`
`r4
`
`r1
`
`r3
`
`AS2
`h4
`
`r2
`
`r5
`
`r7
`r8
`
`r8
`
`r6
`
`AR2
`
`AR3
`
`h3
`
`h2
`
`AR1
`
`Figure :Anexampleofaninternetwork.
`
`h1
`
`r9
`
`s
`
`AS1
`
`7
`
`
`
`
`DynamicReplicationontheInternet
`(cid:15)Foranyclientcandanyhosts,the(canonical)preferencepathbetweencands.
`Wewilldiscusshowtoobtainthisinformatione(cid:14)cientlyinthecontextofactualIProutingpro-
`tocolsinSection.Thisinformationwillbeextractedfromtheroutesdatabasesmaintainedbythe
`routers,andhence,corresponddirectlytothenotionsof\closeness"usedbytherouters,currently
`thenumberofhopstakenbymessagesenroutefromonenodetoanother.Asroutersbecomemore
`sophisticatedandstartusingmoreelaboratemetrics(e.g.,linkbandwidth,linkcongestion,usage
`fees),thesemetricswillbere(cid:13)ectedintheroutesdatabasesandthereforewillbepickedupbyour
`proximityfunctionsautomatically.
`.LoadMetrics
`Weassumetheexistenceofauniformloadmeasurethatallowsloadcomparisonofdi(cid:11)erentservers.
`Ingeneral,theloadmetricmayre(cid:13)ectmultiplecomponents,notablycomputationalloadandstorage
`utilization.Withmultipleloadcomponents,onecanuseloadvectors,withthefollowingoperation
`rules:(cid:15)load >loadifload isgreaterthanloadinsomecomponent;load <loadifload is
`smallerthanloadinallcomponents.Notethatload >loaddoesnotentailload<load ,
`soonemustbecarefulwiththeorderofoperands.Theseoperationsareused,e.g.,tocompare
`theserverloadwithitscapacityload.
`(cid:15)load +loadisaloadvectorwhosecomponentsareobtainedbycomponent-wiseaddition
`ofoperands.Thisoperationisused,e.g.,toexpressthecombinedloadduetohostingtwo
`objects.
`(cid:15)N(cid:3)load,whereNisthenumber,isaloadvectorwhosecomputationalcomponentsare
`obtainedbymultiplyingthecorrespondingcomponentsofvectorloadbyN,andwhosestorage
`componentsareequaltothecorrespondingcomponentsofvectorload.Thisoperationisused,
`e.g.,toexpresstheloadduetohostinganobjectwhentheobjectaccessrateincreasesN
`times.
`Forcompactnessofthepresentation,wewillassumethatloadmetricrepresentsasinglecom-
`putationalcomponent.Thelengthofthereadyqueue(e.g.,theoutputoftheuptimecommandin
`UNIX)canbeusedasthemeasureofcomputationalload[].Extendingthealgorithmstoallow
`vectorloadsasde(cid:12)nedaboveistrivial.
`Weassumethatanindividualservercanestimatethefractionofitstotalloadduetoagiven
`objectonthisserver.Thiscanbedonebykeepingtrackofresourceconsumption(CPUtime,IO
`operations,etc.)duetorequestsforindividualobjectsanddividingupthetotalloadbetween
`objectsproportionallytotheirconsumption.
`Onelasttechnicalissueisthatloadmetricsareusuallyaveragedoversomesamplinginterval
`precedingthetimethemetricisrecorded.So,ameasurementtakenrightafteranobjectrelocation
`eventonahostwillnotre(cid:13)ectthechangeinthesetofhosteddocuments.Todealwiththis
`technicality,weassumethatonceahostacceptsanobject,itusesanupper-limitestimateofwhat
`itsloadwouldbeafteracquiringtheobjectindecidingwhetherornottohonorfurtherrequests
`foracceptingobjectsfromotherhosts.Thehostreturnstousingactualloadmetricsonlywhenits
`samplingintervalstartsafterthelastobjecthadbeenacquired.Similarly,thehostdecidesitneeds
`too(cid:15)oadbasedonalower-limitestimateofitsload.Thederivationoftheseestimatesisenabled
`byourrequestdistributionalgorithm.
`Whenfrequentobjectrelocationsmakemostofsamplingintervalscontainarelocationevent,ahostcanalways
`periodicallyhaltrelocationstotakefreshloadmeasurements.
`
`8
`
`
`
`
`DynamicReplicationontheInternet
`ChooseReplica(c;x):
`/*Executedbythenameservice*/
`letXbethesetofhoststhathaveareplicaofx;
`letpbeanodeinbestNode(X;c)withthesmallestvalueofratio =rcnt(xp)
`aff(xp)andqbethehostthat
`hasareplicaofxwiththesmallestvalueofratio=rcnt(xq)
`aff(xq);
`ifratio >ratio
`choosep;
`rcnt(xp)=rcnt(xp)+ ;
`elsechooses;
`rcnt(xs)=rcnt(xs)+ ;
`endif
`end
`Figure:Thealgorithmforchoosingareplica.
`. NotationandTerminology
`Intherestofthepaper,xpwilldenoteareplicaofobjectxonserverp;load(p)willdenotethe
`loadofnodep,andload(xp)willdenotetheloadonnodepduetoobjectx.Anobjectissaidto
`begeo-migratedorgeo-replicatedtohostsifitisplacedonsforthereasonofproximitytoclient
`requests.Otherwise(whenanobjectismigratedorreplicatedduetoloadconsiderations),theobject
`issaidtobeload-migratedorreplicatedtos.
`ArequestfromclientciscalledlocaltohostsifsBestNode(Allhosts;c).Thepreference
`pathofalocalrequestcontainsjustthehosts.
` RequestDistributionAlgorithm
`Thechallengeindesigningthealgorithmforassigningrequeststoreplicasthatwouldservicethem
`isthatthealgorithmmustcombinethegoalofdistributingloadwiththegoalofchoosingareplica
`thatisclosetotherequest.Asanexample,considerahostingsystemwithjusttwohosts,one
`inAmericaandtheotherinEurope.Choosingreplicasintheround-robinmannerwouldneglect
`theproximityfactor.Forinstance,ifanobjecthasreplicasonbothhosts,withroughlyhalfof
`requestscomingfromeachregion,choosingreplicasintheround-robinmannermaywellresultin
`directingAmericanrequeststotheEuropeanreplicaandviceversa.Ontheotherhand,always
`choosingtheclosestreplicascouldresultinpoorloaddistribution.AssumethattheAmericansite
`isoverloadeddueto\local"requests.Then,iftheclosestreplicaswerealwayschosen,creating
`additionalreplicasinEuropewouldnothelptheoverloadedsite-allrequestswouldstillbedirected
`tothissiteanyway.Ourgoalisanalgorithmthatwoulddirectrequeststotheirclosesthostsinthe
`(cid:12)rstcasewhiledistributingrequestsamongbothhosts(regardlessoftheoriginofrequests)inthe
`secondcase.Finally,thealgorithmmustbesimple,fortworeasons.First,itliesonthecriticalpath
`ofservicingtherequest.Second,itshouldallowsimplereasoningaboutloadboundsonexistingand
`potentialobjectreplicas,whichwouldenablethereplicaplacementalgorithmtomakeautonomous
`placementdecisions.
`ThealgorithmisshowninFigure.Foreachreplicaxs,thealgorithmkeepsacountofthe
`numberoftimesitchoosesthisreplica,therequestcountrcnt(xs).Italsomaintainsthereplica
`
`9
`
`
`
`
`DynamicReplicationontheInternet
`a(cid:14)nity,aff(xs).Replicaa(cid:14)nityisinitiallyequaltooneandthenismodi(cid:12)edbythereplica
`placementalgorithm(seeSection.)Werefertotheratiorcnt(xs)
`aff(xs)asarelativerequestcount,since
`itre(cid:13)ectstherequestcountpera(cid:14)nityunit.
`Whenarequestfromclientcarrives,thealgorithmbeginsbyidentifyingareplicaxqwiththe
`smallestrelativerequestcount,andareplicaxpthatistheclosesttotheclient.Itthenchooses
`thereplica(amongthesetwo)bycomparingtherelativerequestcountoftheleast-requestedreplica
`withtherelativerequestcountdividedbyoftheclosestreplica(adi(cid:11)erentconstantcanbechosen,
`withcorrespondingmodi(cid:12)cationstothereplicaplacementalgorithm).
`Applyingthisalgorithmtotheaboveexample,inthe(cid:12)rstcase,bothreplicaswillhaveroughly
`thesamerequestcount,andthereforeeveryrequestwillbedirectedtotheclosestreplica(assuming
`bothreplicashavea(cid:14)nityone).Inthesecondcase,theAmericansitewillreceiveallrequestsuntil
`itsrequestcountexceedstherequestcountoftheEuropeansitebyafactoroftwo,atwhichpoint
`theEuropeansitewillbechosen.Therefore,theloadontheAmericansitewillbereducedby
`one-third.Creatingmorereplicaswouldreduceitsloadevenfurther.Assumethatnreplicasof
`anobjectarecreated.Evenifthesamereplicaistheclosesttoallrequests,itiseasytoseethat
`thisreplicawillhavetoserviceonlyN=(n+ ),whereNisthetotalnumberofrequests.Thus,
`byincreasingthenumberofreplicas,wecanmaketheloadonthisreplicaarbitrarilylow.Still,
`wheneveranoddrequestarrivesfromanotherreplica'sregion,thisrequestwillbedirectedtoits
`localreplica.
`Replicaa(cid:14)nitiesallowtheprotocoltobevery(cid:13)exibleinrequestdistribution.Continuingwith
`ourexample,assumethatrequestpatternschangefrombeingequallydividedbetweentheAmerican
`andEuropeanreplicastothe %- %split.Ifneithersiteisoverloaded,thereplicaplacement
`algorithmcansetthea(cid:14)nityoftheAmericanreplicato.Withregularrequestinterspacing(i.e.,
`whenarequestfromEuropearrivesaftereveryninerequestsfromAmerica),therequestdistribution
`algorithmwoulddirect = ( %)ofallrequests,includingallthosefromEurope,totheEuropean
`siteandtheresttotheAmericansite.
`Oneproblemwiththisalgorithmisthatwhenanewreplicaiscreated,itwillbechosenforall
`requestsuntilitsrequestcountcatchesupwiththerestofreplicas.Thismaycauseatemporary
`overloadingofnewreplicas.Toavoidthat,thealgorithmresetsallrequestcountsto whenever
`thereplicasetfortheobjectchanges.
`ReplicaPlacementAlgorithm
`Decisionsonreplicaplacementaredoneincooperationbetweenhostsandthereplicationservice,
`whichisimplementedasthereplicatorhierarchy(Figure ).Thereisonereplicatorineachinternal
`area,oneineachinternalautonomoussystem,andonerootreplicator.
`Itisalsoconvenientto
`considerhostsastrivialreplicators,withasinglesubordinatehost.
`Replicatorsactasrepresentativesoftheirregionstooutsidehosts.Ifahostdecidestoplace
`areplicainanotherareaorAS,itsendsitsrequesttothereplicatorofthisareaorAS,andthe
`replicatorchoosesthehost(withintheregionitrepresents)toplacetheobject.Thisarrangement
`facilitatesinformationhidingandisessentialformanageabilityandscalabilityofthesystemwhenit
`spansmultipleareasandASs.Addinganddeletinghostsbecomesanadministrativeactionlocalized
`totheareainvolved;addingordeletinganentireareaconcernsonlythereplicatoroftheparent
`AS.Computingpreferencepathsbyahostbecomesreliantonlyoninformationavailablefromthe
`routingdatabaseofthehost'sareaandautonomoussystem(seeSection).
`Notethatourrequestdistributionalgorithmprecludesatrivialsolutionthatwouldreplicate
`everyobjectoneveryserver.Indeed,sincethisalgorithmisoblivioustoserverloads,itdistributes
`
`10
`
`
`
`Area Repl
`
`...
`
`Area Repl
`
`...
`
`Area Repl
`
`...
`
`...
`
`...
`
`...
`
`...
`
`...
`
`Hosts
`
`Root Repl
`
`AS Repl
`
`. . .
`
`AS Repl
`
`
`DynamicReplicationontheInternet
`Figure :Thereplicatorhierarchy.
`requeststoallavailablereplicas.Thus,excessivereplicaswouldcausemanyrequeststobesentto
`distanthosts.Thereplicaplacementprotocolthereforecreatesnewreplicasonlyifitislikelytobe
`bene(cid:12)cialforeitherclientproximityorserverloadreasons.
`. TheControlState
`Eachhostsmaintainsthefollowingstateforeachobjectitkeeps,xs.
`ForeachentityE(whichcanbehosts,areas,orautonomoussystems)thatappearedonpreference
`pathsofsomerequeststoxsinthelastsamplinginterval,hostskeepsthecountofthenumberof
`theseappearances,cnt(E;xs),referredtoastheaccesscountofE.Inparticular,cnt(s;xs)=cnt(xs)
`givesthetotalaccesscountforxs.Weknowthat,foragivenrequest,nodesonitspreferencepath
`representpreferablelocationsfortheobject.So,anentitythatfrequentlyappearsinpreference
`pathsmaybeagoodcandidateforplacinganobjectreplica.
`HostsalsokeepsthedistancefromstoEonpreferencepaths,averagedovertherequestsfor
`xs,dist(E;xs).Thedistanceisaveragedbecausepreferencepathsmaychangeovertime.
`Finally,smaintainstheloadduetoeachobjectxsitkeeps,load(xs),andreplicaa(cid:14)nity,
`aff(xs).A(cid:14)nityisacompactwayofrepresentingmultiplereplicasofthesameobjectonthesame
`host.Whenthereplicais(cid:12)rstcreated,itsa(cid:14)nityisinitializedto ;whenanobjectismigratedor
`replicatedtoahostthatalreadyhasareplicaofthisobject,itsa(cid:14)nityisincremented.Similarto
`relativerequestcounts,wewillcallaratiocnt(E;xs)
`aff(xs)arelativeaccesscountofcandidateE.
`Areplicatorrmaintainstheminimumhostload,minload(ri),foreachofitschildreplicatorsri.
`Forachildri,minload(ri)isthehigher-boundloadestimateofthetheleast-loadedhostamongri's
`descendanthosts.Thisstateismaintainedasfollows:thelowest-levelreplicators(hosts)periodically
`reporttheirloadtotheirparentreplicators.Uponcollectingallreports,theparentreplicatorrecords
`theminimumhostloads,andinturnsendsthesmallestreportedloadtoitsparent.Betweenrounds
`ofloadreports,areplicatormayalsomodifyitsstateduringexecutionoftheprotocolsbelow.
`
`Area Repl
`
`11
`
`
`
`
`
`DynamicReplicationontheInternet
`DecidePlacement():
`/*Executedbyhosts*/
`ifload(s)>hwoffloading=Yes;
`ifload(s)<lwoffloading=No;
`foreachxs
`ifcnt(s;xs)
`aff(xs)<u
`decrementaff(xs)ifitwasgreaterthan ,ordropxsotherwise
`(unlessxsisthesolereplicaofxinthesystem);
`elseLoopthroughcandidates(xs),inthedecreasingorderofdist(E;xs).
`ForeachcandidateEsuchthatcnt(E;xs)
`cnt(s;xs)>