citmobileme

mobileme 时间:2021-04-30 阅读:()

ACMSIGACTNewsDistributedComputingColumn34DistributedComputingintheCloudsIditKeidarDept.
ofElectricalEngineering,TechnionHaifa,32000,Israelidish@ee.
technion.
ac.
ilItseemslike"computationclouds"arecroppingupeverywherenowadays.
.
.
well,exceptperhaps,ac-tually"intheclouds",asarecentAprilFool'sjokebyAmazonsuggested1.
Whilethereisnocommonlyagreed-upondenitionofwhatexactlyconstitutesacloud,itisclearthattherearesomeprettyinterestingmega-scaledistributedcomputingenvironmentsoutthere.
Suchenvironmentsrequire,andalreadydeploy,manydistributedservicesandapplications.
Thiscolumnexaminesdistributedcomputingresearchthatseekstodevelopnewsolutionsforclouds,aswellastoimproveexistingones.
OurmaincontributionisbyKenBirman,Gregory(Grisha)Chockler,andRobbertvanRenesse,whoidentifyaresearchagendaforcloudcomputing,basedoninsightsgainedatthe2008LADISworkshop.
Theyquestionwhethercontemporaryresearchindistributedcomputing,whichsometimestargetscloudenvironments,isindeedrelevantforcloudcomputing.
Someresearcherswillbedisappointedby(andmightdisagreewith)theconclusionstheyreach.
Theythenproceedtodeneanewagendaforcloudcomputingresearch.
Theirarticle,however,doesnotconsiderissuesofsecurityandtrust.
Thisperhapsstemsfromthefactthatthepaperiswrittenfromtheperspectiveofcloudserviceproviders,ratherthanusers,whereastrustisaconcernforthelatter.
Inthenextcontribution,ChristianCachin,YoursTruly,andAlexander(Alex)Shraerexaminethetrustthatusershave(orcanhave)incloudserviceswheretheystoretheirdata,surveyingrisksaswellassolutionsthatarebeingproposedtoaddressthem.
Thecolumnthenturnstoamoreappliedperspective.
Thenextcontribution,byEdward(Eddie)Bort-nikovfromYahoo!
Research,surveysopensourcetechnologiesthatareusedforweb-scalecomputing,highlightingsometechnologytransferfromtheresearchcommunitytoactualimplementations.
Thecol-umnconcludeswithanannouncement,providedbyRogerBargaandJoseBernabeu-AubanfromMicrosoft,aboutaCloudComputingtutorialthatwillbegivenatDISC'2009inSeptember,inElche,Spain.
ManythankstoKen,Grisha,Robbert,Christian,Alex,Eddie,Roger,andJosefortheircontributions!
1http://aws.
typepad.
com/aws/2009/03/up-up-and-away-cloud-computing-reaches-for-the-sky.
html67TowardaCloudComputingResearchAgendaKenBirmanGregoryChocklerRobbertvanRenesseCornellUniversityIBMResearchCornellUniversityIthaca,NY,USAHaifa,IsraelIthaca,NY,USAken@cs.
cornell.
educhockler@il.
ibm.
comrvr@cs.
cornell.
eduAbstractThe2008LADISworkshoponLargeScaleDistributedSystemsbroughttogetherleadersfromthecommercialcloudcomputingcommunitywithresearchersworkingonavarietyoftopicsindistributedcomputing.
Thedialogyieldedsomesurprises:somehotresearchtopicsseemtobeoflimitednear-termimportancetothecloudbuilders,whilesomeoftheirpracticalchallengesseemtoposenewquestionstousassystemsresearchers.
Thisbriefnotesummarizesourimpressions.
1WorkshopBackgroundLADISisanannualworkshopfocusingonthestateoftheartindistributedsystems.
Theworkshopsarebyinvitation,withtheorganizingcommitteesettingtheagenda.
In2008,thecommitteeincludedourselves,EliezerDekel,PaulDantzig,DannyDolev,andMikeSpreitzer.
Theworkshopwebsite1includesthede-tailedagenda,whitepapers,andslidesets2;proceedingsareavailableelectronicallyfromtheACMPortalwebsite[21].
2LADIS2008TopicThe2008LADIStopicwasCloudComputing,andmorespecically:Managementinfrastructuretools(exampleswouldincludeChubby[4],Zookeeper[25],Paxos[22],[19],Boxwood[23],GroupMembershipServices,DistributedRegistries,ByzantineStateMachineReplication[6],etc),Scalabledatasharingandeventnotication(examplesincludePub/Subplatforms,Multicast[31],Gossip[30],GroupCommunication[8],DSMsolutionslikeSinfonia[1],etc),1http://www.
cs.
cornell.
edu/projects/ladis20082http://www.
cs.
cornell.
edu/projects/LADIS2008/presentations.
htmACMSIGACTNews68June2009Vol.
40,No.
2Network-Levelandotherresource-managedtechnologies(VirtualizationandConsolidation,ResourceAllocation,LoadBalancing,ResourcePlacement,Routing,Scheduling,etc),Aggregation,Monitoring(Astrolabe[29],SDIMS[32],Tivoli,Reputation).
In2008,LADIShadthreekeynotespeakers,oneofwhomsharedhisspeakingslotwithacolleague:JerryCuomo,IBMFellow,VP,andCTOforIBM'sWebsphereproductline.
WebsphereisIBMsagshipproductinthewebservicesspace,andconsistsofascalableplatformfordeployingandmanagingdemandingwebservicesapplications.
Cuomohasbeenakeyplayerintheeffortsinceitsinception.
JamesHamilton,atthattimealeaderwithinMicrosoft'snewCloudComputingInitiative.
Hamiltoncametotheareafromacareerspentdesigninganddeployingscalabledatabasesystemsandclustereddatamanagementplatforms,rstatOracleandthenatMicrosoft.
(SubsequenttoLADIS,hejoinedAmazon.
com.
)FrancoTravostinoandRandyShoup,wholeadeBay'sarchitectureandscalabilityeffort.
BothhadlonghistoriesintheparalleldatabasearenabeforejoiningeBayandbothparticipatedineBay'sscale-outfromearlyinthatcompany'slaunch.
Wewon'ttryandsummarizethethreetalks(slidesetsforallofthemareonlineattheLADISwebsite,andadditionalmaterialssuchasblogs3andvideotapedtalks4.
Rather,wewanttofocusonthreeinsightswegainedbycomparingtheperspectivesarticulatedinthekeynotetalkswiththecloudcomputingperspectiverepresentedbyourresearchspeakers:Wewereforcedtoreviseour"denition"ofcloudcomputing.
Thekeynotespeakersseeminglydiscouragedworkonsomecurrentlyhotresearchtopics.
Conversely,theyleftusthinkingaboutanumberofquestionsthatseemnewtous.
3CloudComputingDenedNoteveryoneagreesonthemeaningofcloudcomputing.
Broadly,thetermhasan"outwardlooking"andan"inwardlooking"face.
Fromtheperspectiveofaclientoutsidethecloud,onecouldcitetheWikipediadenition:CloudcomputingisInternet(cloud)baseddevelopmentanduseofcomputertechnology(com-puting),wherebydynamicallyscalableandoftenvirtualizedresourcesareprovidedasaserviceovertheInternet.
Usersneednothaveknowledgeof,expertisein,orcontroloverthetechnologyinfrastructure"inthecloud"thatsupportsthem.
3Perspectives:JamesHamiltonsblog:http://perspectives.
mvdirona.
com4RandyShouponeBay'sArchitecturalPrinciples:http://www.
infoq.
com/presentations/shoup-ebay-architectural-principlesACMSIGACTNews69June2009Vol.
40,No.
2Thedenitionisbroadenoughtocovereverythingfromwebsearchtophotosharingtosocialnetwork-ing.
Perhapsthekeypointissimplythatcloudcomputingresourcesshouldbeaccessiblebytheenduseranytime,anywhere,andfromanyplatform(beitacellphone,mobilecomputingplatformordesktop).
Theoutwardfacingsideofcloudcomputinghasagrowingsetofassociatedstandards.
Byandlarge:Cloudresourcesareaccessedfrombrowsers,"minibrowsers"runningJavaScript/AJAXorsimilarcode,orataprogramlevelusingwebservicesstandards.
Forexample,manycloudplatformsemploySOAPasarequestencodingstandard,andHTTPasthepreferredwaytoactuallytransmittheSOAPrequesttothecloudplatform,andtoreceiveareply.
Theclientthinksofthecloudasasingleentity.
Ofcourse,thisisjustanillusion:inreality,theimplementationtypicallyrequiresoneormoredatacenters,composedofpotentiallyhugenumbersofserviceinstancesrunningonalargeamountofhardware.
InexpensivecommodityPCsstructuredintoclustersarepopular.
Atypicaldatacenterhasanoutwardfacingbankofserverswithwhichclientsystemsinteractdirectly.
CloudsystemsimplementavarietyofDNSandload-balancing/routingmechanismstocontroltheroutingofclientrequeststoactualservers,inamannerthatmasksthestructureofthecloudfromitsusers.
Furtherisolatingtheclientsofacloudsystemfromitsinternalstructure,theexternalserversmayactuallybetheonlyonesthataclientcanaccessdirectly.
Thisoccursbecausethoseserversoftenrunina"demilitarizedzone"(outsideanyrewall),andarelimitedtoexecutingstateless"businesslogic.
"Thistypicallyinvolvesextractingtheclientrequestandparallelizingitwithinsomesetofservicesthatdotheworkandmaintainanystateassociatedwiththecloudorthetransaction.
Theexternalservercollectsreplies,combinesthemintoasingle"result,"andsendsitbacktotheclient.
Thereisalsoaninsidefacingperspective:Acloudserviceisimplementedbysomesortofpoolofserversthateithershareadatabasesubsystemorreplicatedata[13].
Thereplicationtechnologyisveryoftensupportedbysomeformofscalable,high-speedupdatepropagationtechnology,suchaspublish/subscribemessagebus(inwebservices,thetermEnterpriseServiceBusorESBisacatch-allforsuchmechanisms).
Cloudplatformsarehighlyautomated:managementoftheseserverpools(includingsuchtasksaslaunchingservers,shuttingthemdown,loadbalancing,failuredetectionandhandling)areperformedbystandardizedinfrastructuremechanisms.
Acloudsystemwilloftenprovideitsserverswithsomeformofsharedgloballesystem,orin-memorystoreservices.
Forexample,Google'sGFS[15],Yahoo!
'sHDFS[3],Amazon.
com'sS3[2],memcached[18],andAmazonDynamo[12]arewidelycited.
Thesearespecicsolutions;themoregeneralstatementissimplythatserversshareles,databases,andotherformsofcontent.
Serverpoolsoftenneedwaystocoordinatewhensharedcongurationorothersharedstateisupdated.
Insupportofthismanycloudsystemsprovidesomeformoflockingoratomicmulticastmechanismwithstrongproperties[4],[25].
Someverylarge-scaleservicesusetoolslikeDistributedHashTables(DHTs)torapidlyndinformationsharedwithinapoolofservers,orevenaspartofaworkloadpartitioningscheme(forexample,Amazon'sshopping-cartserviceusesaDHTtospreadtheshoppingcartfunctionoverapotentiallyhugenumberofservermachines).
ACMSIGACTNews70June2009Vol.
40,No.
2We'vefocusedtheabovelistontheinteractivesideofadatacenter,whichsupportstheclusteredserverpoolswithwhichclientsactuallyinteract.
Buttheseinturnwilloftendependupon"backofce"function-ality:activitiesthatruninthebackgroundandprepareinformationthatwillbeusedbytheserversactuallyhandlingclientrequests.
AtGoogle,thesebackofcerolesincludecomputingsearchindices.
Examplesofwidelyknownback-ofcesupportingtechnologiesinclude:Schedulingmechanismsthatassigntaskstomachines,butmorebroadly,playtheroleofprovisioningthedatacenterasawhole.
Aswe'llseebelow,thisaspectofcloudcomputingisofgrowingimpor-tancebecauseofitsorganicconnectiontopowerconsumption:bothtospindisksandrunmachines,butalsobecauseactivemachinesproduceheatanddemandcooling.
Scheduling,itturnsout,comesdownto"decidinghowtospendmoney.
"StoragesystemsthatincludenotjustthegloballesystembutalsoscalabledatabasesystemsandotherscalabletransactionalsubsystemsandmiddlewaresuchasGoogle'sBigTable[7],whichpro-videsanextensive(conceptuallyunlimited)tablestructureimplementedoverGFS.
Controlsystemsforlarge-scaledistributeddataprocessinglikeMapReduce[11]andDryadLINQ[33].
Archivaldataorganizationtools,applicationsthatcompressinformationorcomputeindexes,applica-tionsthatlookforduplicateversionsofobjects,etc.
Insummary,cloudcomputinglacksanycrisporsimpledenition.
Tradepublicationsfocusoncloudcomputingasarealizationofaformofubiquitouscomputingandstorage,inwhichsuchfunctionalitycanbeviewedasanewformofcyber-supported"utility".
OneoftenreadsaboutthecloudasananalogoftheelectricpoweroutletortheInternetitself.
Fromthisperspective,thecloudisdenednotbythewayitwasconstructed,butratherbythebehavioritoffers.
Technologists,inturn,haveatendencytotalkaboutthecomponentsofacloud(likeGFS,BigTable,Chubby)butdoingsocanlosetrackofthecontextinwhichthosecomponentswillbeused—acontextthatisoftenverypeculiarwhencomparedwithgeneralenterprisecomputingsystems.
4IstheDistributedSystemsResearchAgendaRelevantWewouldliketoexplorethislastpointingreaterdetail.
Ifthepublicperceptionofthecloudislargelyoblivioustotheimplementationoftheassociateddatacenters,theresearchcommunitycanseemoblivioustothewaymechanismsareused.
Researchersareoftenunawarethatcloudsystemshaveoverarchingdesignprinciplesthatguidedeveloperstowardsacloud-computingmindsetquitedistinctfromwhatwemayhavebeenfamiliarwithfromourworkinthepast,forexampleontraditionalclient/serversystemsortraditionalmulticastprotocols.
Failingtokeepthebroaderprinciplesinmindcanhavetheeffectofoveremphasizingcertaincloudcomputingcomponentsortechnologies,whilelosingtrackofthewaythatthecloudusesthosecomponentsandtechnologies.
Ofcourseiftheusewasarbitraryorsimilarenoughtothoseolderstylesofclient/serversystem,thiswouldn'tmatter.
Butbecausetheclouddemandsobediencetothoseoverarchingdesigngoals(eitherbecausethecloudwasbuiltwithtoolsthatonlysupportcertainstylesofsystem,orbecauseoperatorssuchaseBayorMicrosoftimposeandenforce"rulesofpractice",aswediscussfurtherbelow),whatmightnormallyseemlikemereapplication-leveldetailinsteadturnsouttobedominantandtohaveallsortsoflowerlevelimplications.
Justasonecouldcriticizetheexternalperspective("ubiquitouscomputing")asanoversimplication,LADIShelpedusappreciatethatwhentheresearchperspectiveoverlookstherolesofourtechnologies,weACMSIGACTNews71June2009Vol.
40,No.
2cansometimeswanderoffontangentsbyproposing"newandimproved"solutionstoproblemsthatactuallyruncontrarytotheoverarchingspiritofthecloudmechanismsthatwillusethesetechnologies.
Toseehowthiscanmatter,considerthenotionofdistributedsystemsconsistency.
Theresearchcom-munitythinksofconsistencyintermsofverycarefullyspeciedmodelssuchasthetransactionaldatabasemodel,atomicbroadcast,Consensus,etc.
Wetendtoreasonalongthefollowinglines:GoogleusesChubby(alockingservice)andChubbyusesStateMachineReplicationbasedonPaxos.
ThusConsensus,anes-sentialcomponentofStateMachineReplication,shouldbeseenasalegitimatecloudcomputingtopic:Consensusis"relevant"byvirtueofitspracticalapplicationtoamajorcloudcomputinginfrastructure.
Wethengeneralize:researchonConsensus,newConsensusprotocolsandtools,alternativestoConsensusareall"cloudcomputingtopics".
Whileallofthisistrue,ourpointisthatConsensus,forGoogle,wasn'tthegoal.
Sure,lockingmattersinGoogle,thisiswhytheybuiltalockingservice.
Butthebiggerpointisthateventhoughlargedatacentersneedlockingservices,ifonecantrustourkeynotespeakers,applicationdevelopersareunderhugepressurenottousethem.
We'reremindedoftheoldstoryoftheblindmentouchingtheelephant.
Whenwereasonthat"GoogleneededChubby,soConsensusasusedtosupportlockingisakeycloudcomputingtechnology,"weactuallyskippasttheactualdesignprincipleandjumpdirectlytothedetails:thiswayofbuildingalockingserviceversusthatone.
Indoingso,welosetrackofthebroaderprinciple,whichisthatdistributedlockingisabadthingthatmustbeavoided!
Thisparticularexampleisagoodonebecause,aswe'llseeshortly,iftherewasasingleoverarchingthemewithinthekeynotetalks,itturnsouttobethatstrongsynchronizationofthesortprovidedbyalockingservicemustbeavoidedliketheplague.
Thisdoesn'tdiminishtheneedforatoollikeChubby;whenlockingactuallycan'tbeavoided,onewantsareliable,standard,provablycorrectsolution.
Yetitdoesemphasizethesenseinwhichwhatweasresearchersmighthavethoughtofasthemainpoint("thevitalroleofconsistencyandConsensus")isactuallysecondaryinacloudsetting.
Seeninthislight,onerealizesthatwhileresearchonConsensusremainsvaluable,itwasamistaketoportrayitasifitwasresearchonthemostimportantaspectofcloudcomputing.
Ourkeynotespeakersmadeitclearthatinfocusingoverlynarrowly,theresearchcommunityoftenmissesthebiggerpoint.
Thisisironic:mostoftheresearcherswhoattendedLADISarethesortsofpeoplewhoteachtheirstudentstodistinguishaproblemstatementfromasolutiontothatproblem,andyetbyoverlookingthereasonsthatcloudplatformsneedvariousmechanisms,weseemtobeguiltyofne-tuningspecicsolutionswithoutadequatelythinkingaboutthecontextinwhichtheyareusedandtherealneedstowhichtheyrespond—aspectsthatcancompletelyreshapeaproblemstatement.
TogobacktoChubby:onceonerealizesthatlockingisatechnologyoflastresort,whilebuildingagreatlockingserviceisclearlytherightthingtodo,oneshouldalsoaskwhatresearchquestionsareposedbytheneedtosupportapplicationsthatcansafelyavoidlocking.
Sure,Consensusreallymatters,butifwefocustoostronglyonit,werisklosingtrackofitslimitedimportanceinthebiggerpicture.
Let'slookatasecondexamplejusttomakesurethispointisclear.
DuringhisLADISkeynote,Microsoft'sJamesHamiltoncommentedthatforreasonsofautonomouscontrol,largedatacentershaveadoptedastandardmodelresemblingthewell-knownRecovery-OrientedComputing(ROC)paradigm[24,5].
Inthismodel,everyapplicationmustbedesignedwithaformofautomaticfaulthandlingmechanism.
Inshort,thismechanismsuspectsanapplicationifanyothercomponentcomplainsthatitismisbehaving.
Oncesuspectedbyafewcomponents,orsuspectedstrenuouslybyevenasinglecomponent,theoffendingapplicationisrebooted—withnoattempttowarnitsclientsorensurethattherebootwillbegracefulortransparentornon-disruptive.
Thefocusapparentlyisonspeed:justpushtherebootbutton.
Ifthisdoesn'tcleartheproblem,Jamesdescribedaseriesofnextsteps:theapplicationmightbeautomaticallyreinstalledACMSIGACTNews72June2009Vol.
40,No.
2onafreshoperatingsysteminstance,orevenmovedtosomeothernode—again,withouttheslightestefforttowarnclients.
WhatdotheclientsdoWell,theyareforcedtoacceptthatservicesbehavethisway,anddeveloperscodearoundthebehavior.
Theytryanduseidempotentoperations,orimplementwaystoresynchronizewithaserverwhenaconnectionisabruptlybroken.
Againstthisbackdrop,Hamiltonpointedtothebodyofresearchontransparenttaskmigration:tech-nologyformovingarunningapplicationfromonenodetoanotherwithoutdisruptingtheapplicationoritsclients.
HispointNotthattheworkinquestionisn'tgood,hard,orpublishable.
Butsimplythatcloudcomputingsystemsdon'tneedsuchamechanism:ifaclientcan(somehow)tolerateaservicebeingabruptlyrestarted,reimagedormigrated,thereisnoobviousvaluetoadding"transparentonlinemigration"tothemenuofoptions.
Hamiltonseesthisasanalogoustotheend-to-endargument:ifalowlevelmechanismwon'tsimplifythehigherlevelthingsthatuseit,howcanonejustifythecomplexityandcostofthelowleveltoolInterestingly,althoughthiswasn'treallyourintentionwhenweorganizedLADIS2008,ByzantineConsensusturnedouttobeahottopic.
Itwastreated,atleastinpassing,bysurprisinglymanyLADISre-searchersintheirwhitepapersandtalks.
Clearly,ourresearchcommunityisnotonlyinterestedinByzantineConsensus,butalsoperceivesByzantinefaulttolerancetobeofvalueincloudsettings.
WhataboutourkeynotespeakersWell,thequickansweristhattheyseemedrelativelyuninterestedinConsensus,letaloneByzantineConsensus.
Onecouldimaginemanypossibleexplanations.
Forexample,someindustryresearchersmightbeunawareoftheConsensusproblemandassociatedtheory.
Suchapersonmightplausiblybecomeinterestedoncetheylearnmoreabouttheimportanceoftheproblem.
Yetthisturnsoutnottobethecaseforourfourkeynotespeakers,allofwhomhavesurprisinglyacademicbackgrounds,andanyofwhomcoulddeliveranuancedlectureonthestateoftheartinfault-tolerance.
Theunderlyingissuewasquitetheopposite:thespeakersbelievethemselvestounderstandsomethingwedidn'tunderstand.
TheyhadnoissuewithByzantineConsensus,butitjustisn'taprimaryquestionforthem.
WecanrestatethisrelativetoChubby.
OneoftheLADISattendeescommentedatsomepointthatByzantineConsensuscouldbeusedtoimproveChubby,makingittolerantoffaultsthatcoulddisruptitascurrentlyimplemented.
Butforourkeynotespeakers,enhancingChubbytotoleratesuchfaultsturnsouttobeofpurelyacademicinterest.
Thebigger—theoverarching—challengeistondwaysoftrans-formingservicesthatmightseemtoneedlockingintoversionsthatarelooselycoupledandcanoperatecorrectlywithoutlocking[17]—togetChubby(andherewe'repickingonChubby:thesamegoesforanysynchronizationprotocol)offthecriticalpath.
TheprincipleinquestionwasmostclearlyexpressedbyRandyShoup,whopresentedtheeBaysystemasanevolutionthatstartedwithamassiveparalleldatabase,butthendivergedfromthetraditionaldatabasemodelovertime.
AsShoupexplained,toscaleout,eBayservicesstartedwiththestepsurgedbyJimGrayinhisfamousessayonterminologyforscalablesystems[13]:theypartitionedtheenterpriseintomultipledisjointsubsystems,andthenusedsmallclusterstoparallelizethehandlingofrequestswithinthese.
Butthiswasn'tenough,Shoupargued,andeventuallyeBaydepartedfromthetransactionalACIDpropertiesentirely,movingtowardsadecentralizedconvergencebehaviorinwhichservernodesare(asmuchaspossible)maintainedinlooselyconsistentbuttransientlydivergentstates,fromwhichtheywillconvergebacktowardsaconsistentstateovertime.
Shoupargued,ineffect,thatscalabilityandrobustnessincloudsettingsarisesnotfromtightsynchro-nizationandfault-toleranceoftheACIDtype,butratherfromloosesynchronizationandself-healingcon-vergencemechanisms.
Shoupwasfarfromtheonlyspeakertomakethispoint.
Hamilton,forexample,commentedthatwhenACMSIGACTNews73June2009Vol.
40,No.
2aMicrosoftcloudcomputinggroupwantstouseastrongconsistencypropertyinaservice.
.
.
hisexecutiveteamhadthepolicyofsendingthatgrouphometondsomeotherwaytobuildtheservice.
Asheexplainedit,onecan'talwayscompletelyeliminatestrongACID-styleconsistencyproperties,buttherstprincipleofsuccessfulscalabilityistobattertheconsistencymechanismsdowntoaminimum,movethemoffthecriticalpath,hidetheminararelyvisitedcornerofthesystem,andthenmakeitashardaspossibleforapplicationdeveloperstogetpermissiontousethem.
Ashesaidthis,Shoupbeamed:hehasthesameroleateBay.
TheLADISaudiencedidn'ttakethese"ghtingwords"passively.
AlvisiandGuerraouibothpointedoutthatByzantinefault-toleranceprotocolsaremoreandmorescalableandmoreandmorepractical,cit-ingworktooptimizetheseprotocolsforhighload,sustainedtransactionstreams,andtocreateoptimisticvariantsthatwillterminateearlyifanexecutionexperiencesnofaults[10],[20].
Yetthekeynotespeakerspushedback,reiteratingtheirpoints.
Shoup,forexample,notedthatmuchthesamecanbesaidofmoderntransactionprotocols:theytooscalewell,cansustainextremelyhightransactionrates,andaremoreandmoreoptimizedfortypicalexecutionscenarios.
Indeed,thesearejustthekindsofprotocolsonwhicheBaydependedinitsearlydays,andthatHamilton"cuthisteeth"developingatOracleandthenasatechnicalleaderoftheMicrosoftSQLserverteam.
ButforShoupperformanceisn'tthereasonthateBayavoidsthesemechanisms.
Hisworryisthatnomatterhowfasttheprotocol,itcanstillcauseproblems.
Thisisasurprisinginsight:forourresearchcommunity,theprevailingassumptionhasbeenthatByzan-tineProtocolswouldbeusedpervasivelyifonlypeopleunderstoodthattheynolongerneedtobeperfor-mancelimitingbottlenecks.
ButShoup'spointisthateBayavoidsthemforadifferentreason.
Hisworryinvolveswhatcouldbecharacterizedas"spookingcorrelations"and"self-synchronization".
Ineffect,anymechanismcapableof"coupling"thebehaviorofmultiplenodesevenlooselywouldincreasetheriskthatthewholedatacentermightbegintothrash.
ShouprelatedstoriesaboutthehugeeffortthateBayinvestedtoeliminateconvoyeffects,inwhichlargepartsofasystemgoidlewaitingforsomesmallnumberofbackloggednodestoworktheirwaythroughaseeminglyendlesstrafcjam.
Thenhespokeoffeedbackoscillationsofallkinds:multicaststorms,chaoticloaductuations,thrashing.
Andfromthis,hereiterated,eBayhadlearnedthehardwaythatanyformofsynchronizationmustbelimitedtosmallsetsofnodesandusedrarely.
Infact,thethreeofusareawareofthisphenomenonfromprojectsonwhichwe'vecollaboratedovertheyears.
Weknowofmanyepisodesinwhichdatacenteroperatorshavefoundtheirlarge-scalesystemsdebilitatedbyinternalmulticast"storms"associatedwithpublish/subscribeproductsthatdestabilizedonaverylargescale,ultimatelysolvingthoseproblemsbylegislatingthatUDPmulticastwouldnotbeusedasatransport.
TheconnectionMulticaststormsareanotherformofself-synchronizing,destructivebehaviorthatcanarisewhencoordinatedactions(inthiscase,lossrecoveryforareliablemulticastprotocol)areunleashedonalargescale.
Thusforourkeynotespeakers,"fearofsynchronization"wasanoverarchingconsiderationthatintheireyes,matteredfarmorethanthetheoreticalpeakperformanceofsuch-and-suchanatomicmulticastorCon-sensusprotocol,Byzantine-tolerantornot.
Ineffect,thequestionthatmatteredwasn'tactuallyperformance,butrathertheriskofdestabilizationthatevenusingmechanismssuchastheseintroduces.
Reectingonthesecomments,whichwereechoedbyCuomoandHamiltoninothercontexts,wendourselvesbackinthatroomwiththeelephant.
Perhapsasresearchersfocusedontheperformanceandscalabilityofmulticastprotocols,orConsensus,orpublish/subscribe,we'reinthepositionofmistakingthetailofthebeastforthecritteritself.
OurLADISkeynotespeakersweren'tnaiveaboutthepropertiesofthekindsofprotocolsonwhichwework.
Ifanything,we'retheonesbeingnaive,aboutthesettinginwhichthoseprotocolsareused.
ACMSIGACTNews74June2009Vol.
40,No.
2Toourcloudoperators,theoverarchinggoalisscalability,andthey'vepainfullylearnedoneoverarchingprincipleofscale:decoupling.
Thekeyistoenablenodestoquietlygoabouttheirwork,asynchronouslyreceivingstreamsofupdatesfromtheback-ofcesystems,synchronouslyhandlingclientrequests,andavoidingeventhemostminorattempttointeractwith,coordinatewith,agreewithorsynchronizewithothernodes.
Howeversimpleorfastaconsistencymechanismmightbe,theystillviewsuchmechanismsaspotentialthreatstothiscoreprincipleofdecoupledbehavior.
Andthustheirinsistenceonasynchronousconvergenceasanalternativetostrongerconsistency:yes,overtime,onewantsnodestobeconsistent.
Butputtingconsistencyaheadofdecouplingis,theyemphasized,justwrong.
5TowardsaCloudComputingResearchAgendaOurworkshopmayhaveservedtodeconstructsomeaspectsofthetraditionalresearchagenda,butitalsoleftuswithelementsofanewagenda—andonenotnecessarilylessexcitingthantheonewearebeingurgedbytheseleaderstoshiftawayfrom.
Someofthemainresearchthemesthatemergeare:1.
Powermanagement.
Hamiltonwasparticularlyemphaticonthistopic,arguingthataten-foldreduc-tioninthepowerneedsofdatacentersmaybepossibleifwecansimplylearntobuildsystemsthatareoptimizedwithpowermanagementastheirprimarygoal,andthatthissavingsopportunitymaybethemostexcitingwaytohaveimpacttoday[14].
ExamplesofideasthatHamiltonoatedwere:Explorewaystosimplydolessduringsurgeloadperiods.
Explorewaystomigrateworkintime.
Thepointherewasthatloadonmoderncloudplatformsisverycyclical,withinfrequentpeaksanddeepvalleys.
Itturnsoutthattheneedtoprovideacceptablequalityofserviceduringthepeaksinatescostscontinuously:evenvalleytimeismademoreexpensivebytheneedtoownapowersupplyabletohandlethepeaks,anumberofnodesadequatetohandlesurgeloads,anetworkprovisionedforworst-casedemand,etc.
Hamil-tonsuggestedthatratherthanthinkabouttaskmigrationforfault-tolerance(atopicmentionedabove),weshouldbethinkingabouttaskdecompositionwiththegoalofmovingworkfrompeaktotrough.
Hamilton'spointwasthatinaheavilyloadeddatacentercopingwitha95%peakload,surprisinglylittleisreallyknownabouttheactualtasksbeingperformed.
Asinanysystem,afewtasksprobablyrepresentthemainload,soonecouldplausiblylearnagreatdeal—perhapsevenautomatically.
Havingdonethis,onecouldattackthoseworst-caseoffenders.
Maybetheycanprecomputesomedata,ordefersomeworktobenisheduplater,whenthesurgehasended.
Thepotentialseemstobeverygreat,andthetopiclargelyunexplored.
Evenduringsurgeloads,somemachinesturnouttobeverylightlyloaded.
Hamiltonarguedthatifoneownsanode,itshoulddoitsshareofthework.
Thisarguesformigratingportionsofsometasksinspace:breakingoverloadedservicesintosmallercomponentsthatcanoperateinparallelandbeshiftedaroundtobalanceloadontheoveralldatacenter.
Here,Hamiltonobservedthatwelacksoftwareengineeringsolutionsaimedatmakingiteasyforthedatacenterdevelopmentteamtodelaythesedecisionsuntillateinthegame.
Afterall,whenbuildinganapplicationitmaynotbeatallclearthat,threeyearsdowntheroad,theapplicationwillaccountformostoftheworkloadduringsurgeloadsthatinturnaccountformostofthecostofthedatacenter.
Thus,longafteranapplicationisbuilt,oneneedswaystorestructureitwithpowermanagementasagoal.
ACMSIGACTNews75June2009Vol.
40,No.
2Newmodelsandprotocolsforconvergentconsistency,perhapsalongthelinesof[28]or[5].
Asnotedearlier,ShoupenergeticallyarguedagainsttraditionalconsistencymechanismsrelatedtotheACIDproperties,andgroupedConsensusintothistechnologyarea.
ButitwasnotsocleartouswhatalternativeeBaywouldprefer,andinfactweseethisasaresearchopportunity.
Weneedtoeitheradaptexistingmodelsforconvergentbehavior(self-stabilization,perhaps,ortheformsofprobabilisticconvergenceusedinsomegossipprotocols)tocreateaformalmodelthatcouldcapturethedesiredbehavioroflooselycoupledsystems.
Suchamodelwouldletusreplace"looseconsistency"withstrongstatementsaboutpreciselywhenasystemisindeedlooselyconsistent,andwhenitismerelybroken!
Weneedaproofmethodologyandmetricsforcomparison,sothatwhendistinctteamssolvethisnewproblemstatement,wecanconvinceourselvesthatthesolutionsreallyworkandcomparetheircosts,performance,scalabilityandotherproperties.
Conversely,theByzantineConsensuscommunityhasvalueonthetablethatonewouldnotwishtosweeptotheoor.
Considertherecent,highlypublicized,Amazon.
comoutageinwhichthatcompany'sS3storagesystemwasdisabledformuchofadaywhenacorruptedvalueslippedintoagossip-basedsubsystemandwasthenhardtoeliminatewithoutfullyrestartingthesubsystem—oneneededbymuchofAmazon,andhenceastepthatforcedAmazontobasicallyshutdownandrestart.
TheByzantinecommunitywouldbejustied,wethink,inarguingthatthisexampleillustratesnotjustaweaknessinlooseconsistency,butalsoadangerassociatedwithworkinginamodelthathasneverbeenrigorouslyspecied.
ItseemsentirelyfeasibletoimportideasfromByzantineConsensusintoaworldoflooseconsistency;indeed,onecanimagineasystemthatachieves"eventualByzantineConsensus.
"OneofthepapersatLADIS(Rodriguesetal.
[26],[27])presentedaspecicationofexactlysuchaservice.
Suchstepscouldbefertileareasforfurtherstudy:topicscloseenoughtotoday'shotareastopublishupon,andyetdirectlyrelevanttocloudcomputing.
2.
Notenoughisknownaboutstabilityoflarge-scaleeventnoticationplatforms,managementtech-nologies,orothercloudcomputingsolutions.
Aswescalethesekindsoftoolstoencompasshundredsorthousandsofnodesspreadoverperhapstensofdatacenters,worldwide,weasresearcherscan'thelpbutbepuzzled:howdooursolutionsworktoday,insuchsettingsVerylarge-scaleeventingdeploymentsareknowntobepronetodestabilizingbehavior—acommunications-levelequivalentofthrashing.
Notknownaretheconditionsthattriggersuchthrashing,thebestwaystoavoidit,thegeneralstylesofprotocolsthatmightbeinherentlyrobustorinherentlyfragile,etc.
Notverymuchisknownabouttestingprotocolstodeterminetheirscalability.
Ifweinventasolution,howcanwedemonstrateitsrelevancewithoutrsttakingleaveofourdayjobsandsigningonatAmazon,Google,MSNorYahooToday,realistically,itseemsnearlyimpossibletovalidatescalableprotocolswithoutworkingatsomecompanythatoperatesamassivebutproprietaryinfrastructure.
Anotheremergingresearchdirectionlooksintostudyingsubscriptionpatternsexhibitedbythenodesparticipatinginalarge-scalepublish/subscribesystem.
Researchers(includingtheau-thorsofthisarticle)arendingthatinreal-worldworkloads,thesubscriptionpattersassociatedwithindividualnodesarehighlycorrelated,formingclustersofnearlyidenticalorhighlysimilarsubscriptions.
Thesestructurescanbediscoveredandexploited(throughe.
g.
,overlaynetworkACMSIGACTNews76June2009Vol.
40,No.
2clustering[9],[16],orchannelization[31]).
LADISresearchersreportedonopportunitiestoamortizemessagedisseminationcostsbyaggregatingmultipletopicsandnodes,withthepoten-tialofdramaticallyimprovingscalabilityandstabilityofapub/subsystem.
3.
OurthirdpointleadstoanideathatMaheshBalakrishnanhaspromoted:weshouldperhapsbegintotreatvirtualizationasarst-classresearchtopicevenwithrespecttoseeminglyremotequestionssuchasthescalabilityofaneventingsolutionoratoleratingByzantinefailures.
ThepointMaheshmakesrunsroughlyasfollows:Forreasonsofcostmanagementandplatformmanagement,thedatacenterofthefutureseemslikelytobefullyvirtualized.
Today,oneassumesthatitmakesnosensetotalkaboutascalableprotocolthatwasactuallyevaluatedon200virtualnodeshostedon4physicalones:onepresumesthatinternalschedulingandcontentioneffectscouldbemoredominantthanthescalabilityoftheprotocolsperse.
Butperhapstomorrow,itwillmakenosensetotalkaboutprotocolsthataren'tdesignedforvirtu-alizedsettingsinwhichnodeswilloftenbeco-located.
Afterall,ifHamiltonisrightandcostfactorswilldominateallotherdecisionsinallsituations,howcouldthisnotbetruefornodestooAretheredeeparchitecturalprincipleswaitingtobeuncovered—perhapsevenentirelynewoperatingsystemsorvirtualizationarchitectures—whenonethinksaboutsupportformassivelyscalableprotocolsrunninginsuchsettings4.
Incontrasttoenterprisesystems,theonlyeconomicallysustainablewayofsupportingInternetscaleservicesistoemployahugehardwarebaseconsistingentirelyofcheapoff-the-shelfhardwarecom-ponents,suchaslow-endPC'sandnetworkswitches.
AsHamiltonpointedout,thisreectssimpleeconomiesofscale:i.
e.
,itismuchcheapertoobtainthenecessarycomputationalandstoragepowerbyputtingtogetherabunchofinexpensivePC'sthantoinvestintoahigh-endenterpriselevelequip-ment,suchasamainframe.
Thistrendhasimportantarchitecturalimplicationsforcloudplatformdesign:Scalabilityemergesasacrosscuttingconcernaffectingallthebuildingblocksusedincloudsettings(andnotrestrictedtothoserequiringstrongconsistency).
Thoseblocksshouldbeeitherredesignedwithscalabilityinmind(e.
g.
,byusingpeer-to-peertechniquesand/ordynamicallyadjustablepartitioning),orreplacedwithnewmiddlewareabstractionsknowntoperformwellwhenscaledout.
Aswescalesystemsup,sheernumbersconfrontuswithgrowingfrequencyoffaultswithinthecloudplatformasawhole.
Consequently,cloudservicesmustbedesignedunderassumptionthattheywillexperiencefrequentandoftenunpredictablefailures.
Servicesmustrecoverfromfailuresautonomously(withouthumanintervention),andthisimpliesthatcloudcomputingplat-formsmustofferstandard,simpleandfastrecoveryprocedures[17].
Wepointedtoaseemingconnectiontorecoveryorientedcomputing(ROC)[24],yetROCwasproposedinmuchsmallerscalesettings.
Arigorouslyspecied,scalableformofROCisverymuchneeded.
Thoseofuswhodesignprotocolsforcloudsettingsmayneedtothinkhardaboutchurnandhandlingofotherformsofsuddendisruptions,suchassuddenloadsurges.
Existingprotocolsaretoooftenpronetodestabilizedbehaviorssuchasoscillation,andthismaypreventtheiruseACMSIGACTNews77June2009Vol.
40,No.
2inlargedatacenters,wheresucheventsruntheriskofdisruptingevenapplicationsthatdon'tusethoseprotocolsdirectly.
Wecouldgoonatsomelength,butthesepointsalreadytouchonthehighlightswegleanedfromtheLADISworkshop.
Clearly,cloudcomputingisheretostay,andposestremendouslyinterestingresearchquestionsandopportunities.
Thedistributedsystemscommunity,upuntilnowatleast,ownsjustaportionofthisresearchspace(indeed,someofthetopicsmentionedaboveareentirelyoutsideofourarea,oratbesttangential).
6LADIS2009Inconclusion,LADIS2008seemstohavebeenanunqualiedsuccess,andindeed,afarmorethought-provokingworkshopthanwethreehaveattendedinsometime.
ThekeywasthatLADISgeneratedspiriteddialogbetweendistributedsystemsresearchersandpractitioners,butalsothattheparticularpractitionerswhoparticipatedsharedsomuchofourbackgroundandexperience.
Whenresearchersandsystembuildersmeet,thereisoftenanimpedancemismatch,butinthecaseofLADIS2008wemanagedtollaroomwithpeoplewhoshareacommonbackgroundandwayofthinking,andyetseethecloudcomputingchallengefromverydistinctperspectives.
LADIS2009isnowbeingplannedrunningjustbeforetheACMSymposiumonOperatingSystemsinOctober2009,atBigSkyResortinUtah.
Incontrasttothetwopreviousworkshops,thisyear'seventisofciallysponsoredbyACMSIGOPS,andthepapersarebeingsolicitedthroughbothanopenCallForPapers,andtargetedsolicitation.
IfSOSP2009isn'talreadyenoughofanattraction,wewouldhopethatreadersofthisessaymightconsiderLADIS2009tobeabsolutelyirresistible!
Youaremostcordiallyinvitedtosubmitapaperandattendtheworkshop.
MoreinformationabouttheupcomingLADISworkshopcanbefoundonitswebsite5andinthecallforpapers6.
TherstLADISwasheldatIBMHaifaResearchLabin20077.
References[1]M.
K.
Aguilera,A.
Merchant,M.
Shah,A.
Veitch,andC.
Karamanolis.
Sinfonia:anewparadigmforbuildingscalabledistributedsystems.
InProceedingsofSOSP'07,pages159–174,Stevenson,WA,2007.
[2]Amazon.
com.
Amazonsimplestorageservice(AmazonS3).
2009.
http://aws.
amazon.
com/s3.
[3]Apache.
org.
HDFSarchitecture.
2009.
http://hadoop.
apache.
org/core/docs/current/hdfs_design.
html.
[4]M.
Burrows.
TheChubbylockserviceforloosely-coupleddistributedsystems.
InOSDI'06:Pro-ceedingsofthe7thUSENIXSymposiumonOperatingSystemsDesignandImplementation,Seattle,WA,2006.
USENIXAssociation.
5http://www.
sigops.
org/sosp/sosp09/workshops/cfp/ladis09/cfp.
pdf6http://www.
cs.
cornell.
edu/projects/ladis20097http://www.
haifa.
ibm.
com/conferences/ladis2007/index.
htmlACMSIGACTNews78June2009Vol.
40,No.
2[5]C.
Cachin,I.
Keidar,andA.
Shraer.
Fail-awareuntrustedstorage.
InDSN'09:thethirty-ninthAnnualInternationalConferenceonDependableSystemsandNetworks,Lisbon,Portugal,2009.
IEEE/IFIP.
[6]M.
CastroandB.
Liskov.
PracticalByzantineFaultTolerance.
InOSDI'99:ProceedingsofthethirdSymposiumonOperatingSystemsDesignandImplementation,pages173–186,NewOrleans,LA,1999.
USENIXAssociation.
[7]F.
Chang,J.
Dean,S.
Ghemawat,W.
C.
Hsieh,WallachD.
A.
,M.
Burrows,T.
Chandra,A.
Fikes,andR.
E.
Gruber.
Bigtable:Adistributedstoragesystemforstructureddata.
InOSDI'06:SeventhSymposiumonOperatingSystemDesignandImplementation,Seattle,WA,November2006.
[8]G.
Chockler,I.
Keidar,andR.
Vitenberg.
Groupcommunicationspecications:acomprehensivestudy.
ACMComputingSurveys,33(4):427–469,2001.
[9]G.
Chockler,R.
Melamed,Y.
Tock,andR.
Vitenberg.
Spidercast:ascalableinterest-awareoverlayfortopic-basedpub/subcommunication.
InDEBS'07:Proceedingsofthe2007inauguralInternationalConferenceonDistributedEvent-BasedSystems,pages14–25,Toronto,ON,2007.
[10]A.
Clement,M.
Marchetti,E.
Wong,L.
Alvisi,andM.
Dahlin.
BFT:thetimeisnow.
InLADIS'08[21].
http://doi.
acm.
org/10.
1145/1529974.
1529992.
[11]J.
DeanandS.
Ghemawat.
MapReduce:simplieddataprocessingonlargeclusters.
InOSDI'04:Proceedingsofthe6thSymposiumonOperatingSystemsDesignandImplementation,SanFrancisco,CA,2004.
USENIXAssociation.
[12]G.
DeCandia,D.
Hastorun,M.
Jampani,G.
Kakulapati,A.
Lakshman,A.
Pilchin,S.
Sivasubramanian,P.
Vosshall,andW.
Vogels.
Dynamo:Amazon'shighlyavailablekey-valuestore.
InProceedingsofSOSP'07,pages205–220,Stevenson,WA,2007.
[13]B.
Devlin,J.
Gray,B.
Laing,andG.
Spix.
Scalabilityterminology:Farms,clones,parti-tions,andpacks:RACSandRAPS.
TechnicalReportMS-TR-99-85,MicrosoftResearch,1999.
ftp://ftp.
research.
microsoft.
com/pub/tr/tr-99-85.
doc.
[14]X.
Fan,W.
-D.
Weber,andL.
A.
Barroso.
Powerprovisioningforawarehouse-sizedcomputer.
InISCA'07:Proceedingsofthe34thannualInternationalSymposiumonComputerArchitecture,pages13–23,SanDiego,CA,2007.
ACM.
[15]S.
Ghemawat,H.
Gobioff,andS.
-T.
Leung.
TheGoogleFileSystem.
InProceedingsofSOSP'03,pages29–43,BoltonLanding,NY,2003.
ACM.
[16]S.
Girdzijauskas,G.
Chockler,R.
Melamed,andY.
Tock.
Gravity:Aninterest-awarepublish/subscribesystembasedonstructuredoverlays(FastAbstract).
InDEBS'08:The2ndInternationalConferenceonDistributedEvent-BasedSystems,Rome,Italy,2008.
[17]J.
Hamilton.
OndesigninganddeployingInternet-scaleservices.
InLISA'07:Proceedingsofthe21stconferenceonLargeInstallationSystemAdministration,pages1–12,Dallas,TX,2007.
USENIXAssociation.
[18]DangaInteractive.
memcached:adistributedmemoryobjectcachingsystem.
2009.
http://www.
danga.
com/memcached.
ACMSIGACTNews79June2009Vol.
40,No.
2[19]J.
KirschandY.
Amir.
Paxosforsystembuilders:Anoverview.
InLADIS'08[21].
http://doi.
acm.
org/10.
1145/1529974.
1529979.
[20]R.
Kotla,L.
Alvisi,M.
Dahlin,A.
Clement,andE.
Wong.
Zyzzyva:speculativeByzantinefaulttolerance.
InProceedingsofSOSP'07,pages45–58,Stevenson,WA,2007.
ACM.
[21]LADIS'08:Proceedingsofthe2ndWorkshoponLarge-ScaleDistributedSystemsandMiddleware.
YorktownHeights,NY,USA,2008.
ACM.
http://doi.
acm.
org/10.
1145/1529974.
[22]L.
Lamport.
Thepart-timeparliament.
ACMTransactionsonComputerSystems,16(2):133–169,1998.
[23]J.
MacCormick,N.
Murphy,M.
Najork,C.
A.
Thekkath,andL.
Zhou.
Boxwood:abstractionsasthefoundationforstorageinfrastructure.
InOSDI'04:Proceedingsofthe6thsymposiumonOperatingSystemsDesignandImplementation,SanFrancisco,CA,2004.
USENIXAssociation.
[24]DPatterson.
RecoveryOrientedComputing.
2009.
http://roc.
cs.
berkeley.
edu.
[25]B.
ReedandF.
P.
Junqueira.
Asimpletotallyorderedbroadcastprotocol.
InLADIS'08[21].
http://doi.
acm.
org/10.
1145/1529974.
1529978.
[26]A.
Singh,P.
Fonseca,P.
Kuznetsov,R.
Rodrigues,andP.
Maniatis.
DeningweaklyconsistentByzan-tinefault-tolerantservices.
InLADIS'08[21].
http://doi.
acm.
org/10.
1145/1529974.
1529990.
[27]A.
Singh,P.
Fonseca,P.
Kuznetsov,R.
Rodrigues,andP.
Maniatis.
Zeno:EventuallyconsistentByzan-tinefaulttolerance.
InNSDI'09:ProceedingsofUSENIXNetworkedSystemsDesignandImplemen-tation,Boston,MA,2009.
USENIXAssociation.
[28]D.
B.
Terry,M.
M.
Theimer,K.
Petersen,A.
J.
Demers,M.
J.
Spreitzer,andC.
H.
Hauser.
ManagingupdateconictsinBayou,aweaklyconnectedreplicatedstoragesystem.
InProceedingsofSOSP'95,pages172–182,CopperMountain,CO,1995.
ACM.
[29]R.
VanRenesse,K.
P.
Birman,andW.
Vogels.
Astrolabe:Arobustandscalabletechnologyfordis-tributedsystemsmonitoring,management,anddatamining.
ACMTransactionsonComputerSystems,21(3),May2003.
[30]R.
VanRenesse,D.
Dumitriu,V.
Gough,andC.
Thomas.
Efcientreconciliationandowcontrolforanti-entropyprotocols.
InLADIS'08[21].
http://doi.
acm.
org/10.
1145/1529974.
1529983.
[31]Y.
Vigfusson,H.
Abu-Libdeh,M.
Balakrishnan,K.
Birman,andY.
Tock.
Dr.
Multicast:Rxfordata-centercommunicationscalability.
InHotNetsVII:SeventhACMWorkshoponHotTopicsinNetworks.
ACM,2008.
[32]P.
YalagandulaandM.
Dahlin.
AScalableDistributedInformationManagementSystem.
InProceed-ingsofSIGCOMM'04,Portland,OR,August2004.
ACM.
[33]Y.
Yu,M.
Isard,D.
Fetterly,M.
Budiu,U.
Erlingsson,P.
K.
Gunda,andJ.
Currey.
DryadLINQ:Asystemforgeneral-purposedistributeddata-parallelcomputingusingahigh-levellanguage.
InProceedingsofOSDI'08,SanDiego,CA,December2008.
http://research.
microsoft.
com/en-us/projects/DryadLINQ.
ACMSIGACTNews80June2009Vol.
40,No.
2TrustingtheCloudChristianCachinIditKeidarAlexanderShraerIBMResearchTechnionTechnionZurich,SwitzerlandHaifa,IsraelHaifa,Israelcca@zurich.
ibm.
comidish@ee.
technion.
ac.
ilshralex@tx.
technion.
ac.
ilAbstractMoreandmoreusersstoredatain"clouds"thatareaccessedremotelyovertheInternet.
Wesurveywell-knowncryptographictoolsforprovidingintegrityandconsistencyfordatastoredincloudsanddiscussrecentresearchincryptographyanddistributedcomputingaddressingtheseproblems.
StoringdataincloudsManyprovidersnowofferawidevarietyofexibleonlinedatastorageservices,rangingfrompassiveones,suchasonlinearchiving,toactiveones,suchascollaborationandsocialnetworking.
Theyhavebecomeknownascomputingandstorage"clouds.
"Suchcloudsallowuserstoabandonlocalstorageanduseonlinealternatives,suchasAmazonS3,NirvanixCloudNAS,orMicrosoftSkyDrive.
SomecloudprovidersutilizethefactthatonlinestoragecanbeaccessedfromanylocationconnectedtotheInternet,andofferadditionalfunctionality;forexample,AppleMobileMeallowsuserstosynchronizecommonapplicationsthatrunonmultiplesdevices.
Cloudsalsooffercomputationresources,suchasAmazonEC2,whichcansignicantlyreducethecostofmaintainingsuchresourceslocally.
Finally,onlinecollaborationtools,suchasGoogleAppsorversioningrepositoriesforsourcecode,makeiteasytocollaboratewithcolleaguesacrossorganizationsandcountries,aspracticedbytheauthorsofthispaper.
WhatcangowrongAlthoughtheadvantagesofusingcloudsareunarguable,therearemanyrisksinvolvedwithreleasingcontroloveryourdata.
Oneconcernthatmanyusersareawareofislossofprivacy.
Nevertheless,thepopularityofsocialnetworksandonlinedatasharingrepositoriessuggeststhatmanyusersarewillingtoforfeitprivacy,atleasttosomeextent.
Settingprivacyaside,inthisarticlewesurveywhatelse"cangowrong"whenyourdataisstoredinacloud.
ACMSIGACTNews81June2009Vol.
40,No.
2Availabilityisamajorconcernwithanyonlineservice,assuchservicesareboundtohavesomedown-time.
ThiswasrecentlythecasewithGoogleMail1,Hotmail2,AmazonS33andMobileMe4.
Usersmustalsounderstandtheirservicecontractwiththestorageprovider.
Forexample,whathappensifyourpaymentforthestorageislateCanthestorageproviderdecidethatoneofyourdocumentsviolatesitspolicyandterminateyourservice,denyingyouaccesstothedataEventheworstscenariossometimescometrue—acloudstorage-providernamedLinkUp(MediaMax)wentoutofbusinesslastyearafterlosing45%ofstoredclientdataduetoanerrorofasystemadministrator5.
Thisincidentalsorevealedthatitissome-timesverycostlyforstorageproviderstokeepstoringoldclientdata,andtheylookforwaystoofoadthisresponsibilitytoathirdparty.
CanaclientmakesurethathisdataissafeandavailableNolessimportantisguaranteeingtheintegrityofremotelystoreddata.
Oneriskisthatdatacanbedamagedwhileintransittoorfromthestorageprovider.
Additionally,cloudstorage,likeanyremoteservice,isexposedtomaliciousattacksfrombothoutsideandinsidetheprovider'sorganization.
Forexample,theserversoftheRedHatLinuxdistributionwererecentlyattackedandtheintrudermanagedtointroduceavulnerabilityandevensignsomepackagesoftheLinuxoperating-systemdistribution6.
InitsSecurityAdvisoryabouttheincident,RedHatstated:.
.
.
weremainhighlycondentthatoursystemsandprocessespreventedtheintrusionfromcompromisingRHNorthecontentdistributedviaRHNandaccordinglybelievethatcustomerswhokeeptheirsystemsupdatedusingRedHatNetworkarenotatrisk.
Unauthorizedaccesstouserdatacanoccurevenwhennohackersareinvolved,e.
g.
,resultingfromasoftwaremalfunctionattheprovider.
SuchdatabreachoccurredinGoogleDocs7duringMarch2009andledtheElectronicPrivacyInformationCentertopetition8withtheFederalTradeCommissionaskingto"openaninvestigationintoGoogle'sCloudComputingServices,todeterminetheadequacyoftheprivacyandsecuritysafeguards.
.
.
".
Anotherexample,wheredataintegritywascompromisedasaresultofprovidermalfunctions,isarecentincidentwithAmazonS3,whereusersexperiencedsilentdatacorruption9.
LaterAmazonstatedinresponsetousercomplaints10:We'veisolatedthisissuetoasingleloadbalancerthatwasbroughtintoserviceat10:55pmPDTonFriday,6/20.
Itwastakenoutofserviceat11amPDTSunday,6/22.
WhileitwasinserviceithandledasmallfractionofAmazonS3'stotalrequestsintheUS.
Intermittently,underload,itwascorruptingsinglebytesinthebytestream.
.
.
Basedonourinvestigationwithbothinternalandexternalcustomers,thesmallamountoftrafcreceivedbythisparticularloadbalancer,andtheintermittentnatureoftheaboveissueonthisoneloadbalancer,thisappearstohaveimpactedaverysmallportionofPUTsduringthistimeframe.
Afurthercomplicationariseswhenmultipleuserscollaborateusingcloudstorage(orsimplywhenoneusersynchronizesmultipledevices).
Here,consistencyunderconcurrentaccessmustbeguaranteed.
1http://googleblog.
blogspot.
com/2009/02/current-gmail-outage.
html2http://www.
datacenterknowledge.
com/archives/2009/03/12/downtime-for-hotmail3http://status.
aws.
amazon.
com/s3-20080720.
html4http://blogs.
zdnet.
com/projectfailures/p=9085http://blogs.
zdnet.
com/projectfailures/p=9996https://rhn.
redhat.
com/errata/RHSA-2008-0855.
html7http://blogs.
wsj.
com/digits/2009/03/08/1214/8http://cloudstoragestrategy.
com/2009/03/trusting-the-cloud-the-ftc-and-google.
html9http://blogs.
sun.
com/gbrunett/entry/amazon_s3_silent_data_corruption10http://developer.
amazonwebservices.
com/connect/thread.
jspathreadID=22709ACMSIGACTNews82June2009Vol.
40,No.
2ApossiblesolutionthatcomestomindisusingaByzantinefault-tolerantreplicationprotocolwithinthecloud(e.
g.
,[14]);indeedthissolutioncanprovideperfectconsistencyandatthesametimepreventdatacor-ruptioncausedbysomethresholdoffaultycomponentswithinthecloud.
However,sinceitisreasonabletoassumethatmostoftheserversbelongingtoaparticularcloudproviderrunthesamesysteminstallationandaremostlikelytobephysicallylocatedinthesameplace(orevenrunonthesamemachine),suchprotocolsmightbeinappropriate.
Moreover,cloud-storageprovidersmighthaveotherreasonstoavoidByzantinefault-tolerantconsensusprotocols,asexplainedbyBirmanetal.
[3].
Finally,evenifthissolvestheprob-lemfromtheperspectiveofthestorageprovider,herewearemoreinterestedintheusers'perspective.
Auserperceivesthecloudasasingletrustdomainandputstrustinit,whatevertheprecautionstakenbytheproviderinternallymightbe;inthissense,thecloudisnotdifferentfromasingleremoteserver.
Notethatwhenmultiplecloudsfromdifferentprovidersareused,runningByzantine-fault-tolerantprotocolsacrossseveralcloudsmightbeappropriate(seenextsection).
WhatcanwedoUserscanlocallymaintainasmallamountoftrustedmemoryandusewell-knowncryptographicmethodsinordertosignicantlyreducetheneedfortrustinthestoragecloud.
Ausercanverifytheintegrityofhisremotelystoreddatabykeepingashorthashinlocalmemoryandauthenticatingserverresponsesbyre-calculatingthehashofthereceiveddataandcomparingittothelocallystoredvalue.
Whenthevolumeofdataislarge,thismethodisusuallyimplementedusingahashtree[25],wheretheleavesarehashesofdatablocks,andinternalnodesarehashesoftheirchildreninthetree.
Auseristhenabletoverifyanydatablockbystoringonlytheroothashofthetreecorrespondingtohisdata[4].
Thismethodrequiresalogarithmicnumberofcryptographicoperationsinthenumberofblocks,asonlyonebranchofthetreefromtheroottothehashofanactualdatablockneedstobechecked.
Hashtreeshavebeenemployedinmanystorage-systemprototypes(TDB[22]andSiRiUS[13]arejusttwoexamples)andareusedcommerciallyintheSolarisZFSlesystem11.
Researchonefcientcryptographicmethodsforauthenticatingdatastoredonserversisanactivearea[26,28].
Althoughthesemethodspermitausertoverifytheintegrityofdatareturnedbyaserver,theydonotallowausertoascertainthattheserverisabletoansweraquerycorrectlywithoutactuallyissuingthatparticularquery.
Inotherwords,theydonotassuretheuserthatallthedatais"stillthere".
Astheamountofdatastoredbythecloudforaclientcanbeenormous,itisimpractical(andmightalsobeverycostly)toretrieveallthedata,ifone'spurposeisjusttomakesurethatitisstoredcorrectly.
Inrecentwork,JuelsandKaliski[18]andAtenieseetal.
[2]introducedprotocolsforassuringaclientthathisdataisretrievablewithhighprobability,underthenameofProofsofRetrievability(PORs)andProofsofDataPossession(PDP),respectively.
Theyincuronlyasmall,nearlyconstantoverheadincommunicationcomplexityandsomecomputationaloverheadbytheserver.
Thebasicideainsuchprotocolsisthatadditionalinformationisencodedinthedatapriortostoringit.
Tomakesurethattheserverreallystoresthedata,ausersubmitschallengesforasmallsampleofdatablocks,andveriesserverresponsesusingtheadditionalinformationencodedinthedata.
Recently,someimprovedschemeshavebeenproposedandprototypesystemshavebeenimplemented[29,6,5].
Theabovetoolsallowasingleusertoverifytheintegrityandavailabilityofhisowndata.
Butwhenmultipleusersaccessthesamedata,theycannotguaranteeintegritybetweenawriterandmultiplereaders.
Digitalsignaturesmaybeusedbyaclienttoverifyintegrityofdatacreatedbyothers.
Usingthismethod,eachclientneedstosignallhisdata,aswellastostoreanauthenticatedpublickeyoftheothersorthe11http://blogs.
sun.
com/bonwick/entry/zfs_end_to_end_dataACMSIGACTNews83June2009Vol.
40,No.
2rootcerticateofapublic-keyinfrastructureintrustedmemory.
Thismethod,however,doesnotruleoutallattacksbyafaultyormaliciousstorageservice.
Evenifalldataissignedduringwriteoperations,theservermightomitthelatestupdatewhenrespondingtoareader,andevenworse,itmight"splititsbrain,"hidingupdatesofdifferentclientsfromeachother.
Somesolutionsusetrustedcomponentsinthesystem[11,31]whichallowclientstoaudittheserver,guaranteeingatomicityeveniftheserverisfaulty.
Withoutadditionaltrustassumptions,theatomicityofalloperationsinthesenseoflinearizability[16]cannotbeguaranteed;infact,evenweakerconsistencynotions,likesequentialconsistency[19],arenotpossibleeither[9].
Thoughausermaybecomesuspiciouswhenhedoesnotseeanyupdatesfromacollaborator,theusercanonlybecertainthattheserverisnotholdingbackinformationbycommunicatingwiththecollaboratordirectly;suchuser-to-usercommunicationisindeedemployedinsomesystemsforthispurpose.
Ifnotatomicity,thenwhatconsistencycanbeguaranteedtoclientsThersttoaddressthisproblemwereMazi`eresandShasha[24],whodenedaso-calledforkingconsistencycondition.
Thisconditionensuresthatifcertainclients'perceptionoftheexecutionbecomesdifferent,forexampleiftheserverhidesarecentvalueofacompletedwritefromareader,thenthesetwoclientswillneveragainseeeachother'sneweroperations,orelsetheserverwillbeexposedasfaulty.
Thispreventsasituationwhereoneuserseespartoftheupdatesissuedbyanotheruser,andtheservercanchoosewhichones.
Moreover,fork-consistencypreventsAlicefromseeingnewupdatesbyBobandbyCarol,whileBobseesonlyAlice'supdates,whereAliceandBobmightthinktheyaremutuallyconsistent,thoughtheyactuallyseedifferentstates.
Essentially,withforkconsistency,eachclienthasalinearizableviewofasub-sequenceoftheexecution,andclientviewscanonlybecomedisjointoncetheydivergefromacommonprex;asimpledenitioncanbefoundin[7].
Therstprotocolofthiskind,realizingfork-consistentstorage,wasimplementedintheSUNDRsystem[20].
Tosavecostandtoimproveperformance,severalweakerconsistencyconditionshavebeenproposed.
Thenotionoffork-sequential-consistency,introducedbyOpreaandReiter[27],allowsclientviewstovio-latereal-timeorderoftheexecution.
Thefork-*consistencyconditionduetoLiandMazi`eres[21]allowstheviewsofclientstoincludeonemoreoperationwithoutdetectinganattackaftertheirviewshavediverged.
ThisconditionwasusedtoprovidemeaningfulserviceinaByzantine-fault-tolerantreplicatedsystem,evenwhenmorethanathirdofthereplicasarefaulty[21].
Althoughconsistencyinthefaceoffailuresiscrucial,itisnolessimportantthattheserviceisunaffectedinthecommoncasebytheprecautionstakentodefendagainstafaultyserver.
Inrecentwork[8,7],weshowthatforallpreviouslyexistingforkingconsistencyconditions,andthusintheprotocolsthatimplementthemwithasingleremoteserver,concurrentoperationsbydifferentclientsmayblockeachothereveniftheprovideriscorrect.
Moreformally,theseconsistencyconditionsdonotallowforprotocolsthatarewait-free[15]whenthestorageprovideriscorrect.
Wehavealsointroducedanewconsistencynotion,calledweakfork-linearizability,thatdoesnotsufferfromthislimitation,andyetprovidesmeaningfulsemanticstoclients[7].
Onedisadvantageofforkingconsistencyconditionsisthattheyarenotsointuitivetounderstandasatomicity,forexample.
Aimingtoprovidesimplerguarantees,wehaveintroducedthenotionofaFail-AwareUntrustedService[7].
Itsbasicideaisthateachusershouldknowwhichofhisoperationsareseenconsistentlybyeachoftheotherusers,andinaddition,ndoutwhenevertheserverviolatesatomicity.
Whenallgoeswell,eachoperationofausereventuallybecomes"stable"withrespecttoeveryothercorrectuser,inthesensethattheyhaveacommonviewoftheexecutionuptothisoperation.
Thus,inallcases,usersgeteitherpositivenoticationsindicatingoperationstability,ornegativenoticationswhentheserverviolatesatomicity.
OurFail-AwareUntrustedServicesrelyonthewell-establishednotionsofeventualconsistency[30]andfail-awareness[12],andadaptthemtothissetting.
TheFAUSTprotocol[7]implementsACMSIGACTNews84June2009Vol.
40,No.
2thisnotionforastorageservice,usinganunderlyingweakfork-linearizablestorageprotocol.
Intuitively,FAUSTindicatesstabilityassoonasadditionalinformationisgathered,eitherthroughthestorageprotocol,orwhenevertheclientscommunicatedirectly.
However,allcompleteoperations,eventhosenotyetknowntobestable,preservecausality[17].
Moreover,whenthestorageserveriscorrect,FAUSTguaranteesstrongsafety(linearizability)andliveness(wait-freedom).
Obviously,ifthecloudproviderviolatesitsspecicationorsimplydoesnotrespond,notmuchcanbedoneotherthandetectingthisandtakingone'sbusinesselsewhereinthefuture.
Itis,however,possibletobemoreprudent,andusemultiplecloudprovidersfromtheoutset,andhereonecanbenetfromthefruitfulresearchonByzantine-fault-tolerantprotocols.
OnepossibilityisrunningByzantine-fault-tolerantstate-machinereplication,whereeachcloudmaintainsasinglereplica[10,14].
Thisapproach,however,requirescomputingresourceswithinthecloud,asprovided,e.
g.
,byAmazonEC2,andnotonlystorage.
Whenonlyasimplestorageinterfaceisavailable,onecanworkwithByzantineQuorumSystems[23],e.
g.
,byusingByzantineDiskPaxos[1].
However,inordertoguaranteetheatomicityofuseroperationsandtotoleratethefailureofonecloud,suchprotocolsmustemployatleastfourdifferentclouds.
SummaryThoughcloudsarebecomingincreasinglypopular,wehaveseenthatsomethingscan"gowrong"whenonetrustsacloudproviderwithhisdata.
Providingdefensesfortheseisanactiveareaofresearch.
Wepresentedabriefsurveyofsolutionsbeingproposedinthiscontext.
Nevertheless,thesesolutionsare,atthispointintime,academic.
Therearestillquestionsregardinghowwelltheseprotectionscanworkinpractice,andmoreover,howeasy-to-usetheycanbe.
Finally,wehaveyettoseehowpopularstoringdataincloudswillbecome,andwhatprotectionsuserswillchoosetouse,ifany.
References[1]I.
Abraham,G.
Chockler,I.
Keidar,andD.
Malkhi.
ByzantinediskPaxos:OptimalresiliencewithByzantinesharedmemory.
DistributedComputing,18(5):387–408,2006.
[2]G.
Ateniese,R.
Burns,R.
Curtmola,J.
Herring,L.
Kissner,Z.
Peterson,andD.
Song.
Provabledatapossessionatuntrustedstores.
InProc.
ACMCCS,pages598–609,2007.
[3]K.
Birman,G.
Chockler,andR.
vanRenesse.
Towardsacloudcomputingresearchagenda.
SIGACTNews,40(2),June2009.
[4]M.
Blum,W.
Evans,P.
Gemmell,S.
Kannan,andM.
Naor.
Checkingthecorrectnessofmemories.
Algorithmica,12:225–244,1994.
[5]K.
D.
Bowers,A.
Juels,andA.
Oprea.
Hail:Ahigh-availabilityandintegritylayerforcloudstorage.
CryptologyePrintArchive,Report2008/489,2008.
http://eprint.
iacr.
org/.
[6]K.
D.
Bowers,A.
Juels,andA.
Oprea.
Proofsofretrievability:Theoryandimplementation.
CryptologyePrintArchive,Report2008/175,2008.
http://eprint.
iacr.
org/.
[7]C.
Cachin,I.
Keidar,andA.
Shraer.
Fail-awareuntrustedstorage.
InProc.
DSN2009,toappear.
FullpaperavailableasTech.
ReportCCIT712,DepartmentofElectricalEngineering,Technion,Dec.
2008.
[8]C.
Cachin,I.
Keidar,andA.
Shraer.
Forksequentialconsistencyisblocking.
IPL,109(7),2009.
[9]C.
Cachin,A.
Shelat,andA.
Shraer.
Efcientfork-linearizableaccesstountrustedsharedmemory.
InProc.
PODC,pages129–138,2007.
ACMSIGACTNews85June2009Vol.
40,No.
2[10]M.
CastroandB.
Liskov.
Practicalbyzantinefaulttolerance.
InProc.
OSDI,pages173–186,1999.
[11]B.
-G.
Chun,P.
Maniatis,S.
Shenker,andJ.
Kubiatowicz.
Attestedappend-onlymemory:Makingadversariessticktotheirword.
InProc.
SOSP,pages189–204,2007.
[12]C.
FetzerandF.
Cristian.
Fail-awarenessintimedasynchronoussystems.
InProc.
PODC,1996.
[13]E.
-J.
Goh,H.
Shacham,N.
Modadugu,andD.
Boneh.
Sirius:Securingremoteuntrustedstorage.
InProc.
NDSS,2003.
[14]J.
Hendricks,G.
R.
Ganger,andM.
K.
Reiter.
Low-overheadByzantinefault-tolerantstorage.
InProc.
SOSP,2007.
[15]M.
Herlihy.
Wait-freesynchronization.
ACMTOPLAS,11(1),1991.
[16]M.
P.
HerlihyandJ.
M.
Wing.
Linearizability:Acorrectnessconditionforconcurrentobjects.
ACMTOPLAS,12(3),1990.
[17]P.
W.
HuttoandM.
Ahamad.
Slowmemory:Weakeningconsistencytoenchanceconcurrencyindistributedsharedmemories.
InProc.
ICDCS,1990.
[18]A.
JuelsandB.
S.
K.
Jr.
Pors:proofsofretrievabilityforlargeles.
InProc.
ACMCCS,pages584–597,2007.
[19]L.
Lamport.
Howtomakeamultiprocessorcomputerthatcorrectlyexecutesmultiprocessprogranm.
IEEETrans.
Comput.
,28(9):690–691,1979.
[20]J.
Li,M.
Krohn,D.
Mazi`eres,andD.
Shasha.
Secureuntrusteddatarepository(SUNDR).
InProc.
OSDI,2004.
[21]J.
LiandD.
Mazi`eres.
Beyondone-thirdfaultyreplicasinByzantinefaulttolerantsystems.
InProc.
NSDI,2007.
[22]U.
Maheshwari,R.
Vingralek,andW.
Shapiro.
Howtobuildatrusteddatabasesystemonuntrustedstorage.
InProc.
OSDI,2000.
[23]D.
MalkhiandM.
K.
Reiter.
Byzantinequorumsystems.
DistributedComputing,11(4):203–213,1998.
[24]D.
Mazi`eresandD.
Shasha.
BuildingsecurelesystemsoutofByzantinestorage.
InProc.
PODC,2002.
[25]R.
C.
Merkle.
Protocolsforpublickeycryptosystems.
InIEEESymposiumonSecurityandPrivacy,pages122–134,1980.
[26]E.
Mykletun,M.
Narasimha,andG.
Tsudik.
Authenticationandintegrityinoutsourceddatabases.
Trans.
Storage,2(2):107–138,2006.
[27]A.
OpreaandM.
K.
Reiter.
Onconsistencyofencryptedles.
InProc.
DISC,2006.
[28]C.
Papamanthou,R.
Tamassia,andN.
Triandopoulos.
Authenticatedhashtables.
InProc.
ACMCCS,pages437–448,2008.
[29]H.
ShachamandB.
Waters.
Compactproofsofretrievability.
InJ.
Pieprzyk,editor,ProceedingsofAsiacrypt2008,volume5350ofLNCS,pages90–107.
Springer-Verlag,Dec.
2008.
[30]D.
B.
Terry,M.
Theimer,K.
Petersen,A.
J.
Demers,M.
Spreitzer,andC.
Hauser.
ManagingupdateconictsinBayou,aweaklyconnectedreplicatedstoragesystem.
InProc.
SOSP,1995.
[31]A.
R.
YumerefendiandJ.
S.
Chase.
Strongaccountabilityfornetworkstorage.
ACMTransactionsonStorage,3(3),2007.
ACMSIGACTNews86June2009Vol.
40,No.
2Open-SourceGridTechnologiesforWeb-ScaleComputingEdwardBortnikovYahoo!
Researchebortnik@yahoo-inc.
com1IntroductionAnalyzingweb-scaledatasetshasbecomeakeyroutineforallleadingInternetcompanies.
Forexample,machine-learningofsearchrelevanceresultsfromwebquerylogsprovidesvitalfeedbackforimprovingthequalityofInternetsearch.
Consider,forexample,thetaskofingestingtheeventsgeneratedbyonlineusers.
Billionsofinterestingevents(e.
g.
,webqueriesandadclicks)happeningdailytranslatetomulti-terabytedatacollections.
Real-timecapturing,storage,andanalysisofthisdataarecommonneedsofallhigh-endonlineapplications.
Gridcomputingtechnologiesthatemergedinrecentyears(e.
g.
[5,16,17,20,21])addresstheserequire-ments,thusenablingdata-intensivesupercomputingatwebscale.
TheyallowedestablishingdatacenterswithhundredsofthousandsofCPUcores,terabytesofRAM,andpetabytesofdiskspace(e.
g.
,[1]),inwhichmultipledataprocessingapplicationsshareacommoninfrastructure.
Typically,thesedatacentersharnesscommodityhardware–off-the-shelfPCswithdirectly-attachedstorage1.
Gridcomputingsoftwareletsdeveloperseasilywrite,deploy,andrundata-intensiveapplications,whichcommonlyrequire:1.
Storagemanagementofpetabytes.
2.
Parallelhigh-speedaccesstothestoreddata.
3.
Reliabilityof(1)and(2)inthefaceofhardware,software,andnetworkingfailures.
Whiletherstgenerationofgridmiddlewarewasmainlyproprietary(e.
g.
,[17,21]),theopen-sourcecom-munityisrapidlycatchingupwithitsowntechnology,ApacheHadoop[5],whichallowsmuchwiderex-posureandfasterinnovation.
HadoopwasstartedbyDougCuttingin2005,andbecameatop-levelApacheprojectin2008.
Nowadays,Hadoopisembracedbyavarietyofacademicandindustrialusers,including(inalphabeticalorder)AmazonA9,CornellUniversity,ETH,Facebook,IBM,Microsoft,Yahoo!
,andmanyothers[4].
1Recently,Googleunveiledacustomserverdesignwhichusesstandardcomponents[1].
ACMSIGACTNews87June2009Vol.
40,No.
2HadoopCore,themostmaturepartofthistechnology,providestwomainabstractions:adistributedlesystem,HDFS(Section2),andaMapReduceprogrammingframeworkforprocessinglargedatasets(Section3).
HigherlevelsinthesoftwarestackfeaturePig[8]andHive[7],user-friendlyparalleldataprocessinglanguages,Zookeeper(Section4),ahigh-availabilitydirectoryandcongurationservice,andHBase[6],aweb-scaledistributedcolumn-orientedstoremodeledafteritsproprietarypredecessors[13,14].
PartsoftheHadoopecosystememergedfromoriginalresearch[22,25].
Researchersanddevelopersintheopen-sourcecommunityareworkingonavarietyofopenproblems,rangingfromnewparadigmsforlarge-scaleinformationprocessingtosystems-relatedissueslikefaulttolerance,taskscheduling,andpowermanagement.
Section5discussessomeoftheseefforts.
2HDFSTheHadoopdistributedlesystemisdesignedforbatchprocessingapplicationsthatneedstreamingaccesstoverylargelesacrossmultiplemachines(nodes).
Theseobjectivesleadtoafewcleardesignchoices:1.
Movingcomputationclosertothedata.
Experienceindevelopinghigh-performancecomputingsys-temsteachesthatmovingcomputationismoreefcientthanmovingdata.
ThisiswhyHDFSdoesnotseparatedatanodesfromcomputationnodes(asmanyenterprisestoragesystemsdo).
Instead,itoptsforstoringdataoninexpensivedirectly-attacheddisks,andprovidesAPI-levelvisibilityintodataplacement,thusenablingdata-drivenapplicationmigration.
2.
Relaxedleaccesssemantics.
DatasetsstoredinHDFSaretypicallyaccessedinawrite-once-read-manypattern,incontrastwithamixedread/writeaccesswhichintraditionalmulti-userlesystems.
Forthisreason,somehardconsistencyrequirementsofthetraditionalPOSIXAPIcanbesacricedforthesakeofimprovedperformance.
Forexample,writescanbelost(andredonepriortotherstread),andthereisnoneedinlockingforconcurrencycontrol.
3.
Largeread-onlyles.
HDFS'sperformanceistunedtolargesequentialscans,whichaffectsthedisklayoutandaccessoptimizations(e.
g.
,localcaching).
4.
Handlinghardwarefailures.
Finally,thelesystemovercomesaconstantfractionofcomputer,disk,andnetworkfailuresthroughchecksumsanddatareplication.
TheHDFSarchitecturedistinguishesbetweennamenodes,whichhostthelesystem'smetadata,anddatan-odes,whichstoredatablocks.
AnHDFSclusterconsistsofasinglenamenodeandmultipledatanodes.
Anamenodeperformslesystemmanagementoperations(allocatingblockstorage,manipulatingleshan-dles,etc.
).
Adatanodemanagesitsdiskasperthemasternamenode'sinstructions,andservestheclients'datapath(read/write)requests.
Thus,thenamenodeisasinglepointofcontrolbutnotanI/Obottleneck.
Namenodeskeeppersistentlogsofcommittedcontroloperationstoenablefastrecovery.
TheBookkeeperproject(Section4)introducesadditionalfault-tolerancefeaturesfornamenodes,e.
g.
,reliableremotelog-ging.
HDFSreplicatesthedatablocksforresilience.
TheAPIallowsindependentcontrolofthereplicationfactorofeachle–i.
e.
,thenumberofcopiesofeachblockwithinthele.
OptimizingreplicaplacementdistinguishesHDFSfrommostotherdistributedlesystems.
I/O-efcientreplicaplacementpoliciesmustdealwithmultipleconstraintslikethedatacenter'stopology,LANspeedsversusdiskspeeds,etc.
Thecurrentlyadoptedrack-awarepolicyisarsteffortinthisdirection.
Itexploitsthefactthatdatanodesareorganizedintohardwareracksinterconnectedbyswitches.
Thepolicyemploysareplicationfactorof3.
ACMSIGACTNews88June2009Vol.
40,No.
2Itplacestworeplicaswithinthesamerack,andonemoreinaremoterack.
Readsareservedfromtheclosestreplicatoreducelatency,whereaswritesarepipelinedamongreplicastoincreasethroughput.
EachHDFSnamenodemonitorsthelivenessofthemanageddatanodes,aswellastheirdiskutilization.
Itcaninitiatere-distributionofdatawithinthecluster,e.
g.
,re-replicationincasesomereplicasfail,orautomaticre-partitioningincaseofunevenuseofstorage.
Adaptiveallocationpoliciesaresubjectforfutureresearch.
Largedatasetstendtobewrittenoncebyasingleuser,mostlyinstreamingmode[17].
AtypicalblocksizeinHDFSis64MB(comparedto4KBformainstreamLinuxlesystems).
Thenamenodetriestospreadmultipleblocksofthesameletodifferentdatanodes.
Thewrite-once-read-manysemanticsallowrelaxingsomeofthetraditionalPOSIXconsistencyrequirements.
Forinstance,HDFSdoesnotimplementadvisorylocksforconcurrentupdates,neitherdoesitsupportrandomwrites.
AnotherexampleinwhichHDFSdeviatesfromtraditionallesystemsisitsstagingoptimization,whichtradesclientsidecachingfordurabilityguarantees.
Underthispolicy,alecreationrequestdoesnotreachthenamenodeimmediately.
Instead,theHDFSclientcreatesatemporarylocalle.
Onlywhenenoughdatahasbeenwrittentolladatablock,theclientcontactsthenamenode,whichinsertstheleintotheHDFSnamespace.
Thenamenodeallocatesadatanodetostoretheblock(andmoredatanodesformoredata).
Ifthenamenodecrashesintheinterim,theleislost.
3MapReduceMap-reduceframeworks(e.
g.
,[21])provedtobeverynaturalforprocessinglargedatasetsinstreamingmode,e.
g.
,webindexbuildingortrainingemailspamlters.
Amap-reducejobusuallysplitstheinputdatasetintoindependentchunks,whichareprocessedbythemaptasksinparallel.
Amaptaskexecutesauserfunctiontotransforminput(key,value)pairsintoanewsetof(key,value)pairs.
Theframeworksortstheoutputsofthemaps,andforwardsthemtothereducetasks.
Areducetaskcombinesall(key,value)pairswiththesamekeyintonew(key,value)pairs.
Finally,thereducedoutputsarestoredinalesystem.
Theword-countcomputerprogram,whichoutputsthenumberofinstancesofeachwordwithinacol-lectionofles,isoftenusedtodemonstratetheparadigm.
Inthiscontext,eachmaptaskprocessesasubsetofles.
Foreachle,itemitsasequenceof(term,"1")pairsforeachwordterminthele.
Allpairswiththethesametermkeyaremappedtothesamereducetask,whichsummarizesthecountof"1"s,andoutputsit.
Aslightvariationofthisexamplebuildstermpostinglists,whichcontainalllocationsofeachtermwithinadocumentcorpus.
Thisdocumentinversionoperationisthebaseofbuildinganefcientwebsearchindex.
Hadoopimplementsamap-reduceJavaAPI,andthesupportingruntimesystem.
Fornon-Javaprogram-mers,itoffersHadoopStreaming–autilitythatallowsuserstocreateandrunjobswitharbitraryexecutables(e.
g.
,shellutilities)asthemapperandthereducer.
Programmerslookingforhigher-leveldataprocessingab-stractionscanresorteithertoPig[8,25],aproceduralyetpowerfulquerylanguage,orHive[7],aSQL-likedeclarativelanguage.
Hadoopmap-reducetasksstoretheirnalandintermediateoutputsinHDFS.
DatacompressionisusedaggressivelytoreducetherequiredI/Obandwidth.
Theruntimesystemexploitsthevisibilityofdataplace-mentwithinHDFStomovecomputationtasksclosertotheirdata.
Theframeworktakescareofschedulingtasks,monitoringthem,andre-executingthefailedones.
Taskgranularityiscongurable,whichallowstradingne-grainedloadbalancingandfastrecoveryforI/Oefciency.
Hadoopsupportsspeculativeexecu-tion–itrunsduplicatesofslowtasks,andpicksthosethatnishrst.
Thisalleviatestheneedforaccuratefailuredetectors,andsuppressesthejoblatencies'longtail.
Yahoo!
'sWebmapapplicationisanexampleofsuccessfuldeploymentofmap-reduce.
ThissystemACMSIGACTNews89June2009Vol.
40,No.
2maintainsagigantictableofinformationabouteverywebsite,page,andlinkthesearchengineknowsabout.
Webmapprovidesinfrastructureforvariousalgorithmsfore.
g.
,ranking,de-duplication,regionclas-sication,andspamdetection.
PortingWebmaptoHadoopallowedresearchersfocusontheseapplicationsratherthanontheplatform.
Thenewsystemachieved33%improvementinjoblatencycomparedtoasimilar-sizeclusterbuiltwiththeprevioustechnology.
Itslargestjobsperformabove100Kmapsand10Kreduces,handling300TBofdata,andproducing200TBofcompressedoutput[10].
4ZooKeeperandBookKeeperZooKeeper[9]isacoordinationservicefordistributedapplications.
Itexposesasimplesetofprimitivesthatdistributedapplicationscanusetosharedatareliably,e.
g.
,implementadistributedcongurationrepository.
ZooKeeperisoptimizedforread-dominatedaccesstosmallobjects(e.
g.
,applicationmetadata).
Itleveragesinpracticemanyachievementsofthedistributedalgorithmscommunity.
ZooKeeperprovidestoitsclientstheabstractionofasetofdataobjects(znodes),organizedinahi-erarchicalnamespaceresemblingalesystemstructure.
TheclientAPIincludesobjectmanipulation(create/delete),dataaccess(read/write),andchangenotications(watch).
Forfault-tolerance,allznodesarereplicatedacrossmultipleservers.
Znodesareessentiallydistributedsharedread/writeregisters[23],extendedwiththewatchabstraction.
ZooKeeperprovidessequentialconsistency[23],i.
e.
,allclientsobservethesameorderofwrites,butreadsmayreturnstaledata.
Thisapproachallowsforlocalreads–i.
e.
,aservercanreplytoaclientrequestwithoutcoordinatingwiththeotherservers.
Aclientthatwishestoreceivefreshdatacanforceitsservertosynchronize(sync)withtherestofthecluster.
ZooKeeperserversimplementaleader-basedatomicbroadcastprotocoltoguaranteeagreementontheorderofwrites.
Thisimplementationisnotwait-free(i.
e.
,somewriterequestsmaytheoreticallyblockforever[23]).
However,tooptimizeforread-dominatedworkload,ithasbeenpreferredoverawait-freeimplementationofalinearizablesharedregister[11]inwhichreadscannotbeservedlocally.
Theserviceimplementswatchestoavoidfrequentprobesforchangesonznodes.
Theorderofchangenoticationsreceivedbywatchclientsisidenticaltotheorderofwrites.
Serversmanagetheirwatcheslocally,i.
e.
,aservernotiesitsclientsuponlearningaboutachange.
Therefore,someclientsmightnotreceivenoticationsinrealtime.
Similarlytoreads,aclientmustexplicitlysyncinordertoreceivefreshnotications.
TheZooKeeperAPIallowsbuildingavarietyofsynchronizationprimitivesontopofthesharedobjectAPI-e.
g.
,adistributedlockservicelikeChubby[12].
Itoffersaneatmechanismofephemeralnodestotrackgroupmembershipchanges,e.
g.
,forsystemsthatwishtoimplementleaderelection[23].
ZooKeeperissuccessfullydeployedwithinproductionsystems,e.
g.
,Webcrawlersandpublish-subscribeplatforms.
Theproject'sroadmapincludesnewoptimizationsforreadandwritescaling,dynamicclusterre-conguration[18],andreplacingatomicbroadcastwithquorumsystems[23].
BookKeeperBookKeeperisaserviceforreliablestorageofwrite-aheadlogs.
Manycriticalsystems,e.
g.
,relationaldatabasesandjournallesystems,employwrite-aheadlogging(WAL)toguaranteerecoverabil-ity[19].
WithWAL,atransactionappendsastatechangerecordtothepersistentlog;thischangemaybeappliedtothemainstorageasynchronouslyafterthetransaction'scompletion.
Incaseofsystemcrash,re-coveryisachievedthroughreplayingchangescommittedtothelog.
WALalsoreducestransactionlatencies,becauseitreplacesrandomI/Owithsequentialwritestothelog.
ACMSIGACTNews90June2009Vol.
40,No.
2TheremoteBookKeeperservicereplicatesthelogacrossmultipleservers,orbookies.
Itcanhandlearbitrary(Byzantine)failuresoflessthan14ofallbookies,aswellasByzantineclients.
BookKeeperemploysaread-writequorumsystemforaccessingbookies[15],andstoresitsmetadatainZooKeeper.
Asaproofofconcept,BookKeeperhasbeenintegratedintotheHDFSnamenode(Section2),inwhichitreplacedthenon-fault-tolerantloggingtoalocalle.
Experimentsshowthatthischangebooststhenodenamethroughputby30%intypicalcongurations.
5WhatNextHadoop'sresourcemanagementisstillinitsinfancy.
Forexample,thesystemcansuccessfullycontrol,throughitsjobtrackercomponent,theexecutionofmultipletaskswithinasinglejobonadedicatedcluster.
TheHadoop-on-demand(HOD)technology[2]allowsprovisioningsuchisolatedvirtualclusterswithinadatacenter.
However,thisapproachmakesresourcesharingamongmultiplejobsproblematic.
Morerecentresearchanddevelopmenttargetavarietyofissues,like:1.
Betterschedulingpolicies(e.
g.
,jobpoolingbysize,andfairschedulingwithinthepools[26]).
2.
Improvingtheschedulingpoliciesinheterogeneousenvironments,e.
g.
,virtualizedinstancesdeployedwithinaremotegridinfrastructurelikeAmazonEC2[27].
3.
Sharingtasksamongmultiplemap-reducejobs,toruncommoncomputationsonce.
Atthedataprocessingside,supportinginteractivequeriesoverWeb-scaledataisthenextchallenge.
Forexample,batchqueriesovergiganticdatasetslikeWebmap(Section3)cantakehourstoevaluate.
Recentresearch[24]suggestssplittingthequeryingprocessintotwostages:rstsupplyaquerytemplate,andlatersupplyspecicinstantiationsofthetemplate.
Withthisapproach,thepre-processingstage,whichneedsnotberealtime,canpre-computeandcachethe(partial)resultsofinstantiationsthatincurhighquerylatencies.
Hadoop'swiki[3]outlinesmanymoreinterestingresearchdirections.
Theseincludeenhanceddataplacement,map-reduceperformancemodeling,improvementofparallelsortalgorithms,HDFSnamespaceexpansion,integrationwithexternalresourcemanagementservices,andmore.
6ConclusionInnovationinleadingInternetcompaniesrevolvesaroundanalyzinghugedatasets.
Moderngridtechnolo-giesoffertoolsforbuildingscalableandreliableweb-scaledatacentersforthispurpose.
Wesurveyedtherecentachievementsinthismultidisciplinaryarea,focusingontheopen-sourceHadoopsuite.
WereviewedthefundamentalsoftheHadooptechnology,andfocusedonselectedresearchprojectsindistributedcom-puting,ZooKeeperandBookKeeper,whichemergedaroundit.
Finally,weoutlinedsomeopenresearchproblemsthatawaitresolutiontosupportnext-generationwebdatacenterinfrastructure.
AcknowledgmentsThissurveyispartiallybasedonAjayAnand'stalkattheUsenix'2008conference[10].
IthankBrianCooper,FlavioJunqueira,ChristopherOlstonandBenjaminReedforprovidingmanyusefulinputs.
ACMSIGACTNews91June2009Vol.
40,No.
2References[1]Googlecontainerdatacentertour.
http://www.
youtube.
com/watchv=zRwPSFpLX8I.
[2]HadooponDemand.
http://hadoop.
apache.
org/core/docs/current/hod_user_guide.
html.
[3]HadoopResearchProjectSuggestions.
http://wiki.
apache.
org/hadoop/ProjectSuggestions#research_projects.
[4]TechnologiespoweredbyHadoop.
http://wiki.
apache.
org/hadoop/PoweredBy.
[5]TheApacheHadoopProject.
http://hadoop.
apache.
org.
[6]TheHBaseProject.
http://hadoop.
apache.
org/hbase.
[7]TheHiveProject.
http://hadoop.
apache.
org/hive/.
[8]ThePigProject.
http://hadoop.
apache.
org/pig.
[9]TheZookeeperProject.
http://hadoop.
apache.
org/zookeeper.
[10]A.
Anand.
UsingHadoopforWebscaleComputing.
ACMUsenix,2008.
http://www.
usenix.
org/events/usenix08/tech/slides/anand.
pdf.
[11]H.
Attiya,A.
Bar-Noy,andD.
Dolev.
Sharingmemoryrobustlyinmessage-passingsystems.
J.
Assoc.
Comput.
Mach.
,42:124–142,1995.
[12]M.
Burrows.
TheChubbyLockServiceforLoosely-CoupledDistributedSystems.
OSDI,2006.
[13]F.
Chang,J.
Dean,S.
Ghemawat,W.
C.
Hsieh,D.
A.
Wallach,M.
Burrows,T.
Chandra,A.
Fikes,andR.
E.
Gruber.
Bigtable:ADistributedStorageSystemforStructuredData.
OSDI,2006.
[14]B.
Cooper,R.
Ramakrishnan,U.
Srivastava,A.
Silberstein,P.
Bohannon,H.
-A.
Jacobsen,N.
Puz,D.
Weaver,andR.
Yerneni.
PNUTS:Yahoo!
'sHostedDataServingPlatform.
VLDB,2008.
[15]D.
MalkhiandM.
Reiter.
ByzantineQuorumSystems.
DistributedComputing,11(4),1998.
[16]G.
DeCandia,D.
Hastorun,M.
Jampani,G.
Kakulapati,A.
Lakshman,S.
SivasubramanianA.
Pilchin,P.
Vosshall,andW.
Vogels.
Dynamo:Amazon'sHighlyAvailableKey-ValueStore.
SOSP,2007.
http://aws.
amazon.
com/s3/.
[17]S.
Ghemawat,H.
Gobioff,andS.
-T.
Leung.
TheGoogleFileSystem.
SOSP,2003.
[18]S.
Gilbert,N.
Lynch,andA.
Shvatrsman.
RAMBOII:RapidlyRecongurableAtomicMemoryforDynamicNetworks.
DSN,2003.
[19]J.
Gray.
TransactionProcessing:ConceptsandTechniques.
MorganKaufmann,1993.
[20]M.
Isard,M.
Budiu,Y.
Yu,A.
Birrell,andD.
Fetterly.
Dryad:DistributedData-parallelProgramsfromSequentialBuildingBlocks.
EuroSys,2007.
[21]J.
DeanandS.
Ghemawat.
MapReduce:SimpliedDataProcessingonLargeClusters.
OSDI,2004.
ACMSIGACTNews92June2009Vol.
40,No.
2[22]F.
JunqueiraandB.
Reed.
Asimpletotallyorderedbroadcastprotocol.
WorkshoponLarge-ScaleDistributedSystemsandMiddleware(LADIS),2008.
[23]NancyA.
Lynch.
DistributedAlgorithms(TheMorganKaufmannSeriesinDataManagementSys-tems).
MorganKaufmann,2002.
[24]C.
Olston,E.
Bortnikov,K.
Elmeleegy,F.
Junqueira,andB.
Reed.
InteractiveAnalysisofWeb-ScaleData.
CIDR,2009.
[25]C.
Olston,B.
Reed,U.
Srivastava,R.
Kumar,andA.
Tomkins.
PigLatin:ANot-So-ForeignLanguageforDataProcessing.
SIGMOD,2008.
[26]M.
Zaharia.
HadoopFairScheduler.
http://developer.
yahoo.
net/blogs/hadoop/FairSharePres.
ppt.
[27]M.
Zaharia,A.
Konwinski,A.
D.
Joseph,R.
Katz,andI.
Stoica.
ImprovingMapReducePerformanceinHeterogeneousEnvironments.
OSDI,2008.
ACMSIGACTNews93June2009Vol.
40,No.
2CloudComputingArchitectureandApplicationProgrammingDISC'09Tutorial,halfday,Sept.
22nd2009RogerBargaJoseBernabeu-AubanDennisGannonChristophePoulainMicrosoftCorporationContact:barga@microsoft.
comBackgroundOverthepastdecade,scienticandengineeringresearchviacomputinghasemergedasthethirdpillarofthescienticprocess,complementingtheoryandexperiment.
Severalstudieshavehighlightedtheimportanceofcomputationalscienceasacriticalenablerofscienticdiscoveryandcompetitivenessinthephysicalandbiologicalsciences,medicineandhealthcare,anddesignandmanufacturing.
Theabilitytocreaterich,detailedmodelsofnaturalandarticialphenomenaandtoprocesslargevolumesofexperimentaldata,itselfcreatedbyanewgenerationofscienticinstrumentsthatarethemselvespoweredbycomputing,makescomputingauniversalintellectualamplier,advancingallofscienceandengineeringandpoweringtheknowledgeeconomy.
Thisrevolutionhasbeenenabledbytheavailabilityofinexpensive,powerfulprocessors;lowcost,largecapacitystoragedevices;andexiblesoftwaretools,eachdrivenbyavibrantconsumerandindustrymarketplace.
Theexplosivegrowthofresearchcomputingsystemshascreatedmajormanagement,technicalandscalchallengesforbothfundingagenciesandresearchuniversities.
Typically,facultymembersacquireresearchcomputingsystems,usuallysmalltomedium(32–256nodes)clusters,viaresearchgrantsandcontractsanddepartmentalfunds.
Thisdistributedacquisitionofresearchcomputingandinadequateplansforlong-termsustainabilityandtechnologyrefresh,meanthatuniversitiesandfundingagenciesthatsupportuniversityresearch,arenowstrugglingtocreateandmaintaincomputeanddatacenterstohousethesesystemsandtooperateandmaintainthemreliably,inenergy-efcient,environmentallyfriendlyways.
Moreover,universitybudgetconstraintsmakeefciencyevermorenecessary.
Agrowingchallengeissatisfyingtheeverrisingdemandforresearchcomputinganddatamanagement-theenablerofscienticdiscovery.
Fortuitously,theemergenceofcloudcomputing-softwareandserviceshostedbynetworksofcommercialdatacentersandaccessibleovertheInternet-offersasolutiontothisconundrum.
CloudComputingTheexplosivegrowthandrapiddevelopmentofcloudservicesaredrivenbytechnologyandbusinesseco-nomics.
Consolidatingcomputingandstorageinverylargedatacenterscreateseconomiesofscaleinfacilitydesignandconstruction,equipmentacquisitionandoperationsandmaintenancethatarenotpossiblewhentheelementsaredistributed.
However,thebenetsofcloudservicesextendfarbeyondeconomiesofscale.
First,optimizedandconsolidatedfacilitiesreducetotalenergyconsumption,andtheycanbedesignedtoexploitenvironmentallyfriendlyandrenewableenergysources.
Second,cloudcomputingenablesa"payonlyforuse"strategywhereusersbearnocostunlesstheyusethecloudservices,andthenpayonlyforthenumberofserviceunitsconsumed.
Third,groupscandeployandexpandservicesrapidly-inminutes,ratherthantheweeksormonthsneededtoprocureandinstalllocalinfrastructure-tomeetrisingdemandACMSIGACTNews94June2009Vol.
40,No.
2ortoaddresstime-criticalneeds.
Finally,theelasticityofcloudservicesmeansthattimeandcomputingareinterchangeable-theusercosttouse10,000processorsforonehouristhesameasusingtenprocessorsfor1,000hours.
Thisisatransformativeequivalence;evenindividualsandsmallcompaniescanexploitcomputingresourcesatascaleheretoforeaccessibleonlytolargecompaniesandgovernments.
Byoutsourcingcomputing,datamanagementandbusinessintelligenceservicestocloudsoftwareplusservicesproviders,businessesareincreasingoperationalefcienciesanddecreasingcosts,allowingthemtofocusontheircorecompetencies.
Similaropportunitiesexistinacademicandresearchcomputing,buttheseopportunitiesarenotbeingexploited.
DISC'09TutorialDescriptionThegoalofthistutorialistodemonstratehowcloudscanaugmenttraditionalsupercomputingbyexpandingaccesstodataandtoolstoabroadercommunityofusersthanarecurrentlyservedbytheconventionalHPCcenters.
Supercomputersprovidethecapabilitytoconductmassivesimulationandanalysiscomputationsforafewusersatatime.
Theyarenotdesignedforon-demandaccessbyhundredsorthousandsofsimultaneoususers.
Inaddition,supercomputersarenotcongurablebytheirusers.
Thussupercomputerapplicationsmustbemodiedand,insomecases,refactoredashardwareandsystemssoftwareisupgraded.
Cloudsoffertheabilityforeachusertocustomizetheexecutionenvironment,andtoarchivethatcustomizationforfutureuseindependentlyoftheinfrastructure'slifecycle.
Currently,anumberofpublicallyaccessiblecomputationalplatformsprovideinstantaccesstocloud-hostedservicessuchaswebsearch,maps,photogalleriesandsocialnetworks.
Therearenowhundredsofcloud-basedservicesweuseinoureverydaylifeandwearestartingtoseesomeofthemalsotouchourscienticlives.
Forexample,GoogleandLivemapshavebeenusedtogaininsightfromgeo-distributedsensordataandtheSloanDigitalSkySurveyandtheSkyServerhaveprovidedscienticdataandtoolstothousandsofastronomyusers.
Wearenowatanimportantinectionpointinthecapabilityofcloudcomputingtoservetheresearchcommunity.
Notonlyhasthetotalcapacityofthecommercialdatacentersexceededthatofsupercomputingcenters,wenowhavethesoftwareinfrastructureinplacetoallowanybodytobuildscalablescienticservicesforbroadclassesofusers,withouthavingtodeploy,maintainandupgradededicatedandexpensivecomputeanddataservers.
Thistutorialwillintroducetheattendeestothisnewtechnology.
Thistutorialwillbeofvaluetothoseinterestedinexposingdataandservicestoabroaderaudienceofuserswithoutincurringthecostsofacquiringandmaintainingscalableinfrastructure.
Thistutorialwillintroducetheattendeestothekeyconceptsandtechnologiesusedtobuildanddeployscienticdataanalysisapplicationsoncloudplatforms.
Thetutorialbeginswithgeneralconceptsofdatacenterarchitectureincludingtheuseofvirtualization;theroleoflowpower,multicoreandpackaging;andwebservicearchitectures.
WewilllookatthecloudstoragemodelswithadetailedlookattheAzureXStoreandabrieflookatGoogle'sBigTableandGFS.
Wewillthenfocusonmodelsofapplicationprogramming.
Wewilldescribebothcommercialandopensourcetoolsfor"mapreduce"computationincludingHadoopandDryadandworkowtoolsfororchestratingremotedataservices.
FollowingthiswewillexaminecloudapplicationframeworksbylookingatGoogle'sAppEngineandMicrosoftAzure.
Throughoutthetutorialwewillusescienticexamplestoillustratethepotentialapplications.
Thetutorialconcludeswithaviewofthefutureforthecloudinscience.
ACMSIGACTNews95June2009Vol.
40,No.
2

展开全文

citmobileme相关文档

centos6.5linux centos 6.5 怎么安装软件德国iphone禁售令苹果在中国禁售了？说说看 360邮箱邮箱地址指的是什么？字节跳动回应TikTok易主#北京字节跳动科技有限公司#小说审核有三面么？我面试了两轮就叫我回家等消息了要是刷下来了也该告 internetexplorer无法打开Internet Explorer无法打开站点怎么解决 Joinsql 宜人贷官网宜人贷是不是骗人的徐州商标徐州松木家具前十名香盛圆排第几 35邮箱邮箱地址怎么写如何发帖子手机百度贴吧怎么发帖子？子域名查询北京域名注册企业主机冰山互联主机屋免费空间贵州电信宽带测速 ubuntu更新源免费全能空间网通ip web服务器架设工作站服务器域名转接 idc是什么美国网站服务器免费cdn 中国电信宽带测速器安徽双线服务器华为云盘域名dns 外贸空间更多

citmobileme

RackNerd：便宜vps补货/1核/768M内存/12G SSD/2T流量/1G带宽，可选机房圣何塞/芝加哥/达拉斯/亚特拉大/荷兰/$9.49/年

Budgetvm12核心 16G 500 GB SSD 或者 2 TB SATA 10GB 20 TB 99美金

博鳌云¥799/月,香港110Mbps(含10M CN2)大带宽独立服务器/E3/8G内存/240G/500G SSD或1T HDD