approachflash

flashftp 时间:2021-01-31 阅读:()

THEADVANCEDCOMPUTINGSYSTEMSASSOCIATIONThefollowingpaperwasoriginallypublishedintheProceedingsofthe1999USENIXAnnualTechnicalConferenceMonterey,California,USA,June6–11,1999Flash:AnEfficientandPortableWebServerVivekS.
Pai,PeterDruschel,andWillyZwaenepoelRiceUniversity1999byTheUSENIXAssociationAllRightsReservedRightstoindividualpapersremainwiththeauthorortheauthor'semployer.
Permissionisgrantedfornoncommercialreproductionoftheworkforeducationalorresearchpurposes.
Thiscopyrightnoticemustbeincludedinthereproducedpaper.
USENIXacknowledgesalltrademarksherein.
FormoreinformationabouttheUSENIXAssociation:Phone:15105288649FAX:15105485738Email:office@usenix.
orgWWW:http://www.
usenix.
orgFlash:AnefcientandportableWebserverVivekS.
PaizPeterDruschelyWillyZwaenepoelyzDepartmentofElectricalandComputerEngineeringyDepartmentofComputerScienceRiceUniversityAbstractThispaperpresentsthedesignofanewWebserverarchitecturecalledtheasymmetricmulti-processevent-driven(AMPED)architecture,andevaluatestheperfor-manceofanimplementationofthisarchitecture,theFlashWebserver.
TheFlashWebservercombinesthehighperformanceofsingle-processevent-drivenserversoncachedworkloadswiththeperformanceofmulti-processandmulti-threadedserversondisk-boundwork-loads.
Furthermore,theFlashWebserveriseasilyportablesinceitachievestheseresultsusingfacilitiesavailableinallmodernoperatingsystems.
TheperformanceofdifferentWebserverarchitec-turesisevaluatedinthecontextofasingleimplemen-tationinordertoquantifytheimpactofaserver'scon-currencyarchitectureonitsperformance.
Furthermore,theperformanceofFlashiscomparedwithtwowidely-usedWebservers,ApacheandZeus.
ResultsindicatethatFlashcanmatchorexceedtheperformanceofexist-ingWebserversbyupto50%acrossawiderangeofrealworkloads.
Wealsopresentresultsthatshowthecontri-butionofvariousoptimizationsembeddedinFlash.
1IntroductionTheperformanceofWebserversplaysakeyroleinsatisfyingtheneedsofalargeandgrowingcommunityofWebusers.
Portablehigh-performanceWebserversreducethehardwarecostofmeetingagivenservicede-mandandprovidetheexibilitytochangehardwareplat-formsandoperatingsystemsbasedoncost,availability,orperformanceconsiderations.
Webserversrelyoncachingoffrequently-requestedWebcontentinmainmemorytoachievethroughputratesofthousandsofrequestspersecond,despitethelongla-tencyofdiskoperations.
SincethedatasetsizeofWebworkloadstypicallyexceedthecapacityofaserver'smainmemory,ahigh-performanceWebservermustbestructuredsuchthatitcanoverlaptheservingofre-questsforcachedcontentwithconcurrentdiskopera-ToappearinProc.
ofthe1999AnnualUsenixTechnicalConfer-ence,Monterey,CA,June1999.
tionsthatfetchrequestedcontentnotcurrentlycachedinmainmemory.
Webserverstakedifferentapproachestoachievingthisconcurrency.
Serversusingasingle-processevent-driven(SPED)architecturecanprovideexcellentperfor-manceforcachedworkloads,wheremostrequestedcon-tentcanbekeptinmainmemory.
TheZeusserver[32]andtheoriginalHarvest/SquidproxycachesemploytheSPEDarchitecture1.
Onworkloadsthatexceedthatcapacityoftheservercache,serverswithmulti-process(MP)ormulti-threaded(MT)architecturesusuallyperformbest.
Apache,awidely-usedWebserver,usestheMParchitectureonUNIXoperatingsystemsandtheMTarchitectureontheMicrosoftWindowsNToperatingsystem.
ThispaperpresentsanewportableWebserverar-chitecture,calledasymmetricmulti-processevent-driven(AMPED),anddescribesanimplementationofthisar-chitecture,theFlashWebserver.
FlashnearlymatchestheperformanceofSPEDserversoncachedworkloadswhilesimultaneouslymatchingorexceedingtheperfor-manceofMPandMTserversondisk-intensivework-loads.
Moreover,FlashusesonlystandardAPIsandisthereforeeasilyportable.
Flash'sAMPEDarchitecturebehaveslikeasingle-processevent-drivenarchitecturewhenrequesteddocu-mentsarecachedandbehavessimilartoamulti-processormulti-threadedarchitecturewhenrequestsmustbesatisedfromdisk.
WequalitativelyandquantitativelycomparetheAMPEDarchitecturetotheSPED,MP,andMTapproachesinthecontextofasingleserverimple-mentation.
Finally,weexperimentallycomparetheper-formanceofFlashtothatofApacheandZeusonrealworkloadsobtainedfromserverlogs,andontwooperat-ingsystems.
Therestofthispaperisstructuredasfollows:Sec-tion2explainsthebasicprocessingstepsrequiredofallWebserversandprovidesthebackgroundforthefollowingdiscussion.
InSection3,wediscusstheasynchronousmulti-processevent-driven(AMPED),the1ZeuscanbeconguredtousemultipleSPEDprocesses,particu-larlywhenrunningonmultiprocessorsystemsReadRequestFindFileReadFileSendDataStartEndAcceptConnSendHeaderFigure1:SimpliedRequestProcessingStepssingle-processevent-driven(SPED),themulti-process(MP),andthemulti-threaded(MT)architectures.
Wethendiscusstheexpectedarchitecture-basedperfor-mancecharacteristicsinSection4beforediscussingtheimplementationoftheFlashWebserverinSection5.
Us-ingrealandsyntheticworkloads,weevaluatetheperfor-manceofallfourserverarchitecturesandtheApacheandZeusserversinSection6.
2BackgroundInthissection,webrieydescribethebasicprocess-ingstepsperformedbyanHTTP(Web)server.
HTTPclientsusetheTCPtransportprotocoltocontactWebserversandrequestcontent.
TheclientopensaTCPconnectiontotheserver,andtransmitsaHTTPrequestheaderthatspeciestherequestedcontent.
Staticcontentisstoredontheserverintheformofdiskles.
Dynamiccontentisgenerateduponrequestbyauxiliaryapplicationprogramsrunningontheserver.
Oncetheserverhasobtainedtherequestedcontent,ittransmitsaHTTPresponseheaderfollowedbythere-questeddata,ifapplicable,ontheclient'sTCPconnec-tion.
Forclarity,thefollowingdiscussionfocusesonserv-ingHTTP/1.
0requestsforstaticcontentonaUNIX-likeoperatingsystem.
However,alloftheWebserverar-chitecturesdiscussedinthispaperarefullycapableofhandlingdynamically-generatedcontent.
Likewise,thebasicstepsdescribedbelowaresimilarforHTTP/1.
1re-quests,andforotheroperatingsystems,likeWindowsNT.
ThebasicsequentialstepsforservingarequestforstaticcontentareillustratedinFigure1,andconsistofthefollowing:Acceptclientconnection-acceptanincomingconnec-tionfromaclientbyperforminganacceptoperationontheserver'slistensocket.
Thiscreatesanewsocketassociatedwiththeclientconnection.
Readrequest-readtheHTTPrequestheaderfromtheclientconnection'ssocketandparsetheheaderfortherequestedURLandoptions.
Findle-checktheserverlesystemtoseeifthere-questedcontentleexistsandtheclienthasappropriatepermissions.
Thele'ssizeandlastmodicationtimeareobtainedforinclusionintheresponseheader.
Sendresponseheader-transmittheHTTPresponseheaderontheclientconnection'ssocket.
Readle-readtheledata(orpartofit,forlargerles)fromthelesystem.
Senddata-transmittherequestedcontent(orpartofit)ontheclientconnection'ssocket.
Forlargerles,the"Readle"and"Senddata"stepsarerepeateduntilalloftherequestedcontentistransmitted.
Allofthesestepsinvolveoperationsthatcanpoten-tiallyblock.
Operationsthatreaddataoracceptconnec-tionsfromasocketmayblockiftheexpecteddatahasnotyetarrivedfromtheclient.
OperationsthatwritetoasocketmayblockiftheTCPsendbuffersarefullduetolimitednetworkcapacity.
Operationsthattestale'sva-lidity(usingstat())oropenthele(usingopen())canblockuntilanynecessarydiskaccessescomplete.
Likewise,readingale(usingread())oraccessingdatafromamemory-mappedleregioncanblockwhiledataisreadfromdisk.
Therefore,ahigh-performanceWebservermustin-terleavethesequentialstepsassociatedwiththeservingofmultiplerequestsinordertooverlapCPUprocess-ingwithdiskaccessesandnetworkcommunication.
Theserver'sarchitecturedetermineswhatstrategyisusedtoachievethisinterleaving.
DifferentserverarchitecturesaredescribedinSection3.
Inadditiontoitsarchitecture,theperformanceofaWebserverimplementationisalsoinuencedbyvariousoptimizations,suchascaching.
InSection5,wediscussspecicoptimizationsusedintheFlashWebserver.
3ServerArchitecturesInthissection,wedescribeourproposedasymmet-ricmulti-processevent-driven(AMPED)architecture,aswellastheexistingsingle-processevent-driven(SPED),multi-process(MP),andmulti-threaded(MT)architec-tures.
3.
1Multi-processInthemulti-process(MP)architecture,aprocessisassignedtoexecutethebasicstepsassociatedwithserv-ingaclientrequestsequentially.
TheprocessperformsallthestepsrelatedtooneHTTPrequestbeforeitacceptsanewrequest.
Sincemultipleprocessesareemployed(typically20-200),manyHTTPrequestscanbeservedconcurrently.
Overlappingofdiskactivity,CPUpro-cessingandnetworkconnectivityoccursnaturally,be-causetheoperatingsystemswitchestoarunnablepro-cesswheneverthecurrentlyactiveprocessblocks.
ReadRequestFindFileReadFileSendDataGetConnReadRequestFindFileReadFileSendDataAcceptConnGetConnSendHeaderProcess1ReadRequestFindFileReadFileSendDataGetConnReadRequestFindFileReadFileSendDataAcceptConnGetConnSendHeaderProcessNFigure2:Multi-Process-IntheMPmodel,eachserverprocesshandlesonerequestatatime.
Processesexecutetheprocessingstagessequentially.
ReadRequestFindFileReadFileSendDataGetConnReadRequestFindFileReadFileSendDataAcceptConnGetConnSendHeaderFigure3:Multi-Threaded-TheMTmodelusesasingleaddressspacewithmultipleconcurrentthreadsofexecu-tion.
Eachthreadhandlesarequest.
Sinceeachprocesshasitsownprivateaddressspace,nosynchronizationisnecessarytohandletheprocessingofdifferentHTTPrequests2.
However,itmaybemoredifculttoperformoptimizationsinthisarchitecturethatrelyonglobalinformation,suchasasharedcacheofvalidURLs.
Figure2illustratestheMParchitecture.
3.
2Multi-threadedMulti-threaded(MT)servers,depictedinFigure3,employmultipleindependentthreadsofcontroloperat-ingwithinasinglesharedaddressspace.
EachthreadperformsallthestepsassociatedwithoneHTTPre-questbeforeacceptinganewrequest,similartotheMPmodel'suseofaprocess.
TheprimarydifferencebetweentheMPandtheMTarchitecture,however,isthatallthreadscanshareglobalvariables.
Theuseofasinglesharedaddressspacelendsitselfeasilytooptimizationsthatrelyonsharedstate.
However,thethreadsmustusesomeformofsynchro-nizationtocontrolaccesstotheshareddata.
TheMTmodelrequiresthattheoperatingsystemprovidessupportforkernelthreads.
Thatis,whenonethreadblocksonanI/Ooperation,otherrunnablethreadswithinthesameaddressspacemustremaineligibleforexecution.
Someoperatingsystems(e.
g.
,FreeBSD2.
2.
6)provideonlyuser-levelthreadlibrarieswithoutkernelsupport.
SuchsystemscannoteffectivelysupportMTservers.
2SynchronizationisnecessaryinsidetheOStoacceptincomingconnections,sincetheacceptqueueisshared3.
3Single-processevent-drivenThesingle-processevent-driven(SPED)architectureusesasingleevent-drivenserverprocesstoperformconcurrentprocessingofmultipleHTTPrequests.
Theserverusesnon-blockingsystemscallstoperformasyn-chronousI/Ooperations.
AnoperationliketheBSDUNIXselectortheSystemVpollisusedtocheckforI/Ooperationsthathavecompleted.
Figure4depictstheSPEDarchitecture.
ASPEDservercanbethoughtofasastatemachinethatperformsonebasicstepassociatedwiththeservingofanHTTPrequestatatime,thusinterleavingthepro-cessingstepsassociatedwithmanyHTTPrequests.
Ineachiteration,theserverperformsaselecttocheckforcompletedI/Oevents(newconnectionarrivals,com-pletedleoperations,clientsocketsthathavereceiveddataorhavespaceintheirsendbuffers.
)WhenanI/Oeventisready,itcompletesthecorrespondingbasicstepandinitiatesthenextstepassociatedwiththeHTTPre-quest,ifappropriate.
Inprinciple,aSPEDserverisabletooverlaptheCPU,diskandnetworkoperationsassociatedwiththeservingofmanyHTTPrequests,inthecontextofasin-gleprocessandasinglethreadofcontrol.
Asaresult,theoverheadsofcontextswitchingandthreadsynchro-nizationintheMPandMTarchitecturesareavoided.
However,aproblemassociatedwithSPEDserversisthatmanycurrentoperatingsystemsdonotprovidesuitablesupportforasynchronousdiskoperations.
Intheseoperatingsystems,non-blockingreadandwriteoperationsworkasexpectedonnetworksock-etsandpipes,butmayactuallyblockwhenusedondiskles.
Asaresult,supposedlynon-blockingreadopera-tionsonlesmaystillblockthecallerwhilediskI/Oisinprogress.
Bothoperatingsystemsusedinourexperi-mentsexhibitthisbehavior(FreeBSD2.
2.
6andSolaris2.
6).
Tothebestofourknowledge,thesameistrueformostversionsofUNIX.
ManyUNIXsystemsprovidealternateAPIsthatim-plementtrueasynchronousdiskI/O,buttheseAPIsaregenerallynotintegratedwiththeselectoperation.
ThismakesitdifcultorimpossibletosimultaneouslycheckforcompletionofnetworkanddiskI/Oeventsinanefcientmanner.
Moreover,operationssuchasopenandstatonledescriptorsmaystillbeblocking.
Forthesereasons,existingSPEDserversdonotusethesespecialasynchronousdiskinterfaces.
Asaresult,lereadoperationsthatdonothitinthelecachemaycausethemainserverthreadtoblock,causingsomelossinconcurrencyandperformance.
3.
4AsymmetricMulti-ProcessEvent-DrivenTheAsymmetricMulti-ProcessEvent-Driven(AMPED)architecture,illustratedinFigure5,combinesEventDispatcherReadRequestReadRequestFindFileFindFileGetConnAcceptConnSendHeaderReadFileSendDataReadFileSendDataSendHeaderFigure4:SingleProcessEventDriven-TheSPEDmodelusesasingleprocesstoperformallclientprocess-inganddiskactivityinanevent-drivenmanner.
theevent-drivenapproachoftheSPEDarchitecturewithmultiplehelperprocesses(orthreads)thathandleblockingdiskI/Ooperations.
Bydefault,themainevent-drivenprocesshandlesallprocessingstepsasso-ciatedwithHTTPrequests.
Whenadiskoperationisnecessary(e.
g.
,becausealeisrequestedthatisnotlikelytobeinthemainmemorylecache),themainserverprocessinstructsahelperviaaninter-processcommunication(IPC)channel(e.
g.
,apipe)toperformthepotentiallyblockingoperation.
Oncetheoperationcompletes,thehelperreturnsanoticationviaIPC;themainserverprocesslearnsofthiseventlikeanyotherI/Ocompletioneventviaselect.
TheAMPEDarchitecturestrivestopreservetheef-ciencyoftheSPEDarchitectureonoperationsotherthandiskreads,butavoidstheperformanceproblemssufferedbySPEDduetoinappropriatesupportforasynchronousdiskreadsinmanyoperatingsystems.
AMPEDachievesthisusingonlysupportthatiswidelyavailableinmodernoperatingsystems.
InaUNIXsystem,AMPEDusesthestandardnon-blockingread,write,andacceptsystemcallsonsocketsandpipes,andtheselectsystemcalltotestforI/Ocompletion.
Themmapoperationisusedtoaccessdatafromthelesystemandthemincoreoperationisusedtocheckifaleisinmainmemory.
Notethatthehelperscanbeimplementedeitheraskernelthreadswithinthemainserverprocessorassep-arateprocesses.
Evenwhenhelpersareimplementedasseparateprocesses,theuseofmmapallowsthehelperstoinitiatethereadingofalefromdiskwithoutintro-ducingadditionaldatacopying.
Inthiscase,boththemainserverprocessandthehelpermmaparequestedle.
Thehelpertouchesallthepagesinitsmemorymapping.
Oncenished,itnotiesthemainserverprocessthatitisnowsafetotransmitthelewithouttheriskofblocking.
4DesigncomparisonInthissection,wepresentaqualitativecomparisonoftheperformancecharacteristicsandpossibleoptimiza-tionsinthevariousWebserverarchitecturespresentedintheprevioussection.
EventDispatcherReadRequestReadRequestFindFileFindFileGetConnAcceptConnSendHeaderReadFileSendDataReadFileSendDataSendHeaderHelper1Helper2HelperkFigure5:AsymmetricMulti-ProcessEventDriven-TheAMPEDmodelusesasingleprocessforevent-drivenre-questprocessing,buthasotherhelperprocessestohan-dlesomediskoperations.
4.
1PerformancecharacteristicsDiskoperations-Thecostofhandlingdiskactivityvariesbetweenthearchitecturesbasedonwhat,ifany,circumstancescauseallrequestprocessingtostopwhileadiskoperationisinprogress.
IntheMPandMTmod-els,onlytheprocessorthreadthatcausesthediskac-tivityisblocked.
InAMPED,thehelperprocessesareusedtoperformtheblockingdiskactions,sowhiletheyareblocked,theserverprocessisstillavailabletohan-dleotherrequests.
TheextracostintheAMPEDmodelisduetotheinter-processcommunicationbetweentheserverandthehelpers.
InSPED,oneprocesshandlesallclientinteractionaswellasdiskactivity,soalluser-levelprocessingstopswheneveranyrequestrequiresdiskac-tivity.
Memoryeffects-Theserver'smemoryconsumptionaffectsthespaceavailableforthelesystemcache.
TheSPEDarchitecturehassmallmemoryrequirements,sinceithasonlyoneprocessandonestack.
WhencomparedtoSPED,theMTmodelincurssomeaddi-tionalmemoryconsumptionandkernelresources,pro-portionaltothenumberofthreadsemployed(i.
e.
,themaximalnumberofconcurrentlyservedHTTPrequests).
AMPED'shelperprocessescauseadditionaloverhead,butthehelpershavesmallapplication-levelmemoryde-mandsandahelperisneededonlyperconcurrentdiskoperation,notforeachconcurrentlyservedHTTPre-quest.
TheMPmodelincursthecostofaseparatepro-cessperconcurrentlyservedHTTPrequest,whichhassubstantialmemoryandkerneloverheads.
Diskutilization-Thenumberofconcurrentdiskre-queststhataservercangenerateaffectswhetheritcanbenetfrommultipledisksanddiskheadscheduling.
TheMP/MTmodelscancauseonediskrequestperpro-cess/thread,whiletheAMPEDmodelcangenerateonerequestperhelper.
Incontrast,sincealluser-levelpro-cessingstopsintheSPEDarchitecturewheneveritac-cessesthedisk,itcanonlygenerateonediskrequestatatime.
Asaresult,itcannotbenetfrommultipledisksordiskheadscheduling.
4.
2Cost/Benetsofoptimizations&featuresTheserverarchitecturealsoimpactsthefeasibilityandprotabilityofcertaintypesofWebserveroptimiza-tionsandfeatures.
Wecomparethetradeoffsnecessaryinthevariousarchitecturesfromaqualitativestandpoint.
Informationgathering-Webserversuseinformationaboutrecentrequestsforaccountingpurposesandtoim-proveperformance,butthecostofgatheringthisinfor-mationacrossallconnectionsvariesinthedifferentmod-els.
IntheMPmodel,someformofinterprocesscommu-nicationmustbeusedtoconsolidatedata.
TheMTmodeleitherrequiresmaintainingper-threadstatisticsandpe-riodicconsolidationorne-grainedsynchronizationonglobalvariables.
TheSPEDandAMPEDarchitecturessimplifyinformationgatheringsinceallrequestsarepro-cessedinacentralizedfashion,eliminatingtheneedforsynchronizationorinterprocesscommunicationswhenusingsharedstate.
Application-levelCaching-Webserverscanemployapplication-levelcachingtoreducecomputationbyusingmemorytostorepreviousresults,suchasresponsehead-ersandlemappingsforfrequentlyrequestedcontent.
However,thecachememorycompeteswiththelesys-temcacheforphysicalmemory,sothistechniquemustbeappliedcarefully.
IntheMPmodel,eachprocessmayhaveitsowncacheinordertoreduceinterprocesscom-municationandsynchronization.
Themultiplecachesin-creasethenumberofcompulsorymissesandtheyleadtolessefcientuseofmemory.
TheMTmodelusesasinglecache,butthedataaccesses/updatesmustbecoordinatedthroughsynchronizationmechanismstoavoidracecon-ditions.
BothAMPEDandSPEDcanuseasinglecachewithoutsynchronization.
Long-livedconnections-Long-livedconnectionsoc-curinWebserversduetoclientswithslowlinks(suchasmodems),orthroughpersistentconnectionsinHTTP1.
1.
Inbothcases,someserver-sideresourcesarecommittedforthedurationoftheconnection.
Thecostoflong-livedconnectionsontheserverdependsonthere-sourcebeingoccupied.
InAMPEDandSPED,thiscostisaledescriptor,application-levelconnectioninforma-tion,andsomekernelstatefortheconnection.
TheMTandMPmodelsaddtheoverheadofanextrathreadorprocess,respectively,foreachconnection.
5FlashimplementationTheFlashWebserverisahigh-performanceimple-mentationoftheAMPEDarchitecturethatusesaggres-sivecachingandothertechniquestomaximizeitsperfor-mance.
Inthissection,wedescribetheimplementationoftheFlashWebserverandsomeoftheoptimizationtechniquesused.
5.
1OverviewTheFlashWebserverimplementstheAMPEDar-chitecturedescribedinSection3.
Itusesasinglenon-blockingserverprocessassistedbyhelperprocesses.
TheserverprocessisresponsibleforallinteractionwithclientsandCGIapplications[26],aswellascontrolofthehelperprocesses.
Thehelperprocessesarerespon-sibleforperformingalloftheactionsthatmayresultinsynchronousdiskactivity.
Separateprocesseswerecho-seninsteadofkernelthreadstoimplementthehelpers,inordertoensureportabilityofFlashtooperatingsystemsthatdonot(yet)supportkernelthreads,suchasFreeBSD2.
2.
6.
TheserverisdividedintomodulesthatperformthevariousrequestprocessingstepsmentionedinSec-tion2andmodulesthathandlevariouscachingfunctions.
Threetypesofcachesaremaintained:lenametransla-tions,responseheaders,andlemappings.
Thesecachesandtheirfunctionareexplainedbelow.
Thehelperprocessesareresponsibleforperformingpathnametranslationsandforbringingdiskblocksintomemory.
Theseprocessesaredynamicallyspawnedbytheserverprocessandarekeptinreservewhennotac-tive.
Eachprocessoperatessynchronously,waitingontheserverfornewrequestsandhandlingonlyonere-questatatime.
Tominimizeinterprocesscommunica-tion,helpersonlyreturnacompletionnoticationtotheserver,ratherthansendinganylecontenttheymayhaveloadedfromdisk.
5.
2PathnameTranslationCachingThepathnametranslationcachemaintainsalistofmappingsbetweenrequestedlenames(e.
g.
,"/bob")andactuallesondisk(e.
g.
,/home/users/bob/publichtml/index.
html).
ThiscacheallowsFlashtoavoidusingthepathnametranslationhelpersforeveryincomingrequest.
Itreducestheprocessingneededforpathnametranslations,anditreducesthenumberoftranslationhelpersneededbytheserver.
Asaresult,thememoryspentonthecachecanberecoveredbythereductioninmemoryusedbyhelperprocesses.
5.
3ResponseHeaderCachingHTTPserversprependledatawitharesponseheadercontaininginformationabouttheleandtheserver,andthisinformationcanbecachedandreusedwhenthesamelesarerepeatedlyrequested.
Sincetheresponseheaderistiedtotheunderlyingle,thiscachedoesnotneeditsowninvalidationmechanism.
Instead,whenthemappingcachedetectsthatacachedlehaschanged,thecorrespondingresponseheaderisregener-ated.
5.
4MappedFilesFlashretainsacacheofmemory-mappedlestore-ducethenumberofmap/unmapoperationsnecessaryforrequestprocessing.
Memory-mappedlesprovideaconvenientmechanismtoavoidextradatacopyinganddouble-buffering,buttheyrequireextrasystemcallstocreateandremovethemappings.
Mappingsforfrequently-requestedlescanbekeptandreused,butun-usedmappingscanincreasekernelbookkeepingandde-gradeperformance.
Themappingcacheoperateson"chunks"oflesandlazilyunmapsthemwhentoomuchdatahasbeenmapped.
Smalllesoccupyonechunkeach,whilelargelesaresplitintomultiplechunks.
InactivechunksaremaintainedinanLRUfreelist,andareunmappedwhenthislistgrowstoolarge.
WeuseLRUtoapproximatethe"clock"pagereplacementalgorithmusedinmanyop-eratingsystems,withthegoalofmappingonlywhatislikelytobeinmemory.
Allmappedlepagesaretestedformemoryresidencyviamincore()beforeuse.
5.
5BytePositionAlignmentThewritev()systemcallallowsapplicationstosendmultiplediscontiguousmemoryregionsinoneop-eration.
High-performanceWebserversuseittosendresponseheadersfollowedbyledata.
However,itsusecancausemisaligneddatacopyingwithintheoperatingsystem,degradingperformance.
Theextracostformis-aligneddataisproportionaltotheamountofdatabeingcopied.
TheproblemariseswhentheOSnetworkingcodecopiesthevariousmemoryregionsspeciedinawritevoperationintoacontiguouskernelbuffer.
IfthesizeoftheHTTPresponseheaderstoredintherstregionhasalengththatisnotamultipleofthemachine'swordsize,thenthecopyingofallsubsequentregionsismisaligned.
Flashavoidsthisproblembyaligningallresponseheaderson32-byteboundariesandpaddingtheirlengthstobeamultipleof32bytes.
Itaddscharacterstovari-ablelengtheldsintheHTTPresponseheader(e.
g.
,theservername)todothepadding.
Thechoiceof32bytesratherthanword-alignmentistotargetsystemswith32-bytecachelines,assomesystemsmaybeoptimizedforcopyingoncacheboundaries.
5.
6DynamicContentGenerationTheFlashWebserverhandlestheservingofdynamicdatausingmechanismssimilartothoseusedinotherWebservers.
Whenarequestarrivesforadynamicdocu-ment,theserverforwardstherequesttothecorrespond-ingauxiliary(CGI-bin)applicationprocessthatgener-atesthecontentviaapipe.
Ifaprocessdoesnotcurrentlyexist,theservercreates(e.
g.
,forks)it.
Theresultingdataistransmittedbytheserverjustlikestaticcontent,exceptthatthedataisreadfromadescriptorassociatedwiththeCGIprocess'pipe,ratherthanale.
TheserverprocessallowstheCGIapplicationprocesstobepersistent,amortizingthecostofcreatingtheapplicationovermultiplerequests.
ThisissimilartotheFastCGI[27]interfaceanditprovidessimilarbene-ts.
SincetheCGIapplicationsruninseparateprocessesfromtheserver,theycanblockfordiskactivityorotherreasonsandperformarbitrarilylongcomputationswith-outaffectingtheserver.
5.
7MemoryResidencyTestingFlashusesthemincore()systemcall,whichisavailableinmostmodernUNIXsystems,todetermineifmappedlepagesarememoryresident.
Inoperatingsystemsthatdon'tsupportthisoperationbutprovidethemlock()systemcalltolockmemorypages(e.
g.
,Com-paq'sTru64UNIX,formerlyDigitalUnix),Flashcouldusethelattertocontrolitslecachemanagement,elim-inatingtheneedformemoryresidencytesting.
Shouldnosuitableoperationsbeavailableinagivenoperatingsystemtocontrolthelecacheortestformem-oryresidency,itmaybepossibletouseafeedback-basedheuristictominimizeblockingondiskI/O.
Here,Flashcouldruntheclockalgorithmtopredictwhichcachedlepagesarememoryresident.
Thepredictioncanadapttochangesintheamountofmemoryavailabletothelecachebyusingcontinuousfeedbackfromperformancecountersthatkeeptrackofpagefaultsand/orassociateddiskaccesses.
6PerformanceEvaluationInthissection,wepresentexperimentalresultsthatcomparetheperformanceofthedifferentWebserverarchitecturespresentedinSection3onrealworkloads.
Furthermore,wepresentcomparativeperformancere-sultsforFlashandtwostate-of-the-artWebservers,Apache[1]andZeus[32],onsyntheticandrealwork-loads.
Finally,wepresentresultsthatquantifytheperfor-manceimpactofthevariousperformanceoptimizationsincludedinFlash.
Toenableameaningfulcomparisonofdifferentar-chitecturesbyeliminatingvariationsstemmingfromim-plementationdifferences,thesameFlashcodebaseisusedtobuildfourservers,basedontheAMPED(Flash),MT(Flash-MT),MP(Flash-MP),andSPED(Flash-SPED)architectures.
Thesefourserversrepresentallthearchitecturesdiscussedinthispaper,andtheywerede-velopedbyreplacingFlash'sevent/helperdispatchmech-anismwiththesuitablecounterpartsintheotherarchitec-tures.
Inallotherrespects,however,theyareidenticaltothestandard,AMPED-basedversionofFlashandusethesametechniquesandoptimizations.
Inaddition,wecomparetheseserverswithtwowidely-usedproductionWebservers,Zeusv1.
30(ahigh-performanceserverusingtheSPEDarchitecture),andApachev1.
3.
1(basedontheMParchitecture),topro-videpointsofreference.
Inourtests,theFlash-MPandApacheserversuse32serverprocessesandFlash-MTuses64threads.
Zeuswasconguredasasingleprocessfortheexperimentsusingsyntheticworkloads,andinatwo-processcongu-rationadvisedbyZeusfortherealworkloadtests.
SincetheSPED-basedZeuscanblockondiskI/O,usingmul-tipleserverprocessescanyieldsomeperformanceim-provementsevenonauniprocessorplatform,sinceital-lowstheoverlappingofcomputationanddiskI/O.
BothFlash-MTandFlashuseamemory-mappedlecachewitha128MBlimitandapathnamecachelimitof6000entries.
EachFlash-MPprocesshasamappedlecachelimitof4MBandapathnamecacheof200entries.
NotethatthecachesinanMPserverhavetobeconguredsmaller,sincetheyarereplicatedineachprocess.
Theexperimentswereperformedwiththeserversrunningontwodifferentoperatingsystems,Solaris2.
6andFreeBSD2.
2.
6.
Alltestsusethesameserverhard-ware,basedona333MHzPentiumIICPUwith128MBofmemoryandmultiple100Mbit/sEthernetinterfaces.
AswitchedFastEthernetconnectstheservermachinetotheclientmachinesthatgeneratetheworkload.
Ourclientsoftwareisanevent-drivenprogramthatsimulatesmultipleHTTPclients[3].
EachsimulatedHTTPclientmakesHTTPrequestsasfastastheservercanhandlethem.
6.
1SyntheticWorkloadIntherstexperiment,asetofclientsrepeatedlyre-questthesamele,wherethelesizeisvariedineachtest.
Thesimplicityoftheworkloadinthistestallowstheserverstoperformattheirhighestcapacity,sincethere-questedleiscachedintheserver'smainmemory.
TheresultsareshowninFigures6(Solaris)and7(FreeBSD).
Theleft-handsidegraphsplottheservers'totaloutputbandwidthagainsttherequestedlesize.
Theconnec-tionrateforsmalllesisshownseparatelyontheright.
Resultsindicatethatthechoiceofarchitecturehaslit-tleimpactonaserver'sperformanceonatrivial,cachedworkload.
Inaddition,theFlashvariantscomparefa-vorablytoZeus,afrmingtheabsoluteperformanceoftheFlash-basedimplementation.
TheApacheserverachievessignicantlylowerperformanceonbothoper-atingsystemsandovertheentirerangeoflesizes,mostlikelytheresultofthemoreaggressiveoptimizationsemployedintheFlashversionsandpresumablyalsoinZeus.
Flash-SPEDslightlyoutperformsFlashbecausetheAMPEDmodelteststhememory-residencyoflesbe-foresendingthem.
SlightlagsintheperformanceofFlash-MTandFlash-MParelikelyduetotheextraker-neloverhead(contextswitching,etc.
)inthesearchitec-tures.
Zeus'anomalousbehavioronFreeBSDforlesizesbetween10and100KBappearstostemfromthebytealignmentproblemmentionedinSection5.
5.
AllserversenjoysubstantiallyhigherperformancewhenrununderFreeBSDasopposedtoSolaris.
Therel-ativeperformanceoftheserversisnotstronglyaffectedbytheoperatingsystem.
6.
2Trace-basedexperimentsWhilethesingle-letestcanindicateaserver'smax-imumperformanceonacachedworkload,itgiveslittleindicationofitsperformanceonrealworkloads.
Inthenextexperiment,theserversaresubjectedtoamorereal-isticload.
Wegenerateaclientrequeststreambyreplay-ingaccesslogsfromexistingWebservers.
Figure8showsthethroughputinMb/secachievedwithvariousWebserversontwodifferentworkloads.
The"CStrace"wasobtainedfromthelogsofRiceUni-versity'sComputerSciencedepartmentalWebserver.
The"Owlnettrace"reectstracesobtainedfromaRiceWebserverthatprovidespersonalWebpagesforapprox-imately4500studentsandstaffmembers.
TheresultswereobtainedwiththeWebserversrunningonSolaris.
TheresultsshowthatFlashwithitsAMPEDarchi-tectureachievesthehighestthroughputonbothwork-loads.
Apacheachievesthelowestperformance.
ThecomparisonwithFlash-MPshowsthatthisisonlyinparttheresultofitsMParchitecture,andmostlyduetoitslackofaggressiveoptimizationslikethoseusedinFlash.
TheOwlnettracehasasmallerdatasetsizethantheCStrace,anditthereforeachievesbettercachelocalityintheserver.
Asaresult,Flash-SPED'srelativeperfor-manceismuchbetteronthistrace,whileMPperformswellonthemoredisk-intensiveCStrace.
EventhoughtheOwlnettracehashighlocality,itsaveragetransfersizeissmallerthantheCStrace,resultinginroughlycomparablebandwidthnumbers.
Asecondexperimentevaluatesserverperformanceunderrealisticworkloadswitharangeofdatasetsizes(andthereforeworkingsetsizes).
Togenerateaninputstreamwithagivendatasetsize,weusetheaccesslogsfromRice'sECEdepartmentalWebserverandtruncatethemasappropriatetoachieveagivendatasetsize.
Theclientsthenreplaythistruncatedlogasalooptogeneraterequests.
Inbothexperiments,twoclientmachineswith32clientseachareusedtogeneratetheworkload.
Figures9(BSD)and10(Solaris)showstheperfor-mance,measuredasthetotaloutputbandwidth,ofthevariousserversunderrealworkloadandvariousdatasetsizes.
Wereportoutputbandwidthinsteadofrequest/secinthisexperiment,becausetruncatingthelogsatdiffer-entpointstovarythedatasetsizealsochangesthesize050100150200020406080100120Filesize(KBytes)Bandwidth(Mb/s)SPEDFlashZeusMTMPApache0510152020040060080010001200Filesize(kBytes)Connectionrate(reqs/sec)SPEDFlashZeusMTMPApacheFigure6:Solarissingleletest—Onthistrivialtest,serverarchitectureseemstohavelittleimpactonperformance.
TheaggressiveoptimizationsinFlashandZeuscausethemtooutperformApache.
050100150200050100150200250Filesize(KBytes)Bandwidth(Mb/s)SPEDFlashZeusMPApache05101520500100015002000250030003500Filesize(kBytes)Connectionrate(reqs/sec)SPEDFlashZeusMPApacheFigure7:FreeBSDsingleletest—ThehighernetworkperformanceofFreeBSDmagniesthedifferencebetweenApacheandtherestwhencomparedtoSolaris.
TheshapeoftheZeuscurvebetween10kBytesand100kBytesislikelyduetothebytealignmentproblemmentionedinSection5.
5.
distributionofrequestedcontent.
Thiscausesuctua-tionsinthethroughputinrequests/sec,buttheoutputbandwidthislesssensitivetothiseffect.
Theperformanceofalltheserversdeclinesasthedatasetsizeincreases,andthereisasignicantdropatthepointwhentheworkingsetsize(whichisrelatedtothedatasetsize)exceedstheserver'seffectivemainmemorycachesize.
Beyondthispoint,theserversareessentiallydiskbound.
Severalobservationcanbemadebasedontheseresults:FlashisverycompetitivewithFlash-SPEDoncachedworkloads,andatthesametimeexceedsormeetstheperformanceoftheMPserversondisk-boundworkloads.
ThisconrmsthatFlashwithitsAMPEDarchitectureisabletocombinethebestofotherarchitecturesacrossawiderangeofworkloads.
ThisgoalwascentraltothedesignoftheAMPEDarchitecture.
TheslightperformancedifferencebetweenFlashandFlash-SPEDonthecachedworkloadsreectstheoverheadofcheckingforcacheresidencyofre-questedcontentinFlash.
Sincethedataisalreadyinmemory,thistestcausesunnecessaryoverheadoncachedworkloads.
TheSPEDarchitectureperformswellforcachedworkloadsbutitsperformancedeterioratesquicklyasdiskactivityincreases.
Thisconrmsourearlierreasoningabouttheperformancetradeoffsassoci-atedwiththisarchitecture.
ThesamebehaviorcanbeseenintheSPED-basedZeus'performance,al-thoughitsabsoluteperformancefallsshortofthevariousFlash-derivedservers.
TheperformanceofFlashMPserverfallssigni-cantlyshortofthatachievedwiththeotherarchi-tecturesoncachedworkloads.
Thisislikelythere-sultofthesmalleruser-levelcachesusedinFlash-MPascomparedtotheotherFlashversions.
Thechoiceofanoperatingsystemhasasigni-cantimpactonWebserverperformance.
Perfor-ApacheMPMTSPEDFlash010203040Bandwidth(Mb/s)CStraceApacheMPMTSPEDFlash010203040Bandwidth(Mb/s)OwlnettraceFigure8:PerformanceonRiceServerTraces/Solaris153045607590105120135150050100150200Datasetsize(MB)Bandwidth(Mb/s)SPEDFlashZeusMPApacheFigure9:FreeBSDRealWorkload-TheSPEDarchitectureisideallysuitedforcachedworkloads,andwhentheworkingsettsincache,FlashmimicsFlash-SPED.
However,Flash-SPED'sperformancedropsdrasticallywhenoperatingondisk-boundworkloads.
manceresultsobtainedonSolarisareupto50%lowerthanthoseobtainedonFreeBSD.
Theoper-atingsystemalsohassomeimpactontherelativeperformanceofthevariousWebserversandarchi-tectures,butthetrendsarelessclear.
Flashachieveshigherthroughputondisk-boundworkloadsbecauseitcanbemorememory-efcientandcauseslesscontextswitchingthanMPservers.
Flashonlyneedsenoughhelperpro-cessestokeepthediskbusy,ratherthanneed-ingaprocessperconnection.
Additionally,thehelperprocessesrequirelittleapplication-levelmemory.
Thecombinationoffewertotalprocessesandsmallhelperprocessesreducesmemorycon-sumption,leavingextramemoryforthelesystemcache.
TheperformanceofZeusonFreeBSDappearstodroponlyafterthedatasetexceeds100MB,whiletheotherserversdropearlier.
WebelievethisphenomenonisrelatedtoZeus'srequest-handling,whichappearstogiveprioritytorequestsforsmalldocuments.
Underfullload,thistendstostarverequestsforlargedocumentsandthuscausestheservertoprocessasomewhatsmallereffectiveworkingset.
TheoveralllowerperformanceunderSolarisappearstomaskthiseffectonthatOS.
Asexplainedabove,Zeususesatwo-processcon-gurationinthisexperiment,asadvisedbythevendor.
ItshouldbenotedthatthisgivesZeusaslightadvantageoverthesingle-processFlash-SPED,sinceoneprocesscancontinuetoservere-questswhiletheotherisblockedondiskI/O.
ResultsfortheFlash-MTserverscouldnotbepro-videdforFreeBSD2.
2.
6,becausethatsystemlackssup-portforkernelthreads.
153045607590105120135150304050607080Datasetsize(MB)Bandwidth(Mb/s)SPEDFlashZeusMTMPApacheFigure10:SolarisRealWorkload-TheFlash-MTserverhascomparableperformancetoFlashforbothin-coreanddisk-boundworkloads.
Thisresultwasachievedbycarefullyminimizinglockcontention,addingcomplexitytothecode.
Withoutthiseffort,thedisk-boundresultsotherwiseresembledFlash-SPED.
6.
3FlashPerformanceBreakdown05101520500100015002000250030003500Filesize(KBytes)Connectionrate(reqs/sec)all(Flash)path&mmappath&resppathonlymmap&respmmaponlyresponlynocachingFigure11:FlashPerformanceBreakdown-Withoutop-timizations,Flash'ssmall-leperformancewoulddropinhalf.
Theeightlinesshowtheeffectofvariouscombi-nationsofthecachingoptimizations.
ThenextexperimentfocusesontheFlashserverandmeasuresthecontributionofitsvariousoptimizationsontheachievedthroughput.
ThecongurationisidenticaltothesingleletestonFreeBSD,whereclientsrepeat-edlyrequestacacheddocumentofagivensize.
Fig-ure11showsthethroughputobtainedbyvariousver-sionsofFlashwithallcombinationsofthethreemainop-timizations(pathnametranslationcaching,mappedlecaching,andresponseheadercaching).
Theresultsshowthateachoftheoptimizationshasasignicantimpactonserverthroughputforcachedcon-tent,withpathnametranslationcachingprovidingthelargestbenet.
Sinceeachoftheoptimizationavoidsaper-requestcost,theimpactisstrongestonrequestsforsmalldocuments.
6.
4PerformanceunderWANconditions010020030040050020406080100120#ofsimultaneousclientsBandwidth(Mb/s)SPEDFlashMTMPFigure12:Addingclients-Thelowper-clientover-headsoftheMT,SPEDandAMPEDmodelscausestableperformancewhenaddingclients.
Multipleapplication-levelcachesandper-processoverheadscausetheMPmodel'sperformancetodrop.
WebserverbenchmarkinginaLANenvironmentfailstoevaluateanimportantaspectofrealWebwork-loads,namelythatfactthatclientscontacttheserverthroughawide-areanetwork.
ThelimitedbandwidthandpacketlossesofaWANincreasetheaverageHTTPcon-nectionduration,whencomparedtoLANenvironment.
Asaresult,atagiventhroughputinrequests/second,arealserverhandlesasignicantlylargernumberofcon-currentconnectionsthanaservertestedunderLANcon-ditions[24].
Thenumberofconcurrentconnectionscanhaveasignicantimpactonserverperformance[4].
Ournextexperimentmeasurestheimpactofthenumberofcon-currentHTTPconnectionsonourvariousservers.
Per-sistentconnectionswereusedtosimulatetheeffectoflong-lastingWANconnectionsinaLAN-basedtestbed.
WereplaytheECElogswitha90MBdatasetsizetoex-posetheperformanceeffectsofalimitedlecachesize.
InFigure12weseetheperformanceunderSolarisasthenumberofnumberofsimultaneousclientsisincreased.
TheSPED,AMPEDandMTserversdisplayanini-tialriseinperformanceasthenumberofconcurrentcon-nectionsincreases.
Thisincreaseislikelyduetotheaddedconcurrencyandvariousaggregationeffects.
Forinstance,alargenumberofconnectionsincreasestheav-eragenumberofcompletedI/Oeventsreportedineachselectsystemcall,amortizingtheoverheadofthisop-erationoveralargernumberofI/Oevents.
Asthenumberofconcurrentconnectionsexceeds200,theperformanceofSPEDandAMPEDattenswhiletheMTserversuffersagradualdeclineinperfor-mance.
Thisdeclineisrelatedtotheper-threadswitch-ingandspaceoverheadoftheMTarchitecture.
TheMPmodelsuffersfromadditionalper-processoverhead,whichresultsinasignicantdeclineinperformanceasthenumberofconcurrentconnectionsincreases.
7RelatedWorkJamesHuetal.
[17]performananalysisofWebserveroptimizations.
Theyconsidertwodifferentarchi-tectures,themulti-threadedarchitectureandonethatem-ploysapoolofthreads,andevaluatetheirperformanceonUNIXsystemsaswellasWindowsNTusingtheWebStonebenchmark.
VariousresearchershaveanalyzedtheprocessingcostsofthedifferentstepsofHTTPrequestservingandhaveproposedimprovements.
Nahumetal.
[25]com-pareexistinghigh-performanceapproacheswithnewsocketAPIsandevaluatetheirworkonbothsingle-letestsandotherbenchmarks.
YimingHuetal.
[18]exten-sivelyanalyzeanearlierversionofApacheandimple-mentanumberofoptimizations,improvingperformanceespeciallyforsmallerrequests.
Yatesetal.
[31]mea-surethedemandsaserverplacesontheoperatingsystemforvariousworkloadstypesandservicerates.
Bangaetal.
[5]examineoperatingsystemsupportforevent-drivenserversandproposenewAPIstoremovebottlenecksob-servedwithlargenumbersofconcurrentconnections.
TheFlashserveranditsAMPEDarchitecturebearsomeresemblancetoThoth[9],aportableoperatingsys-temandenvironmentbuiltusing"multi-processstructur-ing.
"Thismodelofprogrammingusesgroupsofpro-cessescalled"teams"whichcooperatebypassingmes-sagestoindicateactivity.
Parallelismandasynchronousoperationcanbehandledbyhavingoneprocesssyn-chronouslywaitforanactivityandthencommunicateitsoccurrencetoanevent-drivenserver.
Inthismodel,Flash'sdiskhelperprocessescanbeseenaswaitingforasynchronousevents(completionofadiskaccess)andrelayingthatinformationtothemainserverprocess.
TheHarvest/Squidproject[8]alsousesthemodelofanevent-drivenservercombinedwithhelperprocesseswaitingonslowactions.
Inthatcase,theserverkeepsitsownDNScacheandusesasetof"dnsserver"processestoperformcallstothegethostbyname()libraryrou-tine.
SincetheDNSlookupcancausethelibraryrou-tinetoblock,onlythednsserverprocessisaffected.
WhereasFlashusesthehelpermechanismforblockingdiskaccesses,Harvestattemptstousetheselect()calltoperformnon-blockingleaccesses.
Asexplainedearlier,mostUNIXsystemsdonotsupportthisuseofselect()andfalselyindicatethatthediskaccesswillnotblock.
Harvestalsoattemptstoreducethenumberofdiskmetadataoperations.
GiventheimpactofdiskaccessesonWebservers,newcachingpolicieshavebeenproposedinotherwork.
Arlittetal.
[2]proposenewcachingpoliciesbyanalyz-ingserveraccesslogsandlookingforsimilaritiesacrossservers.
Caoetal.
[7]introducetheGreedyDualSizecachingpolicywhichusesbothaccessfrequencyandlesizeinmakingcachereplacementdecisions.
OtherworkhasalsoanalyzedvariousaspectsofWebserverwork-loads[11,23].
Datacopyingwithintheoperatingsystemisasig-nicantcostwhenprocessinglargeles,andseveralap-proacheshavebeenproposedtoalleviatetheproblem.
Thadanietal.
[30]introduceanewAPItoreadandsendmemory-mappedleswithoutcopying.
IO-Lite[29]ex-tendsthefbufs[14]modeltointegratelesystem,net-working,interprocesscommunication,andapplication-levelbuffersusingasetofuniforminterfaces.
Engleretal.
[20]uselow-levelinteractionbetweentheCheetahWebserverandtheirexokerneltoeliminatecopyingandstreamlinesmall-requesthandling.
TheLavaprojectusessimilartechniquesinamicrokernelenvironment[22].
OtherapproachesforincreasingWebserverperfor-manceemploymultiplemachines.
Inthisarea,someworkhasfocusedonusingmultipleservernodesinpar-allel[6,10,13,16,19,28],orsharingmemoryacrossmachines[12,15,21].
8ConclusionThispaperpresentsanewportablehigh-performanceWebserverarchitecture,calledasymmetricmulti-processevent-driven(AMPED),anddescribesanim-plementationofthisarchitecture,theFlashWebserver.
FlashnearlymatchestheperformanceofSPEDserversoncachedworkloadswhilesimultaneouslymatchingorexceedingtheperformanceofMPandMTserversondisk-intensiveworkloads.
Moreover,FlashusesonlystandardAPIsavailableinmodernoperatingsystemsandisthereforeeasilyportable.
Wepresentresultsofexperimentstoevaluatetheim-pactofaWebserver'sconcurrencyarchitectureonitsperformance.
Forthispurpose,variousserverarchitec-tureswereimplementedfromthesamecodebase.
Re-sultsshowthatFlashwithitsAMPEDarchitecturecannearlymatchorexceedtheperformanceofotherarchi-tecturesacrossawiderangeofrealisticworkloads.
ResultsalsoshowthattheFlashserver'sperformanceexceedsthatoftheZeusWebserverbyupto30%,anditexceedstheperformanceofApachebyupto50%onrealworkloads.
Finally,weperformexperimentstoshowthecontributionofthevariousoptimizationsembeddedinFlashonitsperformance.
AcknowledgmentsWearegratefultoErichNahum,JeffMogul,andtheanonymousreviewers,whosecommentshavehelpedtoimprovethispaper.
ThankstoMichaelPearlmanforourSolaristestbedconguration.
SpecialthankstoZeusTechnologyforuseoftheirserversoftwareandDamianReevesforfeedbackandtechnicalassistancewithit.
ThankstoJefPoskanzerforthethttpdwebserver,fromwhichFlashderivessomeinfrastructure.
ThisworkwassupportedinpartbyNSFGrantsCCR-9803673,CCR-9503098,MIP-9521386,byTexasTATPGrant003604,andbyanIBMPartnershipAward.
References[1]Apache.
http://www.
apache.
org[2]M.
F.
ArlittandC.
L.
Williamson.
WebServerWorkloadCharacterization:TheSearchforInvari-ants.
InProceedingsoftheACMSIGMETRICS'96Conference,pages126–137,Philadelphia,PA,Apr.
1996.
[3]G.
BangaandP.
Druschel.
MeasuringthecapacityofaWebserver.
InProceedingsoftheUSENIXSymposiumonInternetTechnologiesandSystems(USITS),Monterey,CA,Dec.
1997.
[4]G.
BangaandP.
Druschel.
Measuringthecapac-ityofaWebserverunderrealisticloads.
WorldWideWebJournal(SpecialIssueonWorldWideWebCharacterizationandPerformanceEvalua-tion),1999.
Toappear.
[5]G.
Banga,P.
Druschel,andJ.
C.
Mogul.
Resourcecontainers:Anewfacilityforresourcemanage-mentinserversystems.
InProc.
3rdUSENIXSymp.
onOperatingSystemsDesignandImplementation,Feb.
1999.
[6]T.
Brisco.
DNSSupportforLoadBalancing.
RFC1794,Apr.
1995.
[7]P.
CaoandS.
Irani.
Cost-awareWWWproxycachingalgorithms.
InProceedingsoftheUSENIXSymposiumonInternetTechnologiesandSystems(USITS),Monterey,CA,Dec.
1997.
[8]A.
Chankhunthod,P.
B.
Danzig,C.
Neerdaels,M.
F.
Schwartz,andK.
J.
Worrell.
AHierarchicalInternetObjectCache.
InProceedingsofthe1996UsenixTechnicalConference,Jan.
1996.
[9]D.
R.
Cheriton.
TheThothSystem:Multi-ProcessStructuringandPortability.
ElsevierSciencePub-lishingCo,.
Inc,1982.
[10]CiscoSystemsInc.
LocalDirector.
http://www.
cisco.
com[11]M.
CrovellaandA.
Bestavros.
Self-SimilarityinWorldWideWebTrafc:EvidenceandPossibleCauses.
InProceedingsoftheACMSIGMETRICS'96Conference,pages160–169,Philadelphia,PA,Apr.
1996.
[12]M.
Dahlin,R.
Yang,T.
Anderson,andD.
Patterson.
Cooperativecaching:Usingremoteclientmem-orytoimprovelesystemperformance.
InProc.
USENIXSymp.
onOperatingSystemsDesignandImplementation,Monterey,CA,Nov.
1994.
[13]O.
P.
Damani,P.
-Y.
E.
Chung,Y.
Huang,C.
Kintala,andY.
-M.
Wang.
ONE-IP:Techniquesforhostingaserviceonaclusterofmachines.
ComputerNet-worksandISDNSystems,29:1019–1027,1997.
[14]P.
DruschelandL.
L.
Peterson.
Fbufs:Ahigh-bandwidthcross-domaintransferfacility.
InPro-ceedingsoftheFourteenthACMSymposiumonOperatingSystemPrinciples,pages189–202,Dec.
1993.
[15]M.
J.
Feeley,W.
E.
Morgan,F.
H.
Pighin,A.
R.
Karlin,H.
M.
Levy,andC.
A.
Thekkath.
Imple-mentingglobalmemorymanagementinaworksta-tioncluster.
InProceedingsoftheFifteenthACMSymposiumonOperatingSystemPrinciples,Cop-perMountain,CO,Dec.
1995.
[16]A.
Fox,S.
D.
Gribble,Y.
Chawathe,E.
A.
Brewer,andP.
Gauthier.
Cluster-basedscalablenetworkservices.
InProceedingsoftheSixteenthACMSym-posiumonOperatingSystemPrinciples,SanMalo,France,Oct.
1997.
[17]J.
C.
Hu,I.
Pyrali,andD.
C.
Schmidt.
Measuringtheimpactofeventdispatchingandconcurrencymodelsonwebserverperformanceoverhigh-speednetworks.
InProceedingsofthe2ndGlobalInter-netConference,Phoenix,AZ,Nov.
1997.
[18]Y.
Hu,A.
Nanda,andQ.
Yang.
Measurement,anal-ysisandperformanceimprovementoftheApachewebserver.
InProceedingsofthe18thIEEEIn-ternationalPerformance,ComputingandCommu-nicationsConference(IPCCC'99),February1999.
[19]IBMCorporation.
IBMeNetworkdispatcher.
http://www.
software.
ibm.
com/network/dispatcher[20]M.
F.
Kaashoek,D.
R.
Engler,G.
R.
Ganger,andD.
A.
Wallach.
ServerOperatingSystems.
InProceedingsofthe1996ACMSIGOPSEuropeanWorkshop,pages141–148,Connemara,Ireland,Sept.
1996.
[21]H.
Levy,G.
Voelker,A.
Karlin,E.
Anderson,andT.
Kimbrel.
ImplementingCooperativePrefetchingandCachinginaGlobally-ManagedMemorySys-tem.
InProceedingsoftheACMSIGMETRICS'98Conference,Madison,WI,June1998.
[22]J.
Liedtke,V.
Panteleenko,T.
Jaeger,andN.
Islam.
High-performancecachingwiththeLavahit-server.
InProceedingsoftheUSENIX1998AnnualTech-nicalConference,NewOrleans,LA,June1998.
[23]S.
ManleyandM.
Seltzer.
WebFactsandFantasy.
InProceedingsoftheUSENIXSymposiumonInter-netTechnologiesandSystems(USITS),pages125–134,Monterey,CA,Dec.
1997.
[24]J.
C.
Mogul.
Networkbehaviorofabusywebserveranditsclients.
TechnicalReportWRL95/5,DECWesternResearchLaboratory,PaloAlto,CA,1995.
[25]E.
Nahum,T.
Barzilai,andD.
Kandlur.
Perfor-manceIssuesinWWWServers.
submittedforpub-lication.
[26]NationalCenterforSupercomputingAp-plications.
CommonGatewayInterface.
http://hoohoo.
ncsa.
uiuc.
edu/cgi[27]OpenMarket,Inc.
FastCGIspecication.
http://www.
fastcgi.
com[28]V.
S.
Pai,M.
Aron,G.
Banga,M.
Svendsen,P.
Dr-uschel,W.
Zwaenepoel,andE.
Nahum.
Locality-awarerequestdistributionincluster-basednetworkservers.
InProceedingsofthe8thConferenceonArchitecturalSupportforProgrammingLanguagesandOperatingSystems,SanJose,CA,Oct.
1998.
ACM.
[29]V.
S.
Pai,P.
Druschel,andW.
Zwaenepoel.
IO-Lite:AuniedI/Obufferingandcachingsystem.
InProceedingsofthe3rdSymposiumonOperatingSystemsDesignandImplementation,NewOrleans,LA,Feb.
1999.
[30]M.
N.
ThadaniandY.
A.
Khalidi.
Anefcientzero-copyI/OframeworkforUNIX.
TechnicalReportSMLITR-95-39,SunMicrosystemsLaboratories,Inc.
,May1995.
[31]D.
Yates,V.
Almeida,andJ.
Almeida.
OntheinteractionbetweenanoperatingsystemandWebserver.
TechnicalReportTR-97-012,BostonUni-versity,CSDept.
,BostonMA,1997.
[32]ZeusTechnologyLimited.
ZeusWebServer.
http://www.
zeus.
co.
uk

展开全文