portabilitylinuxcp

linuxcp 时间:2021-04-10 阅读:()

ArstlookatscalableI/OinLinuxcommandsKenMatney1,ShaneCanon1,andSarpOral1CenterforComputationalSciencesOakRidgeNationalLaboratoryOakRidge,TN,37831Abstract.
Datacreatedfromandusedbyterascaleandpetascaleapplicationscontinuestoincrease,butourabilitytohandleandmanagetheselesisstilllimitedbythecapabilitiesofthestandardserializedLinuxcommandset.
ThispaperintroducestheCenterforComputationalSciences(NCCS)atOakRidgeNationalLaboratory(ORNL)eortstowardsprovidingparallelizedandmoreecientversionsofthecommonlyusedLinuxcommands.
Thedesignandimplementationdetailsaswellasperformanceanalysisofanin-housedevelopeddistributedparallelizedversionofthecptool,spdcpispresented.
Testsshowthatourspdcputilitycanachieve73timesmoreperformancethanitsserializedcounterpart.
Inaddition,weintroducecurrentworktoextendthisapproachtoothertools.
1IntroductionUsersofHPCsystemswithparallellesystemsstillrelyonlegacyserialtoolstoperformmanyday-to-dayoperations.
ParallellesystemssuchasLustreandGPFSaretodaycapableofdeliveringhundredsofGigabytespersecond(GB/s)inaggregatebandwidth,butstandardserial-basedLinuxutilitiescannotharnessthiscapability.
Forexample,makingabackupcopyofcheckpointles,compressingoutput,orcreatingatarleofresultstypicallyiscarriedoutwithstandardLinuxtools.
Consequently,usersarelimitedtotheperformancethatcanbesustainedbyasinglenodeforthesetasks.
Thus,theuserisnotabletotakeadvantageoftheextensivecapabilitiesoftheparallellesystem.
TheCenterforComputationalSciencesatOakRidgeNationalLaboratoryhasbegunworkingontoolstoaddressthisissues.
Inthispaperwewilldescribetheapproachusedindevelopingthesetoolsandpresentsomeearlyperformanceresults.
Wewillalsodiscussworkinprogressandfutureplans.
2MotivationTheNationalCenterforComputationalSciences(NCCS)atOakRidgeNationalLaboratoryoperatesanumberofthemostpowerfulcomputersystemsusedforopenresearch[1][2].
Theagshipsystem,Jaguar,isaCrayXT4withover20,000coresand40TBofmemory.
Itisconguredwithaparallellesystemwithnearly1PBofdiskcapacityandover40GB/soflesystembandwidth.
ThesystemusestheLustrelesystem[3].
TheLustrelesystemaggregatesdistributedstorageunitsintoonelogicallesystem.
Filesarestripedtransparentlybythelesystemacrossmultiplestoragetargetstoaggregatebothcapacityandbandwidth.
Asaresult,userscanachievehighthroughputtostorageforcriticalI/Ooperationssuchaswritingorreadingacheckpointle.
ApplicationssuchastheGryokineticTokamakCode(GTC)havedemonstratedover10GB/sofaggregatebandwidth.
However,manydaytodayoperationsfailtoachieveevenasmallfractionofthiscapabilitybecausetheunderlyingutilitiessuchascp,bzip2,andtarmustbeconnedtoasinglenode.
Afullystripedle(asinglelestripedacrossallstoragetargets)canbewrittenatover20GB/sonaJaguarlesystem.
However,usingcptocopythislebetweentwolocalLustrelesystemsmightonlysustain200MB/s.
Asaresult,whileitmighthavetakenaround50secondstocreatea1TBcheckpointle,itwouldtakemorethan80minutestomakeacopyofthele.
Theuserwouldlikelyencountersimilarproblemswhencompressinganduncompressingles,creatingatarle,orotheroperationsthatrelyonserial-basedtools.
Fromdiscussionswithourusers,ithasbecomeevidentthatthesebottlenecksinday-to-dayoperationsarethesourceofsomeveryrealbarrierstoproductivityandthattherewasaclearandgrowingneedforparallelversionsofthesecommontools.
Furthermore,ifageneralizedframeworkcouldbecreatedforparallelizingmanyofthesecommontasks,itcouldbeextendedtootherusecases.
Fortunately,manyofthesetoolslendthemselvestoparallelizationwithveryclearwaystodecomposethetheinputdomain.
Wechosetofocusonthoseutilitiesthatwouldquicklyprovidethemostbenettoourusercommunity.
23ApproachTherearesomelimitingfactorsinparallelizingLinuxcommands.
First,thesourcedatamustberandomlyaccessible.
Datafromacheckpointleinalesystemisanexample,whiledatafromasocketorpipeisnot.
Second,thedatasetmustresideonmultipleindependentphysicaldevices.
SinceperformanceimprovementisbasedonparallelI/O,accessingmultipleindependentphysicaldevicesconcurrentlyincreasestheachievableaggregatebandwidth.
Therearetwotypesofparallelizationthatcanbeexploited.
First,thereistheparallelismassociatedwithprocessingmultiplelessimultaneously.
Second,thereistheparallelismassociatedwithusingmultipleprocessorstomapcooperativelythedataofasinglele.
Obviously,thegainfromtheuseofthelatterisdependentonhowwellthelehasbeendistributedacrossmultipleserversandiftheworkcanbeeasilydecomposed.
Anothercriticalfactortoperformanceisthesizeofthedatabuersthatareemployed.
Likemostlesystems,parallellesystemspreferlargebuers.
Forexample,Lustrelesystemachievesbestperformancewith1MBbuers.
Parallellesystemsaretypicallymoresensitivetobuersizessincetheselesystemsrelyonnetworkstotransportdatafromthestorageserverstotheclients.
Furthermore,byterangelockingistypicallyusedtoinsureconsistency.
Largerbuersrequirelessoverheadinmanagingtheselocks,resultinginbetterperformance.
Sincethedetailsofhowtodecomposetheworkdependsonthespeciccommandtargeted,eachcommandhastobeexaminedindividually.
However,thebasisofalgorithmsforperformingI/Oinparallelremainsthesame.
Inaddition,amethodforcommunicatingbetweenthevariousparticipatingprocessorsmustbeestablished.
Whilesystemspeciclow-levelprotocolssuchasPortalsonaCrayXTorVerbsonanInniBandclustermightprovidethebestperformance,theylackportability.
Therefore,MPIisusedtoensureportabilitywhilesacricingsomedegreeofperformance.
Ourparallelizedutilitiescaneasilybeportedandcompiledformostparallelsystems.
WhileaLustrelesystemwasusedinthedevelopmentandtestingoftheinitialimplementation,thesetechniquescanbeappliedtootherparallellesystems.
Incertaincases,Lustre-speciccallstoquerythelayoutofthedataareusedtoimproveeciency.
However,goodperformanceandeciencycanstillbeachievedwithouttheseLustrespeciccalls.
LustreisaPOSIXcompliant,object-basedlesystemcomposedofthreecomponents:MetaDataServerAsingleMetaDataServer(MDS)perlesystemthatstoresandmanagesLustrelemetadata,suchaslenames,directories,permissions,stripingpattern,andlelayout.
ObjectStorageTargetOneormoreObjectStorageTargets(OSTs)areblockdevicesthatactuallystoretheledata.
OSTsaremanagedbytheObjectStorageServers(OSSs).
AtanygivencongurationtherecanbeoneormoreOSTscontrolledbyagivenOSS.
ClientClient(s)accessandusethedata.
LustreprovidesallclientswithstandardPOSIXsemanticsandconcurrentreadandwriteaccesstothelesinthelesystem.
Currently,Lustreusesanenhanchedversionofext3lesystemonMDSandOSTstostoreLustreledata.
LustreachieveshighreadandwriteperformancebydistributingtheledataovermultipleOSTs.
Thisisknownasstriping.
ThenumberofOSTsthataleisstripedacrossisknownasstripecount.
Withstriping,themaximumlesizeisnotlimitedbythesizeofasingleblockdevice,andtheaggregateI/ObandwidthscaleswiththenumberofOSSs.
AmoredetaileddescriptionofLustrelesystemisbeyondthescopeofthisdocument.
Interestedreadersareencouragedtoread[3].
TheLinuxcputilitywasselectedasthersttoolforparallelization,asitisacommonlyusedfunction,andthedecompositionissimplesincethemappingofinputdatatooutputdataisdirect.
Consequently,therearealmostnodependenciesbetweentheindividualthreadscarryingoutthecopy.
Theparallelversionofcpistermedspdcpforstreamingparalleldistributedcp.
Currently,spdcponlyworksonLustrelesystem,butourfutureplansinvolveextendingittootherlesystems,suchasGPFS.
Weareintheprocessofpubliclyreleasingthespdcpsourcecodeunderanopensourcelicense.
34PrototypeforaParallelDistributedCopyInpreparingtheprototype,therearetwopossiblewaysinwhichtoproceed.
TherstistotakethesourceforGNUcpandmodifyit.
Thesecondistowritethefunctionfromscratch.
Itisunlikelythatapatchtoreworkcpcouldmakeitintothemainstreamgiventheamountofchangesthatareneededtoparallelizeit.
Thereforewechosetoimplementanewcopycommandstartingfromscratch.
However,wetriedtopreservemanyofthecommand-lineoptionsandgeneralbehaviourofcp.
Theoveralldesignconsistsofseveralcomponents.
AdiagramofthecomponentsisshowninFig.
1.
Thebasecomponentisthe"launchprocess"whichinvokestheMPI-basedcomponents.
InadditiontolaunchingtheMPIjob,italsoperformsanumberofotheroperations,asdescribedbelow.
The"rank0process"intheMPIjobisdesignatedasamaster.
Itisresponsibleformanagingthework.
Anumberofslaveprocessesareresponsibleforcopyingtheledatafromsourcetotarget.
Howthisworkisdistributedacrosstheslavenodesisdescribedbelow.
Thereareanumberofdesignconsiderationstobemade.
First,theprototypeneedstobeawareoftheparallelcharacteristicsofsourcele(s).
Itneedstobeabletoacquiretheseattributesforsourcele(s)andsettheseontargetle(s).
Next,itneedstobeawareoftheavailableresources.
Thatistosay,iftheLinuxcommandisnotrunwithinthecontextofabatchjob,itneedstospawnabatchjobandrequestappropriateresources.
Anotherdesignchoicewastodecidehowmeta-dataoperationswouldbedecomposed.
Currently,LustreemploysasingleMetadataServer(MDS)foralesystem.
Consequently,havingmultipleclientsinteractwiththeMDSmaynotimproveperformanceandmayevenreduceit.
Therefore,theprototypeperformsmanyofthemeta-dataspecictasksinthelaunchprocess.
Forexample,theLinuxcommandthatlaunchestheMPIjob,performsthesearchforsourcele(s),acquiresbothLinuxmeta-dataandLustremeta-dataforthese,andsendsallofthisinformationtoMPImasterviaapipe.
Furthermore,thisprocesscreatesthetargetdirectoryhierarchybeforesendingthelistoflestotheMPIbasedcomponents.
Thisavoidsduplicationofeortandraceconditions,e.
g.
,multipleprocessesrequestingcreationofthesametargetdirectory.
Finally,thelaunchprocesshandlescorrectlysettingtimestampsontargetdirectorieswhenneeded.
Theadvantagetothisstrategymaynotbeobvious.
Sincethelaunchprocesshasalreadyhastraversedthesourcehierarchy,itonlyneedstoretainalistofthedirectoriesandtheirmeta-data.
ThelaunchprocessmustallowtheMPIjobtocompletesothatitcanensureanyupdatestotheaccesstimearenotoverwrittenbyanyoftheslaveprocesses.
Theprototypeemploysavariablestrategyfordecomposingworktodeterminethenumberofclientstoemployincopyingeachle.
Itmakesthisdeterminationbasedonaperformancepredictionmodelofthedataset.
ForsmalllesorleswithonlyasingleLustrestripe,theentireoperationiscarriedoutbyasingleslavenode.
Forlesthataredistributedovermultiplestripes,theworkisdistributedacrossasubsetofprocesses.
Themasterprocesswaitsuntiltheappropriatenumberofslaveprocessesareavailableandthenschedulesthecopyoperationacrossthesubset.
A"teamleader"isselectedwithinthesubset.
TheteamleaderensuresthatthetargetlehasbeencreatedwiththeappropriateLustremeta-dataparameters,suchasthestripecountandstripewidth.
Ifthetypicallemeta-data(modicationdate,etc.
)istobeanexactcopyoftheoriginal,thenalloftheteammembersreporttotheteamleaderthattheyhavecompletedalloftheirI/Orequests.
Otherwise,theteammembersreportdirectlybacktothemasternodefortheirnextassignment.
Likewise,aftertheteammembersreportbacktotheirteamleaderforcompletionnotication,theyawaitfurtherinstructionsfromthemasternode.
Theteamleaderreportstothemasternodetoindicatethatthecopyhascompletedandtheteammembersarearereadyforthenextassignment.
ThetechniquesdescribedaboveallowtheloadonthetargetOSTstobemanaged.
Byinstructingtheprototypecommandtouseonlyaspeciednumberofprocessorsfortheparallelpart,inconjunctionwithspecifyingthebuerwidth,wecanensurethattheidealnumberofclientsareparticipatinginthecopyoperationforagivenle.
ContentioncanstillarisefromothercopythreadshavingstripesthatoverlaponthesameOST.
However,preventingthiswouldincreasethecomplexityandlikelyprovideonlymarginalimprovementsinperformance.
Theprototypeimplementationofspdcpstrivestomimicthestandardcpcommandthatusersarefamiliarwith.
Theintentistocreateadropinreplacementforcpthatuserscaneasilyemployintheirexistingscripts.
4Fig.
1.
Diagramofthecomponentsusedintheparalleldistributedcopy.
Allcomputenodesaccessthelesystem.
ThenumberofteammembersusedforasourceinputledependsonthesourceleLustrestripepattern.
However,someadditionalcommand-lineoptionshavebeenaddedtocontrolaspectsoftheparallelexecutionoftheutility.
Forexample,thereareoptionstocontrolthenumberoftasksandbuersizes.
Furthermore,sinceourenvironmentrequiressubmittingabatchjobtorunaparalleljob,theutilitycantransparentlysubmititselftothebatchqueue.
Consequentlythereareoptionsrelatedtothebatchsubmissionaswell.
AsampleexecutionisshowninFig.
2.
5PerformanceAseriesofperformancemeasurementswerecarriedoutonspdcptool.
Threereferencedatasetswerecreatedinordertomeasuretheperformanceofthespdcptool.
Therstdataset(workload1)consistedof2400les,eachofsize100MB.
Thisisrepresentativeoflestypicallycreatedbyamodelingapplicationwhichlaterareanalyzedorvisualized.
Theseconddataset(workload2)consistedof10les,eachofsize24000MB.
Thisisrepresentativeofacheckpointwhichisdonetoasharedle.
Thethirddataset(workload3)consistedof1200lesofsize100MBand5lesofsize24,000MB.
Thiswasdonetodemonstratetheabilitytoecientlycopyanon-uniformdataset.
TheLinuxcpcommandwasusedtoestablishbaselineperformance.
Then,weevaluatedtheperformanceatvariousscalesinordertounderstandthescalingbehaviorfortheprototype.
Thesemeasurementswereperformedona3500socketCrayXT3systemagainstitslocalLustrelesystem.
TheLustrelesystemconsistedof80OSTsservedby20ObjectStorageServers(OSSs).
Thebackendstoragewasprovidedby10coupletsofDDN8500[10].
ThislesystemhasbeenmeasuredusingtheIOR[11]benchmarktosustainover10GB/sonale-per-processrun.
5spdcp-s16-r/source/directory//target/directory/spdcp-hUsage:spdcp[options]SRCDESTorspdcp[options]SRC.
.
.
DIRECTORYCopyfileSRCtofileDESTorlistoffilesSRC.
.
.
todirectoryDIRECTORY,replicatingLustrestripeinformationwherepossible.
CopyisperformedinparallelbydistributedclientsusingMPImessagepassingforsynchronizationandcontrol.
Whencomputenoderesourcesareaccessibleonlyinbatchmode,commandwillstagejobandretaincontroluntiljobfinishes.
Thefollowingoptionsoffercontrolovercommand:-hPrintthismessage(disablescopy)-VPrintcommandandagentversions(disablescopy)-dUsedummyform(disablescopy,printstargets)-vIncreaseverbositylevel(maximum2)-pPreservemode,ownership,andtimestamps-r,-RCopyrecursively-cReduceOSTcountatdestinationtosourceusage-nDonotoffsetinitialOSTatdestination-b{F}IncreaseI/OrequestsizebyafactorofF-s{M}EmployMparallelclients-A{P}Ifspawningbatchjob,chargeruntoproject,P-w{T}Ifspawningbatchjob,limitwalltimetoTseconds-q{Q}Ifspawningbatchjob,directtobatchqueue,QFig.
2.
Sampleexecutionofspdcp(Top).
Thetotalnumberofclientsrequestedisidentiedbythe-sswitch.
Notethat,thisnumberalsoincludesthe"master(orrank0)node.
"Thespdcphelpmenu(Bottom).
AscanbeseeninFig.
3,spdcpachievesgoodparallelspeedup.
ThedataexhibitacertainamountofvariationbecausetheywereobtainedduringthecourseofnormalproductionoperationoftheCrayXT3.
ItshouldbenotedthatthestockLinuxcputilityachieved324MB/s,126MB/s,and177MB/sforworkload1,workload2,andworkload3,respectively.
Intermsofpeakperformance,ascanbeseeninFig.
3,theworkload2achievesthebestperformancewithspdcp,ataround9300MB/s.
Thisisa73xperformanceincreasecomparedtotheLinuxcputility.
Thepeakperformanceis7300MB/sforworkload1.
Thisis22xspeedupcomparedtotheLinuxcputility.
Forworkload3thepeakisatapproximately9100MB/s;a51xspeedupovertheLinuxcputility.
Also,ascanbeseeninFigure3,thepeakperformanceisobtainedat160to256clients.
However,fromapracticalpointofview,thescalingofperformancelevelsoataround100clients.
ThismakessensegiventhatthenumberofclientsandOSTsareroughlyequivalent.
Consequently,theOSTshavenearlyreachedtheirpeakbandwidth.
Thisisfurtherdemonstratedbythefactthattheaggregatebandwidthis73%to93%ofthepeakbandwidthasmeasuredbyIOR.
6On-goingworkTheparallelimplementationofthecopyutilityisjusttherststepinabroaderinitiativetocreateasuiteofparallelizedtools.
Towardsthisend,wehavestartedtocreateaframeworktogeneralizetheapproachesused61001000100001101001000AggregateBandwidth(MB/s)NumberofClientsWorkload1Workload2Workload3Fig.
3.
spdcpperformanceforclientsupto512.
Theworkload1iscomposedoflargeles,workload2iscomposedofsmallles,andworkload3isamixoflargeandsmallles.
ThestockLinuxcputilityachieved324MB/s,126MB/s,and177MB/sforworkload1,workload2,andworkload3,respectively(notshownonthegure).
inspdcpsothattheycaneasilybeappliedtoothercommonutilities.
Thespdcputilitydoesnotcurrentlyusetheframework,butmaybere-implementedusingtheframeworkinthenearfuture.
Thisframework,whichiscalledspdframe,hasalreadybeenusedforcompressionanddecompressionofbzip2les[12].
Thispresentsslightlymoredicultythanthecopytool,asthedecompositionfordecompressionismoredicult.
Preliminarytestsshowthatourbzip2implementationispromisingandunderrightcongurations(e.
g.
64processorswitha20MBle)itcanachieve15timesmoreperformanceforcompressioncomparedtoitsserializedversionon.
Futureworkwillfocusonapplyingtheframeworktotarandothercommonlebasedutilities.
Whilewearefocusingonapplyingtheframeworktocommontools,theframeworklendsitselftootherusesaswell.
Theframeworkprovidesaneasywayforuserstoapplyafunctionovermultiplelesinparallel.
So,forexample,ausercouldeasilyapplytheframeworktoperformaparallelgreponasetofles.
7RelatedworkIncreasingtheperformanceofcommonLinuxutilitiesgatheredsomeattentionfromtheresearchcommunityovertheyears.
WilliamGroppandEwingLusk[4]haverstrealizedthelimitationsoflegacyserialUNIXutilitiesinparallelenvironments.
TheyintroducedseveralparallelversionsofcommonlyusedUNIXutilitieswithparallelrshastheunderlyingparallelsynchronizationandcommunicationmechanism.
Asafollowuptotheirwork,EmilOng,EwingLusk,andWilliamGroppdevelopedtheMPI-basedversionoftheirparallelizedUNIXutilities[5].
However,thereisacleardistinctionbetweenourgoalandtheirs.
ThetargetforGroppandLuskwasincreaseeciencybyexecutingthesamecommandwiththesameargumentlistandparametersinparallelovermultipleindependentnodeswithindependentoperatingsystemsandlesystems.
Inmanyaspects,theyhaveimplementedSIMD-likeversionsofthecommonUNIXtools.
However,ourapproachdepartsfromtheirsasourgoalwastoincreasetheeciencyofasingleexecutionagivenLinuxutilitybyparallelizinganddistributingitsworkloadovermultipleworker/computenodes,allsharingacommonlesystem,butindependentOSes.
JeGilchristandAysegulCuhadar[7]introducedtwoparallelizedversionsofBWT-basedbzip2nblock-sortinglecompressor,namelypbzip2andmpibzip2.
Thepbzip2isathread-parallelversionofbzip2foruse7onsharedmemorymachines.
Itproducescompatiblebutlargerarchivescomparedtotheoriginalbzip2.
Thempibzip2isanMPI-basedparallelimplementationofthebzip2block-sortinglecompressorforclusters.
Thebzip2smpprogramisanotherparallelizedversionofthebzip2compressor[8].
ItisspecicallytargetedforSMPsystems.
Itisverycache-dependantanddoesnotperformwellwithhyperthreadedsystems.
Itissimilartopbzip2innature,butunlikepbzip2,bzip2smpsupportscompressionfromstdin.
ConclusionIncreasingparallelisminlesystemspavethewayforprocessinglargerdatasetsinshortertimes.
However,whilecapabilitiesforgeneratinglargerdatasetsareconstantlyincreasing,ourtoolsforhandlingandmanagingsuchles,stillremainserialandlimitedinperformance.
TheCenterforComputationalSciences(NCCS)atOakRidgeNationalLaboratory(ORNL)hasstartedaninitiativeforprovidinghigh-performance,parallelversionsofcommonlyusedLinuxcommands.
Thecpcommandwasourstartingpoint.
WehavedevelopedandimplementedaMPI-basedbatch-processingcapableparallelversionofthestandardcpcommand.
Testsshowthat,ourversioncanachieve73timesmoreperformanceoveritsstandardserializedcounterpart.
Also,thispaperintroducesoureortstowardsdevelopingaparallelizeddistributedversionofthebzip2command.
Theimplementationfollowsaframework,whichifsuccessful,willbeusedfordevelopingandparallelizingotherLinuxcommands.
AcknowledgmentsTheauthorswouldliketothankthestaandcolleagueswhohavecontributedmaterialtothispaper.
ResearchsponsoredbytheMathematical,Information,andComputationalSciencesDivision,OceofAd-vancedScienticComputingResearch,U.
S.
DepartmentofEnergy,underContractNo.
DE-AC05-00OR22725withUT-Battelle,LLC.
AbouttheAuthorsKenMatneyisaresearcherintheTechnologyIntegrationGroupwhichispartoftheNationalCenterforCom-putationalSciencesatOakRidgeNationalLab.
HecanbereachedbyE-Mail:matneykdsr@ornl.
gov.
ShaneCanonistheGroupLeaderforTechnologyIntegrationTeam.
HecanbereachedbyE-Mail:canonrs@ornl.
gov.
SarpOralisaresearcherintheTechnologyIntegrationGroupwhichispartoftheNationalCenterforCom-putationalSciencesatOakRidgeNationalLab.
HecanbereachedbyE-Mail:oralhs@ornl.
gov.
References1.
NationalCenterforComputationalSciences.
WebPagehttp://nccs.
gov.
2.
Top500Supercomputersites-November2007list.
WebPagehttp://www.
top500.
org/list/2007/11.
3.
ClusterFileSystems,Inc.
Lustremanual.
Webpage.
http://www.
lustre.
org/manual.
html.
4.
WilliamGroppandEwingL.
Lusk.
ScalableUnixtoolsonparallelprocessorsInProceedingsoftheScalableHigh-PerformanceComputingConference,pp.
56-62,1994.
5.
EmilOng,EwingL.
Lusk,andWilliamGropp.
ScalableUnixCommandsforParallelProcessors:AHigh-PerformanceImplementationInProceedingsofthe8thEuropeanPVM/MPIUsers'GroupMeetingonRecentAdvancesinParallelVirtualMachineandMessagePassing,pp.
410-418,2001.
6.
M.
BurrowsandD.
J.
Wheeler.
Ablock-sortinglosslessdatacompressionalgorithmTechnicalReport124,DigitalSystemsResearchCenter,1994.
7.
JeGilchristandAysegulCuhadar.
ParallelLosslessDataCompressionBasedontheBurrows-WheelerTransformIn21stInternationalConferenceonAdvancedNetworkingandApplications(AINA'07),pp.
877-884,2007.
8.
WebPagehttp://bzip2smp.
sourceforge.
net/9.
R.
S.
CanonandH.
SarpOral.
ACenter-wideFileSystemusingLustre.
InCUGProceedings,2006.
10.
DataDirectNetworks.
WebPagehttp://datadirectnetworks.
com/11.
HedgesetalParallellesystemtestingforthelunaticfringe:thecareandfeedingofrestlessI/OpowerusersInIEEEMassStorageSystemsandTechnologiesProceedings,200512.
JulianSeward.
Thebzip2andlibbzip2ocialhomepage.
WebPagehttp://sources.
redhat.
com/bzip2

展开全文