cumulativesandybridge

sandybridge  时间:2021-03-27  阅读:()
MeasuringEnergyandPowerwithPAPIVincentM.
Weaver,MattJohnson,KiranKasichayanula,JamesRalph,PiotrLuszczek,DanTerpstra,andShirleyMooreInnovativeComputingLaboratoryUniversityofTennessee{vweaver1,mrj,kirankk,ralph,luszczek,terpstra,shirley}@eecs.
utk.
eduAbstract—Energyandpowerconsumptionarebecomingcriti-calmetricsinthedesignandusageofhighperformancesystems.
WehaveextendedthePerformanceAPI(PAPI)analysislibrarytomeasureandreportenergyandpowervalues.
ThesevaluesarereportedusingtheexistingPAPIAPI,allowingcodepreviouslyinstrumentedforperformancecounterstoalsomeasurepowerandenergy.
HigherleveltoolsthatbuildonPAPIwillautomat-icallygainsupportforpowerandenergyreadingswhenusedwiththenewestversionofPAPI.
WedescribeindetailthetypesofenergyandpowerreadingsavailablethroughPAPI.
Wesupportexternalpowermeters,aswellasvaluesprovidedinternallybyrecentCPUsandGPUs.
Measurementsareprovideddirectlytotheinstrumentedprocess,allowingimmediatecodeanalysisinrealtime.
Weprovideexamplesshowingresultsthatcanbeobtainedwithourinfrastructure.
IndexTerms—energymeasurement;powermeasurement;per-formanceanalysisI.
INTRODUCTIONThePerformanceAPI(PAPI)[1]frameworkhastradition-allyprovidedlow-levelcross-platformaccesstothehardwareperformancecountersavailableonmostmodernCPUs.
WiththeadventofcomponentPAPI(PAPI-C)[2],PAPIhasbeenextendedtoprovideawidervarietyofperformancedatafromvarioussources.
Recentlyanumberofnewcomponentshavebeenaddedthatprovidetheabilitytomeasureasystem'senergyandpowerusage.
Energyandpowerhavebecomeincreasinglyimportantcomponentsofoverallsystembehaviorinhigh-performancecomputing(HPC).
Powerandenergyconcernswereonceprimarilyofinteresttoembeddeddevelopers.
NowthatHPCmachineshavehundredsofthousandsofcores[3],theabilitytoreduceconsumptionbyjustafewWattsperCPUquicklyaddsuptomajorpower,cooling,andmonetarysavings.
TherehasbeenalotofHPCinterestinthisarearecently,includingtheGreen500[4]listofenergy-efcientsupercomputers.
PAPI'sabilitytobeextendedbycomponentsallowsaddingsupportforenergyandpowermeasurementswithoutanychangesneededtothecoreinfrastructure.
Existingcodethatisalreadyinstrumentedformeasuringperformancecounterscanbere-used;thenewpowerandenergyeventswillshowupineventlistingsjustlikeotherperformanceevents,andcanbemeasuredwiththesameexistingPAPIAPI.
ThiswillallowcurrentusersofPAPIonHPCsystemstoanalyzepowerandenergywithlittleadditionaleffort.
Therearemanyexistingtoolsthatprovideaccesstopowerandenergymeasurements(oftenthesecomewiththepowermeasuringhardware).
PAPI'sadvantageisthatitallowsmea-suringadiversesetofhardwarewithonecommoninterface.
Usersonlyinstrumenttheircodeonce,andthencanuseitwithminimalchangesastheircodeismovedbetweendifferentmachineswithdifferenthardware.
WithoutPAPItheinstrumentedcodewouldhavetobere-writtendependingonwhatpowermeasurementhardwareitisrunningon.
AnotherbenetofPAPIisthatinadditiontomeasuringenergyandpower,italsoprovidesaccesstoothervalues,suchasCPUperformancecounters,GPUcounters,network,andI/O.
Allofthesecanbemeasuredatthesametime,providingforaricheranalysisenvironment.
ManyoftheotheradvancedPAPIfeatures,suchassamplingandproling,canpotentiallybeusedinconjunctionwiththesenewpowerandenergyevents.
Higher-leveltoolsthatbuildontopofPAPI(suchasTAU[5],HPCToolkit[6],orVampir[7])automaticallygetsupportforthesenewmeasurementsassoonastheyarepairedwithanupdatedPAPIversion.
WewilldescribeindetailthevarioustypesofpowerandenergymeasurementsthatwillbeavailableinthePAPI5.
0release,aswellasshowingexamplesofthedatathatcanbegathered.
II.
RELATEDWORKTherearevariousexistingtoolsthatprovideaccesstopowerandenergyvalues.
Ingeneralthesetoolsdonothaveacross-platformAPIlikePAPI,noraretheydeployedaswidely.
PAPIhasthebenetofallowingenergymeasurementsatthesametimeasCPUandotherperformancecountermeasurements,allowinganalysisoflow-levelenergybehavioratthesourcecodelevel.
PAPIcanalsoactasanabstractionlibrary,somostofthetoolslistedbelowcouldbegivenPAPIcomponentinterfaces.
ThetoolthatprovidesthemostsimilarfunctionalitytoPAPIistheIntelEnergyCheckerSDK[8].
ItprovidesanAPIforinstrumentingcodeandgatheringenergyinformationfromavarietyofexternalpowermetersandsystemcounters.
Itprovidessupportforvariousoperatingsystems,butislimitedtoIntelarchitectures.
PowerPack[9]providesaninterfaceformeasuringpowerfromavarietyofexternalpowersources.
TheAPIprovidesroutinesforstartingandstoppingthegatheringofdataontheremotemachine.
UnlikePAPI,themeasurementsaregatheredout-of-band(onaseparatemachine)andthuscannotbedirectlyprovidedtotherunningprocessinrealtime.
Appearedinthe2012PASAWorkshopIBMPowerExecutive[10]allowsmonitoringpowerandenergyonIBMbladeservers.
AswithPowerPack,thedataisgatheredandanalyzedbyatool(inthiscaseIBMDirector)runningonaseparatemachine.
Shinetal.
[11]constructapowerboardforanARMsystemthatestimatespowerandcommunicateswithafront-endtoolviaPCI.
Varioustoolsaredescribedthatusethegatheredinformation,butthereisnotagenericAPIforaccessingit.
TheLinuxEnergyAttributionandAccountingPlatform(LEA2P)[12]acquiresdataonasystemwithhardwarecustom-modiedtoprovidepowerreadingsviaadataacqui-sitionboard.
ThesevaluesarepassedintotheLinuxkernelandmadeavailableviathe/proclesystemandcanbereadin-band.
PowerScope[13]usesadigitalmultimetertoperformoff-lineanalysisusingstatisticalsampling.
Itprovidesakernel-levelinterface(viasystemcalls)tostartandstopmeasure-ments;thisrequiresmodifyingtheoperatingsystem.
Thebenetofthissystemisthatpowerinformationiskeptintheprocesstable,allowingonetomapenergyusageinadetailedper-processway.
TheEnergyEndoscope[14]isanembeddedwirelesssensornetworkthatprovidesdetailedreal-timeenergymeasurementsviaacustom-designedhelperchip.
TheLinuxkernelismodiedtoreportenergyin/proc/statalongwithotherprocessorstats.
IsciandMartonosi[15]combineexternalpowermetermea-surementswithperformancecounterresultstogeneratepowerreadingswithamodeledCPU.
Thereadingsaregatheredonanexternalmachine.
Bellosa[16]proposesJouleWatcher,aninfrastructurethatuseshardwareperformancecounterstoestimatepowerandprovidethisinformationtothekernelforschedulingdecisions.
HeproposesagenericAPItoprovidethisinformationtousers.
III.
BACKGROUNDPAPIusershaverecentlybecomemoreconcernedwithenergyandpowermeasurements.
Partofthisisduetotheadditionofembeddedsystemsupport(includingARMandMIPSprocessors)andpartisfromthecurrentinterestinenergy-efciencyinPAPI'straditionalHPCenvironment.
WithPAPI-C(componentPAPI)itisstraightforwardtoaddextraPAPI"components"thatreportvaluesoutsideoftheusualhardwareperformancecountersthatwerelongthemainstayofPAPI.
ThePAPIAPIreturnsunsigned64-bitintegers;aslongasapowerorenergyvaluecantthatconstraintnochangesatallneedtobemadetoexistingPAPIcode.
A.
NewPAPIInterfacesTheexistingPAPIinterfaceissufcientforprovidingpowerandenergyvalues,buttherecentPAPI5.
0releaseaddsmanyfeaturesthatimprovethecollectionofthisinformation.
Themostimportantnewfeatureisenhancedeventinfor-mationsupport.
Theusercanqueryaneventandobtainfarricherdetailsthanwereavailablepreviously.
Thenewinterfaceallowsspecifyingunitsforareturnedvalue,allowingausertoknowifthevaluestheyaregettingarein"Watts","Joules"orperhapseven"nano-Joules"withouthavingtolookinthesystemdocumentation.
Anothernewfeatureistheabilitytoreturnvaluesotherthanunsignedintegers,includingoatingpoint.
Thisallowreturningpowervaluesinhuman-friendlyamountssuchas96.
45Wattsratherthan96450milliwatts.
Additionaleventinformationisprovidedthatwillhelpexternaltoolsanalyzetheresults,especiallywhentryingtocorrelatepowerresultswithothermeasurements.
PAPInowprovidesthefrequencywithwhichthevalueisupdatedandwhetherthevaluereturnedisinstantaneous(likeanaveragepowerreading)orcumulative(totalEnergy).
B.
LimitationsTherearesomelimitationswhenmeasuringpowerandenergyusingPAPI.
Typicallythesereadingsaresystem-wide:itisnotpossibletoexactlymaptheresultsexactlytotheuser'scode,especiallyonmulti-coresystems.
Oftenauserisinter-estedinknowingwherethepowerusagecomesfrom:powersupplyinefciencies,theCPU,networkcard,memory,etc.
Withexternalpowermetersitisnotpossibletobreakdownthefull-systempowermeasurementsintoper-componentvalues.
Sincepoweroptimizationforvarioushardwarecomponentsrequiredifferentstrategies,havingonlytotalsystempowermightnotprovideenoughinformationtoallowoptimization.
IdeallyonecouldcorrelatepowerandenergywithCPUandotherPAPImeasurements.
Thiscanbedone;valuescanbemeasuredatthesametime(althoughinseparateeventsets).
Howeverduetothenatureofthemeasurementsitishardtogetanexactcorrelation.
Anotherissueisthatofmeasurementoverhead.
SincePAPIhastorunonthesystemgatheringtheresults,itcontributestotheoverallpowerbudgetofthesystem.
Toolsthatmeasurepowerexternallydonothavethisproblem.
IV.
PAPIENERGYANDPOWERCOMPONENTSThenewPAPI5.
0releaseaddssupportforvariouspowerandenergycomponents.
PAPIcomponentsmeasurepowerandenergyin-band:aprogramisinstrumentedwithPAPIcallsandcanreadmea-surementdataintotherunningprocess.
Thedatacanbestoredtodiskforlaterofineanalysis,butbydefaultitisavailableforimmediateaction.
Thiscontrastswithothertoolsthatonlysupportout-of-bandmeasurements:theycanonlyanalyzecodeatalatertime,andtheprogrambeingproledisnotawareofitscurrentpowerorenergystatus.
Weuselinearalgebraroutinesthatperformone-sidedfac-torizationofdensematricestocomparevariousmethodsofmeasuringenergy.
Inparticular,wetestCholeskyfactorizationfromPLASMA[17]ontheprocessorsideandLUfactor-izationontheGPUusingMAGMA[18].
Bothofthesearecomputationallyboundandthusshowvariablepowerdrawbythecomputingdevice:eitherCPUorGPU.
Ourtestsalsoshowmemoryeffectsbyincludingmemoryboundoperationssuchasllingthematriceswithinitialvalues.
2Appearedinthe2012PASAWorkshop0204060801001201401600510152025303540Power(Watts)Time(seconds)CPUMemoryMotherboadFanFig.
1.
PLASMACholeskypowerusagegatheredbyPowerPack(notPAPI).
Resultsweregatheredout-of-band;PAPIcangathersimilardatain-band.
Forcomparisonpurposes,Figure1showsPLASMACholeskyresultsgatheredwithPowerPack[9](notPAPI)onamachinecustom-wiredforpowermeasurement.
Resultsaregatheredonanunrelatedmachine(whichhastheadvantageofnotincludingtheoverheadofthemeasurementinthepowerreadings).
WeshowthatPAPIcangeneratesimilarresultsfromavarietyofpowermeasurementdevices.
A.
ExternalMeasurementThemostcommontypeofpowermeasurementinfrastruc-tureisonewhereanexternalpowermeterisused.
ForPAPItoaccessthedata,thevalueshavetobepassedbacktothemachinebeingmeasured.
ThisisusuallydoneviaaserialorUSBconnection.
Theeasiesttypeofequipmenttouseinthiscaseisonewhereapowerpass-throughisused;thisdevicelookslikeapowerstrip,andallowsmeasuringthepowerconsumptionofanythingpluggedintothedevice.
Moreintrusivefull-systeminstrumentationcanbedone,wherewiresarehookedintopowersupplies,disks,processorsockets,andDIMMsockets.
Thisenablesne-grainedpowermeasurementbutusuallyrequiresextensiveinstallationcosts.
1)Watt'sUpProPowerMeter:TheWatt'sUpPropower-meterisanexternalmeasurementdevicethatasystemplugsintoinsteadofawalloutlet;itprovidesvariousmeasurementsviaaUSBserialconnection.
Themetricscollectedincludeaveragepower,voltage,current,andvariousothers.
Energycanbederivedbasedontheaveragepowerandtime.
Theresultsaresystem-wideandlowresolution,withupdatesonlyonceasecond.
WritingaPAPIdriverforthisdeviceisnontrivial,astheresultsbecomeavailableeverysecondwhetherrequestedornot.
Anydatacanpotentiallybelostiftheon-boardloggingmemoryisfullandareaddoesnothappenintheone-secondtimewindow.
SincePAPIuserscannotbeexpectedtohavetheircodeinterruptitselfonceasecondtomeasuredata,thePAPIcomponentforksahelperthreadthatreadsthedataonaregularbasis,andthenreturnsoverallvalueswhenaninstrumentedprogramrequestsit.
SomedatagatheredfromaWatt'sUpProdeviceareshowninFigure2.
Theresultsarecoarseduetotheone-secondsamplingfrequencyofthedevice.
Thiscanbegoodenoughfordoingvalidationandglobalinvestigations,butprobablynotdetailedenoughwhentuningcodeforenergyefciency.
However,thegeneraltrendsinpowerconsumptionforthecodeinquestion(CholeskyfactorizationfromPLASMA[17])aresimilartothemuchner-graingraphinFigure1.
InFigure2theinitialspikeinpowerconsumptiontoabout50W(twosecondsintotherun)representsdatageneration(creationofarandommatrix)andcorrespondstoaatledgeatabout130WinFigure1.
Foursecondsintotherun,bothguresindicateauctuationaroundthemaximumpowerlevelforthewholerun.
TheuctuationsaremuchmoreaccuratelyportrayedinFigure1,indicatingtheneedforgranularitysubstantiallylowerthan1secondavailablefortheWatt'sUpProdevice.
2)PowerMon2:Thepowermon2[19]cardsitsbetweenasystem'spowersupplyanditsvariouscomponents.
Itmeasuresvoltageandcurrenton8differentlines,monitoringmostofthepowergoingintothecomputer.
Measurementshappenatafrequencyofupto3kHz;thisismultiplexedacrossauser-selectedsubsetofthe8channels.
WeareworkingonaPAPIcomponentforthisdevice,butsupportiscurrentlynotavailable.
Weforeseeusingthisdevicetoprovideenergyresultsatadetailnotavailablewithotherexternalpowermeters.
B.
InternalMeasurementRecentcomputerhardwareincludessupportformeasuringenergyandpowerconsumptioninternally.
Thisallowsne-grainedpoweranalysiswithouthavingtocustom-instrumentthehardware.
3Appearedinthe2012PASAWorkshop0102030Time(seconds)0204060AveragePower(Watts)PLASMACholeskyFactorizationN=10,000threads=2Fig.
2.
PLASMACholeskypowergatheredwithaWatt'sUpProdeviceonanIntelCore2laptop.
Coarseresultsduetoone-secondsamplingfrequency.
Accesstothemeasurementsusuallyrequiresdirectlow-levelhardwarereads,althoughsometimestheoperatingsystemoralibrarywilldothisforyou.
1)IntelRAPL:RecentIntelSandyBridgechipsincludethe"RunningAveragePowerLimit"(RAPL)interface,whichisdescribedintheIntelSoftwareDeveloper'sManual[20].
RAPL'soveralldesigngoalistoprovideaninfrastructureforkeepingprocessorsinsideofagivenuser-speciedpowerenvelope.
Theinternalcircuitrycanestimatecurrentenergyusagebasedonamodeldrivenbyhardwarecounters,tem-perature,andleakagemodels.
Theresultsofthismodelareavailabletotheuserviaamodelspecicregister(MSR),withanupdatefrequencyontheorderofmilliseconds.
ThepowermodelhasbeenvalidatedbyIntel[21]tocloselyfollowactualenergybeingused.
PAPIprovidesaccesstothevaluesreturnedbythepowermodel.
AccessingMSRsrequiresring-0accesstothehardware;typicallyonlytheoperatingsystemkernelcandothis.
ThismeansaccessingtheRAPLvaluesrequiresakerneldriver.
CurrentlyLinuxdoesnotprovidesuchadriver;onehasbeenproposed[22]butitisunlikelyitwillbemergedintothemainkerneltreeanytimesoon.
Togetaroundthisproblem,weusetheLinux"MSRdriver"thatexportsMSRaccesstouserspaceviaaspecialdevicedriver.
IftheMSRdriverisenabledandgivenproperread-onlypermissionsthenPAPIcanaccesstheseregistersdirectlywithoutneedingkernelsupport.
TherearesomelimitationstoaccessingRAPLthisway.
Theresultsaresystem-widevaluesandcannoteasilybeattributedtoindividualthreads.
Thisisnotworsethanmeasurementsofanysharedresource;onmodernIntelchipslastlevelcachesandtheuncoreeventssharethislimitation.
RAPLreportsvariousenergyreadings.
Thisincludestheenergyusageforthetotalprocessorpackageandthetotalcombinedenergyusedbyallthecores(referredtoasPower-Plane0(PP0)).
PP0alsoincludesalloftheprocessorcaches.
SomeversionsofSandyBridgechipsalsoreportpowerusagebytheon-boardGPU(Power-Plane1(PP1)).
SandybridgeEPchipsdonotsupporttheGPUmeasurement,butinsteadreportenergyreadingsfortheDRAMinterface.
WhiletheRAPLvaluescanbemeasuredin-bandandconsumedbytheprogram,sinceRAPLissystem-wideaseparateprocessmaybeusedtomeasureenergyandpower.
InthiswaytherunningcodedoesnotneedtobeinstrumentedandsomeofthePAPIoverheadcanbeavoided.
Weusethismethodtogathertheresultspresented.
WetakemeasurementsonaSandybridgeEPmachine.
Ithas2CPUpackages,eachwith8cores,andeachcorewith2threads.
Figure3showssomeaveragepowermea-surementsgatheredwhiledoingCholeskyfactorizationusingthePLASMAlibrary.
Noticethattheenergyusagebyeachpackagevaries,despiteallofthecoresdoingsimilarwork.
Partofthisislikelyduetovariationsinthecoresatthesiliconlevel,asnoticedbyRountreeetal.
[23].
Figure4showsthesamemeasurementsusingtheIntelMKLlibrary[24].
Figure5showssomeenergymeasurementscomparingthesameCholeskyfactorizationusingbothPLASMAandIntelMKLonthesamehardware.
ThePAPIresultsshowthatforthiscase,PLASMAusesenergymorequickly,butnishesfasteranduseslesstotalenergyforthecalculation.
2)AMDApplicationPowerManagement:RecentAMDFamily15hprocessorscanreport"CurrentPowerInWatts".
[25]viathe"ProcessorPowerinTDP"MSR.
Weareinvesti-gatingPAPIsupportforthisandhopetodeployacomponentsimilarinnatureandscopetotheIntelRAPLcomponent.
4Appearedinthe2012PASAWorkshop10203040Time(seconds)050100150AveragePower(Watts)PLASMACholeskyFactorizationN=30,000threads=16DRAMPackage0DRAMPackage1PP0Package0PP0Package1TotalPackage0TotalPackage1Fig.
3.
PLASMACholeskypowerusagemeasuredwithRAPLonSandybridgeEP.
PowerPlane0(PP0)istotalusageforall8coresinapackage.
10203040Time(seconds)050100150AveragePower(Watts)MKLCholeskyFactorizationN=30,000threads=16DRAMPackage0DRAMPackage1PP0Package0PP0Package1TotalPackage0TotalPackage1Fig.
4.
IntelMKLCholeskypowerusagemeasuredwithRAPLonSandybridge.
PowerPlane0(PP0)istotalusageforall8coresinapackage.
10203040Time(seconds)01000200030004000TotalEnergy(Joules)CholeskyFactorizationN=30,000threads=16PLASMAPackage0PLASMAPackage1mklPackage0mklPackage1Fig.
5.
Energyusageoftwodifferentimplementations(PLASMAandMKL)ofCholeskyonSandybridgeEPmeasuredwithRAPL.
5Appearedinthe2012PASAWorkshop012Time(seconds)050100150AveragePower(Watts)Fig.
6.
MAGMALUwithsize10,000powermeasurementonanNvidiaFermiC2075,gatheredwithNVML.
3)NVIDIAManagementLibrary:RecentNVIDIAGPUscanreportpowerusageviatheNVIDIAManagementLi-brary(NVML)[26].
ThenvmlDeviceGetPowerUsage()routineexportsthecurrentpower;onFermiC2075GPUsithasmilliwattresolutionwithin±5Wandisupdatedatroughly60Hz.
Thepowerreportedisthatfortheentireboard,includingGPUandmemory.
GatheringdetailedperformanceinformationfromaGPUisdifcult:onceyoudispatchcodetoaGPUtherunningCPUhasnocontroloverituntiltheGPUreturnsuponcomple-tion.
ThismeansthatitisnotgenerallypossibletoattributewhatGPUcodecorrespondstowhatpowerreadings.
Nvidiaprovidesahigh-levelutilitycallednvidia-smiwhichcanbeusedtomeasurepower,butitssamplerateistoolongtoobtainusefulmeasurements.
InordertoprovidebetterpowermeasurementswehaveconstructedanNVMLcomponent[27]forPAPIandhavevalidatedtheresultsusinga"Kill-A-Watt"powermeter.
Figure6showsdatagatheredonanNvidiaFermiC2075cardrunningaMAGMA[28]kernelusingtheLUalgo-rithm[29]withamatrixsizeof10k.
TheMAGMALUfactorizationisacomputeboundalgo-rithm(expressedintermsofGEMMs);itusesahybridizationmethodologytosplitthecomputationbetweentheCPUhostandGPU.
ThesplitaimstomatchLU'salgorithmicrequire-mentstothearchitecturalstrengthsoftheGPUandtheCPU.
InthecaseofLU,thistranslatesintohavingallmatrix-matrix(GEMM)multiplicationdoneontheGmyPU,andthepanelfactorizationsonCPU.
ThedesignofthealgorithmallowsforbigenoughmatricestototallyoverlaptheCPUworkwiththelargematrix-matrixmultiplicationsontheGPU.
Asaresult,theperformanceoftheMAGMALUalgorithmrunsatthespeedofperformingGEMMsontheGPU.
OurexperimentshaveshownthattheuseofMAGMAGEMMoperationsonGPUcompletelyutilizeit,maximizingthepowerconsumption.
ThisexplainswhythehybridLUfactorizationalsomaximizestheGPUpowerconsumption,whichreducestimetakensotheoverallenergyconsumptionisminimized.
C.
EstimatedPowerVariousresearcheshaveproposedusinghardwareperfor-mancecounterstomodelenergyandpowerconsumption[15],[30],[31],[32],[33],[16],[34],[35],[36].
Goeletal.
[36]haveshownthatpowercanbemodeledtowithin10%usingjustfourhardwareperformancecounters.
UsingthePAPIuser-denedeventsinfrastructure[37]aneventcanbecreatedthatderivesanestimatedpowervaluefromthehardwarecounters.
Thiscanbeusedtomeasurepoweronsystemsthatdonothavehardwarepowermeasure-mentavailable.
V.
CONCLUSIONThePAPIlibrarycannowprovidetransparentaccesstopowerandenergymeasurementsviaexistinginterfaces.
Exist-ingprogramsthatalreadyhaveinstrumentationforPAPIforCPUperformancemeasurementscanquicklybeadaptedtomeasurepower,andexistingtoolswillgainaccesstothenewpowereventswithasimplePAPIupgrade.
Withlargerandlargerclustersbeingbuilt,energyconsump-tionhasbecomeoneofthedeningconstraints.
PAPIhasbeencontinuallyextendedtoprovidesupportforthemostup-to-dateperformancemeasurementsonmodernsystems.
TheadditionofpowerandenergymeasurementsallowPAPIuserstostay6Appearedinthe2012PASAWorkshopontopofthisincreasinglyimportantareainthealwaysrapidlychangingHPCenvironment.
ACKNOWLEDGMENTThismaterialisbaseduponworksupportedbytheNationalScienceFoundationunderGrantNo.
0910899andtheU.
S.
DepartmentofEnergyOfceofScienceundercontractDE-FC02-06ER25761.
REFERENCES[1]S.
Browne,J.
Dongarra,N.
Garner,G.
Ho,andP.
Mucci,"Aportableprogramminginterfaceforperformanceevaluationonmodernproces-sors,"InternationalJournalofHighPerformanceComputingApplica-tions,vol.
14,no.
3,pp.
189–204,2000.
[2]D.
Terpstra,H.
Jagode,H.
You,andJ.
Dongarra,"Collectingperfor-mancedatawithPAPI-C,"in3rdParallelToolsWorkshop,2009,pp.
157–173.
[3]"Top500supercomputingsites,"http://www.
top500.
org/.
[4]"Topgreen500list::Environmentallyresponsiblesupercomputing,"http://www.
green500.
org/.
[5]S.
ShendeandA.
Malony,"TheTauparallelperformancesystem,"InternationalJournalofHighPerformanceComputingApplications,vol.
20,no.
2,pp.
287–311,2006.
[6]L.
Adhianto,S.
Banerjee,M.
Fagan,M.
Krentel,G.
Marin,J.
Mellor-Crummey,andN.
Tallent,"HPCToolkit:Toolsforperformanceanalysisofoptimizedparallelprograms,"ConcurrencyandComputation:Prac-ticeandExperience,vol.
22,no.
6,pp.
685–701,2010.
[7]W.
Nagel,A.
Arnold,M.
Weber,H.
-C.
Hoppe,andK.
Solchenbach,"VAMPIR:VisualizationandanalysisofMPIresources,"Supercom-puter,vol.
12,no.
1,pp.
69–80,1996.
[8]Intel,IntelEnergyChecker:SoftwareDeveloperKitUserGuide,2010.
[9]R.
Ge,X.
Feng,S.
Song,H.
-C.
Chang,D.
Li,andK.
Cameron,"Pow-erPack:Energyprolingandanalysisofhigh-performancesystemsandapplications,"IEEETransactionsonParallelandDistributedSystems,vol.
21,no.
6,May2010.
[10]P.
Popa,"ManagingserverenergyconsumptionusingIBMPowerExec-utive,"IBMSystemsandTechnologyGroup,Tech.
Rep.
,2006.
[11]D.
Shin,H.
Shim,Y.
Joo,H.
-S.
Yun,J.
Kim,andN.
Chang,"Energy-monitoringtoolforlow-powerembeddedprograms,"IEEEDesign&TestofComputers,vol.
19,no.
4,pp.
7–17,July/August2002.
[12]S.
Ryffel,"LEA2P:Thelinuxenergyattributionandaccountingplat-form,"Master'sthesis,SwissFederalInstituteofTechnology,Jan.
2009.
[13]J.
FlinnandM.
Satyanarayanan,"PowerScope:atoolforprolingtheenergyusageofmobileapplications,"inProc.
ofthe2ndIEEEWorkshoponMobileComputingSystemsandApplications,Feb.
1999,pp.
2–10.
[14]T.
Stathopoulos,D.
McIntire,andW.
Kaiser,"Theenergyendoscope:Real-timedetailedenergyaccountingforwirelesssensornodes,"inProc.
oftheInternationalConferenceonInformationProcessinginSensorNetworks,Apr.
2008,pp.
383–394.
[15]C.
IsciandM.
Martonosi,"Runtimepowermonitoringinhigh-endprocessors:Methodologyandempiricaldata,"inProc.
IEEE/ACM36thAnnualInternationalSymposiumonMicroarchitecture,Dec.
2003.
[16]F.
Bellosa,"Thebenetsofevent:drivenenergyaccountinginpower-sensitivesystems,"inProceedingsofthe9thworkshoponACMSIGOPSEuropeanworkshop,2000.
[17]PLASMAUsers'Guide,ParallelLinearAlgebraSoftwareforMulticoreArchitectures,Version2.
3,UniversityofTennesseeKnoxville,Nov.
2010.
[18]S.
Tomov,R.
Nath,H.
Ltaief,andJ.
Dongarra,"DenselinearalgebrasolversformulticorewithGPUaccelerators,"inProc.
24thIEEE/ACMInternationalParallelandDistributedProcessingSymposium,Apr.
2010.
[19]D.
Bedard,R.
Fowler,M.
Linn,andA.
Portereld,"PowerMon2:Fine-grained,integratedpowermeasurement,"RenaissanceComputingInstitute,Tech.
Rep.
TR-09-04,2009.
[20]Intel,IntelArchitectureSoftwareDeveloper'sManual,Volume3:SystemProgrammingGuide,2009.
[21]E.
Rotem,A.
Naveh,D.
Rajwan,A.
Anathakrishnan,andE.
Weissmann,"Power-managementarchitectureoftheIntelmicroarchitecturecode-namedSandyBridge,"IEEEMicro,vol.
32,no.
2,pp.
20–27,2012.
[22]Z.
Rui.
(2011,May)[patch2/3]introducein-telrapldriver.
linux-kernelmailinglist.
[Online].
Available:http://thread.
gmane.
org/gmane.
linux.
kernel/1145973[23]B.
Rountree,D.
Ahn,B.
deSupinski,D.
Lowenthal,andM.
Schulz,"BeyondDVFS:Arstlookatperformanceunderahardware-enforcedpowerbound,"inProc.
of8thWorkshoponHigh-Performance,Power-AwareComputing,May2012.
[24]Intel,Intel,MathKernelLibrary(MKL),http://www.
intel.
com/software/products/mkl/.
[25]AMD,AMDFamily15hProcessorBIOSandKernelDeveloperGuide,2011.
[26]NVMLReferenceManual,NVIDIA,2012.
[27]K.
Kasichayanula,"PowerawarecomputingonGPUs,"Master'sthesis,UniversityofTennessee,Knoxville,May2012.
[28]E.
Agullo,C.
Augonnet,J.
Dongarra,H.
Ltaief,R.
Namyst,S.
Thibault,andS.
Tomov,"Faster,cheaper,better-ahybridizationmethodologytodeveloplinearalgebrasoftwareforGPUs,"LAPACKWorkingNote230.
[29]S.
Yamazaki,S.
Tomov,andJ.
Dongarra,"One-sideddensematrixfactorizationsonamulticorewithmultipleGPUaccelerators,"inProc.
ofthe2012InternationalConferenceonComputationalScience,Jun.
2012.
[30]K.
Singh,M.
Bhadauria,andS.
McKee,"Realtimepowerestimationofmulti-coresviaperformancecounters,"Proc.
WorkshoponDesign,ArchitectureandSimulationofChipMulti-Processors,Nov.
2008.
[31]I.
Kadayif,T.
Chinoda,M.
Kandemir,N.
Vijaykirsnan,M.
Irwin,andA.
Sivasubramaniam,"vEC:virtualenergycounters,"inProc.
ofthe2001ACMSIGPLAN-SIGSOFTworkshoponProgramanalysisforsoftwaretoolsandengineering,Jun.
2001.
[32]V.
Tiwari,S.
Malik,andA.
Wolfe,"Poweranalysisofembeddedsoftware:arststeptowardssoftwarepowerminimization,"IEEETransactionsonVLSI,vol.
3,no.
4,pp.
437–445,1994.
[33]J.
RussellandM.
Jacome,"Softwarepowerestimationandoptimizationforhighperformance,32-bitembeddedprocessors,"inProc.
IEEEInternationalConferenceonComputerDesign,Oct.
1998,pp.
328–333.
[34]R.
JosephandM.
Martonosi,"Run-timepowerestimationinhigh-performancemicroprocessors,"inProc.
IEEE/ACMInternationalSym-posiumonLowPowerElectronicsandDesign,Aug.
2001,pp.
135–140.
[35]J.
Haid,G.
Kaefer,C.
Steger,andR.
Weiss,"Run-timeenergyestimationinsystem-on-a-chipdesigns,"inProc.
oftheAsiaandSouthPacicDesignAutomationConference,Jan.
2003,pp.
595–599.
[36]B.
Goel,S.
McKee,R.
Gioiosa,K.
Singh,M.
Bhadauria,andM.
Cesati,"Portable,scalable,per-corepowerestimationforintelligentresourcemanagement.
"inFirstInternationalGreenComputingConference,Aug.
2010.
[37]S.
MooreandJ.
Ralph,"User-denedeventsforhardwareperformancemonitoring,"inProc.
11thWorkshoponToolsforProgramDevelopmentandAnalysisinComputationalScience,Jun.
2011.
7

器安装环境和运维管理工具推荐

今天看到一个网友从原来虚拟主机准备转移至服务器管理自己的业务。这里问到虚拟主机和服务器到底有什么不同,需要用到哪些工具软件。那准备在下班之间稍微摸鱼一下整理我们服务器安装环境和运维管理中常见需要用到的软件工具推荐。第一、系统镜像软件一般来说,我们云服务器或者独立服务器都是有自带镜像的。我们只需要选择镜像安装就可以,比如有 Windows和Linux。但是有些时候我们可能需要自定义镜像的高级玩法,这...

华纳云CN2高防1810M带宽独享,三网直cn218元/月,2M带宽;独服/高防6折购

华纳云怎么样?华纳云是香港老牌的IDC服务商,成立于2015年,主要提供中国香港/美国节点的服务器及网络安全产品、比如,香港服务器、香港云服务器、香港高防服务器、香港高防IP、美国云服务器、机柜出租以及云虚拟主机等。以极速 BGP 冗余网络、CN2 GIA 回国专线以及多年技能经验,帮助全球数十万家企业实现业务转型攀升。华纳云针对618返场活动,华纳云推出一系列热销产品活动,香港云服务器低至3折,...

SunthyCloud阿里云国际版分销商注册教程,即可PayPal信用卡分销商服务器

阿里云国际版注册认证教程-免绑卡-免实名买服务器安全、便宜、可靠、良心,支持人民币充值,提供代理折扣简介SunthyCloud成立于2015年,是阿里云国际版正规战略级渠道商,也是阿里云国际版最大的分销商,专业为全球企业客户提供阿里云国际版开户注册、认证、充值等服务,通过SunthyCloud开通阿里云国际版只需要一个邮箱,不需要PayPal信用卡就可以帮你开通、充值、新购、续费阿里云国际版,服务...

sandybridge为你推荐
刘祚天还有DJ网么?porndao单词prondao的汉语是什么javmoo.comjavbus上不去.怎么办www.kknnn.com求有颜色的网站!要免费的www.ca800.comPLC好学吗66smsm.comffff66com手机可以观看视频吗?www.175qq.com最炫的qq分组www.884tt.com刚才找了个下电影的网站www.ttgame8.com,不过好多电影怎么都不能用QQ旋风或者是迅雷下在呢?175qq.comkf.qq.com.地址是什么铂金血痕手上出现这种血痕是什么情况。有谁知道能告诉下吗? 怎么治疗!
子域名查询 网站域名备案查询 工信部域名备案 西安电信测速 rak机房 商家促销 ibrs 商务主机 我爱水煮鱼 什么是刀片服务器 cn3 最漂亮的qq空间 空间购买 网站加速软件 监控服务器 论坛主机 酸酸乳 服务器防御 SmartAXMT800 windowssever2008 更多