GPUTechnologyConference,May14-17,2012McEneryConventionCenter,SanJose,Californiawww.
gputechconf.
comSessionsonComputationalPhysics(subjecttochange)IMPORTANT:Visithttp://www.
gputechconf.
com/page/sessions.
htmlforthemostup-to-dateschedule.
S0268-VirtualProcessEngineering-RealtimeSimulationofMultiphaseSystemsWeiGe(InstituteofProcessEngineering,ChineseAcademyofSciences)Day:Tuesday,05/15|Time:9:00am-9:50amTopicAreas:ComputationalFluidDynamics;MolecularDynamics;ComputationalPhysics;Algorithms&NumericalTechniquesSessionLevel:AdvancedRealtimesimulationandvirtualrealitywithquantitativelycorrectphysicsforindustrialprocesseswithmulti-scaleandmultiphasesystemisoncearemotedreamforprocessengineering,butisbecomingtruenowwithCPU-GPUhybridsupercomputing.
NumericalandvisualizationmethodsforsuchsimulationsonthousandsofGPUswillbereportedwithapplicationsinchemicalandenergyindustries.
S0258-Sailfish:LatticeBoltzmannFluidSimulationswithGPUsandPythonMichalJanuszewski(UniversityofSilesiainKatowice;GoogleSwitzerland)Day:Tuesday,05/15|Time:9:30am-9:55amTopicAreas:ComputationalFluidDynamics;ComputationalPhysics;DevelopmentTools&LibrariesSessionLevel:IntermediateLearnhowRun-TimeCodeGeneration(RTCG)techniquesallowedforfastdevelopmentofalatticeBoltzmann(LB)fluiddynamicssolvercalledSailfish.
Sailfishiscompletelyopensource,supportsawidevarietyofLBmodels(singleandmultiplerelaxationtimes,theentropicmodel;singleandbinaryfluids)andcantakeadvantageofmultipleGPUs.
EventhoughtheprojectiswrittenpredominantlyinPython,noperformancecompromisesaremade.
ThistalkwillintroducethebasicdesignprinciplesofSailfishandillustratehowRTCGallowstoexploitthepowerofGPUswithminimalprogrammereffort.
S0031-UnstructuredGridNumberingSchemesforGPUCoalescingRequirementsAndrewCorrigan(NavalResearchLaboratory),JohannDahm(UniversityofMichigan)Day:Tuesday,05/15|Time:10:00am-10:25amTopicAreas:ComputationalFluidDynamics;Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:AdvancedLearnhowtoachievehighperformanceforcomputationalfluiddynamics(CFD)solversoverunstructuredgridsusingnumberingschemestailoredforGPUcoalescingrequirements.
Usingthesetechniques,unstructuredgridCFDsolverscanmakemoreeffectiveuseofmemorybandwidth,whichisanotherwisesignificantperformancebottleneckthathassofarledtorelativelylimitedperformancegainsonGPUsincomparisontostructuredgridCFDsolvers.
PerformancebenchmarkswillbeshownusingtheJetEngineNoiseReduction(JENRE)code.
S0321-GPU-BasedMonteCarloRayTracingSimulationforSolarPowerPlantsClausNilsson(TietronixSoftware,Inc.
),MichelIzygon(TietronixSoftware,Inc.
)Day:Tuesday,05/15|Time:2:00pm-2:25pmTopicAreas:EnergyExploration;ComputationalPhysics;RayTracingSessionLevel:BeginnerLearnaboutrealtimesimulationsofConcentratingThermalSolarPowerusingGPUtechnologytoenableperformanceoptimizationoftheseutilityscaleplants.
ByleveragingthepowerofGPUsandtheparallelaspectofthefieldofthousandssun-trackingmirrors,wehavebeensuccessfulincuttingthecomputationtimebyordersofmagnitudeversusthepreviouslyrequiredminutesandhoursruntime.
WewillpresentanoverviewoftheproblemdomainanddescribehowweusedtheGPUtoderiveaMonteCarlophysicsraytracingmethodtosimulatethefluxreflectedbythemirrorsontothesolarreceiver.
S0046-ApplicationoftheGPUtoaTwo-PartComputationalElectromagneticAlgorithmEricDunn(SAIC)Day:Tuesday,05/15|Time:2:30pm-2:55pmTopicAreas:ComputationalPhysics;Algorithms&NumericalTechniques;RayTracingSessionLevel:BeginnerTheshootingandbouncingray(SBR)methodisonewaytosimulateelectromagneticfieldradiation.
Likeallmethods,therearecertainproblemswhereitdoesnotyieldaccurateresults.
Inthispresentation,wewillexplainonesuchcasethatconsistsofanantennaresonatingbetweentwometalplates.
Wewilldiscusshowweusedthegraphicsprocessingunit(GPU)toseparatetheproblemintotwoparts.
EachpartissimulatedindividuallywithSBRproducinganimprovedresult.
SuchaGPU-accelerated,two-partapproachcanbeappliedtoothermoregeneralhybridsimulations.
S0379-GPU-basedHigh-PerformanceSimulationsforSpintronicsJanJacob(UniversityofHamburg-InstituteofAppliedPhysicsandMicrostructureResearchCenter)Day:Tuesday,05/15|Time:2:30pm-2:55pmTopicAreas:GeneralInterest;ComputationalPhysics;ApplicationDesign&PortingTechniquesSessionLevel:IntermediateThejointutilizationoftheelectron'schargeandspinin"spintronics"representsapromisingtechnologyfordataprocessingandstorageinnanostructures.
Thecomplexquantumeffectslikethespin-Halleffectinthesedevicesrequiredemandingnumericalsimulationsprovidingaconvenientlinkbetweenidealizedanalyticalmodelstooftenverycomplexresultsfrommeasurements.
ThesimulationsinvolvingmultiplicationsandinversionsoflargematricesprovideanidealshowcaseforperformancegainbyemployingGPGPUsintheexecutionofthealgebraicroutinesonthesematricesincomputingenvironmentswithsharedexecutionofalgorithmsonmultiplenodeswithmultipleGPGPUsandCPUcores.
S0036-MultiparticleCollisionDynamicsonGPUsElmarWestphal(ForschungszentrumJuelich)Day:Tuesday,05/15|Time:3:00pm-3:50pmTopicAreas:ComputationalPhysics;ComputationalFluidDynamics;MolecularDynamicsSessionLevel:IntermediateSeehowweemployGPUstosimulatetheinteractionofmillionsofsolventandsoluteparticlesofafluidsystem.
Oftenthedomainoflargeclustersystem,themosttimeconsumingpartofoursimulationscannowbedoneondesktopPCsinreasonabletime.
ThiscontributionshowshowGPUscaneffectivelybeusedtoaccelerateexistingprogramsandhowtechniqueslikestreamingandincreaseddatalocalitysignificantlyenhancecalculationthroughput.
ItalsoshowshowaGPU-optimizedprogramstructureyieldsusuallyexpensiveadditionalfunctionality"almostfree".
Furthermore,awell-scalingsingle-node/multi-GPUimplementationoftheprogramispresented.
S0067-PIConGPU-Bringinglarge-scaleLaserPlasmaSimulationstoGPUSupercomputingMichaelBussmann(Helmholtz-ZentrumDresden-Rossendorf),GuidoJuckeland(CenterforInformationServicesandHighPerformanceComputing,TechnicalUniversityDresden)Day:Tuesday,05/15|Time:3:00pm-3:50pmTopicAreas:ComputationalPhysics;Algorithms&NumericalTechniques;ApplicationDesign&PortingTechniques;SupercomputingSessionLevel:AdvancedWithpowerfullasersbreakingthePetawattbarrier,applicationsforlaser-acceleratedparticlebeamsaregainingmoreinterestthanever.
Ionbeamsacceleratedbyintenselaserpulsesfosternewwaysoftreatingcancerandmakethemavailabletomorepeoplethaneverbefore.
Laser-generatedelectronbeamscandrivenewcompactx-raysourcestocreatesnapshotsofultrafastprocessesinmaterials.
WithPIConGPUlaser-drivenparticleaccelerationcanbecomputedinhourscomparedtoweeksonstandardCPUclusters.
WepresentthetechniquesbehindPIConGPU,detailedperformanceanalysisandthebenefitsofPIConGPUforreal-worldphysicscases.
S0221-1024BitParallelRationalArithmeticOperatorsfortheGPURobertZigon(BeckmanCoulter)Day:Tuesday,05/15|Time:4:00pm-4:50pmTopicAreas:Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateLearnhowtocreateasetofrationalarithmeticoperatorsthatmanipulate1024bitoperandsonaTeslaC2050.
TheseoperatorsareusedtocreateanumericallystableimplementationforBesselfunctions.
NaiveimplementationsoftheBesselfunctionsproduceunreliableresultswhentheyareusedtosolveMaxwell'sequationsbywayofMietheory.
Maxwell'sequationsareusedtomodelthescatteringoflightbysmallparticles.
LightscatterisusedinParticleCharacterizationtomeasurethequalityofmaterialslikecocoa,cementandpharmaceuticals.
S0245-PortingLegacyPlasmaCodestoGPUPengWang(NVIDIA)Day:Tuesday,05/15|Time:4:00pm-4:25pmTopicAreas:ComputationalPhysics;ComputationalPhysicsSessionLevel:IntermediateLearnhowtoportlegacyFortranplasmacodestoGPU.
ManylegacyplasmacodesarewritteninFortranandhavemanylinesofcodes.
WewilldiscusstechniquesinportingsuchlegacycodeseasilyandefficientlytoCUDAC/C++.
Performanceanalysisofmajoralgorithmicpatternsinplasmacodeswillbediscussed.
ThediscussionwillusetheGTCandGeFiplasmacodeasrealisticexamples.
S0058-AdvancingGPUMolecularDynamics:RigidBodiesinHOOMD-blueJoshuaAnderson(UniversityofMichigan),TrungDacNguyen(UniversityofMichigan)Day:Wednesday,05/16|Time:10:00am-10:50amTopicAreas:MolecularDynamics;ComputationalPhysicsSessionLevel:IntermediateLearnhowrigidbodydynamicsareimplementedinHOOMD-blue.
Previousreleaseswerecapableofexecutingclassicalmoleculardynamics--wherefreeparticlesinteractviasmoothpotentialsandtheirmotionthroughtimeiscomputedusingNewton'slaws.
Thelatestversionallowsparticlestobegroupedintobodiesthatmoveasrigidunits.
Userscannowsimulatematerialsmadeofcubes,rods,bentrods,jacks,plates,patchyparticles,buckyballs,oranyotherarbitraryshapes.
ThistalkcovershowthesealgorithmsareimplementedontheGPU,tunedtoperformwellforbodiesofanysize,anddiscussesseveraluse-casesrelevanttoresearch.
S0125-MemoryEfficientReverseTimeMigrationin3DChrisLeader(StanfordExplorationProject)Day:Wednesday,05/16|Time:10:00am-10:25amTopicAreas:EnergyExploration;ComputationalPhysicsSessionLevel:IntermediateLearnhowwecanimagetheinterioroftheEarthinthreedimensionsusingReverseTimeMigration.
WediscusshowGPUsacceleratethismethodusingparallelwavepropagationkernels,texturememoriesandminimaldevicetohosttransfers.
Furtherwediscusshowtheprogressionto3Dpresentsamultitudeofnewproblems,particularlymemorybased-causingthesystemtobeIOlimited.
Bymanipulatingboundarypositionsandvaluestoapseudo-randomformweshowhowmanyofthesememoryrestrictionscanbediminishedandhowdetailedsubsurfaceimagescanbefullyconstructedusingGPUs.
S0236-AdvancedOptimizationTechniquesOnaCUDAImplementationofConjugateGradientSolversEriRubin(OptiTex)Day:Wednesday,05/16|Time:10:00am-10:25amTopicAreas:Algorithms&NumericalTechniques;Algorithms&NumericalTechniques;ComputationalPhysics;ApplicationDesign&PortingTechniquesSessionLevel:IntermediateLinearsystemsareattheheartofallotofcomputeproblems.
Inlargesparsesystems,thereare2distinctapproaches,thedirectanditerativesolvers.
Aftermanyyearsofresearchingandtestingbothapproaches,onCPUandGPUwehaveimplementedahighlyefficientCGsolverontheGPUusingacombinationofuniquetechniques.
Inthistalkwewillgooverthesetechniquesandtheimprovedperformancetheybring.
S0312-GPUImplementationforRapidIterativeImageReconstructioninNuclearMedicineJakubPietrzak(UniversityofWarsaw)Day:Wednesday,05/16|Time:10:00am-10:25amTopicAreas:MedicalImaging&Visualization;ComputationalPhysics;ComputerGraphicsSessionLevel:IntermediateGPUimplementationcangreatlyaccelerateiterativetechniquesof3Dimagereconstructioninnuclearmedicineimaging.
SinglePhotonEmissionComputedTomography(SPECT)isafunctionalimagingmodalitywidelyusedinclinicaldiagnosis.
Toobtainhighqualityimageswithinreducedscanningtimeshighsensitivitycollimatorsneedtobeusedandtheirresponsefunctionmodeledinthereconstruction.
ThisisingeneralverycomputationallyintensiveandunfeasiblewithCPUandalgorithmimplementations.
Oursoftwareisabletoperformthereconstructionofpatientdatawithinclinicallyacceptabletimesusingrelativelylowcostandwidelyavailablehardware.
S0352-GPU-AcceleratedParallelComputingforSimulationofSeismicWavePropagationTaroOkamoto(DepartmentofEarthandPlanetarySciences,TokyoInstituteofTechnology)Day:Wednesday,05/16|Time:10:30am-10:55amTopicAreas:ComputationalPhysics;GeneralInterestSessionLevel:AdvancedWeadoptedGPUtoacceleratelarge-scale,parallelfinite-difference(FDTD)simulationofseismicwavepropagation.
EffectiveparallelimplementationisneededbecausethesizeofthememoryofasingleGPUistoosmallforrealapplications.
Thuswedescribethememoryoptimization,thethree-dimensionaldomaindecomposition,andoverlappingthecommunicationandcomputationadoptedinourprogram.
Weachievedsofarahighperformance(single-precision)ofabout61TFlopsbyusing1200GPUsofTSUBAME-2.
0,theGPUsupercomputerinTokyoInstituteofTechnology,Japan.
Asanimportantapplication,weshowtheresultsofthesimulationofthe2011Tohoku-Okimega-quake.
S0269-Accelerating3D-RISMCalculationsusingGPUsYutakaMaruyama(InstituteforMolecularScience),FumioHirata(InstituteforMolecularScience)Day:Wednesday,05/16|Time:3:00pm-3:25pmTopicAreas:LifeSciences;Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateThethree-dimensionalreferenceinteractionsitemodel(3D-RISM)theory,isapowerfultooltoinvestigatebiomolecularprocessesinsolution.
Unfortunately,3D-RISMcalculationsareoftenbothmemoryintensiveandtime-consuming.
WesoughttoacceleratethesecalculationsusingGPUs.
ToworkaroundtheproblemoflimitedmemorysizeinGPUs,wemodifiedthelessmemory-intensiveAndersonmethodforfasterconvergenceof3D-RISMcalculations.
UsingthismethodonC2070,wereducedthecomputationaltimebyafactorofeightcomparedtoIntelXeon(8cores,3.
33GHz)withtheconventionalmethod.
S0055-ParticleDynamicswithMBDandFEAusingCUDAGrahamSanborn(FunctionBay)Day:Wednesday,05/16|Time:4:00pm-4:25pmTopicAreas:ComputationalStructuralMechanics;ComputationalPhysics;ComputationalFluidDynamicsSessionLevel:IntermediateManysphereparticlesaresolvedwithDEM(DiscreteElementMethod)andsimulatedwithGPUtechnology.
Fastalgorithmisappliedtocalculatehertziancontactforcesbetweenmanysphereparticles(from100,000to1,000,000)andNVIDIA'sCUDAisusedtoacceleratethecalculation.
ManysphereparticlesandMBDandFEAentitiesaresimulatedwithincommercialsoftwareRecurDyn.
Manymodelsarebuiltandsimulated;forklifterwithsandmodel,oilinoiltankmodel,oilfilledenginesystemandwaterfilledwashingmachinemodel.
AllmodelsaresimulatedwithNVIDIA'sGPUandtheresultisshown.
S0363-EfficientMolecularDynamicsonHeterogeneousGPUArchitecturesinGROMACSSzilárdPáll(KTHRoyalInstituteofTechnology),BerkHess(KTHRoyalInstituteofTechnology)Day:Wednesday,05/16|Time:4:00pm-4:25pmTopicAreas:MolecularDynamics;ComputationalPhysics;LifeSciencesSessionLevel:IntermediateMolecularDynamicsisanimportantapplicationforGPUacceleration,butmanyalgorithmicoptimizationsandfeaturesstillrelyoncodethatpreferstraditionalCPUs.
ItisonlywiththelatesthardwareandsoftwarewehavebeenabletorealizeaheterogeneousGPU/CPUimplementationandreachperformancesignificantlybeyondthestate-of-the-artofhand-tunedCPUcodeinourGROMACSprogram.
Thesub-milliseconditerationtimeposeschallengesonalllevelsofparallelization.
Comeandlearnaboutournewatom-clusterpairinteractionapproachfornon-bondedforceevaluationthatachieves60%work-efficiencyandotherinnovativesolutionsforheterogeneousGPUsystems.
S0139-GPU-BasedMolecularDynamicsSimulationsofProteinandRNAAssemblySamuelCho(WakeForestUniversity)Day:Wednesday,05/16|Time:5:00pm-5:25pmTopicAreas:MolecularDynamics;ComputationalPhysicsSessionLevel:IntermediateProteinandRNAbiomolecularfoldingandassemblyproblemshaveimportantapplicationsbecausemisfoldingisassociatedwithdiseaseslikeAlzheimer'sandParkinson's.
However,simulatingcomplexbiomoleculesonthesametimescalesasexperimentsisanextraordinarychallengeduetoabottleneckintheforcecalculations.
Toovercomethesehurdles,weperformcoarse-grainedmoleculardynamicssimulationswherebiomoleculesarereducedintosimplercomponents.
Furthermore,ourGPU-basedsimulationshaveasignificantperformanceimprovementoverCPU-basedsimulations,whichislimitedtosystemsof50-150residues/nucleotides.
TheGPU-basedcodecansimulateprotein/RNAsystemsof400-10,000+residues/nucleotides,andwepresentribosomeassemblysimulations.
S0129-AMonteCarloThermalRadiationSolverinGPU/CPUHybridArchitectureGaofengWang(LaboratoireE.
M2.
C,EcoleCentraleParis),OliverGicquel(LaboratoireE.
M2.
C,EcoleCentraleParis)Day:Thursday,05/17|Time:9:00am-9:25amTopicAreas:ComputationalFluidDynamics;ComputationalFluidDynamics;ComputationalPhysics;RayTracingSessionLevel:IntermediateAMonteCarloray-tracingcodeisdevelopedtopredictradiativeheattransferbehaviorsinCFDsimulationofcombustionphenomena.
Usingemission-reciprocalmethod,eachrandomraycastingofeachnodecouldbeindependentlyconductedforparallelcomputations.
ThecodeisefficientlyimplementedinhybridGPU/CPUHPCresourcesusingadedicateddynamicloadbalancingstrategy.
AlinearspeedupscalingofhybridHPCresourceshasbeenshownindemonstratingcalculationofradiativeheattransferofahelicopterengine'scombustionchamber,whileaddingoneGPUinHPCresourcespoolisinsenseofnineCPUcoressupplements.
S0508-FasterFiniteElementsforWavePropagationCodesMaxRietmann(InstituteforComputationalScience/USILugano,Switzerland)Day:Thursday,05/17|Time:10:00am-10:25amTopicAreas:Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateLearnhowtodevelopfasterandbetterfinite-elementcodesforwavepropagationusingGPUsandMPIcombinedwithoverlappingtechniquestohidethecostofcommunicationsandofhost/devicememorycopies.
Differentoptionsbasedonmeshcoloringoronatomicoperationswillbepresented.
Thedifficultytodefinespeedupwillalsobediscussed(speedupversuswhatusingwhatdefinitionof"cost").
ExampleswillbegivenusingSPECFEM3D,ahighlyoptimizedspectralfinite-elementcodethathaswontheGordonBellSupercomputingawardandtheBULLJosephFourieraward,andthatcanrunonCPUorGPUclusters.
S0039-Data-DrivenGPGPUIdeologyExtensionAlexandrKosenkov(UniversityofGeneva),BelaBauer(MicrosoftResearch)Day:Thursday,05/17|Time:10:00am-10:25amTopicAreas:ApplicationDesign&PortingTechniques;ComputationalPhysics;ParallelProgrammingLanguages&Compilers;DevelopmentTools&LibrariesSessionLevel:AdvancedInthissessionwewilldemonstratehowtheGPGPUideologycanbeextendedsothatitcanbeusedonascaleofInfinibandhybridsystem.
Theapproachthatwearepresentingcombinesdelayedexecution,schedulingtechniquesand,mostimportantly,castsdowntheCPUmulti-coreideologytothestreamingmultiprocessor'soneenforcingfullfledged"GPGPUasaco-processor"wayofprogrammingforlarge-scaleMPIhybridapplications.
StayingcompatiblewithmodernCPU/GPGPUlibrariesitprovidesmorethanafinegrainedcontroloverresources-morethanyouwantedthatis.
S0217-EfficientImplementationofCFDAlgorithmsonGPUAcceleratedSupercomputersAliKhajeh-Saeed(UniversityofMassachusetts,Amherst),BlairPerot(UniversityofMassachusetts,Amherst)Day:Thursday,05/17|Time:10:30am-10:55amTopicAreas:ComputationalFluidDynamics;ComputationalPhysics;Supercomputing;ApplicationDesign&PortingTechniquesSessionLevel:IntermediateThegoalofthissessionistointroducetheconceptsnecessarytoperformlargecomputationalfluiddynamic(CFD)problemsoncollectionsofmanyGPUs.
CommunicationandcomputationoverlappingschemesbecomeevenmorecriticalwhenusingfastcomputeenginessuchasGPUsthatareconnectedviaarelativelyslowinterconnect(suchasMPIonInfiniBand).
ThealgorithmspresentedarevalidatedonunsteadyCFDsimulationsofturbulenceusing192graphicsprocessorstoupdatehalf-a-billionunknownspercomputationaltimestep.
TheperformanceresultsfromthreedifferentGPUacceleratedsupercomputers(Lincoln,Forge,andKeeneland)arecomparedwithalargeCPUbasedsupercomputer(Ranger).
S0378-VASPAcceleratedwithGPUsMaxwellHutchinson(UniversityofChicago)Day:Thursday,05/17|Time:2:00pm-2:50pmTopicAreas:QuantumChemistry;ApplicationDesign&PortingTechniques;ComputationalPhysicsSessionLevel:IntermediateThissessionwilldetailtheperformanceandcapabilitiesofGPU-acceleratedVASP,explaindesigndecisionsmadeinportingVASPtoCUDA,andpresentaroadmapforGPUacceleratedVASPdevelopment.
We'veachievedperformanceimprovementsuptoaround20xonsystemsofaround100ionsandhaveimplementedexact-exchange.
Weareworkingonportsofmoreconventionalfunctionality.
S0071-TheHigh-LevelLinearAlgebraLibraryViennaCLAndItsApplicationsKarlRupp(TUWien)Day:Thursday,05/17|Time:3:00pm-3:50pmTopicAreas:DevelopmentTools&Libraries;Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateGettoknowViennaCL,anOpenCLhigh-levellinearalgebrasoftware,whichallowstogetthespeedofGPUcomputingattheconvenienceleveloftheC++Boostlibraries.
Decreasethedevelopmentandexecutiontimeofapplicationsbyutilizingourwell-testedandwidelyusedlibrary,insteadofspendingdaysonlearningdetailsofGPUarchitecturesanddebugging.
Weprovideexamplesthatdemonstratenotonlyhowquicklyexistingapplicationsareportedefficientlyfromsingle-threadedexecutiontofullyutilizingmulti-threadedenvironments,butalsohowtoutilizetherichsetoffunctionalitiesrangingfromcommonBLASroutinestoiterativesolvers.
S0087-GPUAccelerationofDenseStellarClustersSimulationBharathPattabiraman(NorthwesternUniversity),StefanUmbreit(NorthwesternUniversity)Day:Thursday,05/17|Time:3:00pm-3:25pmTopicAreas:Astronomy&Astrophysics;ComputationalPhysics;Algorithms&NumericalTechniquesSessionLevel:IntermediateComputingtheinteractionsbetweenstarswithindensestellarclustersisaproblemoffundamentalimportanceintheoreticalastrophysics.
ThispaperpresentstheparallelizationofaMonteCarloalgorithmforsimulatingstellarclusterevolutionusingprogrammableGraphicsProcessingUnits.
Thekernelsofthisalgorithmexhibithighlevelsofdatadependentdecisionmakingandunavoidablenon-contiguousmemoryaccesses.
However,weadoptvariousparallelizationstrategiesandutilizethehighcomputingpoweroftheGPUtoobtainsubstantialnear-linearspeedupswhichcannotbeeasilyachievedonaCPU-basedsystem.
Thisaccelerationallowstoexplorephysicalregimeswhichwereoutofreachofcurrentsimulations.
S0368-UnravelingtheMysteriesofQuarkswithHundredsofGPUsRonaldBabich(NVIDIA)Day:Thursday,05/17|Time:3:00pm-3:50pmTopicAreas:ComputationalPhysics;ApplicationDesign&PortingTechniques;Algorithms&NumericalTechniques;SupercomputingSessionLevel:IntermediateDiveintotheworldofquarksandgluons,andhearhowGPUcomputingisrevolutionizingthewaymanycalculationsinlatticequantumchromodynamics(latticeQCD)areperformed.
Themaincomputationalchallengeinsuchcalculationsistorepeatedlysolvelargesystemsoflinearequationsarisingfromafour-dimensionalfinite-differenceproblem.
Inthissession,we'lldiscussstrategiesforparallelizingsuchasolveracrosshundredsofGPUs.
Theseincludetechniquesandalgorithmsforreducingmemorytrafficandinter-GPUcommunication.
Thenetresultisanimplementationthatachievesbetterthan20Tflopson256GPUs,realizedintheopen-source"QUDA"library.
S0091-SustainableHybridParallelizationofanUnstructuredHydrodynamicCodeRaphalPoncet(Commissariatàl'EnergieAtomiqueetauxEnergiesAlternatives)Day:Thursday,05/17|Time:3:00pm-3:25pmTopicAreas:ApplicationDesign&PortingTechniques;Algorithms&NumericalTechniques;ComputationalFluidDynamics;ComputationalPhysicsSessionLevel:AdvancedThegoalofthispresentationistoshareourmethodologyforportinganumericalcodetohybridsupercomputingarchitecturesusingMPIcoupledwithdirective-basedlanguages(OpenMPformulticoreCPUs,andHMPPforGPUs).
Ourcode,VOLNA,isanunstructuredpartialdifferentialequationhydrodynamicsolverdevelopedforthesimulationoftsunamis.
Ourresultsdemonstratethatusingdirective-basedlanguagessuchasHMPPforGPUprogramming,onecanretaingoodperformance(e.
g.
speedupof15comparedto1CPUcore,3comparedto8CPUcores)withminimalmodificationsoftheoriginalCPUsourcecode(about30linesofdirectivesinourcase).
S0334-TheFastMultipoleMethodonCPUandGPUProcessorsEricDarve(Stanford)Day:Thursday,05/17|Time:3:00pm-3:25pmTopicAreas:ComputationalPhysics;MolecularDynamics;Algorithms&NumericalTechniquesSessionLevel:AdvancedThefastmultipolemethod(FMM)isawidelyusednumericalalgorithmincomputationalengineering.
AcceleratingtheFMMonCUDA-enabledGPUsischallengingbecausetheFMMhasacomplicateddataaccesspattern,mostlyduringtheso-calledmultipole-to-local(M2L)operation.
WehavecreatedseveralschemestooptimizetheM2Landhaveattainedaperformanceofover350(resp.
160)Gflop/sforsingle(double)precisionarithmetic.
TheoptimalalgorithmwasincorporatedintoacompleteFMMcode,whichcanacceptanysmoothkernelasspecifiedbytheuser,makingitveryflexible.
WehavealsodevelopedahighlyefficientCPUversion.
S0282-LeveragingNVIDIAGPUDirectonAPEnet+3DTorusClusterInterconnectDavideRossetti(ItalianNationalInstitueforNuclearPhysics)Day:Thursday,05/17|Time:4:30pm-4:55pmTopicAreas:Supercomputing;ComputationalPhysicsSessionLevel:IntermediateAPEnet+isanovelclusterinterconnect,basedonacustomPCIcardwhichfeaturesaPCIExpressGen2X8linkandare-configurableHWcomponent(FPGA).
Itsupportsa3DTorustopologyandhasspecialaccelerationfeaturesspecificallydevelopedforNVIDIAFermiGPUs.
AnintroductiontothebasicfeaturesandtheprogrammingmodelofAPEnet+willbefollowedbyadescriptionofitsperformanceonsomenumericalsimulations,e.
g.
HighEnergyPhysicssimulations.
S0218-ASIParallelFortran:AGeneral-PurposeFortrantoGPUTranslatorRainaldLohner(GeorgeMasonUniversity)Day:Thursday,05/17|Time:4:30pm-4:55pmTopicAreas:DevelopmentTools&Libraries;ComputationalFluidDynamics;ComputationalPhysics;ParallelProgrammingLanguages&CompilersSessionLevel:AdvancedOverthelast3yearswehavedevelopedageneral-purposeFortrantoGPUtranslator:ASIParallelFortrandoes.
Thetalkwilldetailitspurpose,designlayoutandcapabilities,andshowhowitisusedandimplemented.
TheuseofASIParallelFortranwillbeshownforlarge-scaleCFD/CEMcodesaswellasothergeneralpurposeFortrancodes.
Mineserver(ASN142586|UK CompanyNumber 1351696),已经成立一年半。主营香港日本机房的VPS、物理服务器业务。Telegram群组: @mineserver1 | Discord群组: https://discord.gg/MTB8ww9GEA7折循环优惠:JP30(JPCN2宣布产品可以使用)8折循环优惠:CMI20(仅1024M以上套餐可以使用)9折循...
spinservers是Majestic Hosting Solutions LLC旗下站点,主要提供国外服务器租用和Hybrid Dedicated等产品的商家,数据中心包括美国达拉斯和圣何塞机房,机器一般10Gbps端口带宽,高配置硬件,支持使用PayPal、信用卡、支付宝或者微信等付款方式。目前,商家针对部分服务器提供优惠码,优惠后达拉斯机房服务器最低每月89美元起,圣何塞机房服务器最低每月...
Dynadot 是一家非常靠谱的域名注册商家,老唐也从来不会掩饰对其的喜爱,目前我个人大部分域名都在 Dynadot,还有一小部分在 NameCheap 和腾讯云。本文分享一下 Dynadot 最新域名优惠码,包括 .COM,.NET 等主流后缀的优惠码,以及一些新顶级后缀的优惠。对于域名优惠,NameCheap 的新后缀促销比较多,而 Dynadot 则是对于主流后缀的促销比较多,所以可以各取所...
http错误403-禁止访问为你推荐
netlife熊猫烧香是怎么制作的地陷裂口山崩地裂的意思rawtoolsTF卡被写保护了怎么办?月神谭求男变女类的变身小说ip在线查询我要用eclipse做个ip在线查询功能,用QQwry数据库,可是我不知道怎么把这个数据库放到我的程序里面去,高手帮忙指点下,小弟在这谢谢了www.765.com哪里有免费的电影网站avtt4.comwww.51kao4.com为什么进不去啊?广告法广告法有什么字不能用www.147.qqq.com谁有147清晰的视频?学习学习hao.rising.cn如何解除瑞星主页锁定(hao.rising.cn). 不想用瑞星安全助手
域名批量查询 星星海 512au ubuntu更新源 web服务器架设软件 北京主机 网通ip 腾讯实名认证中心 申请网站 双十二促销 博客域名 数据湾 mteam google搜索打不开 脚本大全 zcloud cdn加速技术 apache启动失败 phpwind论坛 服务器机柜 更多