Dresdenhttp

http错误403-禁止访问  时间:2021-04-09  阅读:()
GPUTechnologyConference,May14-17,2012McEneryConventionCenter,SanJose,Californiawww.
gputechconf.
comSessionsonComputationalPhysics(subjecttochange)IMPORTANT:Visithttp://www.
gputechconf.
com/page/sessions.
htmlforthemostup-to-dateschedule.
S0268-VirtualProcessEngineering-RealtimeSimulationofMultiphaseSystemsWeiGe(InstituteofProcessEngineering,ChineseAcademyofSciences)Day:Tuesday,05/15|Time:9:00am-9:50amTopicAreas:ComputationalFluidDynamics;MolecularDynamics;ComputationalPhysics;Algorithms&NumericalTechniquesSessionLevel:AdvancedRealtimesimulationandvirtualrealitywithquantitativelycorrectphysicsforindustrialprocesseswithmulti-scaleandmultiphasesystemisoncearemotedreamforprocessengineering,butisbecomingtruenowwithCPU-GPUhybridsupercomputing.
NumericalandvisualizationmethodsforsuchsimulationsonthousandsofGPUswillbereportedwithapplicationsinchemicalandenergyindustries.
S0258-Sailfish:LatticeBoltzmannFluidSimulationswithGPUsandPythonMichalJanuszewski(UniversityofSilesiainKatowice;GoogleSwitzerland)Day:Tuesday,05/15|Time:9:30am-9:55amTopicAreas:ComputationalFluidDynamics;ComputationalPhysics;DevelopmentTools&LibrariesSessionLevel:IntermediateLearnhowRun-TimeCodeGeneration(RTCG)techniquesallowedforfastdevelopmentofalatticeBoltzmann(LB)fluiddynamicssolvercalledSailfish.
Sailfishiscompletelyopensource,supportsawidevarietyofLBmodels(singleandmultiplerelaxationtimes,theentropicmodel;singleandbinaryfluids)andcantakeadvantageofmultipleGPUs.
EventhoughtheprojectiswrittenpredominantlyinPython,noperformancecompromisesaremade.
ThistalkwillintroducethebasicdesignprinciplesofSailfishandillustratehowRTCGallowstoexploitthepowerofGPUswithminimalprogrammereffort.
S0031-UnstructuredGridNumberingSchemesforGPUCoalescingRequirementsAndrewCorrigan(NavalResearchLaboratory),JohannDahm(UniversityofMichigan)Day:Tuesday,05/15|Time:10:00am-10:25amTopicAreas:ComputationalFluidDynamics;Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:AdvancedLearnhowtoachievehighperformanceforcomputationalfluiddynamics(CFD)solversoverunstructuredgridsusingnumberingschemestailoredforGPUcoalescingrequirements.
Usingthesetechniques,unstructuredgridCFDsolverscanmakemoreeffectiveuseofmemorybandwidth,whichisanotherwisesignificantperformancebottleneckthathassofarledtorelativelylimitedperformancegainsonGPUsincomparisontostructuredgridCFDsolvers.
PerformancebenchmarkswillbeshownusingtheJetEngineNoiseReduction(JENRE)code.
S0321-GPU-BasedMonteCarloRayTracingSimulationforSolarPowerPlantsClausNilsson(TietronixSoftware,Inc.
),MichelIzygon(TietronixSoftware,Inc.
)Day:Tuesday,05/15|Time:2:00pm-2:25pmTopicAreas:EnergyExploration;ComputationalPhysics;RayTracingSessionLevel:BeginnerLearnaboutrealtimesimulationsofConcentratingThermalSolarPowerusingGPUtechnologytoenableperformanceoptimizationoftheseutilityscaleplants.
ByleveragingthepowerofGPUsandtheparallelaspectofthefieldofthousandssun-trackingmirrors,wehavebeensuccessfulincuttingthecomputationtimebyordersofmagnitudeversusthepreviouslyrequiredminutesandhoursruntime.
WewillpresentanoverviewoftheproblemdomainanddescribehowweusedtheGPUtoderiveaMonteCarlophysicsraytracingmethodtosimulatethefluxreflectedbythemirrorsontothesolarreceiver.
S0046-ApplicationoftheGPUtoaTwo-PartComputationalElectromagneticAlgorithmEricDunn(SAIC)Day:Tuesday,05/15|Time:2:30pm-2:55pmTopicAreas:ComputationalPhysics;Algorithms&NumericalTechniques;RayTracingSessionLevel:BeginnerTheshootingandbouncingray(SBR)methodisonewaytosimulateelectromagneticfieldradiation.
Likeallmethods,therearecertainproblemswhereitdoesnotyieldaccurateresults.
Inthispresentation,wewillexplainonesuchcasethatconsistsofanantennaresonatingbetweentwometalplates.
Wewilldiscusshowweusedthegraphicsprocessingunit(GPU)toseparatetheproblemintotwoparts.
EachpartissimulatedindividuallywithSBRproducinganimprovedresult.
SuchaGPU-accelerated,two-partapproachcanbeappliedtoothermoregeneralhybridsimulations.
S0379-GPU-basedHigh-PerformanceSimulationsforSpintronicsJanJacob(UniversityofHamburg-InstituteofAppliedPhysicsandMicrostructureResearchCenter)Day:Tuesday,05/15|Time:2:30pm-2:55pmTopicAreas:GeneralInterest;ComputationalPhysics;ApplicationDesign&PortingTechniquesSessionLevel:IntermediateThejointutilizationoftheelectron'schargeandspinin"spintronics"representsapromisingtechnologyfordataprocessingandstorageinnanostructures.
Thecomplexquantumeffectslikethespin-Halleffectinthesedevicesrequiredemandingnumericalsimulationsprovidingaconvenientlinkbetweenidealizedanalyticalmodelstooftenverycomplexresultsfrommeasurements.
ThesimulationsinvolvingmultiplicationsandinversionsoflargematricesprovideanidealshowcaseforperformancegainbyemployingGPGPUsintheexecutionofthealgebraicroutinesonthesematricesincomputingenvironmentswithsharedexecutionofalgorithmsonmultiplenodeswithmultipleGPGPUsandCPUcores.
S0036-MultiparticleCollisionDynamicsonGPUsElmarWestphal(ForschungszentrumJuelich)Day:Tuesday,05/15|Time:3:00pm-3:50pmTopicAreas:ComputationalPhysics;ComputationalFluidDynamics;MolecularDynamicsSessionLevel:IntermediateSeehowweemployGPUstosimulatetheinteractionofmillionsofsolventandsoluteparticlesofafluidsystem.
Oftenthedomainoflargeclustersystem,themosttimeconsumingpartofoursimulationscannowbedoneondesktopPCsinreasonabletime.
ThiscontributionshowshowGPUscaneffectivelybeusedtoaccelerateexistingprogramsandhowtechniqueslikestreamingandincreaseddatalocalitysignificantlyenhancecalculationthroughput.
ItalsoshowshowaGPU-optimizedprogramstructureyieldsusuallyexpensiveadditionalfunctionality"almostfree".
Furthermore,awell-scalingsingle-node/multi-GPUimplementationoftheprogramispresented.
S0067-PIConGPU-Bringinglarge-scaleLaserPlasmaSimulationstoGPUSupercomputingMichaelBussmann(Helmholtz-ZentrumDresden-Rossendorf),GuidoJuckeland(CenterforInformationServicesandHighPerformanceComputing,TechnicalUniversityDresden)Day:Tuesday,05/15|Time:3:00pm-3:50pmTopicAreas:ComputationalPhysics;Algorithms&NumericalTechniques;ApplicationDesign&PortingTechniques;SupercomputingSessionLevel:AdvancedWithpowerfullasersbreakingthePetawattbarrier,applicationsforlaser-acceleratedparticlebeamsaregainingmoreinterestthanever.
Ionbeamsacceleratedbyintenselaserpulsesfosternewwaysoftreatingcancerandmakethemavailabletomorepeoplethaneverbefore.
Laser-generatedelectronbeamscandrivenewcompactx-raysourcestocreatesnapshotsofultrafastprocessesinmaterials.
WithPIConGPUlaser-drivenparticleaccelerationcanbecomputedinhourscomparedtoweeksonstandardCPUclusters.
WepresentthetechniquesbehindPIConGPU,detailedperformanceanalysisandthebenefitsofPIConGPUforreal-worldphysicscases.
S0221-1024BitParallelRationalArithmeticOperatorsfortheGPURobertZigon(BeckmanCoulter)Day:Tuesday,05/15|Time:4:00pm-4:50pmTopicAreas:Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateLearnhowtocreateasetofrationalarithmeticoperatorsthatmanipulate1024bitoperandsonaTeslaC2050.
TheseoperatorsareusedtocreateanumericallystableimplementationforBesselfunctions.
NaiveimplementationsoftheBesselfunctionsproduceunreliableresultswhentheyareusedtosolveMaxwell'sequationsbywayofMietheory.
Maxwell'sequationsareusedtomodelthescatteringoflightbysmallparticles.
LightscatterisusedinParticleCharacterizationtomeasurethequalityofmaterialslikecocoa,cementandpharmaceuticals.
S0245-PortingLegacyPlasmaCodestoGPUPengWang(NVIDIA)Day:Tuesday,05/15|Time:4:00pm-4:25pmTopicAreas:ComputationalPhysics;ComputationalPhysicsSessionLevel:IntermediateLearnhowtoportlegacyFortranplasmacodestoGPU.
ManylegacyplasmacodesarewritteninFortranandhavemanylinesofcodes.
WewilldiscusstechniquesinportingsuchlegacycodeseasilyandefficientlytoCUDAC/C++.
Performanceanalysisofmajoralgorithmicpatternsinplasmacodeswillbediscussed.
ThediscussionwillusetheGTCandGeFiplasmacodeasrealisticexamples.
S0058-AdvancingGPUMolecularDynamics:RigidBodiesinHOOMD-blueJoshuaAnderson(UniversityofMichigan),TrungDacNguyen(UniversityofMichigan)Day:Wednesday,05/16|Time:10:00am-10:50amTopicAreas:MolecularDynamics;ComputationalPhysicsSessionLevel:IntermediateLearnhowrigidbodydynamicsareimplementedinHOOMD-blue.
Previousreleaseswerecapableofexecutingclassicalmoleculardynamics--wherefreeparticlesinteractviasmoothpotentialsandtheirmotionthroughtimeiscomputedusingNewton'slaws.
Thelatestversionallowsparticlestobegroupedintobodiesthatmoveasrigidunits.
Userscannowsimulatematerialsmadeofcubes,rods,bentrods,jacks,plates,patchyparticles,buckyballs,oranyotherarbitraryshapes.
ThistalkcovershowthesealgorithmsareimplementedontheGPU,tunedtoperformwellforbodiesofanysize,anddiscussesseveraluse-casesrelevanttoresearch.
S0125-MemoryEfficientReverseTimeMigrationin3DChrisLeader(StanfordExplorationProject)Day:Wednesday,05/16|Time:10:00am-10:25amTopicAreas:EnergyExploration;ComputationalPhysicsSessionLevel:IntermediateLearnhowwecanimagetheinterioroftheEarthinthreedimensionsusingReverseTimeMigration.
WediscusshowGPUsacceleratethismethodusingparallelwavepropagationkernels,texturememoriesandminimaldevicetohosttransfers.
Furtherwediscusshowtheprogressionto3Dpresentsamultitudeofnewproblems,particularlymemorybased-causingthesystemtobeIOlimited.
Bymanipulatingboundarypositionsandvaluestoapseudo-randomformweshowhowmanyofthesememoryrestrictionscanbediminishedandhowdetailedsubsurfaceimagescanbefullyconstructedusingGPUs.
S0236-AdvancedOptimizationTechniquesOnaCUDAImplementationofConjugateGradientSolversEriRubin(OptiTex)Day:Wednesday,05/16|Time:10:00am-10:25amTopicAreas:Algorithms&NumericalTechniques;Algorithms&NumericalTechniques;ComputationalPhysics;ApplicationDesign&PortingTechniquesSessionLevel:IntermediateLinearsystemsareattheheartofallotofcomputeproblems.
Inlargesparsesystems,thereare2distinctapproaches,thedirectanditerativesolvers.
Aftermanyyearsofresearchingandtestingbothapproaches,onCPUandGPUwehaveimplementedahighlyefficientCGsolverontheGPUusingacombinationofuniquetechniques.
Inthistalkwewillgooverthesetechniquesandtheimprovedperformancetheybring.
S0312-GPUImplementationforRapidIterativeImageReconstructioninNuclearMedicineJakubPietrzak(UniversityofWarsaw)Day:Wednesday,05/16|Time:10:00am-10:25amTopicAreas:MedicalImaging&Visualization;ComputationalPhysics;ComputerGraphicsSessionLevel:IntermediateGPUimplementationcangreatlyaccelerateiterativetechniquesof3Dimagereconstructioninnuclearmedicineimaging.
SinglePhotonEmissionComputedTomography(SPECT)isafunctionalimagingmodalitywidelyusedinclinicaldiagnosis.
Toobtainhighqualityimageswithinreducedscanningtimeshighsensitivitycollimatorsneedtobeusedandtheirresponsefunctionmodeledinthereconstruction.
ThisisingeneralverycomputationallyintensiveandunfeasiblewithCPUandalgorithmimplementations.
Oursoftwareisabletoperformthereconstructionofpatientdatawithinclinicallyacceptabletimesusingrelativelylowcostandwidelyavailablehardware.
S0352-GPU-AcceleratedParallelComputingforSimulationofSeismicWavePropagationTaroOkamoto(DepartmentofEarthandPlanetarySciences,TokyoInstituteofTechnology)Day:Wednesday,05/16|Time:10:30am-10:55amTopicAreas:ComputationalPhysics;GeneralInterestSessionLevel:AdvancedWeadoptedGPUtoacceleratelarge-scale,parallelfinite-difference(FDTD)simulationofseismicwavepropagation.
EffectiveparallelimplementationisneededbecausethesizeofthememoryofasingleGPUistoosmallforrealapplications.
Thuswedescribethememoryoptimization,thethree-dimensionaldomaindecomposition,andoverlappingthecommunicationandcomputationadoptedinourprogram.
Weachievedsofarahighperformance(single-precision)ofabout61TFlopsbyusing1200GPUsofTSUBAME-2.
0,theGPUsupercomputerinTokyoInstituteofTechnology,Japan.
Asanimportantapplication,weshowtheresultsofthesimulationofthe2011Tohoku-Okimega-quake.
S0269-Accelerating3D-RISMCalculationsusingGPUsYutakaMaruyama(InstituteforMolecularScience),FumioHirata(InstituteforMolecularScience)Day:Wednesday,05/16|Time:3:00pm-3:25pmTopicAreas:LifeSciences;Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateThethree-dimensionalreferenceinteractionsitemodel(3D-RISM)theory,isapowerfultooltoinvestigatebiomolecularprocessesinsolution.
Unfortunately,3D-RISMcalculationsareoftenbothmemoryintensiveandtime-consuming.
WesoughttoacceleratethesecalculationsusingGPUs.
ToworkaroundtheproblemoflimitedmemorysizeinGPUs,wemodifiedthelessmemory-intensiveAndersonmethodforfasterconvergenceof3D-RISMcalculations.
UsingthismethodonC2070,wereducedthecomputationaltimebyafactorofeightcomparedtoIntelXeon(8cores,3.
33GHz)withtheconventionalmethod.
S0055-ParticleDynamicswithMBDandFEAusingCUDAGrahamSanborn(FunctionBay)Day:Wednesday,05/16|Time:4:00pm-4:25pmTopicAreas:ComputationalStructuralMechanics;ComputationalPhysics;ComputationalFluidDynamicsSessionLevel:IntermediateManysphereparticlesaresolvedwithDEM(DiscreteElementMethod)andsimulatedwithGPUtechnology.
Fastalgorithmisappliedtocalculatehertziancontactforcesbetweenmanysphereparticles(from100,000to1,000,000)andNVIDIA'sCUDAisusedtoacceleratethecalculation.
ManysphereparticlesandMBDandFEAentitiesaresimulatedwithincommercialsoftwareRecurDyn.
Manymodelsarebuiltandsimulated;forklifterwithsandmodel,oilinoiltankmodel,oilfilledenginesystemandwaterfilledwashingmachinemodel.
AllmodelsaresimulatedwithNVIDIA'sGPUandtheresultisshown.
S0363-EfficientMolecularDynamicsonHeterogeneousGPUArchitecturesinGROMACSSzilárdPáll(KTHRoyalInstituteofTechnology),BerkHess(KTHRoyalInstituteofTechnology)Day:Wednesday,05/16|Time:4:00pm-4:25pmTopicAreas:MolecularDynamics;ComputationalPhysics;LifeSciencesSessionLevel:IntermediateMolecularDynamicsisanimportantapplicationforGPUacceleration,butmanyalgorithmicoptimizationsandfeaturesstillrelyoncodethatpreferstraditionalCPUs.
ItisonlywiththelatesthardwareandsoftwarewehavebeenabletorealizeaheterogeneousGPU/CPUimplementationandreachperformancesignificantlybeyondthestate-of-the-artofhand-tunedCPUcodeinourGROMACSprogram.
Thesub-milliseconditerationtimeposeschallengesonalllevelsofparallelization.
Comeandlearnaboutournewatom-clusterpairinteractionapproachfornon-bondedforceevaluationthatachieves60%work-efficiencyandotherinnovativesolutionsforheterogeneousGPUsystems.
S0139-GPU-BasedMolecularDynamicsSimulationsofProteinandRNAAssemblySamuelCho(WakeForestUniversity)Day:Wednesday,05/16|Time:5:00pm-5:25pmTopicAreas:MolecularDynamics;ComputationalPhysicsSessionLevel:IntermediateProteinandRNAbiomolecularfoldingandassemblyproblemshaveimportantapplicationsbecausemisfoldingisassociatedwithdiseaseslikeAlzheimer'sandParkinson's.
However,simulatingcomplexbiomoleculesonthesametimescalesasexperimentsisanextraordinarychallengeduetoabottleneckintheforcecalculations.
Toovercomethesehurdles,weperformcoarse-grainedmoleculardynamicssimulationswherebiomoleculesarereducedintosimplercomponents.
Furthermore,ourGPU-basedsimulationshaveasignificantperformanceimprovementoverCPU-basedsimulations,whichislimitedtosystemsof50-150residues/nucleotides.
TheGPU-basedcodecansimulateprotein/RNAsystemsof400-10,000+residues/nucleotides,andwepresentribosomeassemblysimulations.
S0129-AMonteCarloThermalRadiationSolverinGPU/CPUHybridArchitectureGaofengWang(LaboratoireE.
M2.
C,EcoleCentraleParis),OliverGicquel(LaboratoireE.
M2.
C,EcoleCentraleParis)Day:Thursday,05/17|Time:9:00am-9:25amTopicAreas:ComputationalFluidDynamics;ComputationalFluidDynamics;ComputationalPhysics;RayTracingSessionLevel:IntermediateAMonteCarloray-tracingcodeisdevelopedtopredictradiativeheattransferbehaviorsinCFDsimulationofcombustionphenomena.
Usingemission-reciprocalmethod,eachrandomraycastingofeachnodecouldbeindependentlyconductedforparallelcomputations.
ThecodeisefficientlyimplementedinhybridGPU/CPUHPCresourcesusingadedicateddynamicloadbalancingstrategy.
AlinearspeedupscalingofhybridHPCresourceshasbeenshownindemonstratingcalculationofradiativeheattransferofahelicopterengine'scombustionchamber,whileaddingoneGPUinHPCresourcespoolisinsenseofnineCPUcoressupplements.
S0508-FasterFiniteElementsforWavePropagationCodesMaxRietmann(InstituteforComputationalScience/USILugano,Switzerland)Day:Thursday,05/17|Time:10:00am-10:25amTopicAreas:Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateLearnhowtodevelopfasterandbetterfinite-elementcodesforwavepropagationusingGPUsandMPIcombinedwithoverlappingtechniquestohidethecostofcommunicationsandofhost/devicememorycopies.
Differentoptionsbasedonmeshcoloringoronatomicoperationswillbepresented.
Thedifficultytodefinespeedupwillalsobediscussed(speedupversuswhatusingwhatdefinitionof"cost").
ExampleswillbegivenusingSPECFEM3D,ahighlyoptimizedspectralfinite-elementcodethathaswontheGordonBellSupercomputingawardandtheBULLJosephFourieraward,andthatcanrunonCPUorGPUclusters.
S0039-Data-DrivenGPGPUIdeologyExtensionAlexandrKosenkov(UniversityofGeneva),BelaBauer(MicrosoftResearch)Day:Thursday,05/17|Time:10:00am-10:25amTopicAreas:ApplicationDesign&PortingTechniques;ComputationalPhysics;ParallelProgrammingLanguages&Compilers;DevelopmentTools&LibrariesSessionLevel:AdvancedInthissessionwewilldemonstratehowtheGPGPUideologycanbeextendedsothatitcanbeusedonascaleofInfinibandhybridsystem.
Theapproachthatwearepresentingcombinesdelayedexecution,schedulingtechniquesand,mostimportantly,castsdowntheCPUmulti-coreideologytothestreamingmultiprocessor'soneenforcingfullfledged"GPGPUasaco-processor"wayofprogrammingforlarge-scaleMPIhybridapplications.
StayingcompatiblewithmodernCPU/GPGPUlibrariesitprovidesmorethanafinegrainedcontroloverresources-morethanyouwantedthatis.
S0217-EfficientImplementationofCFDAlgorithmsonGPUAcceleratedSupercomputersAliKhajeh-Saeed(UniversityofMassachusetts,Amherst),BlairPerot(UniversityofMassachusetts,Amherst)Day:Thursday,05/17|Time:10:30am-10:55amTopicAreas:ComputationalFluidDynamics;ComputationalPhysics;Supercomputing;ApplicationDesign&PortingTechniquesSessionLevel:IntermediateThegoalofthissessionistointroducetheconceptsnecessarytoperformlargecomputationalfluiddynamic(CFD)problemsoncollectionsofmanyGPUs.
CommunicationandcomputationoverlappingschemesbecomeevenmorecriticalwhenusingfastcomputeenginessuchasGPUsthatareconnectedviaarelativelyslowinterconnect(suchasMPIonInfiniBand).
ThealgorithmspresentedarevalidatedonunsteadyCFDsimulationsofturbulenceusing192graphicsprocessorstoupdatehalf-a-billionunknownspercomputationaltimestep.
TheperformanceresultsfromthreedifferentGPUacceleratedsupercomputers(Lincoln,Forge,andKeeneland)arecomparedwithalargeCPUbasedsupercomputer(Ranger).
S0378-VASPAcceleratedwithGPUsMaxwellHutchinson(UniversityofChicago)Day:Thursday,05/17|Time:2:00pm-2:50pmTopicAreas:QuantumChemistry;ApplicationDesign&PortingTechniques;ComputationalPhysicsSessionLevel:IntermediateThissessionwilldetailtheperformanceandcapabilitiesofGPU-acceleratedVASP,explaindesigndecisionsmadeinportingVASPtoCUDA,andpresentaroadmapforGPUacceleratedVASPdevelopment.
We'veachievedperformanceimprovementsuptoaround20xonsystemsofaround100ionsandhaveimplementedexact-exchange.
Weareworkingonportsofmoreconventionalfunctionality.
S0071-TheHigh-LevelLinearAlgebraLibraryViennaCLAndItsApplicationsKarlRupp(TUWien)Day:Thursday,05/17|Time:3:00pm-3:50pmTopicAreas:DevelopmentTools&Libraries;Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateGettoknowViennaCL,anOpenCLhigh-levellinearalgebrasoftware,whichallowstogetthespeedofGPUcomputingattheconvenienceleveloftheC++Boostlibraries.
Decreasethedevelopmentandexecutiontimeofapplicationsbyutilizingourwell-testedandwidelyusedlibrary,insteadofspendingdaysonlearningdetailsofGPUarchitecturesanddebugging.
Weprovideexamplesthatdemonstratenotonlyhowquicklyexistingapplicationsareportedefficientlyfromsingle-threadedexecutiontofullyutilizingmulti-threadedenvironments,butalsohowtoutilizetherichsetoffunctionalitiesrangingfromcommonBLASroutinestoiterativesolvers.
S0087-GPUAccelerationofDenseStellarClustersSimulationBharathPattabiraman(NorthwesternUniversity),StefanUmbreit(NorthwesternUniversity)Day:Thursday,05/17|Time:3:00pm-3:25pmTopicAreas:Astronomy&Astrophysics;ComputationalPhysics;Algorithms&NumericalTechniquesSessionLevel:IntermediateComputingtheinteractionsbetweenstarswithindensestellarclustersisaproblemoffundamentalimportanceintheoreticalastrophysics.
ThispaperpresentstheparallelizationofaMonteCarloalgorithmforsimulatingstellarclusterevolutionusingprogrammableGraphicsProcessingUnits.
Thekernelsofthisalgorithmexhibithighlevelsofdatadependentdecisionmakingandunavoidablenon-contiguousmemoryaccesses.
However,weadoptvariousparallelizationstrategiesandutilizethehighcomputingpoweroftheGPUtoobtainsubstantialnear-linearspeedupswhichcannotbeeasilyachievedonaCPU-basedsystem.
Thisaccelerationallowstoexplorephysicalregimeswhichwereoutofreachofcurrentsimulations.
S0368-UnravelingtheMysteriesofQuarkswithHundredsofGPUsRonaldBabich(NVIDIA)Day:Thursday,05/17|Time:3:00pm-3:50pmTopicAreas:ComputationalPhysics;ApplicationDesign&PortingTechniques;Algorithms&NumericalTechniques;SupercomputingSessionLevel:IntermediateDiveintotheworldofquarksandgluons,andhearhowGPUcomputingisrevolutionizingthewaymanycalculationsinlatticequantumchromodynamics(latticeQCD)areperformed.
Themaincomputationalchallengeinsuchcalculationsistorepeatedlysolvelargesystemsoflinearequationsarisingfromafour-dimensionalfinite-differenceproblem.
Inthissession,we'lldiscussstrategiesforparallelizingsuchasolveracrosshundredsofGPUs.
Theseincludetechniquesandalgorithmsforreducingmemorytrafficandinter-GPUcommunication.
Thenetresultisanimplementationthatachievesbetterthan20Tflopson256GPUs,realizedintheopen-source"QUDA"library.
S0091-SustainableHybridParallelizationofanUnstructuredHydrodynamicCodeRaphalPoncet(Commissariatàl'EnergieAtomiqueetauxEnergiesAlternatives)Day:Thursday,05/17|Time:3:00pm-3:25pmTopicAreas:ApplicationDesign&PortingTechniques;Algorithms&NumericalTechniques;ComputationalFluidDynamics;ComputationalPhysicsSessionLevel:AdvancedThegoalofthispresentationistoshareourmethodologyforportinganumericalcodetohybridsupercomputingarchitecturesusingMPIcoupledwithdirective-basedlanguages(OpenMPformulticoreCPUs,andHMPPforGPUs).
Ourcode,VOLNA,isanunstructuredpartialdifferentialequationhydrodynamicsolverdevelopedforthesimulationoftsunamis.
Ourresultsdemonstratethatusingdirective-basedlanguagessuchasHMPPforGPUprogramming,onecanretaingoodperformance(e.
g.
speedupof15comparedto1CPUcore,3comparedto8CPUcores)withminimalmodificationsoftheoriginalCPUsourcecode(about30linesofdirectivesinourcase).
S0334-TheFastMultipoleMethodonCPUandGPUProcessorsEricDarve(Stanford)Day:Thursday,05/17|Time:3:00pm-3:25pmTopicAreas:ComputationalPhysics;MolecularDynamics;Algorithms&NumericalTechniquesSessionLevel:AdvancedThefastmultipolemethod(FMM)isawidelyusednumericalalgorithmincomputationalengineering.
AcceleratingtheFMMonCUDA-enabledGPUsischallengingbecausetheFMMhasacomplicateddataaccesspattern,mostlyduringtheso-calledmultipole-to-local(M2L)operation.
WehavecreatedseveralschemestooptimizetheM2Landhaveattainedaperformanceofover350(resp.
160)Gflop/sforsingle(double)precisionarithmetic.
TheoptimalalgorithmwasincorporatedintoacompleteFMMcode,whichcanacceptanysmoothkernelasspecifiedbytheuser,makingitveryflexible.
WehavealsodevelopedahighlyefficientCPUversion.
S0282-LeveragingNVIDIAGPUDirectonAPEnet+3DTorusClusterInterconnectDavideRossetti(ItalianNationalInstitueforNuclearPhysics)Day:Thursday,05/17|Time:4:30pm-4:55pmTopicAreas:Supercomputing;ComputationalPhysicsSessionLevel:IntermediateAPEnet+isanovelclusterinterconnect,basedonacustomPCIcardwhichfeaturesaPCIExpressGen2X8linkandare-configurableHWcomponent(FPGA).
Itsupportsa3DTorustopologyandhasspecialaccelerationfeaturesspecificallydevelopedforNVIDIAFermiGPUs.
AnintroductiontothebasicfeaturesandtheprogrammingmodelofAPEnet+willbefollowedbyadescriptionofitsperformanceonsomenumericalsimulations,e.
g.
HighEnergyPhysicssimulations.
S0218-ASIParallelFortran:AGeneral-PurposeFortrantoGPUTranslatorRainaldLohner(GeorgeMasonUniversity)Day:Thursday,05/17|Time:4:30pm-4:55pmTopicAreas:DevelopmentTools&Libraries;ComputationalFluidDynamics;ComputationalPhysics;ParallelProgrammingLanguages&CompilersSessionLevel:AdvancedOverthelast3yearswehavedevelopedageneral-purposeFortrantoGPUtranslator:ASIParallelFortrandoes.
Thetalkwilldetailitspurpose,designlayoutandcapabilities,andshowhowitisusedandimplemented.
TheuseofASIParallelFortranwillbeshownforlarge-scaleCFD/CEMcodesaswellasothergeneralpurposeFortrancodes.

HostYun 新增美国三网CN2 GIA VPS主机 采用美国原生IP低至月15元

在之前几个月中也有陆续提到两次HostYun主机商,这个商家前身是我们可能有些网友熟悉的主机分享团队的,后来改名称的。目前这个品牌主营低价便宜VPS主机,这次有可以看到推出廉价版本的美国CN2 GIA VPS主机,月费地址15元,适合有需要入门级且需要便宜的用户。第一、廉价版美国CN2 GIA VPS主机方案我们可看到这个类型的VPS目前三网都走CN2 GIA网络,而且是原生IP。根据信息可能后续...

RackNerd美国大硬盘服务器促销:120G SSD+192TB HDD,1Gbps大带宽,月付$599,促销美国月付$服务器促销带宽

racknerd怎么样?racknerd最近发布了一些便宜美国服务器促销,包括大硬盘服务器,提供120G SSD+192TB HDD,有AMD和Intel两个选择,默认32G内存,1Gbps带宽,每个月100TB流量,5个IP地址,月付$599。价格非常便宜,需要存储服务器的朋友可以关注一下。RackNerd主要经营美国圣何塞、洛杉矶、达拉斯、芝加哥、亚特兰大、新泽西机房基于KVM虚拟化的VPS、...

2022年腾讯云新春采购季代金券提前领 领取满减优惠券和域名优惠

2022年春节假期陆续结束,根据惯例在春节之后各大云服务商会继续开始一年的促销活动。今年二月中旬会开启新春采购季的活动,我们已经看到腾讯云商家在春节期间已经有预告活动。当时已经看到有抢先优惠促销活动,目前我们企业和个人可以领取腾讯云代金券满减活动,以及企业用户可以领取域名优惠低至.COM域名1元。 直达链接 - 腾讯云新春采购活动抢先看活动时间:2022年1月20日至2022年2月15日我们可以在...

http错误403-禁止访问为你推荐
固态硬盘是什么固态硬盘是什么?公司网络被攻击网站总是被攻击,该怎么处理啊?梦之队官网梦之队是哪个国家的?蒋存祺蒋存祺的主要事迹m.kan84.net那里有免费的电影看?partnersonline国外外贸平台有哪些?广告法请问违反了广告法,罚款的标准是什么baqizi.cc孔融弑母是真的吗?javlibrary.comImage Library Sell Photos Digital Photos Photo Sharing Photo Restoration Digital Photos Photo Albums机器蜘蛛求一个美国的科幻电影名!里面有大型的机械蜘蛛。
香港虚拟主机 长沙服务器租用 域名备案网站 备案域名出售 阿云浏览器 国外永久服务器 息壤备案 pw域名 360抢票助手 java空间 京东商城0元抢购 新天域互联 七夕快乐英文 ntfs格式分区 域名接入 购买国外空间 smtp服务器地址 秒杀品 贵阳电信 1美元 更多