2.31opteron
opteron 时间:2021-03-27 阅读:(
)
IBMTechnicalComputing2011IBMIntroductionofaStabilizedBi-ConjugateGradientiterativesolverforHelmholtz'sEquationontheCMAGRAPESGlobalandRegionalmodels.
PengHongBo(IBM),ZaphirisChristidis(Lenovo)andZhiyanJin(CMA)2014IBMIBMTechnicalComputing2IBMECMWF16thHPCWorkshop,October2014OutlineIntroduction.
–Helmholtz'sEquation.
TheCMAGRAPESmodelsandtheGeneralizedConjugateResidualMethod(GCR).
–GCRimplementationonGRAPES-GLOBALandGRAPES-MESOmodels.
–GRAPESprofiles.
IntroductionofBiconjugateGradientStabilizedMethod(BiCGSTAB)onGRAPES.
–Properties,ImplementationandprofileinformationinbothGLOBALandMESOmodels.
–PerformanceofBiCGSTABonGRAPES-GLOBALandGRAPES-MESOmodels.
Accuracyverificationandstatistics.
–Verificationchallengesofthe10-dayforecastofGRAPES-GLOBAL.
–Accuracybehavioronintroducedcodechangesasafunctionofforecastdays.
Areaaveragederrorsandcorrelationcoefficientsofoptimizedvsbaseresults.
–Chaoticbehaviorintheverificationofresultsformorethan7forecastdays.
Conclusions2014IBMIBMTechnicalComputing3IBMECMWF16thHPCWorkshop,October2014HelmholtzorPressureEquation.
Hemholtz'sequationiscommonlyusedinNumericalWeatherPrediction(NWP)models.
+2=0,–istheLaplacianOperator,isa3Dpressurefunctionandisapositivefunction.
Usingfinitedifferences,theaboveequationisreducedtoasystemoflinearequationsas:=0,–AisanMNxNMblocktriadiagonalmatrix,foragridofMxNhorizontalpoints–Theapproximatesolutionofthelinearequationsis:,theresidualis:=.
–WhenapreconditionerLisused,thediscretizedHelmholtzequationisformulatedas:1=1.
–LargehorizontalgridsinNWPmodelscallforefficientiterativemethodsforsolutions.
2014IBMIBMTechnicalComputing4IBMECMWF16thHPCWorkshop,October2014HelmholtzEquationinGRAPESGRAPES(Global/RegionalAssimilationPredictionSystem).
–ItisaNumericalWeatherpredictionsystemdevelopedbyChinaMeteorologicalAdministration(CMA).
–ItincludesaGlobalandaRegionalweathermodelaswellasdataassimilationsystemsforthem.
DynamiccorefeaturesinGRAPES–Fullycompressibleequations.
–Height-basedterrain-followingcoordinates–Optionforhydrostaticandnon-hydrostaticschemes.
–Arakawa"C"staggeredlat-lonhorizontalgrid.
–Charney-Phillipsverticalschemeforprognosticvariables–PolarFilterandMassFixingscheme–2-time-levelSemiImplicitSemi-Lagrangiantime-stepping.
–GCR–solverforHelmholtzEquationGeneralizedConjugateResidual(GCR)algorithm.
UsesanIncompletesparseLowerandUppertriangular(ILU)matrixfactorizationasapre-conditioner.
B1,B2,…,B19representthecoefficientmatrixofHelmholtz'sequation,whichisdiscretizedintoalargesparsematrix2014IBMIBMTechnicalComputing5IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALProfile,GCRcalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren2.
31811.
41384/384.
__module_integrate_NMOD_integrate[4]48.
62.
31811.
41384.
solver_grapes0.
31236.
34384/384.
pbl_driver0.
00166.
60384/384.
*__module_gcr_NMOD_solve_helmholts_stub_in_solver_grapes0.
03151.
48384/384.
radiation_driver0.
0078.
31384/384.
microphysics_driver0.
0069.
13384/384.
*__module_semi_lag_NMOD_semi_lag_interp_stub_in_solver_grapes0.
0052.
03384/384.
*__module_semi_lag_NMOD_upstream_interp_jin_stub_in_solver_grapes0.
0019.
46384/384.
cumulus_driver0.
0012.
03384/384.
*__module_semi_lag_NMOD_semi_get_upstream_jin_stub_in_solver_grapesMincommunicationtime:MPItask649Maxcommunicationtime:MPItask9392014IBMIBMTechnicalComputing6IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOProfileGCRcalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren1.
68504.
701080/1080.
*__module_integrate_NMOD_solver_grapes_stub_in___module_integrate_NMOD_solve_interface[5]1.
68504.
701080.
__module_integrate_NMOD_solver_grapes[6]0.
00221.
071080/1080.
__module_gcr_NMOD_solve_helmholts[8]0.
0667.
781080/1080.
__module_semi_lag_NMOD_semi_lag_interp[9]0.
5132.
821079/1079.
__module_semi_lag_NMOD_upstream_interp_phy[18]33.
270.
001080/1080.
__module_prm_wangmh_NMOD_prm_y_xiao[19]30.
930.
001080/1080.
__module_prm_wangmh_NMOD_prm_x_xiao[21]0.
0028.
631080/1080.
microphysics_driver[22]23.
440.
001080/1080.
__module_prm_wangmh_NMOD_prm_z_xiao[27]Mincommunicationtime:MPItask0Maxcommunicationtime:MPItask10802014IBMIBMTechnicalComputing7IBMECMWF16thHPCWorkshop,October2014ConvergenceofBi-conjugateGradientStabilizedalgorithmConvergenceoftheBiCGSTABandGCRalgorithmsfor1and25stepsofGRAPES.
–BiCGSTAB(2)convergesinfeweriterationsthanCGR,butmorecomputationallyintensive.
–TheintroductionBiCGSTABimprovedoverallperformanceintheGRAPESmodels.
Usedaspre-cursortotheapplicationoftheGCRalgorithm(extrapre-conditioner),TheamountofiterationsrequiredfortheconvergenceoftheGCRdecreasedsignificantly,GRAPESexecutedmuchfaster(withthehelpofVSXprimitivesincoding),SameandevenbetteraccuracyastheoriginalGCRalgorithm.
2014IBMIBMTechnicalComputing8IBMECMWF16thHPCWorkshop,October2014UpdatedHelmholtzSolverimplementationGRAPES-GLOBAL#ifdefBCGSLep=max(1.
D-10,DBLE(grid%ep))CALLpsolve_bcgsl_main(grid,gcr,ep,a_helm,b_helm,pi,&idep,jdep,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#elseep=max(1.
D-8,DBLE(grid%ep))CALLpsolve_bicgstab_main(grid,gcr,ep,a_helm,b_helm,pi,&idep,jdep,ids,ide,jds,jde,kds,&kde,ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#endifep=grid%epd=1.
0d0CALLpsolve_gcr_main(grid,gcr,ep,a_helm,b_helm,&iter_max,pi,d,idep,jdep,ids,ide,&jds,jde,kds,kde,ims,ime,jms,jme,&kms,kme,its,ite,jts,jte,kts,kte)GRAPES-MESO#ifdefBCGSLep=1.
D-8CALLpsolve_bcgsl_main(grid,gcr,ep,a_helm,b_helm,&pi,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#elseep=1D-8CALLpsolve_bicgstab_main(grid,gcr,ep,a_helm,b_helm,&pi,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#endifep=1.
D-19CALLpsolve_gcr_main(grid,gcr,ep,a_helm,b_helm,&iter_max,pi,d,ids,ide,jds,jde,&kds,kde,ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)2014IBMIBMTechnicalComputing9IBMECMWF16thHPCWorkshop,October2014ConvergenceofBiCGSTABinGRAPES-GLOBALUn-optimizedCodeOptimizedCodebeginofgcr0.
328934647159688379E-03RESofgcr0.
951769473740471055E-09in54iterationsTimingforprocessingforstep1:105.
43999elapsedseconds.
beginofgcr0.
307738677760282797E-01RESofgcr0.
985465629245594409E-09in64iterationsTimingforprocessingforstep2:3.
56000elapsedseconds.
beginofgcr0.
466354355510276777E-01RESofgcr0.
987319218430061550E-09in55iterationsTimingforprocessingforstep3:3.
54000elapsedseconds.
beginofgcr0.
419494279764634215E-01RESofgcr0.
952816344175419192E-09in45iterationsTimingforprocessingforstep4:3.
39000elapsedseconds.
beginofgcr0.
298146267204818100E-01RESofgcr0.
955547301333094658E-09in49iterationsTimingforprocessingforstep5:3.
44000elapsedseconds.
beginofbcgsl0.
328934356968701958E-03RESofbcgsl0.
698006138227474393E-09in16iterationsbeginofgcr0.
102067544683602406E-08RESofgcr0.
969841675518509429E-09in1iterationsTimingforprocessingforstep1:108.
25000elapsedseconds.
beginofbcgsl0.
307101071999445543E-01RESofbcgsl0.
998788656259226276E-09in11iterationsbeginofgcr0.
131913191092197407E-08RESofgcr0.
889851041683508861E-09in2iterationsTimingforprocessingforstep2:2.
50000elapsedseconds.
beginofbcgsl0.
370215569337918604E-01RESofbcgsl0.
728471243819791556E-09in12iterationsbeginofgcr0.
104455860894560670E-08RESofgcr0.
948845550215151657E-09in1iterationsTimingforprocessingforstep3:2.
50000elapsedseconds.
beginofbcgsl0.
348878083179526982E-01RESofbcgsl0.
829610442476401725E-09in12iterationsbeginofgcr0.
114433762484590935E-08RESofgcr0.
635845995011923888E-09in2iterationsTimingforprocessingforstep4:2.
50000elapsedseconds.
beginofbcgsl0.
266947703233833440E-01RESofbcgsl0.
688385709819754403E-09in12iterationsbeginofgcr0.
100135435371643626E-08RESofgcr0.
875385663076386664E-09in1iterationsTimingforprocessingforstep5:2.
46000elapsedseconds.
2014IBMIBMTechnicalComputing10IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALProfileComparisoncalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren2.
09682.
41384/384.
__module_integrate_NMOD_integrate[4]52.
32.
09682.
41384.
solver_grapes0.
24214.
12384/384.
pbl_driver0.
04157.
94384/384.
radiation_driver0.
0083.
80384/384.
*__module_gcr_NMOD_solve_helmholts_stub_in_solver_grapes0.
0067.
05384/384.
*__module_semi_lag_NMOD_semi_lag_interp_stub_in_solver_grapes0.
0054.
97384/384.
microphysics_driver0.
0150.
33384/384.
*__module_semi_lag_NMOD_upstream_interp_jin_stub_in_solver_grapes0.
0017.
91384/384.
cumulus_driver0.
0011.
72384/384.
*__module_semi_lag_NMOD_semi_get_upstream_jin_stub_in_solver_grapesMincommunicationtime:MPItask6492014IBMIBMTechnicalComputing11IBMECMWF16thHPCWorkshop,October2014ConvergenceofBiCGSTABinGRAPES-MESOUn-optimizedCodeOptimizedCode0:beginofgcr0.
118096356906410122E-030:RESofgcr0.
785681906255938855E-19in49iterations0:Timingforprocessingforstep1:18.
15000elapsedseconds.
0:Timingforprocessingforstep1:14.
52999cpuseconds.
0:beginofgcr0.
180227130734546867E-030:RESofgcr0.
690132004197575959E-19in49iterations0:Timingforprocessingforstep2:0.
90000elapsedseconds.
0:Timingforprocessingforstep2:0.
75000cpuseconds.
0:beginofgcr0.
712260919191608395E-040:RESofgcr0.
966563876032326532E-19in48iterations0:Timingforprocessingforstep3:0.
68000elapsedseconds.
0:Timingforprocessingforstep3:0.
57000cpuseconds.
0:beginofgcr0.
337160794746152708E-040:RESofgcr0.
877018965782972674E-19in47iterations0:Timingforprocessingforstep4:0.
67000elapsedseconds.
0:Timingforprocessingforstep4:0.
57000cpuseconds.
0:beginofgcr0.
196107554793862609E-040:RESofgcr0.
635560985222081976E-19in47iterations0:Timingforprocessingforstep5:0.
71000elapsedseconds.
0:Timingforprocessingforstep5:0.
60000cpuseconds.
0:beginofbicgstab0.
118096453737757547E-030:RESofbicgstab0.
380226254620264712E-08in3iterations0:beginofgcr0.
394720884628083064E-080:RESofgcr0.
746418612263664838E-19in16iterations0:Timingforprocessingforstep1:18.
99000elapsedseconds.
0:Timingforprocessingforstep1:18.
69000cpuseconds.
0:beginofbicgstab0.
168370346746922749E-030:RESofbicgstab0.
166872655366664435E-08in3iterations0:beginofgcr0.
181367330318505421E-080:RESofgcr0.
465501345880251435E-19in16iterations0:Timingforprocessingforstep2:0.
67000elapsedseconds.
0:Timingforprocessingforstep2:0.
68000cpuseconds.
0:beginofbicgstab0.
696717378252718038E-040:RESofbicgstab0.
137254158106719979E-08in3iterations0:beginofgcr0.
151730006467615455E-080:RESofgcr0.
322109698287421177E-19in16iterations0:Timingforprocessingforstep3:0.
45000elapsedseconds.
0:Timingforprocessingforstep3:0.
44000cpuseconds.
0:beginofbicgstab0.
320771797557436878E-040:RESofbicgstab0.
950087839437367948E-09in3iterations0:beginofgcr0.
109450945243131875E-080:RESofgcr0.
881479429351996220E-19in15iterations0:Timingforprocessingforstep4:0.
50000elapsedseconds.
0:Timingforprocessingforstep4:0.
50000cpuseconds.
0:beginofbicgstab0.
193261775264966473E-040:RESofbicgstab0.
985010942067601368E-08in2iterations0:beginofgcr0.
996454289745865310E-080:RESofgcr0.
365415647279281880E-19in17iterations0:Timingforprocessingforstep5:0.
48000elapsedseconds.
0:Timingforprocessingforstep5:0.
49000cpuseconds.
2014IBMIBMTechnicalComputing12IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOProfileComparison2014IBMIBMTechnicalComputing13IBMECMWF16thHPCWorkshop,October2014OptimizationVerification.
Accuracyofthecomputations.
Howdoesonecheckaccuracyonthecomputationsonoptimizedcodes–GRAPESMESOaccuracyverificationwassetfora48-hoursforecast.
–GRAPESGLOBALaccuracyverificationwassetfora10-dayforecast.
Majorchangeswereintroducedintoboth,GRAPESGLOBALandMESOCodes.
–Helmholtz'sequationsolutionalgorithm,VectorMASSinMicrophysicsroutines.
Qualitativeandquantitativeverificationmethods.
–VisualinspectionoftheGRAPESGLOBALandMESOgeneratedresults.
–Applystatistics,anddefinelimitsforacceptableresults.
Proceedslowlywithcaution.
Correlationcoefficients(ρ)betweenbase(C)andoptimizedresults(I).
Areaaveragednormalizeddifferences(σ)betweenbase(C)andoptimizedresults(I).
500mbGeopotentialHeight(Φ)fieldsandSurfacePrecipitationaregoodcandidates.
KMArangeforσ0.
98allmodels.
2014IBMIBMTechnicalComputing14IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOVerificationBase:42-hourforecastOptimized:42-hourforecast500mbGeopotentialHeightσandρarewithinacceptablerangeSurfacePrecipitationσandρarewithinacceptablerange2014IBMIBMTechnicalComputing15IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALVerificationGlobalModelsfor10-dayforecastsareimpossibletoverify–http://www.
washingtonpost.
com/blogs/capital-weather-gang/wp/2013/06/25/new-weather-service-supercomputer-faces-chaos/–GFS7-dayforecastdifferencesbetweenPOWER6andIntelsystemsatNCEP.
–Evenasmallchangeincompilerversion,nodecount,systemarchitecture,algorithmicchange,orbitlossesbyusinglessaccuraterepresentations(vectormass)cancauseaglobalweathermodeltodivertfrombaseresultsbeyond7forecastdays.
–Globalweathermodelverificationbeyond7daysforρ>0.
98,ishopeless.
–GRAPES-GLOBALverificationwasexaminedfrom1-10daysofforecast.
2014IBMIBMTechnicalComputing16IBMECMWF16thHPCWorkshop,October201410-DayGRAPES-GLOBALverification.
CorrelationcoefficientsandAreaAveragedDifferencesareusedtocompareruns.
–192-coreunmodifiedcoderunswereusedasbaseforcomparisons.
–10-dayforecastsofthe500mbGeopotentialHeightsfor2048-coresunmodified.
–10-dayforecastsofthe500mbGeopotentialHeightsfor4096-coresmodified.
–Microphysics(WSM6),BiCGSTAB,andacombinationofbothweretested.
–VSXintrinsiccallswereintroducedandtestedinBiCGSTABroutine.
–VectorMASSinWSM6drivesforecastinaslightlydifferentdirection.
2014IBMIBMTechnicalComputing17IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBAL:10-DAYGeopotentialHeightsForecast.
10-day500mbGeopotentialHeightsForecast.
–2048-coreunmodifiedcode,4096-coreoptimizedcode(WSM6,BiCGSTAB_SIMD)UnoptimizedRun:2048Cores500mbGeopotentialHeights.
OptimizedRun:4096Cores500mbGeopotentialHeights.
2014IBMIBMTechnicalComputing18IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBAL:10-DAYSurfacePrecipitationForecast.
10-daySurfacePrecipitationForecast.
–2048-coreunmodifiedcode,4096-coreoptimizedcode(WSM6,BiCGSTAB_SIMD)UnoptimizedRun:2048CoresSurfacePrecipitation.
OptimizedRun:4096CoresSurfacePrecipitation.
2014IBMIBMTechnicalComputing19IBMECMWF16thHPCWorkshop,October2014SummaryandConclusions.
TheGRAPES-GLOBALandGRAPES-MESOmodelswereoptimizedforperformance–BothmodelsusedtheGeneralizedConjugateResidual(GCR)IterativeSolver.
GCR:veryefficientcode,moderateconvergencerates.
–TheBi-conjugateGradientStabilized(BiCGSTAB)iterativesolverwasintroduced.
BiCGSTAB:lessefficientcode,butfastconvergencerates.
–Stand-aloneBiCGSTABsolverdidnotimproveperformance.
WhenBiCGSTABwasusedaheadofGCR,significantimprovementswererealized.
Increasedaccuracy,asseenfromconvergenceresiduals.
Lesstotaliterationstoachieveconvergence,betteroverallperformance.
–VectorMASSintrinsicfunctionswereappliedinthemicrophysicsroutines.
AccuracyverificationwasachallengeforGRAPES-GLOBALforupto10-days.
–GRAPES-MESOverifiedsuccessfullyfor7days,unlikeWSM6.
–VSXprimitives(singleprecision)inBiCGSTABwasnotcriticalinbothperformanceandaccuracy.
昨天有分享到"2021年Vultr新用户福利注册账户赠送50美元"文章,居然还有网友曾经没有注册过他家的账户,薅过他们家的羊毛。通过一阵折腾居然能注册到账户,但是对于如何开通云服务器稍微有点不对劲,对于新人来说确实有点疑惑。因为Vultr采用的是预付费充值方式,会在每月的一号扣费,当然我们账户需要存留余额或者我们采用自动扣费支付模式。把笔记中以前的文章推送给网友查看,他居然告诉我界面不同,看的不对...
Hostio是一家成立于2006年的国外主机商,提供基于KVM架构的VPS主机,AMD EPYC CPU,NVMe硬盘,1-10Gbps带宽,最低月付5欧元起。商家采用自己的网络AS208258,宿主机采用2 x AMD Epyc 7452 32C/64T 2.3Ghz CPU,16*32GB内存,4个Samsung PM983 NVMe SSD,提供IPv4+IPv6。下面列出几款主机配置信息。...
野草云服务器怎么样?野草云是一家成立了9年的国人主机商家,隶属于香港 LucidaCloud Limited (HongKong Registration No. 2736053 / 香港網上查冊中心)。目前,野草云主要销售香港、美国的VPS、虚拟主机及独立服务器等产品,本站也给大家分享过多次他家的优惠了,目前商家开启了优惠活动,香港/美国洛杉矶CN2+BGP云服务器,1核1G仅38元/月起!点击...
opteron为你推荐
云计算什么是云计算?蓝色骨头手机都是人类的骨头灰歌名是什么mathplayer比较word,TeX,MathML中的数学公式处理方式的异同点,尽量详细哦,分数不是问题,谢谢哈,会加分的。www.haole012.comhttp://fj.qq.com/news/wm/wm012.htm 这个链接的视频的 第3分20秒开始的 背景音乐 是什么?8090lu.com8090lu.com怎么样了?工程有进展吗?www.119mm.comwww.993mm+com精品集!ip查询器怎么样查看自己电脑上的IP地址www.03ggg.comwww.tvb33.com这里好像有中国性戏观看吧??斗城网女追男有多易?喜欢你,可我不知道你喜不喜欢我!!平安夜希望有他陪我过lcoc.top服装英语中double topstitches什么意思
域名估价 qq域名邮箱 ddos rackspace 国外服务器网站 godaddy域名转出 ubuntu更新源 hnyd 最好的免费空间 qq对话框 空间合租 免费ftp 华为k3 杭州电信宽带优惠 114dns 国外代理服务器 双11促销 ping值 alertpay 饭桶 更多