E5sandybridge

sandybridge  时间:2021-03-27  阅读:()
2014LENOVO.
ALLRIGHTSRESERVED.
2OutlineParallelSystemdescription.
p775,p460anddx360M4,HardwareandSoftwareCompileroptionsandlibrariesused.
WRFtunableparametersforscalingruns.
nproc_x,nproc_y,numtiles,nio_groups,nio_tasks_per_groupWRFI/OSerialandParallelnetcdflibrariesfordataI/O.
WRFruntimeparametersforscalingruns.
SMT,MPItaskgrids,OpenMPthreads,quiltingtasks.
WRFperformancecomponentsforscalingruns.
Computation,Communication,I/O,loadImbalance.
WRFscalingrunresultsonp775,p460anddx360M4.
Conclusions.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
3IBMPOWER7p775system(P7-IH)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
2012IBMCorporation/464002///--/-/-((ManagementandNetworkRack7316TF3KEYBDISP2MMDU7316TF3KEYBDISP1BLANKBLANKBLANKPDUPDBLANKBLANKAirDuct4273J48Ex481GE,210GEBlank4273J48Ex481GE,210GEBlank7042-CR6HMC17042-CR6HMC2EMS-2p7508W32GB)EMS-1p7508W32GB)HDD(6HDD)HDD(6HDD)7042-CR6HMC3.
.
.
.
.
.
-+464002P+-//U-+464002P+-//U-+464002P+-//UComputeFrameARack001ComputeFrameBRack002ComputeFrameCRack003SimilartotheoldECMWFsystem,WaterCooledRack:3Supernodes;Supernode:4drawers;Drawer:32POWER7chips;Chip:8POWER7cores.
32Cores/node@3.
86GHz,256GB/node,8GB/core,1024cores/drawer,3072cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:4MB/core,DualmemorychannelDisklessnode,TorrentPOWER7chip,GiGEadaptorsonservicenodesHFIfibrenetwork,4xstorage(GPFS)nodes,3xDiskEnclosures.
AIX7.
1,XLF14,VAC12,GPFS,IBMParallelEnvironment,LoadLevelerMPIprofileLibrary,FloatingPointMonitor,VectorandScalarMassLibraries4IBMPOWER7p460Pureflexsystem(Firebird)CMAsystempartition,AirCooled,RearDoorHeatExchangers.
Rack:4chassis;Chassis:7nodes;Node:4POWER7chips;Chip:8POWER7cores32Cores/node@3.
55GHz,128GB/node,4GB/core,224cores/chassis,896cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:4MB/core,singlememorychannel2HDpernode,2QDRDualportInfinibandadaptors,GiGEDualRailFattreeInfinibandnetwork,10xp740storage(GPFS)nodes,8xDSC3700devices.
AIX7.
1,XLF14,VAC12,GPFS,IBMParallelEnvironment,LoadLevelerMPIprofileLibrary,VectorandScalarMassLibraries2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
ComputeRack14Chassis,28nodesComputeRack24Chassis,28nodesComputeRack34Chassis,28nodesComputeRack44Chassis,28nodesComputeRack54Chassis,28nodesStorageRack110xp470nodesStorageRack28xDSC3700+QDRInfinibandRack14Switches/managementComputeRack64Chassis,28nodesComputeRack74Chassis,28nodesComputeRack84Chassis,28nodesComputeRack94Chassis,28nodesComputeRack104Chassis,28nodes5IBMiDataplexdx360M4system(Sandybridge)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
NCEPWCOSSsystempartition,AirCooled,RearDoorHeatExchangersRack:72Nodes;Node:2Inteldx360m4sockets;Socket:8IntelE5-2670Sandybridgecores.
16Cores/node@2.
6GHz,32GB/node,2GB/core,1152cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:20MBsharedper8cores1HDpernode,MellanoxconnectX-FDR,10GiGEportsMellanoxIB4XFDRfullbisectionFatTreenetwork,10x3650M4storagenodes,20xDSC3700RHEL6.
2,IntelFORTRAN13,C11,GPFS,IBMParallelEnvironment,PlatformLSFMPIprofileLibraryComputeRack172nodesComputeRack272nodesComputeRack372nodesComputeRack472nodesComputeRack572nodesComputeRack672nodesComputeRack772nodesComputeRack872nodesInfinibandRack1FDR14I/ORack210nodesI/ORack212DSC37006TestedSystemsSummary2014LENOVO.
ALLRIGHTSRESERVED.
16THECMWFHPCWORKSHOPCorefrequncy(GHz)Memory(GB/core)CorestestedVectorSMT/HTsettingsICFabricOSStorageCompilersParallelEnvrnmntQueueingsystemsLibrariesp7753.
868(Alldimmsoccupied)6144VSXonlyforintrinsicsSMT4(SMT2to2048cores)HFIAIX7.
1GPFS4NSD3DiskEnclsrIBMXLCompilers(AIXv14)IBMPEforAIXLoadLevelerMPIProfile,MASS,HardwarePerfrmnceMonitorp4603.
554(Alldimmsoccupied)8192VSXOnlyforIntriscicsSMT4(SMT2to512cores)QDRDual-RailInfinibandFatTreeAIX7.
1GPFS10NSD8DSC3700IBMXLCompilers(V14)IBMPEforAIXLoadLevelerMPIProfile,MASSdx360M42.
62(Alldimmsoccupied)6144AVXEverywhereHT(Notused)FDRSingleRailInfinibandFatTreeRHEL6.
2GPFS10NSD20DSC3700IntelStudio13CompilersIBMPEforlinuxLSFMPIProfile7WRFv3.
3CompilerOptions2014LENOVO.
ALLRIGHTSRESERVED.
16THECMWFHPCWORKSHOPIdenticalWRFcodewasusedCOMPILEROptionsLibrariesp775FCOPTIM=-O3–qhot–qarch=pwr7–qtune=pwr7FCBASEOPTS=-qsmp=omp-qcache=auto–qfloat=rsqrtnetcdf,pnetcdf,massp7_simd,massvp7,mpihpmp460FCOPTIM=-O3–qhot–qarch=pwr7–qtune=pwr7FCBASEOPTS=-qsmp=omp-qcache=auto–qfloat=rsqrtnetcdf,pnetcdf,massp7_simd,massvp7,mpitracedx360M4FCOPTIM=-O3-xAVX-fp-modelfast=2–ipFCBASEOPTS=-ip-fno-alias-w-ftz-no-prec-div-no-prec-sqrt-alignall–openmpnetcdf,pnetcdf,mpitrace8WRFtunablesforscalingrunsWRFrunspecificsasdefinedinnamelist.
input:5kmhorizontalresolution,6sectimestep,12-hourforecast.
2200X1200X28gridpoints.
Oneoutputfileperforecastinghour.
FourBoundaryreadseverythreeforecasthours.
SameWRFtunableswereusedforeverysystem,basedonselectionsthatyieldedoptimalperformanceonthep775system.
nproc_x:LogicalMPItaskpartitioninx-direction.
nproc_y:LogicalMPItaskpartitioniny-direction.
numtiles:NumberoftilesthatcanbeusedinOpenMP.
nproc_xXnproc_y=numberofMPItasks.
Criticalincomparingcommunicationcharacteristics.
SMTwasusedonlyontheIBMPOWERsystems.
SMTwasemployedonPOWERsystemsbyusingtwoOpenMPthreadsperMPItask.
SMT2wasusedforrunswithlessthan2048coresonp775(Bestperformance).
SMT2wasusedforrunswithlessthan512coresonp460(Bestperformance).
Hyper-threadingwasnotbeneficialondx360M4system(notused).
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
9WRFI/OI/OreadingandwritingofWRFvariables.
13I/Owritesteps,eachwritinga7.
5GBfile.
1ReadfortheInitialconditions(6.
74GBfile).
4Readsfortheboundaryconditions(1.
47GBseach).
Dataingestscannotbedoneasynchronously.
Parallelnetcdfwasusedfordataingests(MPI-IO)forallscalingrunsReadoption11forinitialandboundarydata.
I/Onetcdfquiltingwasusedtowritedatafiles.
AssignthesameI/Otasksandgroupsonallthreesystemsforeachofthescalingruns.
LastI/Ostepisdonesynchronously,sinceWRFcomputationsterminate.
QuiltingI/OtimesonWRFtimersreportI/Osynchronizationtimeonly.
I/OisdonebyquiltingtasksontheI/Osubsystemwhilecomputetaskscompute.
I/OParallelnetcdfquiltingwasnotused.
CanfurtherimproveI/Owritingsteps,especiallythelastI/Ostep.
EarlyWRFversionhadproblemswithIBMParallelEnvironmentandparallelnetcdf.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
10WRFuniformvariablestunables2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
Unchangedvariablesonallsystemsforthesamenumberofphysicalcoresnproc_x;nproc_y,nio_groups,nio_tasks_per_group,numtilesPerformanceonPOWERsystemswasfoundtobealwaysbetterwhen:nproc_x1evenforrunswithasingleOpenMPthread.
numtiles>1actsasacacheblockmechanism(likeNPROMA)p775andp460arefavoredagainstdx360M4Forthetestingscenariosifnproc_x2didnothaveaneffectonperformance.
Choiceofnproc_x,nproc_y,numtiles:Wasbasedonbestperformanceonthep775numtiles=4wassetasanadvantageondx360M411WRFruntimeparametersTaskaffinity(binding)wasusedinalltestruns.
SMT/HT:ON–greenbackground(OneextraOpenMPthread),OFF–yellowbackground.
Variables:OpenMPthreads,numtiles,MPItasks,nproc_x,nproc_y,nio_groups,nio_tasks_per_groupPhysical/Logicalcores=OpenMP_threads*(nproc_x*nproc_y+nio_groups*nio_tasks_per_group)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
Numberofp775nodesOpenMPThreadsNumberofp460nodesOpenMPThreadsNumberofdx360nodesOpenMPThreadsnumtilesNumberofcoresMPITasksnproc_xxnproc_ynio_groupsxnio_tasks_per_group42428141281244x311x4828216142562526x421x416216232145125048x631x83243226424102450611x461x64844829624153676019x401x8644642128242048102020x511x496496419244307276019x401x812841284256444096102017x601x419241924384446144153018x851x612WRFRunstatisticsRunsonp775weredonewiththeMPIHPMlibrary:CollectHardwareperformancemonitordata(smalloverhead).
ScaleHPMdatatop460anddx360M4systemsbyfrequencyratios)EstimateSustainedGFLOPratesonallsystems.
SystemPeakrate=(numberofcoresx8xcorefrequency).
CollectMPIcommunicationstatistics.
Runsonp460anddx360M4weredonewiththeMPITRACElibrary:CollectMPIcommunicationstatistics.
MPIcommunicationfromtracelibrariescanhelpestimate:Communication:(minimumcommunicationamongallMPItasksinvolved).
LoadImbalance:(mediancommunication–minimumcommunication).
AccumulationofinternalWRFtimerscanhelpestimate:ReadI/Otimes(initialfilereadtime+boundaryreadtimes).
WriteI/Otimes(I/OWritequiltingtimefromsynchronization+LastI/OWritetimestep).
LastI/Owritestep:~(totalelapsedtime–totaltimefrominternaltimers).
TotalComputation(Purecomputation+communication+Loadimbalance).
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
13WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144p775TotalElapsedtime(seconds)Numberofp775CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation14WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144p460TotalElapsedtime(seconds)Numberofp460CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation15WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144dx360M4TotalElapsedtime(seconds)Numberofdx360M4CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation16WRFRunstatistics2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
005000.
0010000.
0015000.
0020000.
0025000.
00128256512102415362048307240966144TotalElapsedTime(sec)NumberOfCoresp775dx360M4p4600.
00100.
00200.
00300.
00400.
00500.
00600.
00700.
00800.
00900.
001000.
00128256512102415362048307240966144TotalCommunicationTime(sec)NumberOfCoresp775dx360M4p4600.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144TotalPureComputationTime(sec)NumberOfCoresp775dx360M4p4600.
00500.
001000.
001500.
002000.
002500.
00128256512102415362048307240966144TotalLoadImbalanceTime(sec)NumberOfCoresp775dx360M4p46017WRFRunstatistics2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
000.
501.
001.
502.
002.
503.
00128256512102415362048307240966144Averagecomputetimepertimestep(seconds)NumberOfCoresp775dx360M4p4600.
005.
0010.
0015.
0020.
0025.
00128256512102415362048307240966144Averagereadtimeperreadtimestep(seconds)NumberOfCoresp775dx360M4p4600.
005.
0010.
0015.
0020.
0025.
00128256512102415362048307240966144Averagewritetimeperwritetimestep(seconds)NumberOfCoresp775dx360M4p46018WRFGFLOPRates2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002.
004.
006.
008.
0010.
0012.
0014.
0016.
00128256512102415362048307240966144Percent(%)SustainedofPeakPerformanceNumberOfCoresp775dx360M4p4600.
001000.
002000.
003000.
004000.
005000.
006000.
007000.
008000.
009000.
00128256512102415362048307240966144GFLOPSSustainedNumberOfCoresp775GFLOPSSustaineddx360M4GFLOPSSustainedp460GFLOPSSustainedPeakGFLOPS=NumberofcoresX8xCorefrequency.
p775SustainedGFLOPS=(10-9/p775_run_time)*(PM_VSU_1FLOP+2*PM_VSU_2FLOP+4*PM_VSU_4FLOP+8*PM_VSU_8FLOP)p460SustainedGGLOPS=p775SustainedGFLOPS*(p775_run_time/p460_run_time)dx360M4SustainedGFLOPS=p775SustainedGFLOPS*(p775_run_time/dx360M4_run_time)19ConclusionsWRFscalesandperformswellonalltestedsystems.
QuiltingI/Owithnetcdfworksverywellonallsystems.
Parallelnetcdffordataingestimprovesdatareadtimes.
WRFisapopularsingleprecisionCode.
Itrunsverywellondx360M4system.
-xAVXworksverywell.
IntelCompilersdoagreatjobproducingoptimalandfastbinaries.
numtiles~cacheblockparameterforadditionalperformance.
Hyperthreadinggivesnobenefittowardsoverallperformance.
NearneighborcommunicationishandledeffectivelybyFDRIB.
Itrunsokonp775andp460systems.
VSXdoesnotworkwell.
Codecrashesifcompiledwith-qsimdIBMXLcompilersdoOKwith–O3–qhot(VectorMASSlibrary).
Thinrectangulardecompositionsworkok(cachingandvectorMASS).
SMTworkswellonp775,duetoavailablememory-to-coreBW.
Nearneighborcommunicationanoverkillforp775,butOKforp460.
Performanceoddswerestackedagainstdx360M4.
Runswith6144coresanddifferentnproc_x,nproc_yyieldevenbetterperformance.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.

ZJI(月付480元),香港阿里云专线服务器

ZJI是成立于2011年原Wordpress圈知名主机商—维翔主机,2018年9月更名为ZJI,主要提供香港、日本、美国独立服务器(自营/数据中心直营)租用及VDS、虚拟主机空间、域名注册业务。本月商家针对香港阿里云线路独立服务器提供月付立减270-400元优惠码,优惠后香港独立服务器(阿里云专线)E3或者E5 CPU,SSD硬盘,最低每月仅480元起。阿里一型CPU:Intel E5-2630L...

Hosteons - 限时洛杉矶/达拉斯/纽约 免费升级至10G带宽 低至年$21

Hosteons,一家海外主机商成立于2018年,在之前还没有介绍和接触这个主机商,今天是有在LEB上看到有官方发送的活动主要是针对LEB的用户提供的洛杉矶、达拉斯和纽约三个机房的方案,最低年付21美元,其特点主要在于可以从1G带宽升级至10G,而且是免费的,是不是很吸引人?本来这次活动是仅仅在LEB留言提交账单ID才可以,这个感觉有点麻烦。不过看到老龚同学有拿到识别优惠码,于是就一并来分享给有需...

HaBangNet(6.95美元/月)美国vps 5TB流量/德国vps 香港双向CN2 GIA VPS

HaBangNet支持支付宝和微信支付,只是价格偏贵,之前国内用户并不多。这次HaBangNet推出三个特价套餐,其中美国机房和德国机房价格也还可以,但是香港机房虽然是双向CN2 GIA线路,但是还是贵的惊人,需要美国和德国机房的可以参考下。HaBangNet是一家成立于2014年的香港IDC商家,中文译名:哈邦网络公司,主营中国香港、新加坡、澳大利亚、荷兰、美国、德国机房的虚拟主机、vps、专用...

sandybridge为你推荐
甲骨文不满赔偿未签合同被辞退的赔偿原代码什么是原代码罗伦佐娜米开朗琪罗简介www.baitu.com韩国片爱人.欲望的观看地址se95se.com现在400se就是进不去呢?进WWW怎么400se总cOM打开一半,?求解抓站工具公司网站要备份,谁知道好用的网站抓取工具,能够抓取bbs论坛的。推荐一下,先谢过了!抓站工具抓鸡要什么工具?www.bbb551.combbb是什么意思99nets.com99nets网游模拟娱乐社区怎么打不开了?????????谁能告诉我 ???、www.zhiboba.com网上看nba
购买域名 x3220 哈喽图床 服务器架设 空间服务商 合肥鹏博士 新天域互联 web服务器的架设 刀片服务器的优势 股票老左 lol台服官网 百度云1t 服务器监测 免费邮件服务器 监控服务器 新加坡空间 万网空间 空间申请 卡巴斯基官网下载 七十九刀 更多