E5sandybridge

sandybridge  时间:2021-03-27  阅读:()
2014LENOVO.
ALLRIGHTSRESERVED.
2OutlineParallelSystemdescription.
p775,p460anddx360M4,HardwareandSoftwareCompileroptionsandlibrariesused.
WRFtunableparametersforscalingruns.
nproc_x,nproc_y,numtiles,nio_groups,nio_tasks_per_groupWRFI/OSerialandParallelnetcdflibrariesfordataI/O.
WRFruntimeparametersforscalingruns.
SMT,MPItaskgrids,OpenMPthreads,quiltingtasks.
WRFperformancecomponentsforscalingruns.
Computation,Communication,I/O,loadImbalance.
WRFscalingrunresultsonp775,p460anddx360M4.
Conclusions.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
3IBMPOWER7p775system(P7-IH)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
2012IBMCorporation/464002///--/-/-((ManagementandNetworkRack7316TF3KEYBDISP2MMDU7316TF3KEYBDISP1BLANKBLANKBLANKPDUPDBLANKBLANKAirDuct4273J48Ex481GE,210GEBlank4273J48Ex481GE,210GEBlank7042-CR6HMC17042-CR6HMC2EMS-2p7508W32GB)EMS-1p7508W32GB)HDD(6HDD)HDD(6HDD)7042-CR6HMC3.
.
.
.
.
.
-+464002P+-//U-+464002P+-//U-+464002P+-//UComputeFrameARack001ComputeFrameBRack002ComputeFrameCRack003SimilartotheoldECMWFsystem,WaterCooledRack:3Supernodes;Supernode:4drawers;Drawer:32POWER7chips;Chip:8POWER7cores.
32Cores/node@3.
86GHz,256GB/node,8GB/core,1024cores/drawer,3072cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:4MB/core,DualmemorychannelDisklessnode,TorrentPOWER7chip,GiGEadaptorsonservicenodesHFIfibrenetwork,4xstorage(GPFS)nodes,3xDiskEnclosures.
AIX7.
1,XLF14,VAC12,GPFS,IBMParallelEnvironment,LoadLevelerMPIprofileLibrary,FloatingPointMonitor,VectorandScalarMassLibraries4IBMPOWER7p460Pureflexsystem(Firebird)CMAsystempartition,AirCooled,RearDoorHeatExchangers.
Rack:4chassis;Chassis:7nodes;Node:4POWER7chips;Chip:8POWER7cores32Cores/node@3.
55GHz,128GB/node,4GB/core,224cores/chassis,896cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:4MB/core,singlememorychannel2HDpernode,2QDRDualportInfinibandadaptors,GiGEDualRailFattreeInfinibandnetwork,10xp740storage(GPFS)nodes,8xDSC3700devices.
AIX7.
1,XLF14,VAC12,GPFS,IBMParallelEnvironment,LoadLevelerMPIprofileLibrary,VectorandScalarMassLibraries2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
ComputeRack14Chassis,28nodesComputeRack24Chassis,28nodesComputeRack34Chassis,28nodesComputeRack44Chassis,28nodesComputeRack54Chassis,28nodesStorageRack110xp470nodesStorageRack28xDSC3700+QDRInfinibandRack14Switches/managementComputeRack64Chassis,28nodesComputeRack74Chassis,28nodesComputeRack84Chassis,28nodesComputeRack94Chassis,28nodesComputeRack104Chassis,28nodes5IBMiDataplexdx360M4system(Sandybridge)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
NCEPWCOSSsystempartition,AirCooled,RearDoorHeatExchangersRack:72Nodes;Node:2Inteldx360m4sockets;Socket:8IntelE5-2670Sandybridgecores.
16Cores/node@2.
6GHz,32GB/node,2GB/core,1152cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:20MBsharedper8cores1HDpernode,MellanoxconnectX-FDR,10GiGEportsMellanoxIB4XFDRfullbisectionFatTreenetwork,10x3650M4storagenodes,20xDSC3700RHEL6.
2,IntelFORTRAN13,C11,GPFS,IBMParallelEnvironment,PlatformLSFMPIprofileLibraryComputeRack172nodesComputeRack272nodesComputeRack372nodesComputeRack472nodesComputeRack572nodesComputeRack672nodesComputeRack772nodesComputeRack872nodesInfinibandRack1FDR14I/ORack210nodesI/ORack212DSC37006TestedSystemsSummary2014LENOVO.
ALLRIGHTSRESERVED.
16THECMWFHPCWORKSHOPCorefrequncy(GHz)Memory(GB/core)CorestestedVectorSMT/HTsettingsICFabricOSStorageCompilersParallelEnvrnmntQueueingsystemsLibrariesp7753.
868(Alldimmsoccupied)6144VSXonlyforintrinsicsSMT4(SMT2to2048cores)HFIAIX7.
1GPFS4NSD3DiskEnclsrIBMXLCompilers(AIXv14)IBMPEforAIXLoadLevelerMPIProfile,MASS,HardwarePerfrmnceMonitorp4603.
554(Alldimmsoccupied)8192VSXOnlyforIntriscicsSMT4(SMT2to512cores)QDRDual-RailInfinibandFatTreeAIX7.
1GPFS10NSD8DSC3700IBMXLCompilers(V14)IBMPEforAIXLoadLevelerMPIProfile,MASSdx360M42.
62(Alldimmsoccupied)6144AVXEverywhereHT(Notused)FDRSingleRailInfinibandFatTreeRHEL6.
2GPFS10NSD20DSC3700IntelStudio13CompilersIBMPEforlinuxLSFMPIProfile7WRFv3.
3CompilerOptions2014LENOVO.
ALLRIGHTSRESERVED.
16THECMWFHPCWORKSHOPIdenticalWRFcodewasusedCOMPILEROptionsLibrariesp775FCOPTIM=-O3–qhot–qarch=pwr7–qtune=pwr7FCBASEOPTS=-qsmp=omp-qcache=auto–qfloat=rsqrtnetcdf,pnetcdf,massp7_simd,massvp7,mpihpmp460FCOPTIM=-O3–qhot–qarch=pwr7–qtune=pwr7FCBASEOPTS=-qsmp=omp-qcache=auto–qfloat=rsqrtnetcdf,pnetcdf,massp7_simd,massvp7,mpitracedx360M4FCOPTIM=-O3-xAVX-fp-modelfast=2–ipFCBASEOPTS=-ip-fno-alias-w-ftz-no-prec-div-no-prec-sqrt-alignall–openmpnetcdf,pnetcdf,mpitrace8WRFtunablesforscalingrunsWRFrunspecificsasdefinedinnamelist.
input:5kmhorizontalresolution,6sectimestep,12-hourforecast.
2200X1200X28gridpoints.
Oneoutputfileperforecastinghour.
FourBoundaryreadseverythreeforecasthours.
SameWRFtunableswereusedforeverysystem,basedonselectionsthatyieldedoptimalperformanceonthep775system.
nproc_x:LogicalMPItaskpartitioninx-direction.
nproc_y:LogicalMPItaskpartitioniny-direction.
numtiles:NumberoftilesthatcanbeusedinOpenMP.
nproc_xXnproc_y=numberofMPItasks.
Criticalincomparingcommunicationcharacteristics.
SMTwasusedonlyontheIBMPOWERsystems.
SMTwasemployedonPOWERsystemsbyusingtwoOpenMPthreadsperMPItask.
SMT2wasusedforrunswithlessthan2048coresonp775(Bestperformance).
SMT2wasusedforrunswithlessthan512coresonp460(Bestperformance).
Hyper-threadingwasnotbeneficialondx360M4system(notused).
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
9WRFI/OI/OreadingandwritingofWRFvariables.
13I/Owritesteps,eachwritinga7.
5GBfile.
1ReadfortheInitialconditions(6.
74GBfile).
4Readsfortheboundaryconditions(1.
47GBseach).
Dataingestscannotbedoneasynchronously.
Parallelnetcdfwasusedfordataingests(MPI-IO)forallscalingrunsReadoption11forinitialandboundarydata.
I/Onetcdfquiltingwasusedtowritedatafiles.
AssignthesameI/Otasksandgroupsonallthreesystemsforeachofthescalingruns.
LastI/Ostepisdonesynchronously,sinceWRFcomputationsterminate.
QuiltingI/OtimesonWRFtimersreportI/Osynchronizationtimeonly.
I/OisdonebyquiltingtasksontheI/Osubsystemwhilecomputetaskscompute.
I/OParallelnetcdfquiltingwasnotused.
CanfurtherimproveI/Owritingsteps,especiallythelastI/Ostep.
EarlyWRFversionhadproblemswithIBMParallelEnvironmentandparallelnetcdf.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
10WRFuniformvariablestunables2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
Unchangedvariablesonallsystemsforthesamenumberofphysicalcoresnproc_x;nproc_y,nio_groups,nio_tasks_per_group,numtilesPerformanceonPOWERsystemswasfoundtobealwaysbetterwhen:nproc_x1evenforrunswithasingleOpenMPthread.
numtiles>1actsasacacheblockmechanism(likeNPROMA)p775andp460arefavoredagainstdx360M4Forthetestingscenariosifnproc_x2didnothaveaneffectonperformance.
Choiceofnproc_x,nproc_y,numtiles:Wasbasedonbestperformanceonthep775numtiles=4wassetasanadvantageondx360M411WRFruntimeparametersTaskaffinity(binding)wasusedinalltestruns.
SMT/HT:ON–greenbackground(OneextraOpenMPthread),OFF–yellowbackground.
Variables:OpenMPthreads,numtiles,MPItasks,nproc_x,nproc_y,nio_groups,nio_tasks_per_groupPhysical/Logicalcores=OpenMP_threads*(nproc_x*nproc_y+nio_groups*nio_tasks_per_group)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
Numberofp775nodesOpenMPThreadsNumberofp460nodesOpenMPThreadsNumberofdx360nodesOpenMPThreadsnumtilesNumberofcoresMPITasksnproc_xxnproc_ynio_groupsxnio_tasks_per_group42428141281244x311x4828216142562526x421x416216232145125048x631x83243226424102450611x461x64844829624153676019x401x8644642128242048102020x511x496496419244307276019x401x812841284256444096102017x601x419241924384446144153018x851x612WRFRunstatisticsRunsonp775weredonewiththeMPIHPMlibrary:CollectHardwareperformancemonitordata(smalloverhead).
ScaleHPMdatatop460anddx360M4systemsbyfrequencyratios)EstimateSustainedGFLOPratesonallsystems.
SystemPeakrate=(numberofcoresx8xcorefrequency).
CollectMPIcommunicationstatistics.
Runsonp460anddx360M4weredonewiththeMPITRACElibrary:CollectMPIcommunicationstatistics.
MPIcommunicationfromtracelibrariescanhelpestimate:Communication:(minimumcommunicationamongallMPItasksinvolved).
LoadImbalance:(mediancommunication–minimumcommunication).
AccumulationofinternalWRFtimerscanhelpestimate:ReadI/Otimes(initialfilereadtime+boundaryreadtimes).
WriteI/Otimes(I/OWritequiltingtimefromsynchronization+LastI/OWritetimestep).
LastI/Owritestep:~(totalelapsedtime–totaltimefrominternaltimers).
TotalComputation(Purecomputation+communication+Loadimbalance).
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
13WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144p775TotalElapsedtime(seconds)Numberofp775CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation14WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144p460TotalElapsedtime(seconds)Numberofp460CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation15WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144dx360M4TotalElapsedtime(seconds)Numberofdx360M4CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation16WRFRunstatistics2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
005000.
0010000.
0015000.
0020000.
0025000.
00128256512102415362048307240966144TotalElapsedTime(sec)NumberOfCoresp775dx360M4p4600.
00100.
00200.
00300.
00400.
00500.
00600.
00700.
00800.
00900.
001000.
00128256512102415362048307240966144TotalCommunicationTime(sec)NumberOfCoresp775dx360M4p4600.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144TotalPureComputationTime(sec)NumberOfCoresp775dx360M4p4600.
00500.
001000.
001500.
002000.
002500.
00128256512102415362048307240966144TotalLoadImbalanceTime(sec)NumberOfCoresp775dx360M4p46017WRFRunstatistics2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
000.
501.
001.
502.
002.
503.
00128256512102415362048307240966144Averagecomputetimepertimestep(seconds)NumberOfCoresp775dx360M4p4600.
005.
0010.
0015.
0020.
0025.
00128256512102415362048307240966144Averagereadtimeperreadtimestep(seconds)NumberOfCoresp775dx360M4p4600.
005.
0010.
0015.
0020.
0025.
00128256512102415362048307240966144Averagewritetimeperwritetimestep(seconds)NumberOfCoresp775dx360M4p46018WRFGFLOPRates2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002.
004.
006.
008.
0010.
0012.
0014.
0016.
00128256512102415362048307240966144Percent(%)SustainedofPeakPerformanceNumberOfCoresp775dx360M4p4600.
001000.
002000.
003000.
004000.
005000.
006000.
007000.
008000.
009000.
00128256512102415362048307240966144GFLOPSSustainedNumberOfCoresp775GFLOPSSustaineddx360M4GFLOPSSustainedp460GFLOPSSustainedPeakGFLOPS=NumberofcoresX8xCorefrequency.
p775SustainedGFLOPS=(10-9/p775_run_time)*(PM_VSU_1FLOP+2*PM_VSU_2FLOP+4*PM_VSU_4FLOP+8*PM_VSU_8FLOP)p460SustainedGGLOPS=p775SustainedGFLOPS*(p775_run_time/p460_run_time)dx360M4SustainedGFLOPS=p775SustainedGFLOPS*(p775_run_time/dx360M4_run_time)19ConclusionsWRFscalesandperformswellonalltestedsystems.
QuiltingI/Owithnetcdfworksverywellonallsystems.
Parallelnetcdffordataingestimprovesdatareadtimes.
WRFisapopularsingleprecisionCode.
Itrunsverywellondx360M4system.
-xAVXworksverywell.
IntelCompilersdoagreatjobproducingoptimalandfastbinaries.
numtiles~cacheblockparameterforadditionalperformance.
Hyperthreadinggivesnobenefittowardsoverallperformance.
NearneighborcommunicationishandledeffectivelybyFDRIB.
Itrunsokonp775andp460systems.
VSXdoesnotworkwell.
Codecrashesifcompiledwith-qsimdIBMXLcompilersdoOKwith–O3–qhot(VectorMASSlibrary).
Thinrectangulardecompositionsworkok(cachingandvectorMASS).
SMTworkswellonp775,duetoavailablememory-to-coreBW.
Nearneighborcommunicationanoverkillforp775,butOKforp460.
Performanceoddswerestackedagainstdx360M4.
Runswith6144coresanddifferentnproc_x,nproc_yyieldevenbetterperformance.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.

华圣云 HuaSaint-阿里云国际站一级分销商,只需一个邮箱即可注册国际账号,可代充值

简介华圣云 HuaSaint是阿里云国际版一级分销商(诚招募二级代理),专业为全球企业客户与个人开发者提供阿里云国际版开户注册、认证、充值等服务,通过HuaSaint开通阿里云国际版只需要一个邮箱,不需要PayPal信用卡,不需要买海外电话卡,绝对的零门槛,零风险官方网站:www.huasaint.com企业名:huaSaint Tech Limited阿里云国际版都有什么优势?阿里云国际版的产品...

ZJI:香港物理服务器,2*E5-2630L/32G/480G SSD/30Mbps/2IP/香港BGP,月付520元

zji怎么样?zji是一家老牌国人主机商家,公司开办在香港,这个平台主要销售独立服务器业务,和hostkvm是同一样,两个平台销售的产品类别不一平,商家的技术非常不错,机器非常稳定。昨天收到商家的优惠推送,目前针对香港邦联四型推出了65折优惠BGP线路服务器,性价比非常不错,有需要香港独立服务器的朋友可以入手,非常适合做站。zji优惠码:月付/年付优惠码:zji 物理服务器/VDS/虚拟主机空间订...

PacificRack 端午节再来一款年付$38 VPS主机 2核4GB内存1TB流量

这不端午节和大家一样回家休息几天,也没有照顾网站的更新。今天又出去忙一天没有时间更新,这里简单搜集看看是不是有一些商家促销活动,因为我看到电商平台各种推送活动今天又开始一波,所以说现在的各种促销让人真的很累。比如在前面我们也有看到PacificRack 商家发布过年中活动,这不在端午节(昨天)又发布一款闪购活动,有些朋友姑且较多是端午节活动,刚才有看到活动还在的,如果有需要的朋友可以看看。第一、端...

sandybridge为你推荐
急救知识纳入考试100%的大学生有学习现场急救知识的欲望吗12306崩溃为什么12306进不去巫正刚阿迪三叶草彩虹板鞋的鞋带怎么穿?详细点,最后有图解。高分求mole.61.com摩尔大陆?????m.2828dy.comwww.dy6868.com这个电影网怎么样?www.7788dy.com回家的诱惑 哪个网站更新的最快啊百度指数词什么是百度指数广告法有那些广告法?还有广告那些广告词?www.idanmu.com新开奇迹SF|再创发布网|奇迹SF|奇迹mu|网通奇迹|电信奇迹|baqizi.cc誰知道,最近有什麼好看的電視劇
域名空间代理 网站域名空间 in域名注册 动态域名解析软件 日本空间 合肥鹏博士 个人免费空间 骨干网络 京东商城0元抢购 vip购优汇 泉州电信 空间技术网 vip域名 东莞服务器 drupal安装 网通服务器 dnspod 广州虚拟主机 日本代理ip 114dns 更多