powerpagedefrag

pagedefrag  时间:2021-02-21  阅读:()
DocumentNumber:340608-001USUtilizingLinuxSwapwithIntelOptaneDCSSDsasaMemoryOvercommitTechniqueSolutionsBlueprintJune2019Version1TeamContacts:AndrzejJakowskiandrzej.
jakowski@intel.
comKernelDevelopmentTimC.
Chentim.
c.
chen@intel.
comKernelDevelopmentYingHuangying.
huang@intel.
comKernelDevelopmentFrankOberfrank.
ober@intel.
comTestingandOutreachDavidJ.
Leonedavid.
j.
leone@intel.
comTestingandOutreachAndrewRuffinandrew.
ruffin@intel.
comMarketAnalysisandOutreachPragathiNarendrapragathi.
narendra@intel.
comPerformanceTestandTestDevelopmentMariuszBarczakmariusz.
barczak@intel.
comKernelDevelopmentGertPauwelsgert.
pauwels@intel.
comFieldTechnicalSupportEMEARegionStevenBriscoesteven.
briscoe@intel.
comFieldTechnicalSupportEMEARegionFaribKhondokerfarib.
khondoker@intel.
comTestingandSupportUtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune20192340608-001USRevisionHistoryRevisionNumberDescriptionRevisionDate001Initialrelease.
June2019Inteltechnologies'featuresandbenefitsdependonsystemconfigurationandmayrequireenabledhardware,softwareorserviceactivation.
Performancevariesdependingonsystemconfiguration.
Noproductorcomponentcanbeabsolutelysecure.
Checkwithyoursystemmanufacturerorretailerorlearnmoreatintel.
com.
Noproductorcomponentcanbeabsolutelysecure.
Intel,theIntellogo,Optane,andXeonaretrademarksofIntelCorporationoritssubsidiariesintheU.
S.
and/orothercountries.
*Othernamesandbrandsmaybeclaimedasthepropertyofothers.
IntelCorporationUtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US3ContentsIntroduction4Scope.
4MemoryOvercommitUseCases.
6ExampleServerCostModel7TheKernelBuildProcess.
8DevelopmentToolsRequiredformenuconfig(PossiblePre-requisites)8AppendixAAutomationScriptsandHow-toGuide16AppendixBMemoryManagementFundamentals18B.
1MemoryManagementSystemOverview18AppendixCLinuxKernelInnovationstoLeverageFastSSDsasMemoryExtension20C.
1SwapImprovementsCompletedinv4.
14ofLinuxKernel21AppendixDSwapImprovementsPatchLists23D.
1References.
24UtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune20194340608-001USIntroductionThissolutionsblueprintexplainshowtouseIntelOptaneDCSSDsinmemoryextensionconfigurations,orasmemoryreplacement.
We'lldescriberecentperformanceimprovementsthatwerefirstintroducedinversion4.
11andcompletedinversion4.
14oftheLinux*kernel.
Forsimplicity,wewillrefertoversion4.
14ornewer,asthekernelversionneededtoevaluatehighperformanceswapusage.
VeryhighenduranceandlowlatencydeviceslikeIntelOptaneDCSSDscanbeefficientlyusedasswapdevices,therebyenablingthesystemtoexceeditsminimumrequiredsystemlevelperformanceinvariousmemoryovercommitusecases.
IntelOptaneSSDsusedasswapdevicesareexpectedtohavealonglifespanoffiveormoreyearsinthisusage.
Forthosewhointendtoimmediatelyimplementandtesttheusecasesoutlinedinthisdocument,pleasejumptotheAppendixsections,andvisitthefollowingGitHublinkfortools,instructions,andtestcode.
http://github.
com/fxober/LinuxSwapScopeWewillfocusonhowtheLinuxoperatingsystem(OS)canutilizeIntelOptaneDCSSDsasswapdevices,therebyallowingstoragedevicecapacitytobeusedinconjunctionwithDRAMtostorememorypagesonbothDRAMandnon-volatilememorytypemedia.
Theprocessofmovingmemorypagesbetweenthestoragedeviceandmainmemoryiscalledpaging.
Pagingallowssystemadministratorstoperformefficientmanagementofsystemresources(memory,CPU,storage)atdesiredcostandservicelevels.
WithrecentadvancementsinstoragemediaandLinuxkernelimprovements,IntelOptaneDCSSDsprovideanewopportunitytooffsetDRAMcostsandallowformoreflexibleprocessmemoryoversubscription,athigherperformancelevelsthanbefore.
Thissolutionsblueprintwillexplorethoseusages.
TargetAudienceTargetedforsystemadministrators,systemoperators,DevOpsteams,andapplicationdeveloperswantingtoconfiguretheirunderlyingsoftwareandhardwareresourcestomaximizesystemperformanceatabettercost.
ThisdocumentassumesfamiliaritywithbasiccomputerarchitectureterminologyandtechniquesinOSusagestomanagephysicalresourcessuchasCPU,memoryandstorage.
ItalsoexplainsfundamentalconceptsofmemorymanagementtechniquesutilizedinmodernOSs,focusingontheLinuxenvironment.
TheimprovedimplementationsofLinuxSwap*andbetterhigherendurancememorymedia,suchasIntelOptanememory,isessentiallywhatenablessuchasolutiontobeeffectiveinamoderndatacenterenvironment.
DocumentOrganizationFirst,thisdocumentintroducesusecasesinwhichtheIntelOptaneDCSSDisusedasmemoryaugmentation.
Later,aservercostmodelispresented,whichcanbeadoptedoradjustedtocalculatepotentialcostsavingswhenleveraginganIntelOptaneDCSSDasDRAMreplacement.
Next,wedescribetheOSupgradesnecessarytomaximizesystemperformancewhenusinganIntelOptaneDCSSDasaswapdevice.
SpecificallyweprovideguidanceonminimumrequiredversionsofcommonLinuxdistributionsthatutilizeswapandmemorymanagementsubsystemimprovements,alongwithdetailsonbuildingtheLinuxkernelmanuallytomaximizeswapperformance.
TheAdditionalConsiderationsforSoftwareConfigurationsectionexploressystemconfigurationdetailsformaximizingswapperformance.
Thenwecomparetheperformanceofthedifferentswapdevices.
FinallyintheAppendixsectionsthedetailsofthememorymanagementsubsystemanddetailsofLinuxkernelinnovationsthatimproveswapperformanceareexplained.
Finally,akernelpatchlistisprovidedforadvanceduserswillingtobackportthechangesintotheirownkernelfork.
UtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US5GlossaryTermDefinitionPhysicalmemoryFastmemory,byteaddressable(asopposedtodiskstoragewhichissectororblockaddressable).
Thisfast,dynamicsystemmemoryistypicallyprovidedbyDRAMtechnology.
SwapdeviceDedicatedspaceonastoragedeviceforstoringmemorypagesofprocessdataorprocesscode.
Itcanbewholeblockstoragedeviceoritspartitionorafileinfilesystem(swapfile).
VirtualmemoryMemorymanagementtechniqueimplementedinmodernOSs.
Itprovidesanillusiontotherunningprocessthatitoperatesonacontiguousblockofmemory,whileinrealityhardwareandtheOSmanagetranslationsbetweenvirtualaddressestophysicaladdresses,andtransfersofmemorypagesfromstoragedevicetophysicalmemory.
OSvirtualmemoryhidesthosecomplexitiesfromtheapplicationprogrammer.
TotalCostofOwnership(TCO)Adefined,butoftennotstandardizedapproachtoanalyzingthefinancialimpactofapurchase,andperhapsongoingexpensesofhardwareandsoftwareinfrastructureoveritslifecycle.
TCOmodelstypicallyincludesvariousfactorsimpactingcost,e.
g.
costtopurchaseHW(capitalspending),operationalcostrelatedtoelectricityusedtopowerandcoolabuilding,andDataCenterequipment.
Thispaperfocusesonasimplifiedservercostmodel.
YoucanconsideritBillofMaterialoptimization,sincethetargetisnotfullanalysisofallserveroperationoracquisitioncosts.
§UtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune20196340608-001USMemoryOvercommitUseCasesThischapterintroducesexampleusecasesinwhichanIntelOptaneDCSSDcanbeusedasmemoryextension,orasmemoryreplacementbyusingtheLinuxswapmechanism.
ThischapteralsoprovidesanexampleservercostmodelthathasbeendevelopedtoillustratepotentialcostsavingswhenconsideringthepurchaseofanewHWinfrastructure.
Usethisservercostmodelasaframeworktocalculatepotentialcostsavingsattheservercapitalexpenditurelevel.
MemoryOvercommitforVirtualizesEnvironmentsOnecommontechniquewidelyusedamongcloudserviceproviders(CSPs)istoperformphysicalresourcesover-commitmentincludingphysicalCPU,storage,andmemory.
Thefollowingfigureillustratesvirtualmachinedifferentiationbasedonratio,andhowmuchoftheguestphysicalmemoryisactuallybackedupbyphysicalDRAM.
Forexample"Gold"VMs'guestphysicalmemoryisfullybackedupbyDRAM,whilefor"Silver"VMshalfofitsguestphysicalmemoryisbackedupbyDRAM,andtheremainingportionisbackedupbytheswapdevice.
Finally,for"Bronze"VMs,aquarteroftheguestphysicalmemoryisbackedupbyDRAM,theremainingportioncanbepagedouttotheswapdevice.
WithLinuxbasedhypervisor(KVM)thistypeofdifferentiationcanbeachievedusingthemechanismcalledcontrolgroups(cgroup)whichcontrolsresourceusage(e.
g.
systemmemory)toagroupofprocess–inthiscaseaclassofVMs.
Figure1:ExampleofVirtualMachineDifferentiationBasedonMemoryOvercommitRatio§UtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US7ExampleServerCostModelThischapterfocusesonderivinganexampleservercostmodel,fromasystemmemoryhardwarecostsperspective,fortwoexampleconfigurationsofservers:server"A"andserver"B.
"Theservercostmodeldoesnottakeintoaccountthevariedanduniqueoperationalexpensesorothercapitalexpendituresrelatedtothelargerscopeofrunningadatacenter.
Forsimplicityofourcomparison,differencesinspace,power,operatingcosts,andothervariablefactorsareignored.
Server"A"andserver"B"configurationsarealmostidenticalwithregardstoCPU,networking,andstorage(bothbootdisksanddatavolumes).
Thereareonly2differencesbetweenthem:Server"A"totalphysicalDRAMis384GiB(24x16GBRDIMMs),whileserver"B"ispopulatedwithonly192GiB(12x16GBRDIMMs)ofphysicalDRAMServer"A"doesnotuseIntelOptaneDCSSDasaswapdevice;insteadserver"B"usesIntelOptaneDCSSD(2x100GiBdevices)asswapdevicesOneofthedatapointsmostinterestingtoasystemadministratoristherelativecostofserver"B"toserver"A"whichillustratesthepotentialhardwarecomponentcostsavingsonthepurchaseorleaseofnewserversforthedatacenter.
Additionalservercostcalculationsfocusontherelativecostsofserver"B"configurationcomparedtoserver"A".
Forsimplicity,thiscostingmodeltakesintoaccountonlythememorycomponents(DRAM+IntelOptaneDCSSDcapacities),becauseallothercomponentsofthoseserverconfigurationsareidentical.
Relativecostcomparisonofserver"B"configurationtoserver"A"configurationcanbedefinedasfollows:==_+__NowsimplydividingnumeratoranddenominatorofaboveequationbycostOptaneleadstothefollowingformula:=_+__SubstitutionofwithnormalizedperGiBDRAMtoOptanepriceratio(DRAM_to_Optane)willleadtothisfinalformula:=___+____Note:Pleasedoyourownpricecalculationsusingtheformulaabovetocalculateyourservercostsavings.
§UtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune20198340608-001USTheKernelBuildProcessRecommendedSoftwareUpgradesInordertomaximizeIntelOptaneDCSSDperformanceinamemoryextensionconfiguration(asaswapdevice)IntelrecommendsupgradingyourLinuxdistributiontoarecentversioncontainingthebackportedseriesofpatchesthatwereaddedtotheupstreamLinuxkernelinversions4.
11andlater.
ThefollowingtablecontainsinformationonthecommonLinuxdistributionversionsthatadoptedperformanceimprovementspertainingtoswapperformance.
Table1:LinuxDistributionContainingSwapPerformanceImprovementsLinuxDistributionOSVersionRHEL/CentOSStartingversion7.
5andforwardStartingversion8.
0andforwardUbuntuStartingversion18.
10andforwardSLESStartingversionSLES15,SLES12SP4andforwardOracle*LinuxStartingversionOracleLinux7.
5andlaterwithUEKR5andRHCKHowtoBuildyourKernelBasedonUpstreamLinuxKernelThissectionprovidesinstructionsonbuildingaLinuxkernelimagebasedontheupstreamLinuxkernelproject.
ThismaybeespeciallyusefulforthoseinterestedinfurtherexplorationofLinuxkernelimprovementsrelatingtoswapdeviceperformance,andwhoarewillingtoupgradetheirinfrastructure'sLinuxkernel.
PleasenotethattheseinstructionsarebasedonUbuntu*server18.
04.
2systembuild,theexactstepsmaydifferbetweendifferentLinuxdistributions,e.
g.
usageofdistributionpackagemanager.
Approximatetimeneeded:1hourDevelopmentToolsRequiredformenuconfig(PossiblePre-requisites)Inordertoclone,compile,andbuildanewkernel/driver,thefollowingpackagesmustbeinstalled.
Youmustbeloggedinasroottoinstallthesepackages.
##Dependenciesneededtorunkernelmenuconfig#apt-getinstallflexbison#apt-getinstalllibncurses5-devlibncursesw5-dev##Dependenciesneededtoperformkernelbuild#apt-getinstalllibssl-devlibelf-dev#dpkg-ilinux-*.
debUtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US9BuildNewLinuxKernelwithRCUSettingforSwapDownloadLinuxkernel4.
14or5.
xornewerfromthisrepository:https://www.
kernel.
org/pub/linux/kernel/intoyourLinuxdistribution.
Itisthebesttochoosethelateststablekernel.
Fromaworkingdirectory:##Usewgettodownloadthekernelandunpackit(heretheexampleis4.
18.
20)#wgethttps://mirrors.
edge.
kernel.
org/pub/linux/kernel/v4.
x/linux-4.
18.
20.
tar.
xz#tar-xvflinux-4.
18.
20.
tar.
xz##AlternativelyclonewholeLinuxkernelgitrepositoryandcheckoutspecificbranch#gitclonehttps://git.
kernel.
org/pub/scm/linux/kernel/git/stable/linux.
git#gitcheckout–bv4.
18.
20_localv4.
18.
20BuildandinstallTocreatethekernelconfigurationfile(.
config)basedontherunningkernel,andusethedefaultsettingforallnewoptions,runthefollowingcommand:#yes""|makeoldconfigToobtainmaximumperformance,avoidread-copy-update(RCU)callbackprocessingasthismayintroducedelays.
ToavoidRCU,edit"CONFIG_RCU_NOCB_CPU=y"settinginyourlocalkernel.
configfile.
SeeOffloadingRCUProcessingtoDedicatedKernelThreadsfordetailsoneditingRCUsettings.
Alternatively,youcanmakechangesbyrunningmenuconfigtoselectthatoptionusingtheuserinterfaceasshownintheimagebelow.
#makemenuconfigUnder"GeneralSetupandFeatures>RCUSubsystem"setthe"OffloadRCUcallback…"flagasshownintheimagebelow:SaveandExitmenuconfig.
Buildthekernelandkernelmodules,andinstallthenewkernelonthesystem.
##Tobuildkernelimageandloadablekernelmodulesinvoke#make#makemodules_install##Installnewlybuiltkernelintooperatingsystem#makeinstallAftersuccessfulinstall,rebootthesystemtoloadthenewkernelimageandkernelmodules.
Usuallythenewkernelbecomesthedefaultbootselection.
AfterbootingtheOS,use"uname-a"toverifythattherunningkernelversionmatchesthenewlyinstalledkernelversion.
Ifadifferentkernelversionisloaded,youcanmodifythisbyreconfiguringthesystemloader,usuallygrub2.
Refertothesystemloaderdocumentationforyourspecificdistribution.
UtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune201910340608-001USAdditionalConsiderationsforOSConfigurationThissectionexploresOSconfigurationconsiderationsformaximizingperformanceoftheswapdevice(s).
OffloadingRCUProcessingtoDedicatedKernelThreadsTooffloadRCUprocessingtodedicatedkernelthreads,editthekernelcommandlineoptioninthesystemloader.
WhenusingGrub2assystemloader,navigateto/etc/default/grubfileandadd"rcu_nocb="totheGRUB_CMDLINE_LINUX_DEFAULTentry.
Seebelow/etc/default/grubfilelistingforexample:.
.
.
GRUB_DISTRIBUTOR=`lsb_release-i-s2>/dev/null||echoDebian`GRUB_CMDLINE_LINUX_DEFAULT="rcu_nocbs=0-nmaybe-ubiquity"GRUB_CMDLINE_LINUX="".
.
.
Note:nisthenumberofcpus(orhwthreads)inyoursystemAftersavingedits,runeitherthe"update-grub"or"grub2-mkconfig"commandtoupdateyourgrub2settingsinthebootpartition.
Rebootthesystemandverifythatthenewsettingshavebeenappliedtothekernel.
#dmesg|grep-ioffload[0.
000000]OffloadRCUcallbacksfromCPUs:0-63.
ThereasonforthisstepistoavoidRCUprocessinginanIOcompletionpath,asRCUprocessingwilllikelyincreasepaginglatency.
TurningOffTransparenthugepagesTominimizetheoverheadofcoalescingmemorypagesintohugepagesandlaterbreakingthemupontheswapdevice,performthefollowingcommands:#echo'never'>/sys/kernel/mm/transparent_hugepage/enabled#echo'never'>/sys/kernel/mm/transparent_hugepage/defragWatermarkScaleFactorItisimportanttoincreasethewatermarkscalefactorin/proc/sys/vmasthisisthelevelwhereavailablememoryischeckedbykswapd.
Werecommendsettingitto400or4%ofavailablememory,doingsowillsetkswapdtoautomaticallykickoffswappingat4%ofavailablesystemmemory.
#echo'400'>/proc/sys/vm/watermark_scale_factorNUMAConsiderationsWhendealingwithmultipleswapdevicesonamulti-socketsystemwerecommenddistributingswapdevicesevenlyamongdifferentCPUsocketstoavoidQPI/UPItransfers.
MoreovertoavoidsoftwareoverheadwerecommendcreatingmanyswapdevicesonapartitionedNVMedevice.
Eachswappartitionmusthavethesamepriority.
Inmostcasestherecanbeatleast28partitions,dependingonthekernelconfiguration.
Whensettingupyoursystem,werecommendadheringtotheNUMAlocalityrulesformaximumperformance.
UtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US11PerformanceDataof4.
18.
20LinuxSwapWeusedthepmbenchutilitytotesttheallocationandaccessof4KiBmemorypagesonaLinuxsystem.
OurtestsystemutilizedanUbuntu18.
4.
2distributionofLinuxwhichweinitiallyupgradedtothe4.
18.
20versionofthekernel,astheUbuntureleasecomeswith4.
15.
xkernelversion.
WeupgradedusingthemethodsnotedinAppendixA-AutomationScriptsandHow-toGuide.
Thereshouldbenoissuerunningkernel4.
14ornewerasthekernelpatchestoLinuxswapareupstreamed(publiconkernel.
org)in4.
14.
Youcannotgainthislevelofperformanceonkernelspriorto4.
14.
Wetestedthein-boxkernelofUbuntu18.
04.
2(kernel4.
15.
0-46-generic)andsawminimaldifference(Hereisanexamplevariablesettingfrom/etc/default/grub,CPUcountspecific:GRUB_CMDLINE_LINUX_DEFAULT="rcu_nocbs=0-[n]maybe-ubiquity"Where[n]isthenumberoftotalCPUcoresorvirtualCPUthreadsinyoursystem.
Configurethekernelwiththese.
configsettingsifyouareabletocompileyourownkernel.
4.
EXPERIMENTAL:Generallyspeaking,itisbesttosettheNVMeschedulerto[none]ontheNVMeSSDswhichyouaretestingthemqblockorkyberscheduler.
Inmostcasesyourbuildshows[none],whichisfine.
#more/sys/block/nvme1n1/queue/scheduler[none]UtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US175.
NewerkernelsallowanNVMequeuesizeof1,023,whichissufficientandrecommended.
6.
IfyouareseeingNVMeblockmerges,changeyourNVMeblocksizeto4Kib(not512b)sectors.
Ifblockmergesarestilloccurringaftermakingthischange,trythefollowing.
First,checkthenomergesvalue:#cat/sys/block/queue/nomergesThenomergesvalueshouldbesetto2.
Verifyandchangeifnecessary:echo2>/sys/block/queue/nomerges§UtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune201918340608-001USAppendixBMemoryManagementFundamentalsThischapterintroducesthebasicmemorymanagementconceptsusedintheLinuxkernel.
ItexplainssystemlevelbottlenecksobservedwhenIntelOptaneDCSSDsareusedasswapdeviceswithLinuxversionspriortov4.
14oftheupstreamLinuxkernel.
Finally,itexplainstechniquestoovercomethosebottlenecksinversion4.
14,souserscanexperienceimprovedperformanceandutilizeIntelOptaneDCSSDsasswapdevices.
B.
1MemoryManagementSystemOverviewModernoperatingsystemsimplementavirtualmemorymodelwhichprovidesmanyadvantagestoapplicationdevelopers.
Virtualmemorymodelsimplifiessoftwaredevelopment,itleavesphysicalmemoryallocationanddataplacementcomplexitytotheunderlyingoperatingsystem.
Theoperatingsystemkerneldealswiththatcomplexitybyprovidinganimpressiontoanyrunningprocessthathasabigchunkofmemoryavailable(usually4GiB)foritsexclusiveuse.
InrealityOSkernelmapsprocessvirtualmemorytophysicalDRAM,andpotentiallyoverflowstoaswapdevice,whichextendsavailablephysicalmemory.
Theprocessoftransferringdatabetweentheswapdeviceandphysicalmemoryiscalledpagingandconsistsofpage-inswhenthedataisreadfromtheswapdeviceintophysicalmemory,andpage-outswhendataismovedoutofmemory.
Itshouldbenoted,page-outsmayrequiredatatobewrittenouttotheswapdevice,basedonthestateofthepage.
Figure2belowprovidesaconceptualdiagramofvirtualmemoryandpagingFigure2:VirtualMemoryConceptthroughPagingUtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US19ThepagingprocessismanagedbytheOSandisheavilysupportedbyCPUhardwarethroughthememorymanagementunit(MMU).
Forexample,MMUcontainstranslationlookasidebuffer(TLB)cachewhichcontainsrecentinformationonvirtual-to-physicalmemorytranslations.
Thisenablesasignificantreductionintimeneededtoaccessdatainmemory.
AnotherCPUfeaturethatassiststheOSwithmemorymanagementisamechanismcalledpagefault.
PagefaultisanexceptionraisedbyCPUhardwarewhenaprocesstriestoaccessavirtualmemorylocationthatisnotmappedtoaphysicaladdress.
Therearedifferenttypesofpagefaults:Minor–isrisenwhenapageexistsinmainmemorybutthereisnoentryindicatingvirtual-to-physicaladdressmapping.
ThepagefaulthandlerisimplementedintheOScreatesanewmappingentry.
Major–isrisenwhenapagedoesnotexistinmainmemory.
Thepagefaulthandlerneedstobringrequireddatafromtheswapdeviceintomemoryandcreatecorrespondingmappingentry.
Forexample,thishappensinafreshlyloadedprocesswhichcausestheOSkerneltodelayloadingthewholeprogramintomemory.
Thistechnique,calledon-demandpaging,acceleratesprocessstartup.
AmajorpagefaultisaperformancedrainingprocedurethatrequirestheOSpagefaulthandlertofindanavailablelocationinphysicalmemory,whichcanpotentiallyinvolvepaging-outandloadingcontentoftheprogramfromtheswapdeviceintomemory,beforetheprocesscancontinueitsexecution.
Therearetwodifferenttypesofpages:Filesystempages,orpagesbackedupbythefiles.
Thesearememorypagesthatcontainfiledata;forexample,databasefilesdirectlymappedintotoprocessaddressspace,orlibraryfilescontainingexecutableprogramcode.
Thesepagescanbepaged-intophysicalmemory;forexample,whentheprogramstartsexecutinginstructionsstoredonthedisk(i.
e.
programusageofasharedlibrary).
TheLinuxpagecacheisacacheofthesepagesdestinedforfiles–bothresidentto-be-read,andchanged(dirty)thatneedtobesynchronizedtosomestoragedevice.
DirectaccessIOroutinesforwhichthereisnopagecacheusagearealsoavailableonLinux.
Sincethepagecacheisanopportunisticandgeneralusagecache,itisnotappropriateforallusages.
Anonymouspages.
Thesearememorypagesthatcontainprivateprocessinformation,thatisheaporstack,andhavenodeviceorfilesystembackingthem.
Whenthesystemisrunningintolowmemoryconditions(highmemorypressure)anonymouspagescanbepaged-out(swappedout)totheswappingfileorswapdevicebyOSprocesskswapdanditsrelatedkernelthreads.
Thisprocesscanbemoreorlessaggressivebasedontheconfigurationoftheswappinessparameter,asthisparametersetsthetargetofwhenswappingshouldbecomemoreactive.
Theparametercanbesetfrom0to200;thehigherthevalue,themoreswapisutilizedoverpagecachememoryreclamation.
InourperformancestudytheOSisconfiguredtoitsdefaultvalueof60,whichisthetypicalproductionrecommendedsetting.
Valueof100meansthatOSwillreclaimmemorypagesusingpagecacheandswapequally.
Youcanprintoutprocvariable/proc/sys/vm/swappinesstoviewitscurrentvalue.
Anotherimportantparameterusedtocontrolwhenkswapdkernelthreadsareactivatediswatermark_scale_factor.
Theusercansetalowerlimitofavailablememorythatspecifieswhenkswapdactivitywillbestarted.
MoredetailsareavailableinWatermarkscalefactorsection.
§UtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune201920340608-001USAppendixCLinuxKernelInnovationstoLeverageFastSSDsasMemoryExtensionUntilrecentlytheLinuxkernelhadbeenprimarilyoptimizedforrotationaldisksbecausetheywerethepredominantstoragedevices.
Oneofthetechniquesusedtomaximizeswapperformanceforrotationalharddiskdrives(HDDs)wastomaintainswapdatainthecontiguouslocationonthedisktominimizediskseektime.
Theperformanceyieldsofthistechniquewerefineforrotationalharddiskdrives(HDDs)butinadequateforsolidstatedrives(SSDs).
Withrecentadvancementsinnon-volatilememory(NVM)technologieslikeIntelOptanetechnology,newtechniquesandmethodsareneededtotakeadvantageoftheincreasedperformanceofthemediaanddevices.
WhiletestingLinuxswapagainstthesenewdevices,manysystem-levelbottleneckswerediscoveredinLinuxswap.
KerneldevelopershaveaddressedsomeoftheperformancebottlenecksinthereleaseofLinuxkernel4.
14.
Inthissectionweexploresomeofthoseenhancements.
SwapdeviceintheLinuxkernelisrepresentedbyadedicateddatastructure(swap_info_struct)thatcontainsinformationonhowmemorypagesarestoredontheswapdevice,seeFigure3below.
Thisinformationisstoredinanarray,calledswap_mapwhichispartofswap_info_struct.
Swap_mapstoresinformationonusagecountforapagestoredontheswapdevice.
Swap_mapentriesareaggregatedintoclusters,theseclusterseffectivelyassignspecificportionsoftheswapdevicetothespecificCPUcore.
Updatestotheusagecountofindividualswap_mapentriesrequireperclusterlockstobetakeninsteadofholdingasinglelockprotectingthewholeswap_map.
Figure3:PrimarySwapDeviceDataStructuresEventhoughtherearededicatedswapentriesperCPUcluster,accessestotheswap_mapareprotectedbyasinglelockwhichisascalabilityandperformancelimiterwhenconcurrentattemptstotheswapdevicearemade.
Thenegativeimpactofthissinglelockisespeciallyvisibleinhighmemorypressureconditions.
Whenthesinglelockisusedtoprotectcriticalinformationintheswap_info_structdatastructure,latenciesforhandlingpagefaultsfromtheswapdevicearesignificantlyincreased.
ThisheavilyimpactsenduserperformanceandrendersthelatestHWlatencyimprovementsineffectiveduetosystemlevelbottlenecks.
Thenextsectionexplainstechniquestominimizelockcontentiononthesinglelockthatprotectsswap_info_structdatastructure,andtoimprovesystemlevellatencies.
AspreviouslydiscussedinthePerformanceDatasection,accesslatenciesonswapaveragebelow20microsecondswhenutilizingahigherperformancedrive.
UtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US21C.
1SwapImprovementsCompletedinv4.
14ofLinuxKernelTherearemanysoftwaretechniquestoaddressperformanceproblemsrelatedtolockcontention.
Theseapproachestypicallyrelyonthefollowingprinciples:Replacementofsinglecoarse-grainedlockonswappartitionwithmultiplefiner-grainedlocksontheswapcluster–whenmanypiecesofdataareprotectedfromconcurrentaccessesbyasingle,biglock,theconcurrentthreadsthatareattemptingtoreadorwritedataareserializedinaqueuewhileawaitingtheirturn.
Insuchcases,toimproveparallelism,abiglockcanbesplitintomanysmallerlockstoprotectindependentsub-piecesofdata.
Thisapproachmayyieldsignificantperformanceimprovementsespeciallywhenmultiplethreadsaccessindependentpiecesofdata,howeverwhenmorethanonethreadattemptstoaccessthesamepieceofdata,thoseattemptswillbeserializedinaqueue.
Reductionoftimespentwhenholdinglock(ortimespentincriticalsection)–whentherearemultiplethreadsattemptingtoaccessacriticalsectionthatisprotectedbyanexclusivelockheldbyanotherthreadtheyareallpauseduntillockisreleased.
Thelongerthecriticalsectionis,thelongertheotherthreadswillwaitbeforetheycancontinue.
Reductionoftimethatgiventhreadspendsinthecriticalsectionisanotherusefultechniqueincreasingparallelismandreducinglatency.
KernelDevelopersdeterminedthattheoccurrenceofincreasedsystemlevellatencieswhileswappingtoIntelIntelOptaneDCSSDwerecausedbyasinglelockprotectingswap_info_structdatastructure.
TheyhaveappliedtheprinciplesdiscussedaboveintotheseriesofswapimprovementsthatareavailableinLinuxkernelversion4.
14andlater.
Thefollowingtechniqueshavebeendevelopedtoreducelockcontentionontheswap_info_structlock.
1.
BulkoperationsandperCPUlockclusterimprovements–multipleswap_mapentriesthatrepresentfreespaceontheswapdevicehavebeenaggregatedinlargerunitsandstoredinswapslotcache.
SwapslotcacheismanagedbyaspecificCPUcore,becauseofthatitiscalled"percpuswapslotcache".
WhenaSWthreadrequestsnewswapspaceitfirsttriestoallocateitfromswapslotcacheonthegivenCPU.
Thisoperationdoesnotrequirelocking.
Becausesingleswapslotcachecontainsmultipleswap_mapentriesitislikelythatswap_mapentrywillsuccessfullybeallocatedfromit.
Whenallocationfromswapslotcacheisnotpossible,swapsoftwareneedstoperformbulkallocationofmultipleswap_mapentriesfromswap_map,andassignthoseentriestoswapslotcache.
Swap_info_lockisacquiredwhendoingbulkoperationsontheswap_mapdatastructure.
PleaserefertoFigure4belowfordetailsofthechanges.
Figure4:SwapBulkOperationsImprovementsUtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune201922340608-001US2.
Radixtreesplit–anothersourceoflockcontentionthatexistedinLinuxkernelpriortoversion4.
14wasradixtreeusedforswapcache.
Swapcacheisanoptimizationinaswappingbehaviorthatreducesthenumberofwritestoswapdeviceorswapfileandmaintainsmappingbetweenmemorypageandswapmapentrywhenmemorypageisswappedinorswappedout.
Swapwriteisconsideredunnecessarywhenapageexistsinaswapdeviceorswapfile,aswellasinmainmemory,becausebothofthoselocationscontainthesamedata.
WhenLinuxconsiderspageforreclamationitcansimplycheckifitexistsinbothswapdeviceorswapfile,andinmainmemoryanddatainthosetwolocationsmatch.
Insuchcasepageinmainmemorycanbesimplymarkedasinvalidandreclaimed.
Toperformcheckifswapentryhascorrespondingpagestoredinmainmemoryradixtreedatastructureisused.
Swapcacheradixtreepriortoversion4.
14ofLinuxusedtobeprotectedbysingleswapcachelockwhichreducedparallelism.
Inversion4.
14singleswapcacheradixtreehasbeensplitintomultiplesmallertrees.
Thismodificationintroducedseparatelockspereachsmallerradixtreeandincreasedparallelism.
Thecurrentdesignmethodisbestimplementedwithmanyswappartitionsonthephysicalswapdevice.
SeeAppendixAandtheautomationscriptsongithubtoimplementthemaximumnumberofLinuxswappartitions,typically28.
§UtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US23AppendixDSwapImprovementsPatchListsThissectionprovidesalistofkernelpatchespertainingtoswapimprovementsthatwereintroducedintheLinuxkernel4.
11andin4.
14.
Thislistofpatchesmaybeusefulwhenconsideringcreatingauniquekernelimagebasedonkernelversionsolderthan4.
11,andbackportingswapimprovementsintoit.
commit322b8afe4a65906c133102532e63a278775cc5f0Author:HuangYingDate:WedMay314:52:492017-0700mm,swap:Fixaraceinfree_swap_and_cache()commit0ccfece6ed507738c0e7e4414c3688b78d4e3756Author:HuangYingDate:WedMay314:56:162017-0700mm/swapfile.
c:fixswapspaceleakinerrorpathofswap_free_entries()commit322b8afe4a65906c133102532e63a278775cc5f0Author:HuangYingDate:WedMay314:52:492017-0700mm,swap:Fixaraceinfree_swap_and_cache()commitba81f83842549871cbd7226fc11530dc464500bbAuthor:HuangYingDate:WedFeb2215:45:462017-0800mm/swap:skipreadaheadonlywhenswapslotcacheisenabledcommit039939a65059852242c823ece685579370bc574fAuthor:TimChenDate:WedFeb2215:45:432017-0800mm/swap:enableswapslotscacheusagecommit67afa38e012e9581b9b42f2a41dfc56b1280794dAuthor:TimChenDate:WedFeb2215:45:392017-0800mm/swap:addcacheforswapslotsallocationcommit7c00bafee87c7bac7ed9eced7c161f8e5332cb4eAuthor:TimChenDate:WedFeb2215:45:362017-0800mm/swap:freeswapslotsinbatchUtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune201924340608-001UScommit36005bae205da3eef0016a5c96a34f10a68afa1eAuthor:TimChenDate:WedFeb2215:45:332017-0800mm/swap:allocateswapslotsinbatchescommite8c26ab60598558ec3a626e7925b06e7417d7710Author:TimChenDate:WedFeb2215:45:292017-0800mm/swap:skipreadaheadforunreferencedswapslotscommit4b3ef9daa4fc0bba742a79faecb17fdaaead083bAuthor:Huang,YingDate:WedFeb2215:45:262017-0800mm/swap:splitswapcacheinto64MBtrunkscommit235b62176712b970c815923e36b9a9cc05d4d901Author:Huang,YingDate:WedFeb2215:45:222017-0800mm/swap:addclusterlockcommit6a991fc72d1243b8da0c644d3147d3ec41a0b281Author:Huang,YingDate:WedFeb2215:45:192017-0800mm/swap:fixkernelmessageinswap_info_get()commitf6498b3f33123a6ee1c81a1b29b9c07964cb95c1Author:HuangYingDate:FriOct816:59:302016-0700mm:don'tuseradixtreewritebacktagsforpagesinswapcacheD.
1ReferencesSeethefollowinglinksforimportantreferenceinformation.
Mostoftheoriginalpatches:https://kernelnewbies.
org/Linux_4.
11#Memory_managementSecondstepswapoptimizationnotes:https://kernelnewbies.
org/Linux_4.
14#Memory_managementWhitepaperonPMBench(2018):https://www.
semanticscholar.
org/paper/Pmbench%3A-A-Micro-Benchmark-for-Profiling-Paging-on-Yang-Seymour/dd0adcde7d074a414a9df76fb20d52a0d8aa8c71#paper-headerWhitepaperwithdeeperanalysisofpersistentmemory'sapplicabilitytomemorypageaccessperformance:https://web.
cs.
unlv.
edu/jisooy/paper/yang_pmbench.
pdf§

RAKsmart秒杀服务器$30/月,洛杉矶/圣何塞/香港/日本站群特价

RAKsmart发布了9月份优惠促销活动,从9月1日~9月30日期间,爆款美国服务器每日限量抢购最低$30.62-$46/月起,洛杉矶/圣何塞/香港/日本站群大量补货特价销售,美国1-10Gbps大带宽不限流量服务器低价热卖等。RAKsmart是一家华人运营的国外主机商,提供的产品包括独立服务器租用和VPS等,可选数据中心包括美国加州圣何塞、洛杉矶、中国香港、韩国、日本、荷兰等国家和地区数据中心(...

Hostodo(年付12美元),美西斯波坎机房Linux VPS主机66折

Hostodo 商家是比较小众的国外VPS主机商,这不看到商家有推送促销优惠在美国西岸的斯波坎机房还有少部分库存准备通过低价格促销,年付低至12美元Linux VPS主机,且如果是1GB内存方案的可以享受六六折优惠,均是采用KVM架构,且可以支付宝付款。第一、商家优惠码优惠码:spokanessd 1GB+内存方案才可以用到优惠码,其他都是固定的优惠低至年12美元。第二、商家促销这里,我们可以看到...

VirMach:$7.2/年KVM-美元512MB/$7.2/年MB多个机房个机房可选_双线服务器租赁

Virmach对资源限制比较严格,建议查看TOS,自己做好限制,优点是稳定。 vCPU 内存 空间 流量 带宽 IPv4 价格 购买 1 512MB 15GB SSD 500GB 1Gbps 1 $7/VirMach:$7/年/512MB内存/15GB SSD空间/500GB流量/1Gbps端口/KVM/洛杉矶/西雅图/芝加哥/纽约等 发布于 5个月前 (01-05) VirMach,美国老牌、稳...

pagedefrag为你推荐
简体翻译成繁体帮忙把繁体翻译成简体今日热点怎么删除如何彻底删除今日热点童之磊华硕的四核平板电脑,怎么样?腾讯文章腾讯罗剑楠是何许人也?硬盘人500G的硬盘容量是多少啊?vbscript教程请教一下高手们,这个VBS脚本难不难啊,我想学学这个,但是又不知道该从哪入手,希望高手指点指点??人人逛街为什么女人都喜欢逛街?谢谢了,大神帮忙啊bt封杀BT下载被封锁了,怎么办,下载不了电影了!机械键盘轴机械键盘什么轴好,机械键盘轴有几种bluestackbluestacks安卓模拟器有什么用
linux虚拟主机 域名转让 韩国服务器租用 免费申请域名和空间 siteground 京东云擎 铁通流量查询 个人免费空间 vip购优汇 河南m值兑换 php空间推荐 gtt shopex主机 免费私人服务器 河南移动梦网 vul 杭州电信 北京主机托管 phpwind论坛 apachetomcat 更多