expensiveamd裁员

amd裁员  时间:2021-04-02  阅读:()
FAQ1of13AMDAPPSDKv2.
8FAQ1GeneralQuestions1.
DoIneedtouseadditionalsoftwarewiththeSDKTorunanOpenCLapplication,youmusthaveanOpenCLruntimeonyoursystem.
IfyoursystemincludesarecentAMDdiscreteGPU,oranAPU,youalsoshouldinstallthelatestCatalystdrivers,whichcanbedownloadedfromAMD.
com.
Informationonsupporteddevicescanbefoundatdeveloper.
amd.
com/appsdk.
IfyoursystemdoesnotincludearecentAMDdiscreteGPU,orAPU,theSDKinstallsaCPU-onlyOpenCLrun-time.
Also,werecommendusingthedebuggingprofilingandanalysistoolscontainedintheAMDCodeXLheterogeneouscomputetoolssuite.
2.
WhichversionsoftheOpenCLstandarddoesthisSDKsupportAMDAPPSDK2.
8supportsdevelopmentofapplicationsusingtheOpenCLSpecificationv1.
2.
AsallOpenCL1.
1APIsaresupportedwithinOpenCL1.
2,youalsocandevelopOpenCL1.
1-compliantapplications.
3.
WillapplicationsdevelopedtoexecuteonOpenCL1.
1stilloperateinanOpenCL1.
2environmentOpenCLisdesignedtobebackwardscompatible.
TheOpenCL1.
2run-timedeliveredwiththeAMDCatalystdriversrunanyOpenCL1.
1-compliantapplication.
However,anOpenCL1.
2-compliantapplicationwillnotexecuteonanOpenCL1.
1run-timeifAPIsonlysupportedbyOpenCL1.
2areused.
4.
DoesAMDprovideanyadditionalOpenCLsamples,otherthanthosecontainedwithintheSDKThemostrecentversionsofallofthesamplescontainedwithintheSDKarealsoavailableforindividualdownloadfromthedeveloper.
amd.
com/appsdk"Samples&Demos"page.
ThispagealsocontainsadditionalsamplesthateitherweretoolargetoincludeintheSDK,orwhichhavebeendevelopedsincethemostrecentSDKrelease.
Checkthiswebpagefornew,updated,orlargesamples.
5.
HowoftencanIexpecttogetAMDAPPSDKupdatesDeveloperscanexpectthattheAMDAPPSDKmaybeupdatedtwotothreetimesayear.
Actualreleaseintervalsmayvarydependingonavailablenewfeaturesandproductupdates.
AMDiscommittedtoprovidingdeveloperswithregularupdatestoallowthemtotakeadvantageofthelatestdevelopmentsinAMDAPPtechnology.
2of13FAQ6.
WhatisthedifferencebetweentheCPUandGPUcomponentsofOpenCLthatarebundledwiththeAMDAPPSDKTheCPUcomponentusesthecompatibleCPUcoresinyoursystemtoaccelerateyourOpenCLcomputekernels;theGPUcomponentusesthecompatibleGPUcoresinyoursystemtoaccelerateyourOpenCLcomputekernels.
7.
WhatCPUsdoestheAMDAPPSDKv2.
8withOpenCL1.
2supportworkonTheCPUcomponentofOpenCLbundledwiththeAMDAPPSDKworkswithanyx86CPUwithSSE3orlater,aswellasSSE2.
xorlater.
AMDCPUshavesupportedSSE3(andlater)since2005.
SomeexamplesofAMDCPUsthatsupportSSE3(orlater)aretheAMDAthlon64(startingwiththeVenice/SanDiegosteppings),AMDAthlon64X2,AMDAthlon64FX(startingwithSanDiegostepping),AMDOpteron(startingwithE4stepping),AMDSempron(startingwithPalermostepping),AMDPhenom,AMDTurion64,andAMDTurion64X2.
8.
WhatAPUsandGPUsdoestheAMDAPPSDKv2.
8withOpenCL1.
2supportworkonForthelistofsupportedAPUsandGPUs,seetheAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk9.
CanmyOpenCLcoderunonGPUsfromothervendorsAtthistime,AMDdoesnotplantohavetheAMDAPPSDKsupportGPUproductsfromothervendors;however,sinceOpenCLisanindustrystandardprogramminginterface,programswritteninOpenCL1.
2canberecompiledandrunwithanyOpenCL-compliantcompilerandruntime.
10.
WhatversionofMSVisualStudioissupportedTheAMDAPPSDKv2.
8withOpenCL1.
2supportsMicrosoftVisualStudio2008ProfessionalEdition,MicrosoftVisualStudio2010ProfessionalEdition,andMicrosoftVisualStudio2012.
11.
IsitpossibletorunmultipleAMDAPPapplications(computeandgraphics)concurrentlyMultipleAMDAPPapplicationscanberunconcurrently,aslongastheydonotaccessthesameGPUatthesametime.
AMDAPPapplicationsthatattempttoaccessthesameGPUatthesametimeareautomaticallyserializedbytheruntimesystem.
12.
WhichgraphicsdriverisrequiredforthecurrentAMDAPPSDKv2.
8withOpenCL1.
2CPUsupportFortheminimumrequiredgraphicsdriver,seetheAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk.
Ingeneral,itisadvisedthatyouupdateyoursystemtousethemostrecentgraphicsdriversthatareavailableforit.
13.
HowdoesOpenCLcomparetootherAPIsandprogrammingplatformsforparallelcomputing,suchasOpenMPandMPIWhichoneshouldIuseOpenCLisdesignedtotargetparallelismwithinasinglesystemandprovideportabilitytomultipledifferenttypesofdevices(GPUs,multi-coreCPUs,etc.
).
OpenMPtargetsmulti-coreFAQ3of13CPUsandSMPsystems.
MPIisamessagepassingprotocolmostoftenusedforcommunicationbetweennodes;itisapopularparallelprogrammingmodelforclustersofmachines.
Eachprogrammingmodelhasitsadvantages.
ItisanticipatedthatdevelopersmixAPIs,forexampleprogrammingaclusterofmachineswithGPUswithMPIandOpenCL.
14.
IfIwritemycodeontheCPUversion,doesitworkontheGPUversion,ordoIhavetomakechanges.
AssumingthesizelimitationsforCPUsisconsidered,thecodeworksonboththeCPUandGPUcomponents.
Performancetuning,however,isdifferentforeach.
15.
WhatistheprecisionofmathematicaloperationsSeeChapter7,"OpenCLNumericalCompliance,"oftheOpenCL1.
2Specificationforexactmathematicaloperationsprecisionrequirements.
http://developer.
amd.
com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.
aspxID=8816.
Arebyte-addressablestoressupportedByte-addressablestoresaresupported.
17.
ArelongintegerssupportedYes,64-bitintegersaresupported.
18.
AreoperationsonvectorssupportedYes,operationsonvectorsaresupported.
19.
IsswizzlingsupportedYes,swizzling(therearrangingofelementsinavector)issupported.
20.
HowdoesoneverifyifOpenCLhasbeeninstalledcorrectlyonthesystemRunclinfofromthecommand-lineinterface.
TheclinfotoolshowstheavailableOpenCLdevicesonasystem.
21.
HowdoIknowifIhaveinstalledthelatestversionoftheAMDAPPSDKOninstallationoftheSDK,ifaninternetconnectionisavailable,theinstallerstateswhetherornotanewerSDKisavailableinthefileVersionInfo.
txtindirectoryC:\ProgramFiles(x86)\AMDAPP\docs\.
Alternatively,youcancheckforthelatestavailableversionoftheAMDAPPSDKathttp://developer.
amd.
com/appsdk.
2Optimizations22.
HowdoIuseconstantsinakernelforbestperformanceForperformanceusingconstants,highesttolowestperformanceisachievedusing:-Literalvalues-Constantpointerwithcompiletimeconstantsindexing.
-Constantpointerwithruntimeconstantindexingthatisthesameforallthreads.
4of13FAQ-Constantpointerwithlinearaccessindexingthatisthesameforallthreads.
-Constantpointerwithlinearaccessindexingthatisdifferentbetweenthreads.
-Constantpointerwithrandomaccessindexing.
23.
WhyareliteralvaluesthefastestwaytouseconstantsUpto96bitsofliteralvaluesareembeddedintheinstruction;thus,intheory,thereisnolimitonthenumberofusableliterals.
Inpractice,thelimitis16Kuniqueliteralsinacompilationunit.
24.
Whydoesa*b+cnotgenerateamadinstructionDependingonthehardwareandthefloatingpointprecision,thecompilermaynotgenerateamadinstructionforthecomputationofa*b+cduetothefloating-pointprecisionrequirementsintheOpenCLspecification.
Here,developerswhowanttoexploitthemadinstructionforperformancecanreplacethatcomputationwiththemad()built-infunctioninOpenCL.
25.
Whatisthepreferredwork-groupsizeThepreferredwork-groupsizeontheAMDplatformisamultipleof64.
3OpenCLQuestions26.
WhatisOpenCLOpenCL(OpenComputingLanguage)isthefirsttrulyopenandroyalty-freeprogrammingstandardforgeneral-purposecomputationsonheterogeneoussystems.
OpenCLletsprogrammerspreservetheirexpensivesourcecodeinvestmentandeasilytargetbothmulti-coreCPUsandthelatestGPUs,suchasthosefromAMD.
Developedinanopenstandardscommitteewithrepresentativesfrommajorindustryvendors,OpenCLgivesusersacross-vendor,non-proprietarysolutionforacceleratingtheirapplicationsontheirCPUandGPUcores.
27.
HowmuchdoestheAMDOpenCLdevelopmentplatformcostAMDbundlessupportforOpenCLaspartofitsAMDAPPSDKproductoffering.
TheAMDAPPSDKisofferedtodevelopersandusersfreeofcharge.
28.
WhatoperatingsystemsdoestheAMDAPPSDKv2.
8withOpenCL1.
2supportAMDAPPSDKv2.
8runson32-bitand64-bitversionsofWindowsandLinux.
Fortheexactlistofsupportedoperatingsystems,seetheAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk29.
CanIwriteanOpenCLapplicationthatworksonbothCPUandGPUApplicationsthatprogramtothecoreOpenCL1.
2APIandkernellanguageshouldbeabletotargetbothCPUsandGPUs.
Atruntime,theappropriatedevice(CPUorGPU)mustbeselectedbytheapplication.
FAQ5of1330.
DoestheAMDOpenCLcompilerautomaticallyvectorizeforSSEontheCPUTheCPUcomponentofOpenCLthatisbundledwiththeAMDAPPSDKtakesadvantageofSSE3instructionsontheCPU.
ItalsotakesadvantageoftheAVXinstructionswheresupported.
InadditiontoAVX,OpenCLmathlibraryfunctionsalsoleverageXOPandFMA4capabilitiesonCPUsthatsupportthem.
31.
DoestheAMDAPPSDKv2.
8withOpenCL1.
2supportworkonmultipleGPUs(ATICrossFire)OpenCLapplicationscanexplicitlyinvokeseparatecomputekernelsonmultiplecompatibleGPUsinasinglesystem.
Thepartitioningofthealgorithmtomultipleparallelcomputekernelsmustbedonebythedeveloper.
ItisrecommendedthatATICrossFirebeturnedoffinmostsystemconfigurationssothatAMDAPPapplicationscanaccessallavailableGPUsinthesystem.
ATICrossFiretechnologyallowsmultipleAMDGPUstoworktogetheronasinglegraphics-renderingtask.
ThismethoddoesnotapplytoAMDAPPcomputationaltasksbecauseitisnotcompatiblewiththecomputemodelusedforAMDAPPapplications.
32.
CanIshippre-compiledOpenCLapplicationbinariesthatworkoneitherCPUorGPUByusingOpenCLruntimeAPIs,developerscanwriteOpenCLapplicationsthatcandetecttheavailablecompatibleCPUsandGPUsinthesystem.
Thisletsdeveloperspre-compileapplicationsintobinariesthatdynamicallyworkoneitherCPUsorGPUsthatexecuteontargeteddevices.
IncludingLLVMIRinthebinaryprovidesameansforthebinarytosupportdevicesforwhichtheapplicationwasnotexplicitlypre-compiled.
33.
IstheOpenCLdoubleprecisionoptionalextensionsupportedTheKhronosandAMDdoubleprecisionextensionsaresupportedoncertaindevices.
YourapplicationcanusetheOpenCLAPItoqueryifthisfunctionalityissupportedonthedeviceinuse.
34.
IsitpossibletowriteOpenCLcodethatscalestransparentlyovermultipledevicesForOpenCLprogramsthattargetonlymulti-coreCPUs,scalingcanbedonetransparently;however,scalingacrossmultipleGPUsrequiresthedevelopertoexplicitlypartitionthealgorithmintomultiplecomputekernels,aswellasexplicitlylaunchthecomputekernelsontoeachcompatibleGPU.
35.
WhatshouldIdoifIgetwrongresultsontheAppleplatformwithAMDdevicesApplehandlessupportfortheAppleplatform;pleasecontactthem.
36.
IsitpossibletodynamicallyindexintoavectorNo,thisisnotpossiblebecauseavectorisnotanarray,butarepresentationofahardwareregister.
6of13FAQ37.
Whatisthedifferencebetweenlocalinta[4]andinta[4]localinta[4]useshardwarelocalmemory,whichisasmall,low-latency,high-bandwidthmemory;inta[4]usesper-threadhardwarescratchmemory,whichislocatedinuncachedglobalmemory.
38.
Whydoesusingabarriercausethemaxkernelwork-groupsizetodropto64onHD4XXXchipsThesupportedHD4XXXchipsdonothaveahardwarebarrier,sotheOpenCLruntimecannotexecutemorethanasinglewavefrontpergrouptosatisfytheOpenCLmemoryconsistencymodel.
NotethatHD4XXXdevicesupportisEOL.
Catalystdriversnolongerincludesupportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
39.
HowcomemyprogramrunsslowerinOpenCLthaninCUDA/Brook+/ILWhencomparingperformance,itisbettertocomparecodeoptimizedforourOpenCLplatformagainstcodeoptimizedagainstanothervendor'sOpenCLplatforms.
Bycomparingthesametoolchainondifferentvendors,youcanfindoutwhichvendorshardwareworksthebestforyourproblemset.
40.
WhycanInotusetextureonRV7XXdevicesinOpenCLRV7XXdevicesdonotsupportallofthetexturemodesandprecisionrequirementsthatOpenCLrequires.
SincetexturesaremappedtoimagesinOpenCLandisan"allornothing"approach,wedonotsupportimagesonRV7XXdevices;thus,thereisnoaccesstotextures.
NotethatHD4XXXdevicesupportisEOL.
Catalystdriversnolongerincludesupportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
41.
Whydoread-writeimagesnotexistinOpenCLOpenCLhasamemoryconsistencymodelthatrequirescertainconstraints(seetheOpenCLSpecificationformoreinformation).
Sinceimagesarespecialfunctionalhardwareunits,theyaredifferentforreadingandwriting.
Thisisdifferentfrompointers,whichforthemostpartusethesamehardwareunitsandcanguaranteetheconsistencythatOpenCLrequires.
42.
DoesprefetchingworkontheGPUPrefetchisnotneededontheGPUbecausethehardwarehasabuilt-inmechanismtohidelatencywhenmanywork-groupsarerunning.
TheLDScanbeusedasasoftware-controlledcache.
43.
Howdoyoudeterminethemaxnumberofconcurrentwork-groupsThemaximumnumberofconcurrentwork-groupsisdeterminedbyresourceusage.
Thisincludesnumberofregisters,amountofLDSspace,andnumberofthreadsperworkgroup.
Thereisnowaytodirectlyspecifythenumberofregistersusedbyakernel.
EachSIMDhasa64-wideregisterfile,witheachcolumnconsistingof256x32x4registers.
44.
IsitpossibletotellOpenCLnottouseallCPUCoresYes,usethedevicefissionextension.
FAQ7of134OpenCLOptimizations45.
Whatismoreefficient,theternaryoperator:ortheselectfunctionTheselectfunctioncompilestothesinglecycleinstruction,cmov_logical;inmostcases,:alsocompilestothesameinstruction.
Insomecases,whenmemoryisinoneoftheoperands,the:operatoriscompileddowntoanIF/ELSEblock.
AnIF/ELSEblocktakesmorethanasingleinstructiontoexecute.
46.
Whatisthedifferencebetween24-bitand32-bitintegeroperations24-bitoperationsarefasterbecausetheyusefloatingpointhardwareandcanexecuteonallcomputeunits.
Many32-bitintegeroperationsalsorunonallstreamprocessors,butifbotha24-bitanda32-bitversionexistforthesameinstruction,the32-bitinstructionexecutesonlyonepercycle.
5HardwareInformation47.
Howare8/16-bitoperationshandledinhardwareThe8/16-bitoperationsareemulatedwith32-bitregisters.
48.
Do24-bitintegersexistinhardwareNo,thereare24-bitinstructions,suchasMUL24/MAD24,butthesmallestintegerinhardwareregistersis32-bits.
49.
Whatarethebenefitsofusing8/16-bittypesover32-bitintegers8/16-bittypestakelessmemorythana32-bitintegertype,increasingtheamountofdatayouareabletoloadwithasingleinstruction.
TheOpenCLcompilerup-convertsto32-bitsonloadanddown-convertstothecorrecttypeonstore.
50.
WhatisthedifferencebetweenaGPRandasharedregisterAlthoughtheyarephysicallyequivalent,thedifferenceiswhethertheregisteroffsetinthehardwareisabsolutetotheregisterfileorrelativetothewavefrontID.
51.
HowoftenarewavefrontscreatedWavefrontsarecreatedbythehardwaretoexecuteaslongasresourcesareavailable.
Iftheyarecreatedbutcannotexecuteimmediately,theyareputinawaitqueuewheretheystayuntilcurrentlyrunningwavefrontsarefinished.
52.
WhatisthemaximumnumberofwavefrontsThemaximumnumberofwavefrontsisdeterminedbywhichresourcelimitsthenumberofwavefrontsthatcanbespawned.
Thiscanbethenumberofregisters,amountoflocalmemory,requiredstacksize,orotherfactors.
Computeshaderwithlocalmemoryusagehasahardcapat16wavefronts.
8of13FAQ53.
WhydoIgetblueorblackscreenswhenexecutinglongerrunningkernelsTheGPUisnotapreemptabledevice.
IfyouarerunningtheGPUasyourdisplaydevice,ensurethatacomputeprogramdoesnotusetheGPUpastacertaintimelimitsetbyWindows.
Exceedingthetimelimitcausesthewatchdogtimertotrigger;thiscanresultinundefinedprogramresults.
54.
WhatisthecostofaclauseswitchIngeneral,thelatencyofaclauseswitchisaround40cycles.
NotethatthisisrelevantonlyforEvergreenandNorthernIslanddevices.
55.
HowcanIhideclauseswitchlatencyByexecutingmultiplewavefrontsinparallel.
NotethatthisisrelevantonlyforEvergreenandNorthernIslanddevices.
56.
HowcanIreduceclauseswitchesClauseswitchesarealmostdirectlyrelatedtosourceprogramcontrolflow.
Byreducingsourceprogramcontrolflow,clauseswitchescanalsobereduced.
ThisisonlyrelevantforEvergreenandNorthernIslandsdevices.
57.
HowdoesthehardwareexecutealooponawavefrontThelooponlyendsexecutionforawavefrontonceeverythreadinthewavefrontbreaksoutoftheloop.
Onceathreadbreaksoutoftheloop,allofitsexecutionresultsaremasked,buttheexecutionstilloccurs.
58.
HowdoesflowcontrolworkwithwavefrontsTherearenoflowcontrolunitsforeachindividualthread,sothewholewavefrontmustexecutethebranchifanythreadinthewavefrontexecutesthebranch.
Iftheconditionisfalse,theresultsarenotwrittentomemory,buttheexecutionstilloccurs.
59.
WhatistheconstantbuffersizeonGPUhardware64kB.
60.
Whathappenswithout-of-boundmemoryoperationsWritesaredropped,andreadsreturnapre-definedvalue.
61.
For7XXdevices,whydoes64x1givebadperformanceinOpenCLkernelsOneofthereasonsisbecauseofhowthecachesaresetuponRV7XXdevices.
Thecachesareoptimizedtoworkinatiledmode,notinlinearmode(whichisthemodeOpenCLkernelsuse).
Togetoptimalcachere-usefromthetextureincomputeshadermodeonRV7XXdevices,reblockyourthreadIDs.
A16x4,8x8,or4x16shouldgiveyougoodenoughblockingtogetsimilarcacheperformanceasyourpixelshaderkernel.
Thisisbecauseacachelinecanbethoughtofasa4x2blockofdatacominginatonce.
So,forpixelshaders,64threadsareblockedina8x8blockthatusesexactlyeightcachelines.
ForOpenCLkernels,your64x1blockpatternuses16cachelines,butonlyuseshalfthedataineachcacheline.
NotethatHD4XXXdevicesupportisEOL.
CatalystdriversnolongerincludeFAQ9of13supportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
62.
WhatisuniqueabouttheLDSinHD4XXXdevices,andwhatareitsperformancecharacteristicsTheLDSintheHD4XXXdevicesisanowner'swritemodelwithlimitedapplications.
Whenusedcorrectly,ithasverysimilarperformancecharacteristicstotheL1cache,buttheusergainscontroloverwhatdataexistsinthememory.
TheLDS_TransposesampleintheSDKusestheLDSintheHD4XXXdevicesveryefficiently.
NotethatHD4XXXdevicesupportisEOL.
Catalystdriversnolongerincludesupportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
6MicrosoftVisualStudio63.
CanIusetheSDKwithMicrosoftVisualStudio2012ExpressDuetolimitationsinMicrosoftVisualStudioExpress,itisonlypossibletousebuildfilesforindividualsamples.
MicrosoftVisualStudioExpressdoesnotsupportbuildingofallofthesamplesatthesametime.
TheprojectfilesthatbuildallofthesamplesareonlysupportedbyfullversionsofMicrosoftVisualStudio2008,2010,or2012.
7Bolt64.
WhatGPUsisBoltoptimizedforBoltisoptimizedforAMD"SouthernIslands"familyofGPUs.
65.
WhichGPUcomputetechnologiesaresupportedbyBoltThepreviewversionofBoltusesOpenCLasitscomputeback-end;however,futurereleasesshouldsupportbothC++AMPandOpenCLcomputeback-ends.
66.
AretheBoltAPIsfinalTheAPIsintheBoltpreviewarenotfinal.
TheBoltAPIshouldbestableandshoulduseformaldeprecationproceduresfromBoltrelease1.
0onwards.
PleasehelpustoimproveBolt;weencourageyoutoprovidesuggestionsonhowwecanimprovetheBoltAPI.
67.
DoesBoltrequirelinkingtoalibraryYes,Boltincludesasmall,staticlibrarythatmustbelinkedintoauserprogramtoproperlyresolvesymbols.
SinceBoltisatemplatelibrary,almostallfunctionalityiswritteninheaderfiles,butOpenCLfollowsanonlinecompilationmodelwhichmakessensetoincludeinalibraryandnotinheaderfiles.
68.
DoesBoltdependonanyotherlibrariesYes,BoltcontainsdependenciesonBoost.
Alldependentheaderfilesandpre-compiledlibrariesareavailableintheBoltSDK.
10of13FAQ69.
WhatalgorithmsdoesBoltcurrentlysupportTransform,reduce,transform_reduce,count,count_if,inclusive_scan,exclusive_scan,andsort.
70.
WhatversionofOpenCLdoesBoltcurrentlyrequireBoltusesfeaturesavailableinOpenCLv1.
2.
71.
DoesBoltrequiretheAMDOpenCLruntimeYes,Boltrequirestheuseoftemplatesinkernelcode.
Atthetimeofthiswriting,AMDistheonlyvendortoprovidethissupport.
72.
WhichCatalystpackageshouldIusewithBoltGenerallyspeaking,downloadingandinstallingthelatestCatalystpackagecontainsthemostrecentOpenCLruntime.
Asofthetimeofthiswriting,therecommendedCatalystpackageis12.
10.
73.
WhichlicenseisBoltlicensedunderTheApacheLicense,Version2.
0.
74.
WhenshouldIusedevice_vectorvsregularhostmemorybolt::cl::device_vectorisusedtomanagedevice-localmemoryandcandeliverhigherperformanceondiscreteGPUsystem.
However,thehostmemoryinterfaceseliminatetheneedtocreateandmanagedevice_vectors.
Ifmemoryisre-usedacrossmultipleBoltcallsorisreferencedbyotherkernels,usingdevice_vectordelivershigherperformance75.
HowdoImeasuretheperformanceofOpenCLBoltlibrarycallsThefirsttimethataBoltlibraryroutineiscalled,theruntimecallstheOpenCLcompilertocompilethekernel.
EachuniquetemplateinstantiationreusesthesameOpenCLcompilation,sotwoBoltcallswiththesamefunctorsarecompiledonlyonetime.
WhenmeasuringtheperformanceofBolt,itisimportanttoexcludethefirstcall(withthecompilationoverhead)fromthetimingmeasurement.
HerearetwowaystotimeBoltfunctioncalls:Thefirstexamplepullsthecalloutsidethetimerloop.
std::vectora;bolt::cl::transform(a.
begin,a.
end,z.
begin(),bolt::cl::negate);startTimer(.
.
.
);for(inti=0;i);}stopTimer(.
.
.
);Asecondalternativeistoexplicitlyexcludethefirstcompilationfromthetimingregion://Explicitlystarttimerafterfirstiteration:std::vectora,z;for(inti=0;i);FAQ11of13}stopTimer(.
.
.
);76.
TheBoltlibraryAPIsaccepthostdatastructuressuchasstd::vector().
ArethesecopiedtotheGPUorkeptinhostmemoryForhostdatastructures,BoltcreatesanOpenCLbufferwiththeCL_MEM_USE_HOST_PTRflagandusestheOpenCLmapandunmapAPIstomanagethehostanddeviceaccess.
OntheAMDOpenCLimplementation,thisresultsindatabeingkeptinhostmemorywhenthehostmemoryisalignedon256-byteboundary.
Inothercases,theruntimecopiesthedataasneededtotheGPU.
Seesection4.
5.
2oftheAMDAcceleratedParallelProcessingOpenCLProgrammingGuideformoreinformationonmemoryoptimization.
Higher-performancemaybeobtainedbyaligningthehostdatastructuresto256-byteboundariesorbyusingthedevice_vectorclassforfinercontrolovertheallocationproperties.
77.
BoltAPIsalsoacceptdevice_vectorinputs.
Whenshouldusedevice_vectorsformyinputsThebolt::cl::device_vectorinterfaceisathinwrapperaroundthecl::Bufferclass;itprovidesaninterfacesimilartostd::vector.
Also,thedevice_vectoroptionallyacceptsaflagsargumentthatispassedtotheOpenCLbuffercreation.
Thefollowingtwoflagscanbeuseful:CL_MEM_ALLOC_HOST_POINTER:Forcesmemorytobeallocatedinhostmemory,evenondiscreteGPUs.
GPUsaccesstothedataisslower(overthePCIebus),buttheallocationavoidstheoverheadofcopyingdatato,andfrom,theGPU.
ThiscanbeusefulwhenthedataisusedinonlyasingleBoltcallbeforebeingprocessedagainontheCPU.
CL_MEM_USE_PERSISTENT_MEM_AMD:Providesdevice-residentzerocopymemory.
Usethisfordatathatiswrite-onlybytheCPU.
Seesection4.
5.
2oftheAMDAcceleratedParallelProcessingOpenCLProgrammingGuideformoreinformation.
8C++AMP78.
WhatisC++AMPC++AcceleratedMassiveParallelism(C++AMP)acceleratesexecutionofC++codebytakingadvantageofdata-parallelhardware,suchasthecomputeunitsonandAPU.
C++AMPisanopenspecificationthatextendsregularC++throughtheuseoftherestrictkeywordtodenoteportionsofcodetobeexecutedontheaccelerateddevice.
79.
WhatAMDdevicessupportC++AMPAnyAMDAPUorGPUthatsupportsMicrosoftDirectX11FeatureLevel11.
0andhigher.
80.
CanyourunaC++AMPprogramonaCPUYoucanrunaC++AMPprogramonaCPU,butitsusageisnotrecommendedinaproductionenvironment.
Formoreinformation,see:http://blogs.
msdn.
com/b/nativeconcurrency/archive/2012/03/10/cpu-accelerator-in-c-amp.
aspx81.
WherecanIgetsupportforC++AMPC++AMPissupportedbytheC++compilerinMicrosoftVisualStudio2012.
12of13FAQ82.
IsC++AMPsupportedonLinuxAtthistime,C++AMPisnotsupportedonLinux;however,asC++AMPisdefinedasanopenstandard,MicrosofthasopenedthedoorforimplementationsonoperatingsystemsotherthanWindows.
83.
WherecanIlearnmoreaboutC++AMPThereisanMSDNpageinC++AMP.
Also,thebookC++AMP:AcceleratedMassiveParallelismwithMicrosoftVisualC++,byKateGregoryandAdeMiller,maybeuseful.
9Aparapi84.
WhatisAparapiAparapiisanAPIforexpressingdataparallelalgorithmsinJavaandaruntimecomponentcapableofconvertingJavabytecodetoOpenCLforexecutionontheGPU.
IfAparapicannotexecuteontheGPUatruntime,itexecutesthedeveloper'salgorithminaJavathreadpool.
Forappropriateworkloads,thisextendsJava's'WriteOnceRunAnywhere'toincludeGPUdevices.
"85.
WhatAMDdevicessupportAparapiAnyAMDAPU,CPU,orGPUthatsupportsOpenCL1.
1andhigherwillwork.
SeethelistofAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk.
86.
WherecanIgetsupportforAparapiSee:http://code.
google.
com/p/aparapi/87.
IsAparapisupportedonLinuxYes.
88.
WhichJDK/JREisrecommendedforAparapiTheOracleJDK/JRE,althoughnotarequirementforAparapi,ishighlyrecommendedforperformanceandforstabilityreasons.
89.
WherecanIlearnmoreaboutAparapiYoucanfindoutmoreaboutAparapifrom:http://code.
google.
com/p/aparapi/AMD'sproductsarenotdesigned,intended,authorizedorwarrantedforuseascomponentsinsystemsintendedforsurgicalimplantintothebody,orinotherapplicationsintendedtosupportorsustainlife,orinanyotherapplicationinwhichthefailureofAMD'sproductcouldcreateasituationwherepersonalinjury,death,orseverepropertyorenvironmentaldamagemayoccur.
AMDreservestherighttodiscontinueormakechangestoitsproductsatanytimewithoutnotice.
CopyrightandTrademarks2012AdvancedMicroDevices,Inc.
Allrightsreserved.
AMD,theAMDArrowlogo,ATI,theATIlogo,Radeon,FireStream,andcombinationsthereofaretrade-marksofAdvancedMicroDevices,Inc.
OpenCLandtheOpenCLlogoaretrade-marksofAppleInc.
usedbypermissionbyKhronos.
Othernamesareforinfor-mationalpurposesonlyandmaybetrademarksoftheirrespectiveowners.
ThecontentsofthisdocumentareprovidedinconnectionwithAdvancedMicroDevices,Inc.
("AMD")products.
AMDmakesnorepresentationsorwarrantieswithrespecttotheaccuracyorcompletenessofthecontentsofthispublicationandreservestherighttomakechangestospecificationsandproductdescriptionsatanytimewithoutnotice.
Theinformationcontainedhereinmaybeofapreliminaryoradvancenatureandissubjecttochangewithoutnotice.
Nolicense,whetherexpress,implied,arisingbyestoppelorotherwise,toanyintellectualpropertyrightsisgrantedbythispublication.
ExceptassetforthinAMD'sStandardTermsandConditionsofSale,AMDassumesnoliabilitywhatsoever,anddisclaimsanyexpressorimpliedwar-ranty,relatingtoitsproductsincluding,butnotlimitedto,theimpliedwar-rantyofmerchantability,fitnessforaparticularpurpose,orinfringementofanyintellectualpropertyright.
ContactAdvancedMicroDevices,Inc.
OneAMDPlaceP.
O.
Box3453Sunnyvale,CA,94088-3453Phone:+1.
408.
749.
400013of13FAQForAMDAcceleratedParallelProcessing:URL:developer.
amd.
com/appsdkDeveloping:developer.
amd.
com/Forum:developer.
amd.
com/openclforum

Webhosting24:$1.48/月起,日本东京NTT直连/AMD Ryzen 高性能VPS/美国洛杉矶5950X平台大流量VPS/1Gbps端口/

Webhosting24宣布自7月1日起开始对日本机房的VPS进行NVMe和流量大升级,几乎是翻倍了硬盘和流量,价格依旧不变。目前来看,日本VPS国内过去走的是NTT直连,服务器托管机房应该是CDN77*(也就是datapacket.com),加上高性能平台(AMD Ryzen 9 3900X+NVMe),还是有相当大的性价比的。此外在6月30日,又新增了洛杉矶机房,CPU为AMD Ryzen 9...

CloudCone,美国洛杉矶独立服务器特价优惠,美国洛杉矶MC机房,100Mbps带宽不限流量,可选G口,E3-1270 v2处理器32G内存1Gbps带宽,69美元/月

今天CloudCone发布了最新的消息,推送了几款特价独立服务器/杜甫产品,美国洛杉矶MC机房,分配100Mbps带宽不限流量,可以选择G口限制流量计划方案,存储分配的比较大,选择HDD硬盘的话2TB起,MC机房到大陆地区线路还不错,有需要美国特价独立服务器的朋友可以关注一下。CloudCone怎么样?CloudCone服务器好不好?CloudCone值不值得购买?CloudCone是一家成立于2...

Megalayer(48元)新增 美国CN2优化线路特价服务器和VPS方案

Megalayer 商家算是新晋的服务商,商家才开始的时候主要是以香港、美国独立服务器。后来有新增菲律宾机房,包括有VPS云服务器、独立服务器、站群服务器等产品。线路上有CN2优化带宽、全向带宽和国际带宽,这里有看到商家的特价方案有增加至9个,之前是四个的。在这篇文章中,我来整理看看。第一、香港服务器系列这里香港服务器会根据带宽的不同区别。我这里将香港机房的都整理到一个系列里。核心内存硬盘IP带宽...

amd裁员为你推荐
关键字编程中,什么是关键字地陷裂口山崩地裂的意思www.haole012.com012qq.com真的假的ip在线查询我要用eclipse做个ip在线查询功能,用QQwry数据库,可是我不知道怎么把这个数据库放到我的程序里面去,高手帮忙指点下,小弟在这谢谢了www.765.com有没好的学习网站www.niuav.com在那能找到免费高清电影网站呢 ?www.kanav001.com翻译为日文: 主人,请你收养我一天吧. 带上罗马音标会更好wwwm.kan84.net电视剧海派甜心全集海派甜心在线观看海派甜心全集高清dvd快播迅雷下载抓站工具仿站必备软件有哪些工具?最好好用的仿站工具是那个几个?www.javlibrary.com跪求一个JAVHD.com的帐号
已备案域名注册 绍兴服务器租用 新网域名解析 瓦工 kvmla 分销主机 linode代购 服务器日志分析 parseerror 新站长网 nerds php空间购买 idc查询 网站在线扫描 raid10 yundun 沈阳主机托管 永久免费空间 广州主机托管 fatcow 更多