expensiveamd裁员
amd裁员 时间:2021-04-02 阅读:(
)
FAQ1of13AMDAPPSDKv2.
8FAQ1GeneralQuestions1.
DoIneedtouseadditionalsoftwarewiththeSDKTorunanOpenCLapplication,youmusthaveanOpenCLruntimeonyoursystem.
IfyoursystemincludesarecentAMDdiscreteGPU,oranAPU,youalsoshouldinstallthelatestCatalystdrivers,whichcanbedownloadedfromAMD.
com.
Informationonsupporteddevicescanbefoundatdeveloper.
amd.
com/appsdk.
IfyoursystemdoesnotincludearecentAMDdiscreteGPU,orAPU,theSDKinstallsaCPU-onlyOpenCLrun-time.
Also,werecommendusingthedebuggingprofilingandanalysistoolscontainedintheAMDCodeXLheterogeneouscomputetoolssuite.
2.
WhichversionsoftheOpenCLstandarddoesthisSDKsupportAMDAPPSDK2.
8supportsdevelopmentofapplicationsusingtheOpenCLSpecificationv1.
2.
AsallOpenCL1.
1APIsaresupportedwithinOpenCL1.
2,youalsocandevelopOpenCL1.
1-compliantapplications.
3.
WillapplicationsdevelopedtoexecuteonOpenCL1.
1stilloperateinanOpenCL1.
2environmentOpenCLisdesignedtobebackwardscompatible.
TheOpenCL1.
2run-timedeliveredwiththeAMDCatalystdriversrunanyOpenCL1.
1-compliantapplication.
However,anOpenCL1.
2-compliantapplicationwillnotexecuteonanOpenCL1.
1run-timeifAPIsonlysupportedbyOpenCL1.
2areused.
4.
DoesAMDprovideanyadditionalOpenCLsamples,otherthanthosecontainedwithintheSDKThemostrecentversionsofallofthesamplescontainedwithintheSDKarealsoavailableforindividualdownloadfromthedeveloper.
amd.
com/appsdk"Samples&Demos"page.
ThispagealsocontainsadditionalsamplesthateitherweretoolargetoincludeintheSDK,orwhichhavebeendevelopedsincethemostrecentSDKrelease.
Checkthiswebpagefornew,updated,orlargesamples.
5.
HowoftencanIexpecttogetAMDAPPSDKupdatesDeveloperscanexpectthattheAMDAPPSDKmaybeupdatedtwotothreetimesayear.
Actualreleaseintervalsmayvarydependingonavailablenewfeaturesandproductupdates.
AMDiscommittedtoprovidingdeveloperswithregularupdatestoallowthemtotakeadvantageofthelatestdevelopmentsinAMDAPPtechnology.
2of13FAQ6.
WhatisthedifferencebetweentheCPUandGPUcomponentsofOpenCLthatarebundledwiththeAMDAPPSDKTheCPUcomponentusesthecompatibleCPUcoresinyoursystemtoaccelerateyourOpenCLcomputekernels;theGPUcomponentusesthecompatibleGPUcoresinyoursystemtoaccelerateyourOpenCLcomputekernels.
7.
WhatCPUsdoestheAMDAPPSDKv2.
8withOpenCL1.
2supportworkonTheCPUcomponentofOpenCLbundledwiththeAMDAPPSDKworkswithanyx86CPUwithSSE3orlater,aswellasSSE2.
xorlater.
AMDCPUshavesupportedSSE3(andlater)since2005.
SomeexamplesofAMDCPUsthatsupportSSE3(orlater)aretheAMDAthlon64(startingwiththeVenice/SanDiegosteppings),AMDAthlon64X2,AMDAthlon64FX(startingwithSanDiegostepping),AMDOpteron(startingwithE4stepping),AMDSempron(startingwithPalermostepping),AMDPhenom,AMDTurion64,andAMDTurion64X2.
8.
WhatAPUsandGPUsdoestheAMDAPPSDKv2.
8withOpenCL1.
2supportworkonForthelistofsupportedAPUsandGPUs,seetheAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk9.
CanmyOpenCLcoderunonGPUsfromothervendorsAtthistime,AMDdoesnotplantohavetheAMDAPPSDKsupportGPUproductsfromothervendors;however,sinceOpenCLisanindustrystandardprogramminginterface,programswritteninOpenCL1.
2canberecompiledandrunwithanyOpenCL-compliantcompilerandruntime.
10.
WhatversionofMSVisualStudioissupportedTheAMDAPPSDKv2.
8withOpenCL1.
2supportsMicrosoftVisualStudio2008ProfessionalEdition,MicrosoftVisualStudio2010ProfessionalEdition,andMicrosoftVisualStudio2012.
11.
IsitpossibletorunmultipleAMDAPPapplications(computeandgraphics)concurrentlyMultipleAMDAPPapplicationscanberunconcurrently,aslongastheydonotaccessthesameGPUatthesametime.
AMDAPPapplicationsthatattempttoaccessthesameGPUatthesametimeareautomaticallyserializedbytheruntimesystem.
12.
WhichgraphicsdriverisrequiredforthecurrentAMDAPPSDKv2.
8withOpenCL1.
2CPUsupportFortheminimumrequiredgraphicsdriver,seetheAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk.
Ingeneral,itisadvisedthatyouupdateyoursystemtousethemostrecentgraphicsdriversthatareavailableforit.
13.
HowdoesOpenCLcomparetootherAPIsandprogrammingplatformsforparallelcomputing,suchasOpenMPandMPIWhichoneshouldIuseOpenCLisdesignedtotargetparallelismwithinasinglesystemandprovideportabilitytomultipledifferenttypesofdevices(GPUs,multi-coreCPUs,etc.
).
OpenMPtargetsmulti-coreFAQ3of13CPUsandSMPsystems.
MPIisamessagepassingprotocolmostoftenusedforcommunicationbetweennodes;itisapopularparallelprogrammingmodelforclustersofmachines.
Eachprogrammingmodelhasitsadvantages.
ItisanticipatedthatdevelopersmixAPIs,forexampleprogrammingaclusterofmachineswithGPUswithMPIandOpenCL.
14.
IfIwritemycodeontheCPUversion,doesitworkontheGPUversion,ordoIhavetomakechanges.
AssumingthesizelimitationsforCPUsisconsidered,thecodeworksonboththeCPUandGPUcomponents.
Performancetuning,however,isdifferentforeach.
15.
WhatistheprecisionofmathematicaloperationsSeeChapter7,"OpenCLNumericalCompliance,"oftheOpenCL1.
2Specificationforexactmathematicaloperationsprecisionrequirements.
http://developer.
amd.
com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.
aspxID=8816.
Arebyte-addressablestoressupportedByte-addressablestoresaresupported.
17.
ArelongintegerssupportedYes,64-bitintegersaresupported.
18.
AreoperationsonvectorssupportedYes,operationsonvectorsaresupported.
19.
IsswizzlingsupportedYes,swizzling(therearrangingofelementsinavector)issupported.
20.
HowdoesoneverifyifOpenCLhasbeeninstalledcorrectlyonthesystemRunclinfofromthecommand-lineinterface.
TheclinfotoolshowstheavailableOpenCLdevicesonasystem.
21.
HowdoIknowifIhaveinstalledthelatestversionoftheAMDAPPSDKOninstallationoftheSDK,ifaninternetconnectionisavailable,theinstallerstateswhetherornotanewerSDKisavailableinthefileVersionInfo.
txtindirectoryC:\ProgramFiles(x86)\AMDAPP\docs\.
Alternatively,youcancheckforthelatestavailableversionoftheAMDAPPSDKathttp://developer.
amd.
com/appsdk.
2Optimizations22.
HowdoIuseconstantsinakernelforbestperformanceForperformanceusingconstants,highesttolowestperformanceisachievedusing:-Literalvalues-Constantpointerwithcompiletimeconstantsindexing.
-Constantpointerwithruntimeconstantindexingthatisthesameforallthreads.
4of13FAQ-Constantpointerwithlinearaccessindexingthatisthesameforallthreads.
-Constantpointerwithlinearaccessindexingthatisdifferentbetweenthreads.
-Constantpointerwithrandomaccessindexing.
23.
WhyareliteralvaluesthefastestwaytouseconstantsUpto96bitsofliteralvaluesareembeddedintheinstruction;thus,intheory,thereisnolimitonthenumberofusableliterals.
Inpractice,thelimitis16Kuniqueliteralsinacompilationunit.
24.
Whydoesa*b+cnotgenerateamadinstructionDependingonthehardwareandthefloatingpointprecision,thecompilermaynotgenerateamadinstructionforthecomputationofa*b+cduetothefloating-pointprecisionrequirementsintheOpenCLspecification.
Here,developerswhowanttoexploitthemadinstructionforperformancecanreplacethatcomputationwiththemad()built-infunctioninOpenCL.
25.
Whatisthepreferredwork-groupsizeThepreferredwork-groupsizeontheAMDplatformisamultipleof64.
3OpenCLQuestions26.
WhatisOpenCLOpenCL(OpenComputingLanguage)isthefirsttrulyopenandroyalty-freeprogrammingstandardforgeneral-purposecomputationsonheterogeneoussystems.
OpenCLletsprogrammerspreservetheirexpensivesourcecodeinvestmentandeasilytargetbothmulti-coreCPUsandthelatestGPUs,suchasthosefromAMD.
Developedinanopenstandardscommitteewithrepresentativesfrommajorindustryvendors,OpenCLgivesusersacross-vendor,non-proprietarysolutionforacceleratingtheirapplicationsontheirCPUandGPUcores.
27.
HowmuchdoestheAMDOpenCLdevelopmentplatformcostAMDbundlessupportforOpenCLaspartofitsAMDAPPSDKproductoffering.
TheAMDAPPSDKisofferedtodevelopersandusersfreeofcharge.
28.
WhatoperatingsystemsdoestheAMDAPPSDKv2.
8withOpenCL1.
2supportAMDAPPSDKv2.
8runson32-bitand64-bitversionsofWindowsandLinux.
Fortheexactlistofsupportedoperatingsystems,seetheAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk29.
CanIwriteanOpenCLapplicationthatworksonbothCPUandGPUApplicationsthatprogramtothecoreOpenCL1.
2APIandkernellanguageshouldbeabletotargetbothCPUsandGPUs.
Atruntime,theappropriatedevice(CPUorGPU)mustbeselectedbytheapplication.
FAQ5of1330.
DoestheAMDOpenCLcompilerautomaticallyvectorizeforSSEontheCPUTheCPUcomponentofOpenCLthatisbundledwiththeAMDAPPSDKtakesadvantageofSSE3instructionsontheCPU.
ItalsotakesadvantageoftheAVXinstructionswheresupported.
InadditiontoAVX,OpenCLmathlibraryfunctionsalsoleverageXOPandFMA4capabilitiesonCPUsthatsupportthem.
31.
DoestheAMDAPPSDKv2.
8withOpenCL1.
2supportworkonmultipleGPUs(ATICrossFire)OpenCLapplicationscanexplicitlyinvokeseparatecomputekernelsonmultiplecompatibleGPUsinasinglesystem.
Thepartitioningofthealgorithmtomultipleparallelcomputekernelsmustbedonebythedeveloper.
ItisrecommendedthatATICrossFirebeturnedoffinmostsystemconfigurationssothatAMDAPPapplicationscanaccessallavailableGPUsinthesystem.
ATICrossFiretechnologyallowsmultipleAMDGPUstoworktogetheronasinglegraphics-renderingtask.
ThismethoddoesnotapplytoAMDAPPcomputationaltasksbecauseitisnotcompatiblewiththecomputemodelusedforAMDAPPapplications.
32.
CanIshippre-compiledOpenCLapplicationbinariesthatworkoneitherCPUorGPUByusingOpenCLruntimeAPIs,developerscanwriteOpenCLapplicationsthatcandetecttheavailablecompatibleCPUsandGPUsinthesystem.
Thisletsdeveloperspre-compileapplicationsintobinariesthatdynamicallyworkoneitherCPUsorGPUsthatexecuteontargeteddevices.
IncludingLLVMIRinthebinaryprovidesameansforthebinarytosupportdevicesforwhichtheapplicationwasnotexplicitlypre-compiled.
33.
IstheOpenCLdoubleprecisionoptionalextensionsupportedTheKhronosandAMDdoubleprecisionextensionsaresupportedoncertaindevices.
YourapplicationcanusetheOpenCLAPItoqueryifthisfunctionalityissupportedonthedeviceinuse.
34.
IsitpossibletowriteOpenCLcodethatscalestransparentlyovermultipledevicesForOpenCLprogramsthattargetonlymulti-coreCPUs,scalingcanbedonetransparently;however,scalingacrossmultipleGPUsrequiresthedevelopertoexplicitlypartitionthealgorithmintomultiplecomputekernels,aswellasexplicitlylaunchthecomputekernelsontoeachcompatibleGPU.
35.
WhatshouldIdoifIgetwrongresultsontheAppleplatformwithAMDdevicesApplehandlessupportfortheAppleplatform;pleasecontactthem.
36.
IsitpossibletodynamicallyindexintoavectorNo,thisisnotpossiblebecauseavectorisnotanarray,butarepresentationofahardwareregister.
6of13FAQ37.
Whatisthedifferencebetweenlocalinta[4]andinta[4]localinta[4]useshardwarelocalmemory,whichisasmall,low-latency,high-bandwidthmemory;inta[4]usesper-threadhardwarescratchmemory,whichislocatedinuncachedglobalmemory.
38.
Whydoesusingabarriercausethemaxkernelwork-groupsizetodropto64onHD4XXXchipsThesupportedHD4XXXchipsdonothaveahardwarebarrier,sotheOpenCLruntimecannotexecutemorethanasinglewavefrontpergrouptosatisfytheOpenCLmemoryconsistencymodel.
NotethatHD4XXXdevicesupportisEOL.
Catalystdriversnolongerincludesupportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
39.
HowcomemyprogramrunsslowerinOpenCLthaninCUDA/Brook+/ILWhencomparingperformance,itisbettertocomparecodeoptimizedforourOpenCLplatformagainstcodeoptimizedagainstanothervendor'sOpenCLplatforms.
Bycomparingthesametoolchainondifferentvendors,youcanfindoutwhichvendorshardwareworksthebestforyourproblemset.
40.
WhycanInotusetextureonRV7XXdevicesinOpenCLRV7XXdevicesdonotsupportallofthetexturemodesandprecisionrequirementsthatOpenCLrequires.
SincetexturesaremappedtoimagesinOpenCLandisan"allornothing"approach,wedonotsupportimagesonRV7XXdevices;thus,thereisnoaccesstotextures.
NotethatHD4XXXdevicesupportisEOL.
Catalystdriversnolongerincludesupportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
41.
Whydoread-writeimagesnotexistinOpenCLOpenCLhasamemoryconsistencymodelthatrequirescertainconstraints(seetheOpenCLSpecificationformoreinformation).
Sinceimagesarespecialfunctionalhardwareunits,theyaredifferentforreadingandwriting.
Thisisdifferentfrompointers,whichforthemostpartusethesamehardwareunitsandcanguaranteetheconsistencythatOpenCLrequires.
42.
DoesprefetchingworkontheGPUPrefetchisnotneededontheGPUbecausethehardwarehasabuilt-inmechanismtohidelatencywhenmanywork-groupsarerunning.
TheLDScanbeusedasasoftware-controlledcache.
43.
Howdoyoudeterminethemaxnumberofconcurrentwork-groupsThemaximumnumberofconcurrentwork-groupsisdeterminedbyresourceusage.
Thisincludesnumberofregisters,amountofLDSspace,andnumberofthreadsperworkgroup.
Thereisnowaytodirectlyspecifythenumberofregistersusedbyakernel.
EachSIMDhasa64-wideregisterfile,witheachcolumnconsistingof256x32x4registers.
44.
IsitpossibletotellOpenCLnottouseallCPUCoresYes,usethedevicefissionextension.
FAQ7of134OpenCLOptimizations45.
Whatismoreefficient,theternaryoperator:ortheselectfunctionTheselectfunctioncompilestothesinglecycleinstruction,cmov_logical;inmostcases,:alsocompilestothesameinstruction.
Insomecases,whenmemoryisinoneoftheoperands,the:operatoriscompileddowntoanIF/ELSEblock.
AnIF/ELSEblocktakesmorethanasingleinstructiontoexecute.
46.
Whatisthedifferencebetween24-bitand32-bitintegeroperations24-bitoperationsarefasterbecausetheyusefloatingpointhardwareandcanexecuteonallcomputeunits.
Many32-bitintegeroperationsalsorunonallstreamprocessors,butifbotha24-bitanda32-bitversionexistforthesameinstruction,the32-bitinstructionexecutesonlyonepercycle.
5HardwareInformation47.
Howare8/16-bitoperationshandledinhardwareThe8/16-bitoperationsareemulatedwith32-bitregisters.
48.
Do24-bitintegersexistinhardwareNo,thereare24-bitinstructions,suchasMUL24/MAD24,butthesmallestintegerinhardwareregistersis32-bits.
49.
Whatarethebenefitsofusing8/16-bittypesover32-bitintegers8/16-bittypestakelessmemorythana32-bitintegertype,increasingtheamountofdatayouareabletoloadwithasingleinstruction.
TheOpenCLcompilerup-convertsto32-bitsonloadanddown-convertstothecorrecttypeonstore.
50.
WhatisthedifferencebetweenaGPRandasharedregisterAlthoughtheyarephysicallyequivalent,thedifferenceiswhethertheregisteroffsetinthehardwareisabsolutetotheregisterfileorrelativetothewavefrontID.
51.
HowoftenarewavefrontscreatedWavefrontsarecreatedbythehardwaretoexecuteaslongasresourcesareavailable.
Iftheyarecreatedbutcannotexecuteimmediately,theyareputinawaitqueuewheretheystayuntilcurrentlyrunningwavefrontsarefinished.
52.
WhatisthemaximumnumberofwavefrontsThemaximumnumberofwavefrontsisdeterminedbywhichresourcelimitsthenumberofwavefrontsthatcanbespawned.
Thiscanbethenumberofregisters,amountoflocalmemory,requiredstacksize,orotherfactors.
Computeshaderwithlocalmemoryusagehasahardcapat16wavefronts.
8of13FAQ53.
WhydoIgetblueorblackscreenswhenexecutinglongerrunningkernelsTheGPUisnotapreemptabledevice.
IfyouarerunningtheGPUasyourdisplaydevice,ensurethatacomputeprogramdoesnotusetheGPUpastacertaintimelimitsetbyWindows.
Exceedingthetimelimitcausesthewatchdogtimertotrigger;thiscanresultinundefinedprogramresults.
54.
WhatisthecostofaclauseswitchIngeneral,thelatencyofaclauseswitchisaround40cycles.
NotethatthisisrelevantonlyforEvergreenandNorthernIslanddevices.
55.
HowcanIhideclauseswitchlatencyByexecutingmultiplewavefrontsinparallel.
NotethatthisisrelevantonlyforEvergreenandNorthernIslanddevices.
56.
HowcanIreduceclauseswitchesClauseswitchesarealmostdirectlyrelatedtosourceprogramcontrolflow.
Byreducingsourceprogramcontrolflow,clauseswitchescanalsobereduced.
ThisisonlyrelevantforEvergreenandNorthernIslandsdevices.
57.
HowdoesthehardwareexecutealooponawavefrontThelooponlyendsexecutionforawavefrontonceeverythreadinthewavefrontbreaksoutoftheloop.
Onceathreadbreaksoutoftheloop,allofitsexecutionresultsaremasked,buttheexecutionstilloccurs.
58.
HowdoesflowcontrolworkwithwavefrontsTherearenoflowcontrolunitsforeachindividualthread,sothewholewavefrontmustexecutethebranchifanythreadinthewavefrontexecutesthebranch.
Iftheconditionisfalse,theresultsarenotwrittentomemory,buttheexecutionstilloccurs.
59.
WhatistheconstantbuffersizeonGPUhardware64kB.
60.
Whathappenswithout-of-boundmemoryoperationsWritesaredropped,andreadsreturnapre-definedvalue.
61.
For7XXdevices,whydoes64x1givebadperformanceinOpenCLkernelsOneofthereasonsisbecauseofhowthecachesaresetuponRV7XXdevices.
Thecachesareoptimizedtoworkinatiledmode,notinlinearmode(whichisthemodeOpenCLkernelsuse).
Togetoptimalcachere-usefromthetextureincomputeshadermodeonRV7XXdevices,reblockyourthreadIDs.
A16x4,8x8,or4x16shouldgiveyougoodenoughblockingtogetsimilarcacheperformanceasyourpixelshaderkernel.
Thisisbecauseacachelinecanbethoughtofasa4x2blockofdatacominginatonce.
So,forpixelshaders,64threadsareblockedina8x8blockthatusesexactlyeightcachelines.
ForOpenCLkernels,your64x1blockpatternuses16cachelines,butonlyuseshalfthedataineachcacheline.
NotethatHD4XXXdevicesupportisEOL.
CatalystdriversnolongerincludeFAQ9of13supportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
62.
WhatisuniqueabouttheLDSinHD4XXXdevices,andwhatareitsperformancecharacteristicsTheLDSintheHD4XXXdevicesisanowner'swritemodelwithlimitedapplications.
Whenusedcorrectly,ithasverysimilarperformancecharacteristicstotheL1cache,buttheusergainscontroloverwhatdataexistsinthememory.
TheLDS_TransposesampleintheSDKusestheLDSintheHD4XXXdevicesveryefficiently.
NotethatHD4XXXdevicesupportisEOL.
Catalystdriversnolongerincludesupportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
6MicrosoftVisualStudio63.
CanIusetheSDKwithMicrosoftVisualStudio2012ExpressDuetolimitationsinMicrosoftVisualStudioExpress,itisonlypossibletousebuildfilesforindividualsamples.
MicrosoftVisualStudioExpressdoesnotsupportbuildingofallofthesamplesatthesametime.
TheprojectfilesthatbuildallofthesamplesareonlysupportedbyfullversionsofMicrosoftVisualStudio2008,2010,or2012.
7Bolt64.
WhatGPUsisBoltoptimizedforBoltisoptimizedforAMD"SouthernIslands"familyofGPUs.
65.
WhichGPUcomputetechnologiesaresupportedbyBoltThepreviewversionofBoltusesOpenCLasitscomputeback-end;however,futurereleasesshouldsupportbothC++AMPandOpenCLcomputeback-ends.
66.
AretheBoltAPIsfinalTheAPIsintheBoltpreviewarenotfinal.
TheBoltAPIshouldbestableandshoulduseformaldeprecationproceduresfromBoltrelease1.
0onwards.
PleasehelpustoimproveBolt;weencourageyoutoprovidesuggestionsonhowwecanimprovetheBoltAPI.
67.
DoesBoltrequirelinkingtoalibraryYes,Boltincludesasmall,staticlibrarythatmustbelinkedintoauserprogramtoproperlyresolvesymbols.
SinceBoltisatemplatelibrary,almostallfunctionalityiswritteninheaderfiles,butOpenCLfollowsanonlinecompilationmodelwhichmakessensetoincludeinalibraryandnotinheaderfiles.
68.
DoesBoltdependonanyotherlibrariesYes,BoltcontainsdependenciesonBoost.
Alldependentheaderfilesandpre-compiledlibrariesareavailableintheBoltSDK.
10of13FAQ69.
WhatalgorithmsdoesBoltcurrentlysupportTransform,reduce,transform_reduce,count,count_if,inclusive_scan,exclusive_scan,andsort.
70.
WhatversionofOpenCLdoesBoltcurrentlyrequireBoltusesfeaturesavailableinOpenCLv1.
2.
71.
DoesBoltrequiretheAMDOpenCLruntimeYes,Boltrequirestheuseoftemplatesinkernelcode.
Atthetimeofthiswriting,AMDistheonlyvendortoprovidethissupport.
72.
WhichCatalystpackageshouldIusewithBoltGenerallyspeaking,downloadingandinstallingthelatestCatalystpackagecontainsthemostrecentOpenCLruntime.
Asofthetimeofthiswriting,therecommendedCatalystpackageis12.
10.
73.
WhichlicenseisBoltlicensedunderTheApacheLicense,Version2.
0.
74.
WhenshouldIusedevice_vectorvsregularhostmemorybolt::cl::device_vectorisusedtomanagedevice-localmemoryandcandeliverhigherperformanceondiscreteGPUsystem.
However,thehostmemoryinterfaceseliminatetheneedtocreateandmanagedevice_vectors.
Ifmemoryisre-usedacrossmultipleBoltcallsorisreferencedbyotherkernels,usingdevice_vectordelivershigherperformance75.
HowdoImeasuretheperformanceofOpenCLBoltlibrarycallsThefirsttimethataBoltlibraryroutineiscalled,theruntimecallstheOpenCLcompilertocompilethekernel.
EachuniquetemplateinstantiationreusesthesameOpenCLcompilation,sotwoBoltcallswiththesamefunctorsarecompiledonlyonetime.
WhenmeasuringtheperformanceofBolt,itisimportanttoexcludethefirstcall(withthecompilationoverhead)fromthetimingmeasurement.
HerearetwowaystotimeBoltfunctioncalls:Thefirstexamplepullsthecalloutsidethetimerloop.
std::vectora;bolt::cl::transform(a.
begin,a.
end,z.
begin(),bolt::cl::negate);startTimer(.
.
.
);for(inti=0;i);}stopTimer(.
.
.
);Asecondalternativeistoexplicitlyexcludethefirstcompilationfromthetimingregion://Explicitlystarttimerafterfirstiteration:std::vectora,z;for(inti=0;i);FAQ11of13}stopTimer(.
.
.
);76.
TheBoltlibraryAPIsaccepthostdatastructuressuchasstd::vector().
ArethesecopiedtotheGPUorkeptinhostmemoryForhostdatastructures,BoltcreatesanOpenCLbufferwiththeCL_MEM_USE_HOST_PTRflagandusestheOpenCLmapandunmapAPIstomanagethehostanddeviceaccess.
OntheAMDOpenCLimplementation,thisresultsindatabeingkeptinhostmemorywhenthehostmemoryisalignedon256-byteboundary.
Inothercases,theruntimecopiesthedataasneededtotheGPU.
Seesection4.
5.
2oftheAMDAcceleratedParallelProcessingOpenCLProgrammingGuideformoreinformationonmemoryoptimization.
Higher-performancemaybeobtainedbyaligningthehostdatastructuresto256-byteboundariesorbyusingthedevice_vectorclassforfinercontrolovertheallocationproperties.
77.
BoltAPIsalsoacceptdevice_vectorinputs.
Whenshouldusedevice_vectorsformyinputsThebolt::cl::device_vectorinterfaceisathinwrapperaroundthecl::Bufferclass;itprovidesaninterfacesimilartostd::vector.
Also,thedevice_vectoroptionallyacceptsaflagsargumentthatispassedtotheOpenCLbuffercreation.
Thefollowingtwoflagscanbeuseful:CL_MEM_ALLOC_HOST_POINTER:Forcesmemorytobeallocatedinhostmemory,evenondiscreteGPUs.
GPUsaccesstothedataisslower(overthePCIebus),buttheallocationavoidstheoverheadofcopyingdatato,andfrom,theGPU.
ThiscanbeusefulwhenthedataisusedinonlyasingleBoltcallbeforebeingprocessedagainontheCPU.
CL_MEM_USE_PERSISTENT_MEM_AMD:Providesdevice-residentzerocopymemory.
Usethisfordatathatiswrite-onlybytheCPU.
Seesection4.
5.
2oftheAMDAcceleratedParallelProcessingOpenCLProgrammingGuideformoreinformation.
8C++AMP78.
WhatisC++AMPC++AcceleratedMassiveParallelism(C++AMP)acceleratesexecutionofC++codebytakingadvantageofdata-parallelhardware,suchasthecomputeunitsonandAPU.
C++AMPisanopenspecificationthatextendsregularC++throughtheuseoftherestrictkeywordtodenoteportionsofcodetobeexecutedontheaccelerateddevice.
79.
WhatAMDdevicessupportC++AMPAnyAMDAPUorGPUthatsupportsMicrosoftDirectX11FeatureLevel11.
0andhigher.
80.
CanyourunaC++AMPprogramonaCPUYoucanrunaC++AMPprogramonaCPU,butitsusageisnotrecommendedinaproductionenvironment.
Formoreinformation,see:http://blogs.
msdn.
com/b/nativeconcurrency/archive/2012/03/10/cpu-accelerator-in-c-amp.
aspx81.
WherecanIgetsupportforC++AMPC++AMPissupportedbytheC++compilerinMicrosoftVisualStudio2012.
12of13FAQ82.
IsC++AMPsupportedonLinuxAtthistime,C++AMPisnotsupportedonLinux;however,asC++AMPisdefinedasanopenstandard,MicrosofthasopenedthedoorforimplementationsonoperatingsystemsotherthanWindows.
83.
WherecanIlearnmoreaboutC++AMPThereisanMSDNpageinC++AMP.
Also,thebookC++AMP:AcceleratedMassiveParallelismwithMicrosoftVisualC++,byKateGregoryandAdeMiller,maybeuseful.
9Aparapi84.
WhatisAparapiAparapiisanAPIforexpressingdataparallelalgorithmsinJavaandaruntimecomponentcapableofconvertingJavabytecodetoOpenCLforexecutionontheGPU.
IfAparapicannotexecuteontheGPUatruntime,itexecutesthedeveloper'salgorithminaJavathreadpool.
Forappropriateworkloads,thisextendsJava's'WriteOnceRunAnywhere'toincludeGPUdevices.
"85.
WhatAMDdevicessupportAparapiAnyAMDAPU,CPU,orGPUthatsupportsOpenCL1.
1andhigherwillwork.
SeethelistofAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk.
86.
WherecanIgetsupportforAparapiSee:http://code.
google.
com/p/aparapi/87.
IsAparapisupportedonLinuxYes.
88.
WhichJDK/JREisrecommendedforAparapiTheOracleJDK/JRE,althoughnotarequirementforAparapi,ishighlyrecommendedforperformanceandforstabilityreasons.
89.
WherecanIlearnmoreaboutAparapiYoucanfindoutmoreaboutAparapifrom:http://code.
google.
com/p/aparapi/AMD'sproductsarenotdesigned,intended,authorizedorwarrantedforuseascomponentsinsystemsintendedforsurgicalimplantintothebody,orinotherapplicationsintendedtosupportorsustainlife,orinanyotherapplicationinwhichthefailureofAMD'sproductcouldcreateasituationwherepersonalinjury,death,orseverepropertyorenvironmentaldamagemayoccur.
AMDreservestherighttodiscontinueormakechangestoitsproductsatanytimewithoutnotice.
CopyrightandTrademarks2012AdvancedMicroDevices,Inc.
Allrightsreserved.
AMD,theAMDArrowlogo,ATI,theATIlogo,Radeon,FireStream,andcombinationsthereofaretrade-marksofAdvancedMicroDevices,Inc.
OpenCLandtheOpenCLlogoaretrade-marksofAppleInc.
usedbypermissionbyKhronos.
Othernamesareforinfor-mationalpurposesonlyandmaybetrademarksoftheirrespectiveowners.
ThecontentsofthisdocumentareprovidedinconnectionwithAdvancedMicroDevices,Inc.
("AMD")products.
AMDmakesnorepresentationsorwarrantieswithrespecttotheaccuracyorcompletenessofthecontentsofthispublicationandreservestherighttomakechangestospecificationsandproductdescriptionsatanytimewithoutnotice.
Theinformationcontainedhereinmaybeofapreliminaryoradvancenatureandissubjecttochangewithoutnotice.
Nolicense,whetherexpress,implied,arisingbyestoppelorotherwise,toanyintellectualpropertyrightsisgrantedbythispublication.
ExceptassetforthinAMD'sStandardTermsandConditionsofSale,AMDassumesnoliabilitywhatsoever,anddisclaimsanyexpressorimpliedwar-ranty,relatingtoitsproductsincluding,butnotlimitedto,theimpliedwar-rantyofmerchantability,fitnessforaparticularpurpose,orinfringementofanyintellectualpropertyright.
ContactAdvancedMicroDevices,Inc.
OneAMDPlaceP.
O.
Box3453Sunnyvale,CA,94088-3453Phone:+1.
408.
749.
400013of13FAQForAMDAcceleratedParallelProcessing:URL:developer.
amd.
com/appsdkDeveloping:developer.
amd.
com/Forum:developer.
amd.
com/openclforum
Boomer.Host是一家比较新的国外主机商,虽然LEB自述 we’re now more than 2 year old,商家提供虚拟主机和VPS,其中VPS主机基于OpenVZ架构,数据中心为美国得克萨斯州休斯敦。目前,商家在LET发了两款特别促销套餐,年付最低3.5美元起,特别提醒:低价低配,且必须年付,请务必自行斟酌确定需求再入手。下面列出几款促销套餐的配置信息。CPU:1core内存:...
百纵科技:美国云服务器活动重磅来袭,洛杉矶C3机房 带金盾高防,会员后台可自助管理防火墙,添加黑白名单 CC策略开启低中高.CPU全系列E52680v3 DDR4内存 三星固态盘列阵。另有高防清洗!百纵科技官网:https://www.baizon.cn/联系QQ:3005827206美国洛杉矶 CN2 云服务器CPU内存带宽数据盘防御价格活动活动地址1核1G10M10G10G38/月续费同价点击...
RepriseHosting是成立于2012年的国外主机商,提供独立服务器租用和VPS主机等产品,数据中心在美国西雅图和拉斯维加斯机房。商家提供的独立服务器以较低的价格为主,目前针对西雅图机房部分独立服务器提供的优惠仍然有效,除了价格折扣外,还免费升级内存和带宽,商家支持使用支付宝或者PayPal、信用卡等付款方式。配置一 $27.97/月CPU:Intel Xeon L5640内存:16GB(原...
amd裁员为你推荐
著作权登记什么是著作权登记沙滩捡12块石头价值近百万朋友从内蒙古阿拉善那边的戈壁捡了很多石头,求大神们鉴定一下,据说那边产玛瑙。谢谢大神们,大大的悬赏百度商城百度知道一般一天能挣多少钱?百度关键词价格查询百度关键字如何设定竟价价格?冯媛甑谁知道怎么找到冯媛甄的具体资料?同一ip网站最近我们网站老是出现同一个IP无数次的进我们网站,而且是在同一时刻,是不是被人刷了?为什么呀?百度关键词分析百度关键字分析是什么意思?网站检测请问论文检测网站好的有那些?javmoo.com0904-javbo.net_avop210hhb主人公叫什么,好喜欢,有知道的吗www.585ccc.com手机ccc认证查询,求网址
安徽双线服务器租用 仿牌空间 BWH 紫田 优key ubuntu更新源 hnyd 丹弗 天互数据 183是联通还是移动 秒杀汇 免费全能主机 服务器干什么用的 最好的qq空间 空间技术网 国外免费asp空间 gtt 空间首页登陆 德隆中文网 服务器防火墙 更多