prototypeavmask.net

avmask.net  时间:2021-03-25  阅读:()
CRNet:Cross-ReferenceNetworksforFew-ShotSegmentationWeideLiu1,ChiZhang1,GuoshengLin1,FayaoLiu21NanyangTechnologicalUniversity,Singapore2A*Star,SingaporeE-mail:weide001@e.
ntu.
edu.
sg,chi007@e.
ntu.
edu.
sg,gslin@ntu.
edu.
sgAbstractOverthepastfewyears,state-of-the-artimagesegmen-tationalgorithmsarebasedondeepconvolutionalneuralnetworks.
Torenderadeepnetworkwiththeabilitytoun-derstandaconcept,humansneedtocollectalargeamountofpixel-levelannotateddatatotrainthemodels,whichistime-consumingandtedious.
Recently,few-shotsegmenta-tionisproposedtosolvethisproblem.
Few-shotsegmenta-tionaimstolearnasegmentationmodelthatcanbegener-alizedtonovelclasseswithonlyafewtrainingimages.
Inthispaper,weproposeacross-referencenetwork(CRNet)forfew-shotsegmentation.
Unlikepreviousworkswhichonlypredictthemaskinthequeryimage,ourproposedmodelconcurrentlymakepredictionsforboththesupportimageandthequeryimage.
Withacross-referencemecha-nism,ournetworkcanbetterndtheco-occurrentobjectsinthetwoimages,thushelpingthefew-shotsegmentationtask.
Wealsodevelopamaskrenementmoduletorecurrentlyre-nethepredictionoftheforegroundregions.
Forthek-shotlearning,weproposetonetunepartsofnetworkstotakeadvantageofmultiplelabeledsupportimages.
ExperimentsonthePASCALVOC2012datasetshowthatournetworkachievesstate-of-the-artperformance.
1.
IntroductionDeepneuralnetworkshavebeenwidelyappliedtovi-sualunderstandingtasks,e.
g.
,objectiondetection,seman-ticsegmentationandimagecaptioning,sincethehugesuc-cessinImageNetclassicationchallenge[4].
Duetoitsdata-drivingproperty,large-scalelabeleddatasetsareof-tenrequiredtoenablethetrainingofdeepmodels.
How-ever,collectinglabeleddatacanbenotoriouslyexpensiveintaskslikesemanticsegmentation,instancesegmentation,andvideosegmentation.
Moreover,datacollectingisusu-allyforasetofspeciccategories.
KnowledgelearnedinpreviousclassescanhardlybetransferredtounseenclassesCorrespondingauthor:G.
Lin(e-mail:gslin@ntu.
edu.
sg)Figure1.
ComparisonofourproposedCRNetagainstpreviouswork.
Previouswork(upperpart)unilaterallyguidethesegmen-tationofqueryimageswithsupportimages,whileinourCRNet(lowerpart)supportandqueryimagescanguidethesegmentationofeachother.
directly.
Directlynetuningthetrainedmodelsstillneedsalargeamountofnewlabeleddata.
Few-shotlearning,ontheotherhand,isproposedtosolvethisproblem.
Inthefew-shotlearningtasks,modelstrainedonprevioustasksareexpectedtogeneralizetounseentaskswithonlyafewlabeledtrainingimages.
Inthispaper,wetargetatfew-shotimagesegmentation.
Givenanovelobjectcategory,few-shotsegmentationaimstondtheforegroundregionsofthiscategoryonlyseeingafewlabeledexamples.
Manypreviousworksformulatethefew-shotsegmentationtaskasaguidedsegmentationtask.
Theguidanceinformationisextractedfromthelabeledsup-portsetfortheforegroundpredictioninthequeryimage,whichisusuallyachievedbyanunsymmetricaltwo-branchnetworkstructure.
Themodelisoptimizedwiththegroundtruthquerymaskasthesupervision.
Inourwork,wearguethattherolesofqueryandsup-portsetscanbeswitchedinafew-shotsegmentationmodel.
Specically,thesupportimagescanguidethepredictionof4165thequeryset,andconversely,thequeryimagecanalsohelpmakepredictionsofthesupportset.
Inspiredbytheimageco-segmentationliterature[7,12,1],weproposeasymmet-ricCross-ReferenceNetworkthattwoheadsconcurrentlymakepredictionsforboththequeryimageandthesupportimage.
ThedifferenceofthenetworkdesignwithpreviousworksisshowninFig.
1.
Thekeycomponentinournet-workdesignisthecross-referencemodulewhichgeneratesthereinforcedfeaturerepresentationsbycomparingtheco-occurrentfeaturesintwoimages.
Thereinforcedrepresen-tationsareusedforthedownstreamforegroundpredictionsintwoimages.
Inthemeantime,thecross-referencemod-ulealsomakespredictionsofco-occurrentobjectsinthetwoimages.
Thissub-taskprovidesanauxiliarylossinthetrainingphasetofacilitatethetrainingofthecross-referencemodule.
Asthereexistshugevarianceintheobjectappearance,miningforegroundregionsinimagescanbeamulti-stepprocess.
WedevelopaneffectiveMaskRenementModuletoiterativelyreneourpredictions.
Intheinitialprediction,thenetworkisexpectedtolocatehigh-condenceseedre-gions.
Then,thecondencemap,intheformofprobabilitymap,issavedasthecacheinthemoduleandisusedforlaterpredictions.
Weupdatethecacheeverytimewemakeanewprediction.
Afterrunningthemaskrenementmoduleforafewsteps,ourmodelcanbetterpredicttheforegroundre-gions.
Weempiricallydemonstratethatsuchalight-weightmodulecansignicantlyimprovetheperformance.
Whenitcomestothek-shotimagesegmentationwheremorethanonesupportimagesareprovided,previousmeth-odsoftenuse1-shotmodeltomakepredictionswitheachsupportimageindividuallyandfusetheirfeaturesorpre-dictedmasks.
Inourpaper,weproposetonetunepartsofournetworkwiththelabeledsupportexamples.
Asournetworkcanmakepredictionsforbothtwoimageinputsatatime,wecanuseatmostk2imagepairstonetuneournetwork.
Anadvantageofournetuningbasedmethodisthatitcanbenetfromtheincreasingnumberofsupportim-ages,andthusconsistentlyincreasestheaccuracy.
Incom-parison,thefusion-basedmethodscaneasilysaturatewhenmoresupportimagesareprovided.
Inourexperiment,wevalidateourmodelinthe1-shot,5-shot,and10-shotset-tings.
Themaincontributionsofthispaperarelistedasfollows:Weproposeanovelcross-referencenetworkthatcon-currentlymakespredictionsforboththequerysetandthesupportsetinthefew-shotimagesegmentationtask.
Byminingtheco-occurrentfeaturesintwoim-ages,ourproposednetworkcaneffectivelyimprovetheresults.
Wedevelopamaskrenementmodulewithcondencecachethatisabletorecurrentlyrenethepredictedre-sults.
Weproposeanetuningschemefork-shotlearning,whichturnsouttobeaneffectivesolutiontohandlemultiplesupportimages.
ExperimentsonthePASCALVOC2012demonstratethatourmethodsignicantlyoutperformsbaselinere-sultsandachievesnewstate-of-the-artperformanceonthe5-shotsegmentationtask.
2.
RelatedWork2.
1.
FewshotlearningFew-shotlearningaimstolearnamodelwhichcanbeeasilytransferredtonewtaskswithlimitedtrainingdataavailable.
Few-shotlearningiswidelyexploredinimageclassicationtasks.
Previousmethodscanberoughlydi-videdintotwocategoriesbasedonwhetherthemodelneedsnetuningatthetestingtime.
Innon-netunedmethods,parameterslearnedatthetrainingtimearekeptxedatthetestingstage.
Forexample,[19,22,21,24]aremetricbasedapproacheswhereanembeddingencoderandadistancemetricarelearnedtodeterminetheimagepairsimilarity.
Thesemethodshavetheadvantageoffastinferencewith-outfurtherparameteradaptions.
However,whenmultiplesupportimagesareavailable,theperformancecanbecomesaturateeasily.
Innetuningbasedmethods,themodelpa-rametersneedtobeadaptedtothenewtasksforpredictions.
Forexample,in[3],theydemonstratethatbyonlynetun-ingthefullyconnectedlayer,modelslearnedontrainingclassescanyieldstate-of-the-artfew-shotperformanceonnewclasses.
Inourwork,weuseanon-netunedfeed-forwardmodeltohandle1-shotlearningandadoptmodelnetuninginthek-shotsettingtobenetfrommultiplela-beledsupportimages.
Thetaskoffew-shotlearningisalsorelatedtoopensetproblem[20],wherethegoalisonlytodetectdatafromnovelclasses.
2.
2.
SegmentationSemanticsegmentationisafundamentalcomputervisiontaskwhichaimstoclassifyeachpixelintheimage.
State-of-the-artmethodsformulateimagesegmentationasadensepredictiontaskandadoptfullyconvolutionalnetworkstomakepredictions[2,11].
Usually,apre-trainedclassica-tionnetworkisusedasthenetworkbackbonebyremovingthefullyconnectedlayersattheend.
Tomakepixel-leveldensepredictions,encoder-decoderstructures[9,11]areoftenusedtoreconstructhigh-resolutionpredictionmaps.
Typicallyanencodergraduallydownsamplesthefeaturemaps,whichaimstoacquirelargeeld-of-viewandcaptureabstractfeaturerepresentations.
Then,thedecodergrad-uallyrecoversthene-grainedinformation.
Skipconnec-tionsareoftenusedtofusehigh-levelandlow-levelfea-4166turesforbetterpredictions.
Inournetwork,wealsofollowtheencoder-decoderdesignandopttotransfertheguidanceinformationinthelow-resolutionmapsandusedecoderstorecoverdetails.
2.
3.
Few-shotsegmentationFew-shotsegmentationisanaturalextensionoffew-shotclassicationtopixellevels.
SinceShabanetal.
[17]pro-posethistaskforthersttime,manydeeplearning-basedmethodsareproposed.
Mostpreviousworksformulatethefew-shotsegmentationasaguidedsegmentationtask.
Forexample,in[17],thesidebranchtakesthelabeledsupportimageastheinputandregressthenetworkparametersinthemainbranchtomakeforegroundpredictionsforthequeryimage.
In[26],theysharethesamespiritsandproposetofusetheembeddingsofthesupportbranchesintothequerybranchwithadensecomparisonmodule.
Dongetal.
[5]drawinspirationfromthesuccessofPrototypicalNetwork[19]infew-shotclassications,andproposeadenseprototypelearningwithEuclideandistanceasthemetricforsegmentationtasks.
Similarly,Zhangetal.
[27]proposeacosinesimilarityguidancenetworktoweightfea-turesfortheforegroundpredictionsinthequerybranch.
Therearesomepreviousworksusingrecurrentstructurestorenethesegmentationpredictions[6,26].
Allpreviousmethodsonlyusetheforegroundmaskinthequeryimageasthetrainingsupervision,whileinournetwork,thequerysetandthesupportsetguideeachotherandbothbranchesmakeforegroundpredictionsfortrainingsupervision.
2.
4.
Imageco-segmentationImageco-segmentationisawell-studiedtaskwhichaimstojointlysegmentthecommonobjectsinpairedimages.
Manyapproacheshavebeenproposedtosolvetheobjectco-segmentationproblem.
Rotheretetal.
[15]proposetominimizeanenergyfunctionofahistogrammatchingtermwithanMRFtoenforcesimilarforegroundstatistics.
Ru-binsteinetetal.
[16]capturethesparsityandvisualvari-abilityofthecommonobjectfrompairsofimageswithdensecorrespondences.
Joulinetal.
[7]solvethecommonobjectproblemwithanefcientconvexquadraticapprox-imationofenergywithdiscriminateclustering.
Sincetheprevalenceofdeepneuralnetworks,manydeeplearning-basedmethodshavebeenproposed.
In[12],themodelre-trievescommonobjectproposalswithaSiamesenetwork.
Chenetal.
[1]adoptchannelattentionstoweightfea-turesfortheco-segmentationtask.
Deeplearning-basedapproacheshavesignicantlyoutperformednon-learningbasedmethods.
3.
TaskDenitionFew-shotsegmentationaimstondtheforegroundpix-elsinthetestimagesgivenonlyafewpixel-levelannotatedimages.
Thetrainingandtestingofthemodelareconductedontwodatasetswithnooverlappedcategories.
Atboththetrainingandtestingstages,thelabeledexampleimagesarecalledthesupportset,whichservesasameta-trainingsetandtheunlabeledmeta-testingimageiscalledthequeryset.
Toguaranteeagoodgeneralizationperformanceattesttime,thetrainingandevaluationofthemodelareaccom-plishedbyepisodicallysamplingthesupportsetandthequeryset.
GivenanetworkRθparameterizedbyθ,ineachepisode,werstsampleatargetcategorycfromthedatasetC.
Basedonthesampledclass,wethensamplek+1labeledimages{(x1s,y1s),(x2s,y2s),.
.
.
(xks,yks),(xq,yq)}thatallcontainthesampledcategoryc.
Amongthem,therstklabeledimagesconstitutethesupportsetSandthelastoneisthequerysetQ.
Afterthat,wemakepredictionsonthequeryimagesbyinputtingthesupportsetandthequeryimageintothemodelyq=Rθ(S,xq).
Attrainingtime,welearnthemodelpa-rametersθbyoptimizingthecross-entropylossL(yq,yq),andrepeatsuchproceduresuntilconvergence.
4.
MethodInthissection,weintroducetheproposedcross-referencenetworkforsolvingfew-shotimagesegmenta-tion.
Inthebeginning,wedescribeournetworkinthe1-shotcase.
Afterthat,wedescribeournetuningschemeinthecaseofk-shotlearning.
Ournetworkincludesfourkeymodules:theSiameseencoder,thecross-referencemod-ule,theconditionmodule,andthemaskrenementmodule.
TheoverallarchitectureisshowninFig.
2.
4.
1.
MethodoverviewDifferentfrompreviousexistingfew-shotsegmentationmethods[26,17,5]unilaterallyguidethesegmentationofqueryimageswithsupportimages,ourproposedCRNeten-ablessupportandqueryimagesguidethesegmentationofeachother.
Wearguethattherelationshipbetweensupport-queryimagepairsisvitaltofew-shotsegmentationlearn-ing.
ExperimentsinTable2validatetheeffectivenessofournewarchitecturedesign.
AsshowninFigure2,ourmodellearnstoperformfew-shotsegmentationasfollows:foreveryquery-supportpair,weencodertheimagepairintodeepfeatureswiththeSiameseEncoder,thenapplythecross-referencemoduletomineoutco-occurrentobjectfeatures.
Tofullyutilizetheannotatedmask,theconditionalmodulewillincorporatethecategoryinformationofsupportsetannotationsforforegroundmaskpredictions,ourmaskrenemodulecachesthecondenceregionmapsrecur-rentlyfornalforegroundprediction.
Inthecaseofk-shotlearning,previousworks[27,26,17]onsimplyaveragetheresultsofdifferent1-shotpredictions,whileweadoptanoptimization-basedmethodthatnetunesthemodelto4167Figure2.
ThepipelineofourNetworkarchitecture.
OurNetworkmainlyconsistsofaSiameseencoder,across-referencemodule,acon-ditionmodule,andamaskrenementmodule.
Ournetworkadoptsasymmetricdesign.
TheSiameseencodermapsthequeryandsupportimagesintofeaturerepresentations.
Thecross-referencemoduleminestheco-occurrentfeaturesintwoimagestogeneratereinforcedrepresentations.
Theconditionmodulefusesthecategory-relevantfeaturevectorsintofeaturemapstoemphasizethetargetcategory.
Themaskrenementmodulesavesthecondencemapsofthelastpredictionintothecacheandrecurrentlyrenesthepredictedmasks.
makeuseofmoresupportdata.
Table4demonstratestheadvantagesofourmethodoverpreviousworks.
4.
2.
SiameseencoderTheSiameseencoderisapairofparameter-sharedcon-volutionalneuralnetworksthatencodethequeryimageandthesupportimagetofeaturemaps.
Unlikethemod-elsin[17,14],weuseasharedfeatureencodertoencodethesupportandthequeryimages.
Byembeddingtheim-agesintothesamespace,ourcross-referencemodulecanbettermineco-occurrentfeaturestolocatetheforegroundregions.
Toacquirerepresentativefeatureembeddings,weuseskipconnectionstoutilizemultiple-layerfeatures.
AsisobservedinCNNfeaturevisualizationliterature[26,23],featuresinlowerlayersoftenrelatetolowlevelcueandhigherlayersoftenrelatetosegmentcue,wecombinethelowerlevelfeaturesandhigherlevelfeaturesandpassingtofollowedmodules.
4.
3.
Cross-ReferenceModuleThecross-referencemoduleisdesignedtomineco-occurrentfeaturesintwoimagesandgenerateupdatedrep-resentations.
ThedesignofthemoduleisshowninFig.
3.
GiventwoinputfeaturemapsgeneratedbytheSiameseen-coder,werstuseglobalaveragepoolingtoacquiretheglobalstatisticsinthetwoimages.
Then,thetwofeaturevectorsaresenttoapairoftwo-layerfullyconnected(FC)layers,respectively.
TheSigmoidactivationfunctionat-tachedaftertheFClayertransformsthevectorvaluesintotheimportanceofthechannel,whichisintherangeof[0,1].
Afterthat,thevectorsinthetwobranchesarefusedbyelement-wisemultiplication.
Intuitively,onlythecommonfeaturesinthetwobrancheswillhaveahighactivationinthefusedimportancevector.
Finally,weusethefusedvec-tortoweighttheinputfeaturemapstogeneratereinforcedfeaturerepresentations.
Incomparisontotherawfeatures,thereinforcedfeaturesfocusmoreontheco-occurrentrep-resentations.
Basedonthereinforcedfeaturerepresentations,weadda4168Figure3.
Thecross-referencemodule.
Giventheinputfea-turemapsfromthesupportandthequerysets(Fs,Fq),thecross-referencemodulegeneratesupdatedfeaturerepresentations(Gs,Gq)byinspectingtheco-occurrentfeatures.
headtodirectlypredicttheco-occurrentobjectsinthetwoimagesduringtrainingtime.
Thissub-taskaimstofacil-itatethelearningoftheco-segmentationmoduletominebetterfeaturerepresentationsforthedownstreamtasks.
Togeneratethepredictionsoftheco-occurrentobjectsintwoimages,thereinforcedfeaturemapsinthetwobranchesaresenttoadecodertogeneratethepredictedmaps.
ThedecoderiscomposedofconvolutionallayerfollowedbyaASPP[2]layers,nally,aconvolutionallayergeneratesatwo-channelpredictioncorrespondingtotheforegroundandbackgroundscores.
4.
4.
ConditionModuleTofullyutilizethesupportsetannotations,wedesignaconditionmoduletoefcientlyincorporatethecategoryinformationforforegroundmaskpredictions.
Thecon-ditionmoduletakesthereinforcedfeaturerepresentationsgeneratedbythecross-referencemoduleandacategory-relevantvectorasinputs.
Thecategory-relevantvectoristhefusedfeatureembeddingsofthetargetcategory,whichisachievedbyapplyingforegroundaveragepooling[26]overthecategoryregion.
Asthegoalofthefew-shotsegmen-tationistoonlyndtheforegroundmaskoftheassignedobjectcategory,thetask-relevantvectorservesasacondi-tiontosegmentthetargetcategory.
Toachieveacategory-relevantembedding,previousworksopttolteroutthebackgroundregionsintheinputimages[14,17]orinthefeaturerepresentations[26,27].
Wechoosetodosobothinthefeaturelevelandintheinputimage.
Thecategory-relevantvectorisfusedwiththereinforcedfeaturemapsintheconditionmodulebybilinearlyupsamplingthevectortothesamespatialsizeofthefeaturemapsandconcatenatingthem.
Finally,weaddaresidualconvolutiontoprocesstheconcatenatedfeatures.
Thestructureoftheconditionmod-ulecanbefoundinFig.
4.
Theconditionmodulesinthesupportbranchandthequerybranchhavethesamestruc-tureandsharealltheparameters.
4.
5.
MaskRenementModuleAsisoftenobservedintheweaklysupervisedseman-ticsegmentationliterature[26,8],directlypredictingtheFigure4.
Theconditionmodule.
Ourconditionmodulefusesthecategory-relevantfeaturesintorepresentationsforbetterpredic-tionsofthetargetcategory.
objectmaskscanbedifcult.
Itisacommonprincipletorstlylocateseedregionsandthenrenetheresults.
Basedonsuchprinciple,wedesignamaskrenementmoduletorenethepredictedmaskstep-by-step.
Ourmotivationisthattheprobabilitymapsinasinglefeed-forwardpredic-tioncanreectwhereisthecondentregioninthemodelprediction.
Basedonthecondentregionsandtheimagefeatures,wecangraduallyoptimizethemaskandndthewholeobjectregions.
AsshowninFig.
5,ourmaskrene-mentmodulehastwoinputs.
Oneisthesavedcondencemapinthecacheandthesecondinputistheconcatenationoftheoutputsfromtheconditionmoduleandthecross-referencemodule.
Fortheinitialprediction,thecacheisinitializedwithazeromask,andthemodulemakespredic-tionssolelybasedontheinputfeaturemaps.
Themodulecacheisupdatedwiththegeneratedprobabilitymapeverytimethemodulemakesanewprediction.
Werunthismod-ulemultipletimestogenerateanalrenedmask.
Themaskrenementmoduleincludesthreemainblocks:thedownsampleblock,theglobalconvolutionblock,andthecombineblock.
TheDownsampleBlockdownsamplesthefeaturemapsbyafactorof2.
Thedownsampledfea-turesarethenupsampledtotheoriginalsizeandfusedwithfeaturesintheoppositebranch.
Theglobalconvolutionblock[13]aimstocapturefeaturesinalargeeld-of-viewwhilecontainingfewparameters.
Itincludestwogroupsof1*7and7*1convolutionalkernels.
Thecombineblockeffectivelyfusesthefeaturebranchandthecachedbranchtogeneraterenedfeaturerepresentations.
4.
6.
FinetuningforK-ShotLearningInthecaseofk-shotlearning,weproposetonetuneournetworktotakeadvantageofmultiplelabeledsupportim-ages.
Asournetworkcanmakepredictionsfortwoimagesatatime,wecanuseatmostk2imagepairstonetuneournetwork.
Attheevaluationstage,werandomlysam-pleanimagepairfromthelabeledsupportsettonetuneourmodel.
WekeeptheparametersintheSiameseencoderxedandonlynetunetherestmodules.
Inourexperiment,wedemonstratethatournetuningbasedmethodscancon-sistentlyimprovetheresultwhenmorelabeledsupportim-agesareavailable,whilethefusion-basedmethodsinprevi-ousworksoftengetsaturatedperformancewhenthenum-berofsupportimagesincreases.
4169Figure5.
Themaskrenementmodule.
Themodulesavesthegeneratedprobabilitymapfromthelaststepintothecacheandrecurrentlyoptimizesthepredictions.
5.
Experiment5.
1.
ImplementationDetailsIntheSiameseencoder,weexploitmulti-levelfeaturesfromtheImageNetpre-trainedResnet-50astheimagerep-resentations.
Weusedilatedconvolutionsandkeepthefea-turemapsafterlayer3andlayer4haveaxedsizeof1/8oftheinputimageandconcatenatethemfornalpredic-tion.
Alltheconvolutionallayersinourproposedmoduleshavethekernelsizeof3*3andgeneratefeaturesof256channels,followedbytheReLUactivationfunction.
Attesttime,werecurrentlyrunthemaskrenementmodulefor5timestorenethepredictedmasks.
Inthecaseofk-shotlearning,wextheSiameseencoderandnetunetherestparameters.
5.
2.
DatasetandEvaluationMetricWeimplementcross-validationexperimentsonthePAS-CALVOC2012datasettovalidateournetworkdesign.
Tocompareourmodelwithpreviousworks,weadoptthesamecategorydivisionsandtestsettingswhicharerstproposedin[17].
Inthecross-validationexperiments,20objectcat-egoriesareevenlydividedinto4folds,withthreefoldsasthetrainingclassesandonefoldasthetestingclasses.
ThecategorydivisionisshowninTable1.
Wereporttheav-erageperformanceover4testingfolds.
Fortheevaluationmetrics,weusethestandardmeanIntersection-over-Union(mIoU)oftheclassesinthetestingfold.
Formorede-tailsaboutthedatasetinformationandtheevaluationmet-ric,pleasereferto[17].
6.
AblationstudyThegoaloftheablationstudyistoinspecteachcompo-nentinournetworkdesign.
OurablationexperimentsareconductedonthePASCALVOCdataset.
Weimplementfoldcategories0aeroplane,bicycle,bird,boat,bottle1bus,car,cat,chair,cow2diningtable,dog,horse,motobike,person3pottedplant,sheep,sofa,train,tv/monitorTable1.
TheclassdivisionofthePASCALVOC2012datasetpro-posedin[17].
ConditionCross-ReferenceModule1-shot36.
343.
349.
1Table2.
Ablationstudyontheconditionmoduleandthecross-referencemodule.
Thecross-referencemodulebringsalargeper-formanceimprovementoverthebaselinemodel(Conditiononly).
Multi-LevelMaskReneMulti-Scale1-shot49.
150.
353.
455.
2Table3.
Ablationexperimentsonthemultiple-levelfeature,multiple-scaleinput,andtheMaskRenemodule.
Everymod-ulebringsperformanceimprovementoverthebaselinemodel.
cross-validation1-shotexperimentsandreporttheaverageperformanceoverthefoursplits.
InTable2,werstinvestigatethecontributionsofourtwoimportantnetworkcomponents:theconditionmod-uleandthecross-referencemodule.
Asshown,therearesignicantperformancedropsifweremoveeithercompo-nentfromthenetwork.
Particularly,ourproposedcross-referencemodulehasahugeimpactonthepredictions.
Our4170Figure6.
OurQualitativeexamplesonthePASCALVOCdataset.
Therstrowisthesupportsetandthesecondrowisthequeryset.
Thethirdrowisourpredictedresultsandthethefourthrowisthegroundtruth.
Evenwhenthequeryimagescontainobjectsfrommultipleclasses,ournetworkcanstillsuccessfullysegmentthetargetcategoryindicatedbythesupportmask.
Method1-shot5-shot10-shotFusion49.
150.
249.
9FinetuneN/A57.
559.
1Finetune+FusionN/A57.
658.
8Table4.
k-shotexperiments.
Wecompareournetuningbasedmethodwiththefusionmethod.
Ourmethodyieldsconsistentper-formanceimprovementwhenthenumberofsupportimagesin-creases.
Forthecaseof1-shot,netuneresultsarenotavailableasCRNetneedsatleasttwoimagestoapplyournetunescheme.
networkcanimprovethecounterpartmodelwithoutcross-referencemodulebymorethan10%.
Toinvestigatehowmuchthescalevarianceoftheob-jectsinuencethenetworkperformance,weadoptamulti-scaletestexperimentinournetwork.
Specically,atthetesttime,weresizethesupportimageandthequeryimageto[0.
75,1.
25]oftheoriginalimagesizeandconducttheinfer-MethodBackbonemIoUIoUOSLM[17]VGG1640.
861.
3co-fcn[14]VGG1641.
160.
9sg-one[27]VGG1646.
363.
1R-DRCN[18]VGG1640.
160.
9PL[5]VGG16-61.
2A-MCG[6]ResNet-50-61.
2CANet[26]ResNet-5055.
466.
2PGNet[25]ResNet-5056.
069.
9CRNetVGG1655.
266.
4CRNetResNet-5055.
766.
8Table5.
Comparisonwiththestate-of-the-artmethodsunderthe1-shotsetting.
Ourproposednetworkachievesstate-of-the-artper-formanceunderbothevaluationmetrics.
ence.
Theoutputpredictedmaskoftheresizedqueryimageisbilinearlyresizedtotheoriginalimagesize.
Wefusethepredictionsunderdifferentimagescales.
AsshowninTa-4171MethodBackbonemIoUIoUOSLM[17]VGG1643.
961.
5co-fcn[14]VGG1641.
460.
2sg-one[27]VGG1647.
165.
9R-DFCN[18]VGG1645.
366.
0PL[5]VGG16-62.
3A-MCG[6]ResNet-50-62.
2CANet[26]ResNet-5057.
169.
6PGNet[25]ResNet5058.
570.
5CRNetVGG1658.
571.
0CRNetResNet5058.
871.
5Table6.
Comparisonwiththestate-of-the-artmethodsunderthe5-shotsetting.
Ourproposednetworkoutperformsallpreviousmethodsandachievesnewstate-of-the-artperformanceunderbothevaluationmetrics.
ble3,multi-scaleinputtestbrings1.
2mIoUscoreimprove-mentinthe1-shotsetting.
WealsoinvestigatethechoicesoffeaturesinthenetworkbackboneinTable3.
Wecom-parethemulti-levelfeatureembeddingswiththefeaturessolelyfromthelastlayer.
Ourmodelwithmulti-levelfea-turesprovidesanimprovementof1.
8mIoUscore.
Thisindicatesthattobetterlocatethecommonobjectsintwoimages,middle-levelfeaturesarealsoimportantandhelp-ful.
Tofurtherinspecttheeffectivenessofthemaskrene-mentmodule,wedesignabaselinemodelthatremovesthecachedbranch.
Inthiscase,themaskrenementblockmakespredictionssolelybasedontheinputfeaturesandweonlyrunthemaskrenementmoduleonce.
AsshowninTable3,ourmaskrenementmodulebrings3.
1mIoUscoreperformanceincreaseoverourbaselinemethod.
Inthek-shotsetting,wecompareournetuningbasedmethodwiththefusion-basedmethodswidelyusedinpre-viousworks.
Forthefusion-basedmethod,wemakeanin-ferencewitheachofthesupportimagesandaveragetheirprobabilitymapsasthenalprediction.
ThecomparisonisshowninTable4.
Inthe5-shotsetting,thenetuningbasedmethodoutperforms1-shotbaselineby8.
4mIoUscore,whichissignicantlysuperiortothefusion-basedmethod.
When10supportimagesareavailable,ournetuningbasedmethodshowsmoreadvantages.
Theperformancecon-tinuesincreasingwhilethefusion-basedmethod'sperfor-mancebeginstodrop.
6.
1.
MSCOCOCOCO2014[10]isachallenginglarge-scaledataset,whichcontains80objectcategories.
Following[26],wechoose40classesfortraining,20classesforvalidationand20classesfortest.
AsshowninTable.
7,theresultsagainvalidatethedesignsinournetwork.
ConditionCross-ReferenceModuleMask-Rene1-shot5-shot43.
344.
038.
542.
744.
945.
645.
847.
2Table7.
Ablationstudyontheconditionmodulecross-referencemoduleandMask-renemoduleondatasetMSCOCO.
6.
2.
ComparisonwiththeState-of-the-ArtResultsWecompareournetworkwithstate-of-the-artmethodsonthePASCALVOC2012dataset.
Table5showstheper-formanceofdifferentmethodsinthe1-shotsetting.
WeuseIoUtodenotetheevaluationmetricproposedin[14].
ThedifferencebetweenthetwometricsisthattheIoUmetricalsoincorporatesthebackgroundintotheIntersection-over-Unioncomputationandignorestheimagecategory.
5-ShotExperiments.
Thecomparisonof5-shotseg-mentationresultsundertwoevaluationmetricsisshowninTable6.
Ourmethodachievesnewstate-of-the-artperfor-manceunderbothevaluationmetrics.
7.
ConclusionInthispaper,wehavepresentedanovelcross-referencenetworkforfew-shotsegmentation.
Unlikepreviousworkunilaterallyguidingthesegmentationofqueryimageswithsupportimages,ourtwo-headdesignconcurrentlymakespredictionsinboththequeryimageandthesupportimagetohelpthenetworkbetterlocatethetargetcategory.
Wedevelopamaskrenementmodulewithacachemecha-nismwhichcaneffectivelyimprovethepredictionperfor-mance.
Inthek-shotsetting,ournetuningbasedmethodcantakeadvantageofmoreannotateddataandsignicantlyimprovestheperformance.
ExtensiveablationexperimentsonPASCALVOC2012datasetvalidatetheeffectivenessofourdesign.
Ourmodelachievesstate-of-the-artperfor-manceonthePASCALVOC2012dataset.
AcknowledgementsThisresearchissupportedbytheNationalResearchFoundationSingaporeunderitsAISingaporeProgramme(AwardNumber:AISG-RP-2018-003)andtheMOETier-1researchgrants:RG126/17(S)andRG22/19(S).
Thisre-searchisalsopartlysupportedbytheDelta-NTUCorporateLabwithfundingsupportfromDeltaElectronicsInc.
andtheNationalResearchFoundation(NRF)Singapore.
References[1]HongChen,YifeiHuang,andHidekiNakayama.
Semanticawareattentionbaseddeepobjectco-segmentation.
arXivpreprintarXiv:1810.
06859,2018.
2,3[2]Liang-ChiehChen,GeorgePapandreou,IasonasKokkinos,KevinMurphy,andAlanLYuille.
Deeplab:Semanticimage4172segmentationwithdeepconvolutionalnets,atrousconvolu-tion,andfullyconnectedcrfs.
IEEEtransactionsonpatternanalysisandmachineintelligence,40(4):834–848,2018.
2,5[3]Wei-YuChen,Yen-ChengLiu,ZsoltKira,Yu-ChiangWang,andJia-BinHuang.
Acloserlookatfew-shotclassication.
InInternationalConferenceonLearningRepresentations,2019.
2[4]JiaDeng,WeiDong,RichardSocher,Li-JiaLi,KaiLi,andLiFei-Fei.
Imagenet:Alarge-scalehierarchicalimagedatabase.
InCVPR,pages248–255,2009.
1[5]NanqingDongandEricXing.
Few-shotsemanticsegmenta-tionwithprototypelearning.
InBMVC,2018.
3,7,8[6]TaoHu,PengwanYang,ChiliangZhang,GangYu,YadongMu,andCeesGMSnoek.
Attention-basedmulti-contextguidingforfew-shotsemanticsegmentation.
2019.
3,7,8[7]ArmandJoulin,FrancisBach,andJeanPonce.
Multi-classcosegmentation.
In2012IEEEConferenceonComputerVi-sionandPatternRecognition,pages542–549.
IEEE,2012.
2,3[8]AlexanderKolesnikovandChristophHLampert.
Seed,ex-pandandconstrain:Threeprinciplesforweakly-supervisedimagesegmentation.
InEuropeanConferenceonComputerVision,pages695–711.
Springer,2016.
5[9]GuoshengLin,AntonMilan,ChunhuaShen,andIanDReid.
Renenet:Multi-pathrenementnetworksforhigh-resolutionsemanticsegmentation.
InCVPR,volume1,page5,2017.
2[10]Tsung-YiLin,MichaelMaire,SergeBelongie,JamesHays,PietroPerona,DevaRamanan,PiotrDollar,andCLawrenceZitnick.
Microsoftcoco:Commonobjectsincontext.
InECCV,pages740–755,2014.
8[11]JonathanLong,EvanShelhamer,andTrevorDarrell.
Fullyconvolutionalnetworksforsemanticsegmentation.
InPro-ceedingsoftheIEEEconferenceoncomputervisionandpat-ternrecognition,pages3431–3440,2015.
2[12]PreranaMukherjee,BrejeshLall,andSnehithLattupally.
Objectcosegmentationusingdeepsiamesenetwork.
arXivpreprintarXiv:1803.
02555,2018.
2,3[13]ChaoPeng,XiangyuZhang,GangYu,GuimingLuo,andJianSun.
Largekernelmatters–improvesemanticsegmen-tationbyglobalconvolutionalnetwork.
InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecog-nition,pages4353–4361,2017.
5[14]KateRakelly,EvanShelhamer,TrevorDarrell,AlyoshaEfros,andSergeyLevine.
Conditionalnetworksforfew-shotsemanticsegmentation.
InICLRWorkshop,2018.
4,5,7,8[15]CarstenRother,TomMinka,AndrewBlake,andVladimirKolmogorov.
Cosegmentationofimagepairsbyhistogrammatching-incorporatingaglobalconstraintintomrfs.
In2006IEEEComputerSocietyConferenceonComputerVi-sionandPatternRecognition(CVPR'06),volume1,pages993–1000.
IEEE,2006.
3[16]MichaelRubinstein,ArmandJoulin,JohannesKopf,andCeLiu.
Unsupervisedjointobjectdiscoveryandsegmentationininternetimages.
InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pages1939–1946,2013.
3[17]AmirrezaShaban,ShrayBansal,ZhenLiu,IrfanEssa,andByronBoots.
One-shotlearningforsemanticsegmentation.
arXivpreprintarXiv:1709.
03410,2017.
3,4,5,6,7,8[18]MennatullahSiamandBorisOreshkin.
Adaptivemaskedweightimprintingforfew-shotsegmentation.
arXivpreprintarXiv:1902.
11123,2019.
7,8[19]JakeSnell,KevinSwersky,andRichardZemel.
Prototypicalnetworksforfew-shotlearning.
InNIPS,2017.
2,3[20]XinSun,ZhenningYang,ChiZhang,GuohaoPeng,andKeck-VoonLing.
Conditionalgaussiandistributionlearningforopensetrecognition,2020.
2[21]OriolVinyals,CharlesBlundell,TimothyLillicrap,DaanWierstra,etal.
Matchingnetworksforoneshotlearning.
InAdvancesinneuralinformationprocessingsystems,pages3630–3638,2016.
2[22]FloodSungYongxinYang,LiZhang,TaoXiang,PhilipHSTorr,andTimothyMHospedales.
Learningtocompare:Re-lationnetworkforfew-shotlearning.
InCVPR,2018.
2[23]JasonYosinski,JeffClune,AnhNguyen,ThomasFuchs,andHodLipson.
Understandingneuralnetworksthroughdeepvisualization.
arXivpreprintarXiv:1506.
06579,2015.
4[24]ChiZhang,YujunCai,GuoshengLin,andChunhuaShen.
Deepemd:Few-shotimageclassicationwithdifferentiableearthmover'sdistanceandstructuredclassiers,2020.
2[25]ChiZhang,GuoshengLin,FayaoLiu,JiushuangGuo,QingyaoWu,andRuiYao.
Pyramidgraphnetworkswithconnectionattentionsforregion-basedone-shotsemanticsegmentation.
InProceedingsoftheIEEEInternationalConferenceonComputerVision,pages9587–9595,2019.
7,8[26]ChiZhang,GuoshengLin,FayaoLiu,RuiYao,andChunhuaShen.
Canet:Class-agnosticsegmentationnetworkswithit-erativerenementandattentivefew-shotlearning.
InPro-ceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition,pages5217–5226,2019.
3,4,5,7,8[27]XiaolinZhang,YunchaoWei,YiYang,andThomasHuang.
Sg-one:Similarityguidancenetworkforone-shotsemanticsegmentation.
arXivpreprintarXiv:1810.
09091,2018.
3,5,7,84173

艾云年付125元圣何塞GTT,洛杉矶vps年付85元

艾云怎么样?艾云是一家去年年底成立的国人主机商家,商家主要销售基于KVM虚拟架构的VPS服务,机房目前有美国洛杉矶、圣何塞和英国伦敦,目前商家推出了一些年付特价套餐,性价比非常高,洛杉矶套餐低至85元每年,给500M带宽,可解奈飞,另外圣何塞也有特价机器;1核/1G/20G SSD/3T/2.5Gbps,有需要的朋友以入手。点击进入:艾云官方网站艾云vps促销套餐:KVM虚拟架构,自带20G的防御...

快云科技:夏季大促销,香港VPS7.5折特惠,CN2 GIA线路; 年付仅不到五折巨惠,续费永久同价

快云科技怎么样?快云科技是一家成立于2020年的新起国内主机商,资质齐全 持有IDC ICP ISP等正规商家。我们秉承着服务于客户服务于大众的理念运营,机器线路优价格低。目前已注册用户达到5000+!主营产品有:香港弹性云服务器,美国vps和日本vps,香港物理机,国内高防物理机以及美国日本高防物理机!产品特色:全配置均20M带宽,架构采用KVM虚拟化技术,全盘SSD硬盘,RAID10阵列, 国...

两款半月湾 HMBcloud 春节88折日本和美国CN2 VPS主机套餐

春节期间我们很多朋友都在忙着吃好喝好,当然有时候也会偶然的上网看看。对于我们站长用户来说,基本上需要等到初八之后才会开工,现在有空就看看是否有商家的促销。这里看到来自HMBcloud半月湾服务商有提供两款春节机房方案的VPS主机88折促销活动,分别是来自洛杉矶CN2 GIA和日本CN2的方案。八八折优惠码:CNY-GIA第一、洛杉矶CN2 GIA美国原生IP地址、72小时退款保障、三网回程CN2 ...

avmask.net为你推荐
网易网盘关闭入口网易网盘怎么用????.cn域名cn域名有什么用啊?云爆发云瀑现象多发生在山地的什么坡?云计算什么叫做“云计算”?西部妈妈网我爸妈在云南做非法集资了,钱肯定交了很多,我不恨她们。他们叫我明天去看,让我用心的看,,说是什么...丑福晋男主角中毒眼瞎毁容,女主角被逼当丫鬟,应用自己的血做药引帮男主角解毒的言情小说125xx.com115xx.com是什么意思百度指数词百度指数为0的词 为啥排名没有斗城网女追男有多易?喜欢你,可我不知道你喜不喜欢我!!平安夜希望有他陪我过广告法中华人民共和国广告法中,有哪些广告不得发布?
备案域名 二级域名申请 a2hosting Vultr 安云加速器 国外php主机 英文简历模板word 52测评网 刀片服务器是什么 php空间推荐 双十一秒杀 中国电信测网速 速度云 adroit 佛山高防服务器 重庆双线服务器托管 新世界服务器 免费邮件服务器 移动服务器托管 电信网络测速器 更多