W henspecies matchesareunavailableareDNAb arc o d e s c orrectlyassigne dt o hi gh ertaxa?Anassessmentusingsphingidmoths
JohnJa mes Wilson1*,RodolpheRougerie1,5,JustinSchonfeld1,DanielHJanzen2,WinnieHallwachs2,
M e hrdadHajibabaei1,IanJKitching3,JeanHaxaire4andPaulDNH ebert1
Background proposed as a method capable of partially alleviatingTaxono micassignmentsarecrucialforeffectivecom mu- this“taxonomicimpedi ment”byenablingaccuratespe-nication of biological research, enabling comparability cies identifications by non-specialists using nucleotidebetween studies. Yet, theability tocategorize biodiver- co mparisonsacrossastandardgeneregion[2].sity effectively and accurately is hampered by a lack of Inatypical scenario, aspecimen of unknownspeciestaxonomic experts [1]. DNA barcoding has been affinityisencountered,theDNAbarcodeofthequeryissequenced and then compared with a reference library
WilsonetalBMCEcology2011, 11:18 Page2of14http://www biomedcentral com/1472-6785/11/18queriesfro mspeciesnotincludedinthekey,DN Abar- barcodelibraryisavailable(86%of knownspecies[15])codingcannotassignaspeciesidentificationwhenthere containing relatively stable and well-studied taxaarenobarcoderecordsforconspecificsinthereference (Figure 1A). This enables us to assemble sub-librarieslibrary. Consequently, barcoding appraisal studies withawiderangeofdifferentspeciescompletenessandusually require a priori knowledge that the species of also provides a robust taxonomic framework againstthequeryispresentinthereferencelibrary(e.g. [4-6]). which to judge assignment accuracy. We evaluatedInreallife,aconsequenceofwidespreadroutineuseof assignmentaccuracyusingconcordancewiththecurrentD NA barcoding is that failed species matches (e.g. < classification of Sphingidae [16] while recognising that98%similaritywiththeclosestlibrarysequence[5])are morphologicallyderivedtaxono myrepresentsfalsifiablefrequently encountered (e.g. [7]). In such situations it hypotheses.Consequently,wealsoexaminedtheassign-may be tempting to attempt assignment to a higher mentsaposterioriinlightofamorerecentphylogenetictaxonomiclevel(i.e.genus,tribe,subfamily).Forexam- study of the family [17]. Sphingidae is the target of aple, Armstrong and Ball [8] suggested their query bar- globalbarcodingcampaign[15]andshowshighsuccesscode sharing 94.6% similarity with the closest library forspecies-levelbarcodeidentifications(Figure1B).match (Clostera albostigma) was a likely congener butnotconspecificofthereferencelibrarybarcode.Thereis Assignment criteriaconsiderable disagreement overthelikely accuracy and Since Hebert et al. [12] proposed that DNA barcodingappropriateness of such assignme nt attempts (e.g. couldbeusedtoassignqueriestohighertaxa,research-[1,5,9-11]), which is not surprising given the different ers have performed higher taxa assignments using adpurposesandcriteriaemployed. hoc criteria based on thefrequency of best hits, degreeof sequence similarity, bootstrapping or BLAST scoresDNA barcoding assignment to higher taxa (e.g. [18-22]). However, these studies usually involvedHebertetal. [12]expressedoptimismforbarcode-based fragmentary tissues of unknown taxonomic origin andassignmentstohighertaxainanimals.Suchassignments consequently assignments could not be independentlyare useful as shorthand for phylogenetic hypotheses confirmed (i.e. using morphology). Therefore, both thefrom which biological characteristics of organisms can accuracy and optimal approach for such assignmentsbe predicted. For example, by assigning a specimen to remain unclear. In this study, we test the extent tothegenusAellopos onecanpredict thatasacaterpillar whichassignmentaccuracydependsonassignmentcri-itmostlikelyfedonplantsofthefamilyRubiaceae[13]. teria applied by comparing the performance of severalT hecapacitytomakepredictionsbasedontaxonme m- approachese mpl oyedinpriorstudies.b ershipisespeciallypertinentwherefundamentali mpe-diments, e.g. an egg or an incomplete specimen, Tree-based assignment criteriaprecludemorphology-baseddetectionofcharacteristics. While some consider the use of tree-based assignmentW hile assignment to pre-determined taxa is an opera- approaches controversial [23], we consider it justifiedtion distinct from the description of taxa, assignment for supraspecific taxa sharing phylogenetic as opposedaccuracyisrelatedtotheabilityofthecharactersystem to tokogenetic affinities. Using tree-based criteria,use dasthebasisofassign menttotrackorganis malphy- queriesaresuccessfullyassigne dwhentheyclusterwithlogeny (i.e. display a phylogenetic signal [14]). This barcodesfromtheircorrecttaxon[24].Meieretal. [25]operationisconfoundedbythefactthatmanycurrently use the following example where they imagine a refer-recognized supraspecific taxa are not natural [10]. In encelibrarycontainingachimpbarcodebutnohumansuchcases, thefailure of acharacter system toprovide barcode to illustrate the difficulty with such anaccurate assignments can reflect “imperfect” taxonomy approach:“Imagineaqueryclusteringwithachi mpbar-ratherthanthelackofphylogeneticsignal. code. Basedonthequery’ s position, onecannot decide
In this study, we test the ability of DNA barcodes to whetheritcomesfrom Homosapiensoranotherchimp,enable accurate higher taxon assignments. Specifically i.e., forming a cluster on a tree is logically insufficientweask: If species coverage intheDNAbarcode library for assigning a sequence.” We address this concern byis incomplete, canthebarcode from asphingid species establishingobjectiverulesetsforourtree-basedassign-notrepresented inthe library beassigned tothe genus ment criteria based on topology (Table 1). We includeit belongs to, or, recognised as being from a sphingid assignment criteria that require a taxon to be “mono-genus missing from the library? Likewise, can the bar- phyletic” or “exclusive” for a query to be assigned tocode from a sphingid genus not represented in the thattaxon(Table1).Thisrequiresthatweoverlookthelibrarybeaccuratelyassignedatthetribeandsubfamily factthattreesbasedonC OIdonotperfectlytrackorga-level?Weaddressthesequestionsusingthemothfamily nismal phylogeny at deeper levels [14] and that manySphingidae because a comprehensive global reference “traditional” taxa are not monophyletic [17]. Ekrem et
WilsonetalBMCEcology2011, 11:18 Page3of14http://www biomedcentral com/1472-6785/11/18
assignmentisthatofthetaxonAus
“Bestmatch” Qissimplyassignedtothegenusofthemostsimilar Direct sequence comparison assignment criterialibrarysequencebasedonK2Pdistance In addition to tree-based assignment we used criteria“Bestc”lose Qisassignedtothegenusofthemostsimilarlibrary basedondirectsequencecomparison.Wechosenotto
unassignable(ie “ambiguous”) assignment criteria we use, both based on K2P [32]
WilsonetalBMCEcology2011, 11:18 Page4of14http://www biomedcentral com/1472-6785/11/18match”. A query is assigned the taxon of the referencMeethodsbarcodethatitmostcloselymatchesirrespectiveofhow Query dataset, 100% reference library and sub-librariessimilar the query and library barcodes are. Under this Using barcode records assembled as part of the globalcriterions o mefalseassign mentsareinevitable.A“false- barc odingca mpaignonSphingidae[15],weselectedonepositive”result, whereaquerybarcode ismatchedtoa barcodefromeachspeciestoactasareferencebarcodereferencebarcodedespitesignificantdivergence,isafre- f or that taxon. Reference barcodes were available forquent consequence of using the BLAST algorithm by 1088 of the 1270 described species listed in Kitchingitself [33]. For example, the query dataset used here and Cadiou [16] and for an additional seven Costacontainedfivemonobasicgenera.Forthesebarcodesthe Rican species described or revalidated since 2000 (=only possible result for a genus assignment using “best 1095sphingidspecies).Barcodesequenceswereselectedmatch”are“false-positive”.Theseerrorscanbeavoided to maximize length and quality and ranged from 267-by using the modified assignment criterion, “best close 658bp,with77%being658bpand93%>600bp.Thematch”. With “best close match” the best-matching sample comprised 200 genera with all the currentlyreference barcode is identified, but the query is only recognised tribes and subfamilies (Figure 1A) repre-assignedthetaxonnameofthatbarcodeifthebarcode sented. Three saturniid barcodes (Arsenura drucei,issufficientlysimilar(i.e.belowathreshold).Otherwise, Lono mi aelectra,Perigacluacina)werealsoincludedasthequeryremainsunassigned(i.e. “ambiguous”).Inour this family represents the putative sister family to thecase,thethresholdvaluecanbeselectedbyplottingthe Sphingidae[35]takingthefullreferencelibraryto1098nu mber of“true-positives” and “false-positives”against barcodes(seeadditionalfile1:Fullreferencelibrary).the K2P distance from the query to the “best match”. Barcodesfrom118sphingidspeciescollectedinAreaWe then determine a threshold that maximizes the deConservacion Guanacaste, northwestern Costa Rica,numberof“true-positives”whileminimizingthenumber were used as query barcodes (see additional file 2:of “false-positives”. It remains unclear why one would Querydataset).DNAwasextractedfollowingautomatedexpectthatthereshouldbeacommonthresholdacross protocols [36] and the DN A barcode amplified andtaxonomic groups of the same rank or how this could sequenced[37].TheseCostaRicansphingidscomprisedbe implemented in a real-life scenario. Many studies awell-documented[38,39],diversesubsetofthefamily,haveshownauniversalthresholdofgeneticdistanceto with each of the tribes and subfamilies representeddistinguishtaxacannotbedetermined[10].However,in among 29 genera. All the queries were correctlytheabsence ofbetterstrategies,thismetho datleastpro- assigne dtospecieswhenusingthefullreferencelibraryvidesarigorouslyderivedthresholdvalue[25]. anda“best match”assign mentcriterion.
Forthepurposesofthisstudythefollowi ngwerecon-Library species completeness sideredlibrariesof100%completeness:forgenusassign-Based on their study of species in one family of Dip- mentattempts,therepresentativefromthesamespeciestera, Ekrem et al. [9] concluded that assigning a bar- asthatofthequerywastheonlybarcoderemovedfromcode record to the correct genus or species-group was thereferencelibrary;fortribeandsubfamilyassignmentunlikely unless a “nearperfect”match is present inthe attempts, the barcodes from all the representatives ofreference library with the further prediction that a species in the genus of the query were removed from“comprehensive” library is also essential for accurate the reference library. All contribal genera were notassignment to family or even order. Furthermore, Ball removed in the case of subfamily tests, due to theand Armstrong [4] suggested that the failure of a increased level of uncertainty regarding naturalness oflymantriine barcode to group with other me mbers of thesetaxa.its subfamily was attributable tolow taxonsampling in We subsequently created sub-libraries from the fulltheir reference library (also see [5,34]). Considering referencelibrarywithdifferentlevelsofspeciescomple-that growth of the DN Abarcode library will take time, teness.Inanapproachtermedhere“randomsampling”a key issue concerns the effect of completeness of the barcodes were chosen at random to construct sub-reference library on the accuracy of higher taxon libraries comprising 10, 20, 30, 40, 50, 60, 70, 80 andassign ments. Byusingaglobalandcomprehensivebar- 90% of the full reference library. Sub-sampling at eachcode reference library of considerable phylogenetic speciesrichnesslevelwasrepeated30times.Adifferentbreadth (86% of known species in the family), the approach termed here “constrained sampling” limitedSphingidae, we addressed this uncertainty through therandomselectionofspeciestoensureaminimumofsimulating different levels of species completeness of onespeciespergenusinthesub-library.Thisapproachthe reference library and examining the effect on wasreiteratedtoconstruct sub-libraries comprising20,assign mentaccuracy. 30, 40, 50, 60, 70, 80 and 90% of the full reference
WilsonetalBMCEcology2011, 11:18 Page5of14http://www biomedcentral com/1472-6785/11/18library andwasrepeated30timesateachspecies com- Read tree, assign query a taxon or notpleteness level. For the sub-libraries as with the 100% accordingtocriterion.library,forgenusassignmentattempts,weremovedthe Evaluate accuracy of assignment (true orreferencebarcodeforthespeciesofthequeryfromthe false).sub-libraries. For tribe and subfamily assignmentattempts we removed the reference barcodes for the The four tree-based methods were “liberal” (Figuregenusofthequery. 2A) [40], “strict” (Figure 2B) [25,40], “liberal & exclu-sive”and “strict &exclusive”.Wealso performed “bestQuery assignment criteria match”foralltaxonassignments and “bestmatch”andIn each assignment attempt we allowed two possible “best close match” for assignment to genus (with theoutcomes: (i)A“positive”assignment(i.e.thequerywas randomlysampledlibrary)whereassignmentwasbasedassignedtoataxon)or(ii) An“ambiguous”assignment only on the most similar reference library barcode(i.e.thequerywasnotassignedtoataxon).A“positive” (Table1).For“bestmatch”onlya“positive”assign mentassignment was either true (TP) - it matched with the is possible (i.e. the assignment is TP or FP) (Table 1).morphology-based identification, or false (FP) - it dis-
WilsonetalBMCEcology2011, 11:18 Page6of14http://www biomedcentral com/1472-6785/11/18
For “best close match” the query was assigned to the accuracy equals precision. The results are discussedtaxonofthemostsimilarlibrarybarcodebasedonK2P belowintermsofthesemeasures.distance, provided it was within a certain threshold. Iftherewerenobarcodesinthelibrarywithinthethresh- Resultsold,theassignmentwas“ambiguous”.Inordertoselect Theresultsofalltheexperimentsareprovidedinaddi-athresholdwelooke dattheresultsofthe“bestmatch” tionalfile3:Resultsofallexperi ments.criterionandplottedthenu mb erof“true-positives”and
“false-positives”againsttheK2Pdistancefromthequery Correct assignments togenus, tribe and subfamily (100%to the “best match”. The distance that maximized thelibrary)numberofTP(whichinourcasealsocorresponded to The overall accuracy of assignment to genus was 0.83the distance with the lowest proportion of FP) was using the “liberal” and 0.75 using the “strict” criterion.selectedasthethreshold. T heprecisionofassignmenttogenuswas0.86usingthe
M easures of accuracy were calculated as follows: 1. “liberal”and0.98usingthe“strict”criterion.Anu mberPrecision,thefractionofbarcodesplacedinataxonthat ofqueryspecieswereconsistentlyassignedtothewrongbelongsthere,TP/(TP+FP) ;and2.OverallAccuracy,the genus across all analyses resulting in FP. Even thoughproportion of barcodes placed without any error, (TP theseFPweretechnicallyincorrectassignmentsTable2+TA)/(TP+FP+TA+FA) [33].Note,for‘bestmatch”due details how in many cases the assignments made someto the absence of the “ambiguous” category overall sense considering the taxonomic structure andTable2False-positiveassignmentsatthegenuslevel
QuerySequence Assignment Notes
Eupyrrhoglossumsa gra(2) Aellopos(5) AelloposandEu pyrrhoglossum are mo stlikelyasisterpair[17]Eupyrrhoglossum differsfrom
Aelloposonlyinfore wingveinsRs3and Rs4re mainingseparateapicallyandthe phalluslackingspinesontherightside(Kitching,personalcom munication)
Ma doryxplutonius(4) Pseudosphinx(1) PseudosphinxwasclosetoMa doryxontheKawaharaetal[17]phyloge nyan dbothge nerab elongtothesametribe(Dilophonotini)PseudosphinxisveryclosetoIsognathus;indee d,itcouldbeargue ditisjustanoversizedIsognathus withoutyellowinthehindwin g(Kitching,personalcom munication)
m entionedasanorientalgenusexpectedtofallnearthebaseoftheAc herontini/Sphinginicladewhi chinclude daparaphyletic Man duca
Neococytiuscluentius(1) A mphi moea(1) A mphi moeawasnotsampledbyKawaharaetal [17]butwasplacedbyKitching[49]asthe
h ow e ver,RothschildandJordan[50]notedacl oser m orp hol ogicalsi milarityof darcetato
S merinthinaeandalongwayremove dfromP syces(Kitching,personalcom m unication)
Pachylioidesresum ens(1) Pachylia(3) Pachylioidesan dPachyliaweren otcl os elyrelatedo ntheKawa haraetal [17]phyl ogeny,butw ereassociatedbyKitchingan dCa di ou[16]
Phryxuscaicus(1) Erinnyis(10 NotincludedbyKawaharaetal [17]butlinkedbyKitchingandCadiou[16]withPhryxusco nsi deredjustahighlydivergentErinnyis
Pseudosphinxtetrio(1) Madoryx(4) Pseudosphinxan dMadoryxwerereciprocallymis-assigned Seeabove
Xylophanesgodmani(80) Theretra(38) X god mani wasnotsampl edbyKawaharaetal [17]butXylophanesan dTheretraare m e mb ersofthesam etribe(Choericampina)an dw eresug ge stedto becloselyrelatedby Hu nsdoeferetal
[51]basedontheirmtDNAphylogeny
Xylophanesturbata(80) Chaerocina(3) X turbatawasnotsampl edbyKawaharaetal [17],buttheunex pectedplace me ntof
Chaerocina,closetoXylophanes,waso bservedo ntheirphyloge ny
Speciesandgenusna me sarefollowedbyanu mberinbracketswhichgivesthenu mberofspeciesinthegenusinthe100%referencelibrary,e g Neococytiuscluentius(1)isthes oleme m b erof Ne oc ocytiusinthe 100%referenc elibrary
WilsonetalBMCEcology2011, 11:18 Page7of14http://www biomedcentral com/1472-6785/11/18phylogenyofthefamily. Theseincludedfourspeciesin Overallaccuracyofassignmenttotribewas0.75usingm o nobasic genera: Pachylioides resumens, Phryxus cai- the“liberal”and0.66usingthe“strict”criterion(Figurecus, Pseudosphinx tetrio,andNeococytius cluentius, for 3). Precision of assignment to tribe was 0.81 using thewhichtheonlypossibleoutcomeswereFPorTA, since “liberal” and 0.95 using the “strict” criterion. Many ofa query belonging to a monobasic genera cannot be a the query barcodes placed in the wrong tribe belongedTP. A second group of FP were query barcodes togenerathatarepositionedasparaphyleticorpolyphy-(Madoryxplutonius, Manducaalbiplaga, Pachyliadar- letic with respect to their current tribal designations,ceta and Pachylia ficus) assigned to monobasic genera according to recent phylogenetic study (e.g. Agrius,inthereferencelibrary(seeTable2).Queriesbelonged Aleuron,Cautethia,Cocytius,Enyo,Eumorpha,Pachygo-tospeciesnotpresentinthereferencelibrary,monoba- ni dia[17]),orwereonlongbranchesinabasalpositionsicgenerahaveonlyasinglespeciesthatwaspresentin (Pachylia)within their tribe. Ani nstructive example isthelibrary,therefore,thisgroupcouldmorecorrectlybe Eu morpha,agenuscurrentlyplacedinthetribePhilam-interpreted as TA or FA assignments. Two FP, Xylo- pelini. Query barcodes belonging to Eumorpha werephanes godmani and Xylophanes turbata, were queries assigned totribe Macroglossini. This isconsistent withfrom an exceptionally species-rich genus (104 species theplacementofEumorpha(+Enyo)assistertoacladeglobally). The overall accuracy of assignment to genus comprisedof MacroglossinionthephylogenyofKawa-inthisstudywassimilar tothatreportedbyEliasetal. haraetal. [17].
[24]whofound69-81%oftheirIthomiinaequerieswere Overallaccuracyofassignmenttosubfamilywas0.90assignedtothecorrectgenususingtree-basedcriteria. using the “liberal” and 0.84 using the “strict” criterion
WilsonetalBMCEcology2011, 11:18 Page8of14http://www biomedcentral com/1472-6785/11/18with “best match” having the highest overall accuracy overallaccuracyacrossallsub-libraries,buthigherpre-for this taxonomic level (0.92) (Figure 3). Precision of cision with an average of only 2% of assignments toassignmenttosubfamilywas0.83usingthe“liberal”and genusbeingFP(Figure4).
0.96usingthe“strict”criterion. Thecriteria requiring exclusivity resulted inanover-whel mingnu mberofFAassignments(Figure5)andpro-Success oftree-based assignment criteria duced very low overall accuracy and precision despiteC on sideringFigures4and5,itisclearthatdifferentcri- theirlowerincidenceofFP(Figure5).Notethatthesuc-teriaproduce dcontrastingresults.Forexample, “liberal” c essrateforcriteriawithouttheexclusivityrequire mentwas frequently the highest scoring criteria in terms of arehigher,becausetheydidnotrequire“monophyly";i.e.overall accuracy (Figure 3), but performed less well in queriescanbeassignedontreeswithcongeneric(orcon-terms of precision with an average of 18% of assign- tribal andsubfamilial) barcodes found intwodifferentments to genus being FP (Figure 4). “Strict” had lower “clades”aslongastherulesofthecriterionaremet.
Wilsonetal BMCEcology2011,11:18 Page9of14http://www bio me dcentral co m/1472-6785/11/18
with the expectation of authors like Ekrem et al. [9] .Success ofsequence comparisonassignment criteria This maybe explained largely by differences in studySuccessunder‚bestmatch‛was‚similart‛o‚strict‛atthe design.Our experimental design measures the relativetribe level but very similar to liberal at the subfamily precision and overall accuracy of different assignmentlevel(Figure4),whereitactuallyhadthehighestoverall criteria across reference libraries of different levels ofaccuracy but was stillbehind the‚strict‛criteria in completeness and structure.No single assignment cri-termsofprecision(Figure3). terion wassuperior across therange of taxonomic sce-
Inordertobeabletouse‚bestclosematch‛,wefirst nariosexaminedandtherewasoftenaconflictbetweendeterminedtheoptimalthresholdtobe0.05K2Pdistance overall accuracy and precision.Our results discussed(Figure 6)andthis value wasused todecide whether a below,togetherwithimplicationsforcriterionselection,queryhadacloseenoughbarcodematchbegivena‚posi- indicate a clear requirement for species to be in taxative‛assignment.‚Bestclosematch‛successfullyreduced thatarewell-differentiatedcladestomaximizethenum-thehighnumberofFPseenwith‚bestmatch‛(Figure6), berofcorrectassignments.Whetherthesesuccessratesbut,likethe‚strict‛criterionresultedinalargenumberof are high enough to be useful remains a judgment callFA.Successunder‚bestclosematch‛wasverysimilarto fortheend-user.
‚strict‛butitproducedamuchlowernumberofTPwiththelargersub-libraries(Figure6). Assessing barcoding accuracy with taxonomicc lass ifica tio ns
Effectoflibrarycompleteness Inthis studywehavepresentedsimplifiedexamplesThe‚liberal‛and‚strict‛criteria were generally the wherethespeciesofthequerybarcodeismissingfromhighest-scoring criteria in terms of overall accuracy referencelibraries(andtheentiregenusforassignmentsand precision across all taxonomic levels and all sub- totribeandsubfamily)toensureweweresolelyaddres-libraries (Figure 3 and 4).An exception was the high sing the question of assigning the query to the nextprecision observed forthe‚strict&exclusive‛criterion least inclusive taxon.By excluding the possibility of a
Sharktech又称SK或者鲨鱼机房,是一家主打高防产品的国外商家,成立于2003年,提供的产品包括独立服务器租用、VPS云服务器等,自营机房在美国洛杉矶、丹佛、芝加哥和荷兰阿姆斯特丹等。之前我们经常分享商家提供的独立服务器产品,近期主机商针对云虚拟服务器(CVS)提供优惠码,优惠后XS套餐年付最低仅33.39美元起,支持使用支付宝、PayPal、信用卡等付款方式。下面以XS套餐为例,分享产品配...
官方网站:https://www.akkocloud.com/AkkoCloud新品英国伦敦CN2 GIA已上线三网回程CN2 GIA 国内速度优秀.电信去程CN2 GIALooking Glass:http://lonlg.akkocloud.com/Speedtest:http://lonlg.akkocloud.com/speedtest/新品上线刚好碰上国庆节 特此放上国庆专属九折循环优惠...
A400互联怎么样?A400互联是一家成立于2020年的商家,A400互联是云服务器网(yuntue.com)首次发布的云主机商家。本次A400互联给大家带来的是,全新上线的香港节点,cmi+cn2线路,全场香港产品7折优惠,优惠码0711,A400互联,只为给你提供更快,更稳,更实惠的套餐,香港节点上线cn2+cmi线路云服务器,37.8元/季/1H/1G/10M/300G,云上日子,你我共享。...