thresholdaltools

altools.u32  时间:2021-01-30  阅读:()
APPLICATIONNOTEOpenAccessAltools:auserfriendlyNGSdataanalyserSalvatoreCamiolo1*,GauravSablok2andAndreaPorceddu1AbstractBackground:Genotypingbyre-sequencinghasbecomeastandardapproachtoestimatesinglenucleotidepolymorphism(SNP)diversity,haplotypestructureandthebiodiversityandhasbeendefinedasanefficientapproachtoaddressgeographicalpopulationgenomicsofseveralmodelspecies.
ToaccesscoreSNPsandinsertion/deletionpolymorphisms(indels),andtoinferthephyleticpatternsofspeciation,mostsuchapproachesmapshortreadstothereferencegenome.
Variantcallingisimportanttoestablishpatternsofgenome-wideassociationstudies(GWAS)forquantitativetraitloci(QTLs),andtodeterminethepopulationandhaplotypestructurebasedonSNPs,thusallowingcontent-dependenttraitandevolutionaryanalysis.
Severaltoolshavebeendevelopedtoinvestigatesuchpolymorphismsaswellasmorecomplexgenomicrearrangementssuchascopynumbervariations,presence/absencevariationsandlargedeletions.
Theprogramsavailableforthispurposehavedifferentstrengths(e.
g.
accuracy,sensitivityandspecificity)andweaknesses(e.
g.
lowcomputationspeed,complexinstallationprocedureandabsenceofauser-friendlyinterface).
HereweintroduceAltools,asoftwarepackagethatiseasytoinstallanduse,whichallowstheprecisedetectionofpolymorphismsandstructuralvariations.
Results:AltoolsusestheBWA/SAMtools/VarScanpipelinetocallSNPsandindels,andthednaCopyalgorithmtoachievegenomesegmentationaccordingtolocalcoveragedifferencesinordertoidentifycopynumbervariations.
Italsousesinsertsizeinformationfromthealignmentofpaired-endreadsanddetectspotentiallargedeletions.
Adoublemappingapproach(BWA/BLASTn)identifiesprecisebreakpointswhileensuringrapidelaboration.
Finally,Altoolsimplementsseveralprocessesthatyielddeeperinsightintothegenesaffectedbythedetectedpolymorphisms.
Altoolswasusedtoanalysebothsimulatedandrealnext-generationsequencing(NGS)dataandperformedsatisfactorilyintermsofpositivepredictivevalues,sensitivity,theidentificationoflargedeletionbreakpointsandcopynumberdetection.
Conclusions:Altoolsisfast,reliableandeasytousefortheminingofNGSdata.
Thesoftwarepackagealsoattemptstolinkidentifiedpolymorphismsandstructuralvariantstotheirbiologicalfunctionsthusprovidingmorevaluableinformationthansimilartools.
Reviewers:ThisarticlewasreviewedbyProf.
LeeandProf.
Raghava.
Openpeerreview:ReviewedbyProf.
LeeandProf.
Raghava.
Forthefullreviews,pleasegototheReviewers'commentssection.
Keywords:Next-generationsequencing,Copynumbervariation,SNPs,Indels,Largedeletions,Re-sequencing*Correspondence:scamiolo@uniss.
it1UniversitàdeglistudidiSassari,DipartimentodiAgraria,SACEG,ViaEnricoDeNicola1,Sassari07100,ItalyFulllistofauthorinformationisavailableattheendofthearticle2016Camioloetal.
OpenAccessThisarticleisdistributedunderthetermsoftheCreativeCommonsAttribution4.
0InternationalLicense(http://creativecommons.
org/licenses/by/4.
0/),whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedyougiveappropriatecredittotheoriginalauthor(s)andthesource,providealinktotheCreativeCommonslicense,andindicateifchangesweremade.
TheCreativeCommonsPublicDomainDedicationwaiver(http://creativecommons.
org/publicdomain/zero/1.
0/)appliestothedatamadeavailableinthisarticle,unlessotherwisestated.
Camioloetal.
BiologyDirect(2016)11:8DOI10.
1186/s13062-016-0110-0ImplementationBackgroundGenome-basedpolymorphicscansarethestandardmethodtoestablishthedegreeofconservationandphylogeneticimprintingamongtherelatedplanttaxa.
Approachesbasedonre-sequencinghaverecentlybeenexploitedforthediscoveryofsinglenucleotidepolymor-phisms(SNPs)andinsertion/deletionpolymorphisms(indels)asaproxyforthephyleticpatternsofevolution[1].
InadditiontothecreationofSNPmaps,itisusefultoidentifySNPsassociatedwithparticulartraitsinordertolocalizequantitativetraitloci(QTLs)suitableformo-lecularbreedingprograms[2].
Inthelastdecade,theoptimizationofnext-generationsequencing(NGS)chemistryandplatformshasin-creasedthethroughputofsequencingwhilereducingcosts.
Althoughthegenerationoflargeamountsofsequencedataisnolongerabottleneckinscientificin-vestigations,theinterpretationofthedataremainschal-lenging.
Re-sequencingapproachesproducemillionsofshortreads50–400bpinlength,althoughthelatesttechnologiesarelikelytoyieldlongerreads.
Whenatar-getgenome(TG)isre-sequenced,thealignmentofsuchreadstoareferencegenome(RG)resultsinthedetec-tionofsequencevariantssuchasSNPsandindels,andseveralalignmentalgorithmshavebeendevelopedtode-tectthem[3].
NGSplatformsalsogeneratesequencingerrors,soothertoolshavebeendevelopedtoreducethenumberoffalsepolymorphismsbyintroducingsuitablestatisticaltests[4].
AlthoughmanyalignerssuchasBWA[5]andBowtie[6]incorporatealgorithmsthatidentifySNPsandindelsquicklyandaccurately,theyfailtodetectlargegenomicdeletions(hundredstothousandsofbases)possiblyduetothesegmentalduplicationofthegenomeandtheretro-transpositionofshortandlonginterspersedele-ments(SINESandLINES)[7].
Thesetypesofpolymor-phismsarebetterhighlightedbysoftwarethatdetectsanomalousinsertsizesinthealignmentofpaired-endreads,orbylong-readsequencingapproaches[8].
Alter-natively,splittingeachreadintotwoportionscaniden-tifyreadsspanningthedeletedsegment(e.
g.
thedeletionbreakpoints)[9].
ToolssuchasPindel[10],Breakdancer[11]andPEMer[12]relyonsuchstrategiestoidentifylargedeletions,andmustdealwiththecompromisebe-tweenspeedandtheaccuracyofbreakpointdetection.
Inferringthedeletioncoordinatesfromthedistancebe-tweentwomappedpaired-endreadsisinaccuratebe-causetheinsertsizeisusuallypartofadistributionratherthanaprecisevalue.
Theidentificationofsplit-mappedreadsisalsoanextremelytimeconsumingandcomputationallydemandingtask.
Resequencingdatahavealsobeenusedtodetectlargegenomicrearrangementssuchascopynumbervariations(CNVs)andpresence/absencevariations(PAVs)[13].
CNVsreflectduplicationordeletioneventsthatchangethecopynumberofspecificgenomicsequenceswhencomparingtargetandreferencegenomes.
Alignmentcoverageateachreferencepositionwillincreaseinadu-plicatedsegmentanddecreaseinadeletedsegment,sothedepthofcoverage(DOC)isoftenusedtoidentifyCNVs[13].
PAVsareidentifiedbydetectingreferencepo-sitionsthatarenotcoveredbyanytargetgenomereads.
Computationaltoolsforsequencealignmentandana-lysisareoftendifficulttoinstallanduse,particularlyfornon-specialistresearcherswithlimitedexperienceinthefieldofbioinformatics.
HerewepresentAltools,auser-friendlysoftwareplatformfortheinterpretationofrese-quencingdata.
Thepipelinehelpstheusertoachievethealignmentofsequencedreadsagainstareferencegenome,thediscoveryofSNPs/indels(atthegenomicandtran-scriptlevels),CNVs,PAVsandlargedeletionsthroughanintuitivegraphicaluserinterface(GUI).
Thealgorithmsin-cludedinAltools(Additionalfile1:FigureS1)ensuretherapidandaccurateanalysisofsequencedataandproduceinformativestatisticsthatlinkthesequencedatatobio-logicalfunctions[14].
MaterialsandmethodsSequencedataArabidopsisthalianareferencegenome(Col0ecotype)togetherwiththecorrespondinggeneannotationfilewasdownloadedfromtheTAIRwebsite(ftp://ftp.
arabidopsi-s.
org/home/tair/Genes/TAIR7_genome_release/).
Gff2se-quence[15]wasusedtogenerateFASTAformattedsequencesofcodingsequences(CDS)anduntranslatedre-gions(UTR).
ResequencingdatafortheTsu1andBur0ge-notypesweredownloadedfromtheSRAdatabase(http://www.
ncbi.
nlm.
nih.
gov/sra/)(Additionalfile2:TableS1).
GenomesimulationTheRpackageRSVSim[16]wasusedwithdefaultparam-eterstogenerateA.
thalianasimulatedgenomesthatin-cludeddeletionsandduplications(maxDups=10)ofvariablesizes(2000,10,000and50,000bp).
Forsuchrear-rangedgenomes,dwgsimsoftware(http://davetang.
org/wiki/tiki-index.
phppage=DWGSIM)wasusedtosimulateIlluminapaired-end70-bpreadsatdifferentcoverages(pa-rameters:Ccov-c0-S2-e0.
0001-0.
01-E0.
0001-0.
01,withcovequalto4,10,20,40and100).
Thesametoolwasusedtogeneratesimulated70-bppairedendreadsfortheoriginalA.
thalianagenomewith40xcoverage.
EvaluationofpolymorphismqualityWeappliedthepositivepredictivevalue(PPV)andsen-sitivityteststodeterminetherobustnessofSNPsandindels.
ThePPVistheportionofthetotalnumberofcalledpolymorphismsthatarecorrect[17].
SensitivityCamioloetal.
BiologyDirect(2016)11:8Page2of11indicatestheratiobetweenthenumberofcorrectlycalledpolymorphismsandthetotalnumberofgenuinepolymorphisms[17].
PPVandsensitivitywerealsousedtoevaluatethereliabilityofpredictedlargedeletionsandduplications.
Inthiscase,thenumberofpositionsin-cludedintheidentifiedstructuralvariantswasdividedbyeitherthetotalnumberofbasesineachstructuralvariant(PPV)orbythetotalnumberofbasesrepresent-inggenuinestructuralvariants(sensitivity).
Readalignment:mappingrawreadsagainstareferencegenomeTheReadalignmenttoolallowstheusertomapasetofFASTQ-formattedreadstoareferencegenomeusingBWA[5]asthealigner,tosortandindexthealignmentfilewithSAMtools[18]andtocallstatisticallysignifi-cantpolymorphismswithVarScan[19].
BWAwaspre-ferredoverotheralignersbecauseitperformsbetterthansimilartools(e.
g.
Bowtie2)whenanalysinglongerreads[20](ascenariothatwillbecomemorecommonforfuturesequencingtechnologies).
Similarly,VarScanwaschosenbecauseofitshighsensitivity[21]andbetterperformanceinlower-coveragesequencingruns[22].
BothtoolshavebeenimplementedinAltoolswithoutmodificationsandthereforetheirperformancehasnotchanged.
Altoolswillautomaticallyrecognizepaired-endandsingle-enddatasetsandalignthemaccordingly.
Editdistance,numberofthreads(thusallowingforparallelcomputing)andanyadditionalBWAflagscanbespecifiedbytheuser.
Whenthealignmentofreadsiscomplete,apileup-formattedfileisgeneratedbySAMtools[18]consid-eringonlythosealignmentsthatfulfilspecificuser-definedrequirements("minimumalignmentquality","minimumbasequality"and"additionalpileupparameters"intheGUI).
MoreinformationcanbefoundintheAltoolsman-ualprovidedwiththesoftware.
Pileupanalyser:providingfasteraccesstothealignmentdataThePileupanalysertoolisusedtogenerateapileupfoldercontainingfilesrelatedtoeachchromosomeintherefer-encegenome.
Onlyinformationaboutposition,referencegenomenucleotide,targetgenomenucleotide,coverageandpresence/absenceofSNPsandindelsisreportedinsuchfiles,withtheaimofreducingdiskspaceusageanddataprocessingtimesduringfurtheranalysis.
Pileupana-lyseralsooffersseveralconfigurablefiltersettingsrelativetotheminimumnumberofreads,thebasequality,theminimump-valueandthresholdallelefrequencyforcall-ingSNPsandindels.
Acomprehensivesummarystatisticsfileisalsoproduced,reportingthepercentageofnon-coveredchromosomes,thefrequencyofSNPsandindels,specificcoverageofbasesG|CandA|T,andthefrequencyofbasesinvolvedinselectedpolymorphisms.
Coverageanalyser:detectingCNVsandPAVsTheCoverageanalysertoolisdesignedtoinvestigateCNVsandPAVsbasedonthelocaldepthofcoverage.
Anomalouscoveragevaluesmayreflectthestructureofthetargetgen-ome(i.
e.
duplicationsmaybepresentinthereferencegen-ome),soCNVdetectionrequiresthatalignmentdatafromboththetargetandreferencegenomesarecompared.
Coverageanalyserinitiallycalculatestheaveragecoverageforthereferencegenome(RGavCov)andtargetgenome(TGavCov)whilecomputingonlyinformativepositions(i.
e.
coverage>0).
Aseriesofadjacentwindowsisthengener-atedalongthechromosomes,andfortheithwindowanaveragecoverageiscalculatedforboththereferencegen-ome(RGwindCov(i))andthetargetgenome(TGwindCov(i))bycomputingtheinformationreportedintherelativepileupfolders.
GenomicportionsthatfeatureTGwindCov(i)=0butRGwindCov(i)>0areimmediatelyreportedintheoutputas"zerocoverage"regions,whichhighlightpotentialPAVs.
Furthermore,foreachithwindow,thevalueρ(i)iscalcu-latedastheratiobetweentheaveragecoverageofthetargetandreferencegenomesinthatwindow:ρiTGWindCoviRGWindCoviTheDNAcopyalgorithm[23]isthenusedtosplittheDNAintosegmentsfeaturinghomogeneousvaluesofρ(i)(hereafterρseg).
Foreachsegmentj,thisvalueisnor-malizedinordertoaccountfortheaveragecoverageofthetwosegments:ρsegNormjρsegjRGavCovTGavCovMoreover,foreachsegment,theaveragecoverageofthetargetgenome(TGsegCov(j))andreferencegenome(RGsegCov(j))arealsocalculated.
Coverageanalyserthenreportslossesandgainsaccordingtothefollowingra-tionale:forthejthsegment,thehypotheticalcopynum-berforboththereferenceandtargetgenomesiscalculatedbydividingthesegmentaveragecoveragebytheoverallaveragecoverage:TGsegCopyjTGsegCovjTGavCovRGsegCopyjRGsegCovjRGavCovIfoneormorecopiesofsegmentjhavebeenlostfromthetargetgenomethenthefollowingrelationshipshouldbesatisfied:TGsegCopyj≤RGsegCopyj1However,ifoneconsidersadiploidorganismthatlosesasegmentcopyinonlyoneofthehomologousCamioloetal.
BiologyDirect(2016)11:8Page3of11chromosomes,thefollowingrelationshipismoreaccurate:TGsegCopyj≤RGsegCopyj0:5Theabovecanbereformulatedas:ρsegNormjRGsegCopyj≤RGsegCopyj0:5Thisleadstotheconclusionthatasegmentcanbede-finedaslostifthefollowingrelationshipissatisfied:ρsegNormjloss≤10:5RGsegCopyjSimilarly,againedsegmentisreportedifthefollowingrelationshipissatisfied:ρsegNormjgain≥10:5RGsegCopyjDNAcopyallowsthemergingofsegmentswhoseρsegvaluesareatleastthreestandarddeviationsapart,there-forecreatingasmootheddataset.
Coverageanalyseralsoperformsthesearchforlostandgainedsegmentsonsuchdatasets.
Importantly,Coverageanalysernotonlyreturnsthecoverageratiobutalsotheindividualcalcu-latedcopynumberforboththereferenceandtargetge-nomes.
Thisfeatureprovidesadeeperinsightintothemeaningoftheratiovalue(e.
g.
avalueof2mayderivefroma2:1or4:2ratio,amongothers).
Slidinganalysis:visualizingcoverageandpolymorphismdataTheSlidinganalysistoolcomputestheaveragecoveragetogetherwiththefrequencyofSNPsandindelswithineitheradjacentorslidingwindowsalongthechromo-some.
Boththerawdataandthecorrespondingplotsaregenerated,sothistoolquicklyhighlightshighlypoly-morphicregionsorsitespotentiallycontainingCNVs.
Largedeletionsfinder:fastidentificationofdeletionsbreakpointsCommonalignersthatuseshortreadsarenotsuitableforthedetectionoflongdeletions.
TheLargedeletionsfindertoolusesafoldercontainingSAM-formattedfilesthatareproducedfollowingthealignmentofpaired-endreadstoareferencegenome.
Adeletioniscalledwhenthemappingdistancebetweentwomate-readsishigherthanauser-definedthreshold.
Overlappingdeletionscanbemergedifthedistancebetweenthefirstmateforbothsetsofpairedendsdoesnotexceedauser-definednum-berofnucleotides.
Altoolsreturnstheapproximateco-ordinatesofthedeletionboundariesatthisstage(Additionalfile3:FigureS2A).
AnadditionalalignmentstepisperformedusingBLASTntopreciselyidentifythedeletionbreakpoints.
Tworangesaredefinedthatare2000nucleotideswideandcentredontheapproximatestartandendpositions,respectively(Additionalfile3:FigureS2B).
AllreadpairsforwhichatleastonemateismappedwithinsuchrangesareextractedfromtheSAM-formattedalign-mentfileandmappedontothereferencegenomebyBLASTnalignment.
Readsthatdidnotmapontotherefer-encegenomeoriginally,possiblyduetoabrokenalign-ment,willproducehitsthatcanbeusedtoinfertherealdeletionboundaries(Additionalfile3:FigureS2C).
Coverageanalysercarriesoutanadditionaltesttohigh-lightpotentialfalsepositivedeletionsreflectingintrachro-mosomalduplicationevents.
Thefirst200nucleotidesbeyondtheupstreamdeletionbreakpointareextractedfromthereferencegenomeandusedagainasaBLASTnquerytosearchforadditionalalignments.
Intheoutputfile,furtherfieldsarereportedforeachdeletionindicatingthepositionofthesesecondaryalignments,theirpercent-ageofidentityandalignmentcoverage.
Wedefinedele-tionsthatfeaturesuchsupplementaryfieldssuchasambiguous,asexplainedinmoredetailintheAltoolsmanual(Additionalfile4:FigureS3).
Finally,thecoverageofthedeletedregionsisreportedinordertospeculatewhetherthedetectedstructuralvariationishomozygousorheterozygous,andtotestforthepresenceofthedeletedregionsatotherpositionswithinthetargetgenome.
Polymorphismanalyser:linkingvariantstobiologicalfunctionsWhenSNPsandindelshavebeenidentifiedusingtheBWA/SAMtools/VarScanpipeline,thePolymorphismanalysertoolcanbeusedtohighlightthosenucleotidevariationsthataffectthegenicportions,i.
e.
codingse-quences(CDS)anduntranslatedregions(UTR).
Thistoolrequiresthepileupfolder,anadditionalfoldercon-tainingFASTA-formattedCDSandUTRsequences,andthegff3-formattedgeneannotationfile.
Polymorphismanalyserreturnsatablethatreportsinformationsuchas:(a)thegenicportionofthesequence(CDS,3UTRand/or5UTR),(b)thegenename(c)therelativepositionofthepolymorphism,(d)thenucleotidescalledintheref-erencegenomeandinthealignedreads,(e)thezygosityofthemutation,(f)aminoacidsubstitutionsduetonon-synonymousSNPs,includingmutationsgeneratingaprematurestopcodon,and(g)anyframeshiftcausedbyindelswithintheCDS.
AlignmentcomparisonThe1:1Alignmenttoolcomparesthepileupfoldersoftwodifferentalignmentsonthesamereferencegenomeandreportsthecommonanduniquepolymorphisms.
Camioloetal.
BiologyDirect(2016)11:8Page4of11GeneextractorTheLargedeletionfinderandCoverageanalysertoolsfeatureanoptiontogenerateaGEfilethatcanbeana-lysedinmoredetailusingtheGeneExtractortool.
Thelatteralsorequiresagff3-formattedannotationfileandreturnsalistofgenesthatarepartially(markedwiththeflag0)ortotally(markedwiththeflag1)includedwithinaselectedstructuralvariation.
PerformanceSNP/indelidentificationinsimulatedgenomesTheA.
thalianagenome(TAIR7)wasusedasascaffoldtogeneratedfivesetsofpaired-endIlluminareadswith4x,10x,20x,40xand100xcoverage,respectively.
Foreachcoveragedataset,readswerealignedtotheoriginalreferencegenomeusingtheReadsalignmenttoolwithdefaultparameters.
ThePileupanalysertoolswasthenused(seeAdditionalfile5:TableS2forsettings)tode-tectthesimulatedpolymorphisms.
AlthoughthePPVswere>0.
99foreachoftheanalyseddatasets,sensitivityincreasedtoaplateauat20xcoverageforbothSNPsandindels(Table1).
Moreover,whereastheSNPcallingsensitivityreachedamaximumvalueof0.
98,indeliden-tificationwaspoorwithamaximumvalueof0.
81at40xcoverage.
StructuralvariationidentificationinsimulatedgenomesFiftydeletionsof2000bpwereintroducedintotheA.
thalianagenomeandtheresultingsimulatedsequencewasusedtogeneratefivesetsofpaired-endIlluminareadswith4x,10x,20x,40xand100xcoverage,respect-ively.
Thesametestwasthenrepeatedbysimulating10,000and50,000bpdeletions.
TheLargedeletionsfindertoolwasusedtolocalizethesimulateddeletionsineachdataset.
ThePPVandsensitivitywere>0.
97forallthedatasetsandinmanycasestheyreachedtheirmaximumvalue(Figs.
1andAdditionalfile6:FigureS4).
Furthermore,wecomputedthedistributionofthedifferencesbetweentheobservedandsimulatedbreak-points.
Themedianwas0atallparametersforcoverageanddeletionsize,withdifferencesofafewnucleotidesbetweenthe10thand90thdistributionquartiles(Fig.
1andAdditionalfile6:FigureS4).
TheLargedeletionsfindertoolwascomparedtothewidely-usedPindelsoft-ware[10]andtheformershowedsuperiorperformanceintermsofexecutiontimeand,inmostcases,alsoPPVandsensitivity(Additionalfile7:TableS3).
Wealsosimulated50duplicationsof2000bpinthesamereferencegenomeandgeneratedfivesetsofpaired-endIlluminareadswith4x,10x,20x,40xand100xcoverage,respectively.
Theapproachdescribedabovewasusedtoinvestigateduplicationsof10,000and50,000bp.
Ineachofthesimulateddatasets,themax-imumnumberofduplicationswas10.
Coverageanalyserwasusedtolocalizetheduplicatedregionsanddeter-minethenumberofcopiesbasedonareferencegenomepileupfolderderivedfromthealignmentandpileupofA.
thalianasimulatedreads.
A50-bpwindowwasusedandonlylosses/gainslargerthan500bpweresenttotheoutputfile.
Thesoftwareachievedthebestperformancewhenonlylargeduplicationswerepresent,resultinginthehighestPPVs(0.
97–1)andsensitivities(0.
99–1)asshowninFigs.
2andAdditionalfile8:FigureS5.
Table1PerformanceoftheAltoolsplatform(detectionofpolymorphisms).
StatisticalanalysisofAltoolspolymorphismcallingwascarriedoutatfivesimulatedcoveragelevelsCoverage4x10x20x40x100xdgwsimgeneratedpolymorphisms121,388122,074121,368121,540121,638dgwsimgeneratedSNPs107,054107,411106,766107,372107,277dgwsimgeneratedindels14,33414,66314,60214,16814,361AltoolstotalcalledSNPs35,71481,647102,493105,164105,580AltoolscorrectlycalledSNPs35,65081,482102,274104,910105,243AltoolsfalsepositiveSNPs64165219254337Altoolstotalcalledindels3049830711,13411,54211,657Altoolscorrectlycalledindels3040828011,11211,50311,621Altoolsfalsepositiveindels927223936PPVSNPs1.
001.
001.
001.
001.
00Indels0.
330.
760.
960.
980.
98SensitivitySNPs1.
001.
001.
001.
001.
00Indels0.
210.
560.
760.
810.
81Camioloetal.
BiologyDirect(2016)11:8Page5of11However,thesensitivitydeclinedto~0.
95fortheduplica-tionsof2000and10000bp,althoughthePPVwaspooronlyforthe4xsimulateddataset(PPV2000bp=0.
21,PPV10000bp=0.
65)asshowninAdditionalfile8:FigureS5.
Thecopynumberwasalsopredictedprecisely,withtheslopebetweenthedetectedandexpectedcopynumbersal-wayshigherthan0.
9(Figs.
2andAdditionalfile8:FigureS5).
ThecomparisonofthismodulewithothersoftwareforthedetectionofCNVs,e.
g.
CNVseq[24],confirmeditsex-cellentperformanceintermsofexecutiontimes,PPVandsensitivity(Additionalfile7:TableS3).
AnalysisofA.
thalianaresequencingdatausingAltoolsAltoolswasusedtoanalysetherealresequencingdataoftwoA.
thalianaaccessions(Bur0andTsu1)fortherobustdetectionofpolymorphismsandtoestimatethescalabilityoftheapproach.
ThePileupanalysertoolidenti-fiedseveralkeyfeatures,suchas:(a)ahighercoverageofG|CcomparedtoA|Tbases(Additionalfile9:TableS4),whichisaknownbiasforsomeIlluminasequencingplat-forms[25];(b)ahigherfrequencyofpolymorphismsinchromosome4(Additionalfile10:FigureS6);and(c)maintenanceofthegenomicstructuredespitetheSNPandindelevents(Additionalfile11:FigureS7).
ThePolymorphismanalysertoolhighlightedthepres-enceof133,129SNPsand5343indelswithintheCDSandUTRsofBur0transcripts.
Interestingly,94%oftheSNPsweidentifiedwerehomozygous,comparedtoonly61.
2%oftheindels(Table2).
ThehigherdegreeofSNPhomozygosityreflectsthestatusofA.
thalianaasanau-togamousplantspecies,whereasthedifferentzygosityratiointhecontextofindelssuggeststheyarelesslikelytobecomefixedduetotheirpotentialdeleteriouseffects,e.
g.
frameshiftsinCDSorregulatorydisruptionintheUTRs.
SNPsintheCDSresultedin49,369aminoacidsubstitutions,573prematurestopcodonsandthelossofthestopcodoninatleastonealleleof114genes(Table2).
AsimilarpictureemergedwhentheTsu1resequencingdatawereanalysed,althoughtheSNPfrequencyprovedtobemorehomogenouswhencomparingtheCDSandUTRsinthisaccession(~0.
29%).
The1:1AlignmenttoolwasusedtocompareBur0andTsu1polymorphisms,revealingthatnearly30%ofthepolymorphismswerecommontobothaccessions(Additionalfile12:FigureS8).
TheCoverageanalysertoolwasusedtoinvestigatelossandgaineventsinBur0bycomparingitsresequencingdatatotheA.
thalianasimulateddata(accessionCol0)aspreviouslydescribed(windowsize=50,minimumnumberofwindowstomerge=4,minimumstructuralvariantsize=1000bp).
Nearly4.
4millionbpwereshowntobelostfromtheBur0genome,whereas3.
4millionbpweregainedFig.
1PerformanceoftheLargedeletionfindertool(detectionoflargedeletionbreakpoints).
DistributionofthedifferencesbetweendetectedandexpectedbreakpointpositionscalledbytheLargedeletionfindertooltogetherwiththecorrespondingPPVandsensitivity.
Theplotsrepresenttheresultsonsimulatedreaddatasetswith10xcoverageandthreelargedeletionsizes(2000,10,000and50,000bp)Fig.
2PerformanceoftheCoverageanalysertool(detectionofcopynumbervariation).
ScatterplotshowingdifferencesbetweendetectedandexpectedcopynumberscalledbytheCoverageanalysertooltogetherwiththecorrespondingvaluesofPPVandsensitivity.
Theplotsrepresenttheresultsonsimulatedreaddatasetswith10xcoverageandthreeduplicationsizes(2000,10,000and50,000bp)Camioloetal.
BiologyDirect(2016)11:8Page6of11(Table3).
GeneExtractorwasusedtoinvestigatewhethersuchstructuralvariationscouldincludeanno-tatedgenes.
Althoughtheidentifiedstructuralvariantscomprisedmorethan6%oftheA.
thalianagenome,onlyafewhundredgenesweretotallyincludedinthecorrespondingregions(Table3).
Ageneontology(GO)singularenrichmentanalysis(SEA)usingtheweb-basedserverAgrigo(http://bioinfo.
cau.
edu.
cn/agriGO/analy-sis.
php)revealedthatthegainedgenesweremostlyin-volvedintherespirationpathway(Additionalfile13:TableS5)whereasthemissinggenes(lostandzerocoverage)wereenrichedinstress-responsefunctions(Additionalfile14:TableS6).
DiscussionInthispaperwepresentAltools,anewsoftwarepipelinefortheanalysisandinterpretationofNGSdata.
Altoolsfea-turesaGUI-enabledworkflowforvariantcallingthatguidestheuserthroughallsteps,beginningwithreference-assistedalignmentandendingwiththefunctionalannota-tionofidentifiedvariants.
AltoolsreliesonaJava-builtGUIthatprovidesauser-friendlybioinformaticsenvironmenttogetherwithseveralalgorithmsdevelopedinC++thatmaximizethecomputationalperformance.
AlthoughmanysoftwareplatformshavebeendevelopedtohandleNGSdataanalysis,Altoolsoffersauniquesetofadvantageousfeatures.
TheBWA/SAMtools/VarScanpipelineisusedforthealignmentandidentificationofSNPsandindels,andtothebestofourknowledgethisisthefirsttimethesecom-ponentshavebeenembeddedasinglesoftwareplatformandtheoverallperformancehasbeenverified.
WefoundthattheproposedstrategyachievedsatisfactoryresultsintermsofPPVandsensitivity,althoughthebestperform-ancewasachievedatcoveragesof10xormore(Table1).
Theperformanceandscalabilityoftheworkflowwasequivalenttoorinsomecasesevenbetterthanotheravail-abletools[17].
ThesensitivitydetectionwasbetterforSNPsthanindels(Table1).
Thismayreflecttheloweditdistanceusedinthealignmentstep(BWAflag–n=4)whichcanreducetheprobabilityofalignmentforreadsfeaturinglongerinsertionsordeletions.
Anewalgorithmwasdevelopedfortheidentificationoflargedeletions.
Thistakesintoaccountpaired-endreadsmappingonthesamechromosomebutatadis-tancethatisincompatiblewiththeexpectedinsertsize,andthiscandeterminetheapproximatecoordinatesoflargedeletions.
TheBLASTalgorithmisthenusedtoac-curatelydetectthedeletionbreakpointsbyusingthebrokenalignmentofreadsspanningtheidentifieddele-tions.
TwoadditionalfeaturesmaketheLargedeletionfindertoolsuperiortosimilartools.
First,coverageofthedeletedsegmentisalsocalculatedinthereferencegenome.
Thiscanprovideadeeperinsightonthetyp-ologyofthelostDNAportion,i.
e.
thepresenceofalignedreadswithindeletionsmayreflecteitherahet-erozygousstructuralvariationorthepresenceofapar-alogousregionelsewhereinthegenome.
Second,theLargedeletionfindertoolalsotestswhetherthedeletionflankingregionsareduplicatedinadditionalpositionsofthechromosome.
Thisfeature,togetherwiththenumberofreadssupportingthestructuralvariation,allowedustoexcludepotentialfalsepositivedeletionsandachievegoodperformanceintermsofPPV,sensitivityandprecisionofbreakpointdetectionforallthesimulateddatasetsweanalysed(Figs.
1andAdditionalfile6:FigureS4).
TheCoverageanalysertoolachievedsatisfactoryPPVandsensitivityvaluestogetherwithaprecisecalculationofthecopynumberinmostofthesimulateddatasets(Figs.
2andAdditionalfile8:FigureS5).
Theperform-ancewaspoorerwhenweanalyseddatasetsfeaturinglowercoverageandsmallerduplicatedsegmentsbecausethemethodissensitivetorandomcoveragefluctuationsthataremoreeasilyaveragedinlongersegments.
OneofthemainadvantagesofAltoolsisitsabilitytolinkSNPs,indels,CNVs,PAVsandlargestructuralvaria-tionswithbiologicaloutcomes.
ThebenefitofthisTable3CoverageanalyserresultsforA.
thalianaaccessionBur0.
Totalnumberofbasesdetectedasgains,lossesandzerocoverageareastogetherwiththenumberofannotatedgenesfoundintheseareasTotallength(bp)#IncludedgenesGains3,429,100145Losses4,443,400116Zerocoverage4,406,500155Table2PolymorphismsfoundinthegenomesandtranscriptsofA.
thalianaaccessionsBur0andTsu1Bur0Tsu1#HomozygousSNPs125,234107,257#HeterozygousSNPs78957203#Homozygousindels32712514#Heterozygousindels20721677CDS0.
320.
28SNPfrequency3utr0.
360.
295utr0.
360.
29CDS0.
0030.
003Indelfrequency3utr0.
0590.
0455utr0.
0630.
049#Aminoacidmutations49,36943,215#Prematurestopcodons573469#Loststopcodons114101Camioloetal.
BiologyDirect(2016)11:8Page7of11approachemergedfromtheanalysisoftwoA.
thalianaaccessions,Bur0andTsu1.
First,Pileupanalyserpro-ducedstatisticsthatwereusedfortheassessmentofthesequencingquality(e.
g.
G|CvsA|Tcoverage)whilere-vealingthatsmallpolymorphisms(SNPsandindels)pre-servethegeneralAT-richnucleotidecompositionprofile(Additionalfile11:FigureS7).
Becausethistoolcon-siderssinglechromosomedatasets,chromosome4wasidentifiedasthemostpolymorphicinbothaccessions(Additionalfile10:FigureS6).
TheCoverageanalysertoolallowedtheidentificationofCNVsandPAVsintheBur0accessionandrevealedthatalmost6%ofthereferencegenomeisinvolvedinsuchstructuralvariations.
Nevertheless,theGeneex-tractortoolshowedthatonlyafewhundredannotatedgeneswereincludedcompletelywithinthedetectedCNVsandPAVsasexpected,andthatmoststructuralvariationswereintergenic(ornon-annotated)sequences.
Interestingly,GOenrichmentrevealedontologiesassoci-atedwiththerespirationpathway(Additionalfile13:TableS5)whichcorrespondstotheabilityofBur0shootstoproducelargeramountsofseveralsugarscomparedtotheCol0accessionunderspecificcondi-tions[26].
TheanalysisofCNVsandPAVsalsoshowedthatmanyofthegenesthathavebeenlostfromtheBur0accessionarerelatedtostress-responsefunctions(Additionalfile14:TableS6)matchingthemorestress-sensitivecharacteristicsofBur0comparedtoCol0[27].
ThePolymorphismanalysertoolallowedtheidentifi-cationofgenesinwhichSNPsorindelscausedgeneloss,prematuretruncationoraminoacidsubstitutions.
AsimpleevaluationofpolymorphismfrequencieswithintranscriptsshowedhowSNPsaremorelikelythanindelstobecomefixedintheCDS,withindelsfeaturingmuchlessfrequentlyintheCDScomparedtotheUTRs.
Thishypothesiswasconfirmedbythehigherpercentageofheterozygousindels,contrastingwiththeautogamyofA.
thaliana(Table2).
Finally,polymorphismsintheBur0andTsu1accessionswerecomparedtofindcommonanduniqueSNPsandindels,anadditionalAltoolsfea-turethatcouldbeusedtoinvestigatephylogeneticrela-tionships,developaDNAbarcodingsystemorconductgenomewideassociationstudies.
ConclusionsAdvancesintheNGStechnologiesinthelastyearshaveledtothedevelopmentofstreamlinedworkflowsfortheanalysisandinterpretationofNGSdata.
Inthiscontext,Altoolsoffersauniquecombinationoffeaturesinclud-inganintuitiveGUI,astraightforwardinstallationpro-cedureanduser-friendlymenussuitableforresearcherswithonlybasicinformaticsskills.
Thenewalgorithmfortheidentificationofseveraltypesofstructuralvariationswasfast,accurateandsensitive,equallingorexceedingtheperformanceofcontemporarysoftwareplatforms.
Fi-nally,theAltoolspipelineisnotsolelybasedonthecomparativeanalysisofsequencingdatabutalsothebio-logicalinterpretationofcomplexdatasets.
AvailabilityandrequirementsProjectname:AltoolsProjecthomepage:http://sourceforge.
net/projects/altools/Operatingsystem:Linux64bitProgramminglanguage:Java,C++,ROtherrequirements:xterm,RpackageDNAcopy,Javaversion1.
8.
0_45orlater.
License:GNUGPLAnyrestrictiontousebynon-academics:norestrictionappliedReviewer'scommentsReviewer'sreport2:Prof.
SanghyukLeeReviewerrecommendationstoauthors:Followingpointsneedstobeaddressedforimprovingthequalityofthework.
1.
Mostofpipelineslackanob-jectivecomparisonwithothertoolspubliclyavailable.
Forexample,theyimplementedBWA/samtools/VarscanforidentifyingSNPsandindelsanditshowedsatisfac-toryperformanceintermsofPPVandsensitivityintheirsimulationstudy.
However,itsperformanceshouldbecomparedwithotherprogramssuchasGATKutilities,PINDEL,Scalpel.
CNVsareidentifiedwiththeirownin-housedevelopedalgorithm.
Again,itsperformanceshouldbecomparedwithothertoolsforsimilarpur-poses(e.
g.
XHMM,ExomeDepth,Conifer,CONTRA,andexomeCopy).
Withoutsuchcomparison,itisdiffi-culttojudgewhetherAltools'resultaresuperiortothosetoolsandnobodywouldusethetool.
2.
Thepipe-lineistightlydesignedwithverylimitedflexibility.
BetterapproachwouldbetoallowuserstochoosepropertoolsandprocessesliketheGALAXYworkflowengine.
Newandbettertoolsareconstantlyreleasedandusersshouldbeabletochoosesuchupdatedtoolsifnecessary.
Ibe-lievethatthereexistbettertoolsthanVarscaninvariantcalling.
Furthermore,thehard-wiredpipelineofAltoolsisdifficulttomodify.
Forexample,itisusuallyrecom-mendedtoincorporateadaptortrimming,duplicatere-moval,andalignmentrecalibrationforpre-processingoftheNGSdatainanalyzingwell-establishedmodelorgan-isms.
3.
Thepackingoftoolsneedssignificantimprove-ment.
Idonotfeelthatthetoolisreallyuser-friendlywithpoorflexibility,noutilitytoolsforlogorprocessmanagement,andnouniquevisualizationsupport.
Minorissues:Englisheditingisstronglyrecommended.
Camioloetal.
BiologyDirect(2016)11:8Page8of11Authors'responsetoreviewer2:WewouldliketothankProfessorLeeforhisvaluablesuggestions.
Pleasefindhereafterapointbypointresponsetotheraisedconcerns.
Majorrevisions.
WeranabenchmarktestonAltoolsbycomparingitsperformancewiththatofCNVseqforthedetectionofCNVsandPindelforthedetectionoflargedeletions.
Theresults(Additionalfile7:TableS3)showthatoursoftwareperformedbetterintermsofexecutiontimeand,ingeneral,intermsofPPVandsensitivity.
ThechoiceoftheBWAalignerandVarScanpolymorphismcallerisnowbetterexplainedinthetext.
Wealsoappre-ciatedthesuggestiontoimprovetheGUIbyincludingautilityforlogorprocessmanagement,avisualizationtoolandawidercollectionofaligners,polymorphismcallersandreadpre-processingtoolsandweintendtoconsiderthesesuggestionsforfutureAltoolsupdates.
Forthetimebeing,webelievethatrelyingonwidely-usedfileformatssuchasSAM,BAMandSAMtoolspileupwillalreadydeliveracertaindegreeofflexibilitytotheAltoolsenvironment.
Forexample,userscanapplytheirfavouritetoolstogeneratecompatiblefilesandcanstillsubmittheirdatatotheAltoolsstructuralvariationdetectionalgorithm.
Minorissues.
Aprofessionalscientificeditingservicehascarriedoutathoroughrevisionofthemanuscript.
Reviewer2'scommentstotherevisedmanuscript:Assuggestedinthepreviousreview,authorscomparedtheperformanceofAltoolswithCNVseqforCNVsandPindelforlargeindels,andreportbetterPPVandsensi-tivity.
However,Ithinkthatthecomparisontargetpro-gramswerenotproperlychosen.
BothCNVseqandPindelwerepublishedin2009andIbelievethatmanyotherprogramshavebeenpublishedforthesamepur-pose.
Furthermore,theissueoflimitedflexibilitywasnotresolvedyet.
EventhoughAltoolscanbecombinedwithvariousfileformatsinprinciple,expertswithsuchcapabilitywouldnotuseapipelinetoolnotsupportingrecentadvancedalgorithms.
Authors'response:WewouldliketothankProfessorLeeforhiscomments.
Althoughweareawareofthemostrecentalgorithmsfortheidentificationofpolymorphismsandstructuralvariations,wedecidedtobenchmarkAltoolsagainstPindelandCNVseqbecausethesesoft-wareplatformsarewidelyused,theirqualityiswellestablished,andcomparativetestsagainstsimilartoolshavebeenpublishedintherecentliterature(e.
g.
J.
Zhangetal.
,2014,HorticultureResearch1:14045;D.
H.
Gho-neim,2014,BMCResearchNotes7:864,J.
Duan,2013,PlosOne8:e59128).
IndeedProfessorLeesuggestedPindelasoneoftheplatformsweshoulduseforcomparison.
Finally,asindicatedinourpreviousresponse,wearealreadyworkingtoimprovetheflexibilityofAltoolsandcompatibilitywithmorerecentalgorithmswillbeintro-ducedinaforthcomingupdate.
Reviewer'sreport3:Prof.
GajendraRaghavaReviewerrecommendationstoauthors:Inthismanuscript,apipelinedevelopedforanalyzingNGSdatahasbeendescribed.
Thisisimportantpipelineforresearchersworkinginthefiledofgenomics.
Inthepresentformthismanuscriptisnotpublishableasau-thorshavenotjustifiedtheirclaims.
Inadditionselectionoftoolsintegratedinthismanuscriptneedtobejusti-fied.
Majorcomments1.
InpastnumberofpipelineshavebeendevelopedonNGS,authorshouldshowcom-parisonofAltoolswithexistingtools.
2.
Authorsclaimthattheirpipelineisfast(fastintermsofwhat)).
InordertojustifytheirclaimtheyshouldbenchmarktheirmethodintermofexecutiontimeusedtoprocessNGSdata.
3.
Inaddition,authorsshouldshowsuperiorityofindividualtoolsintegratedintheirpipelineoverexistingtools.
Thisisimportanttoshowapplicationofthispipline.
4)Altoolspipelinecontainseightmajormodulesorcomponents,authorshouldlistindigenousandthirdpartysoftwareseparately.
GraphicalflowchartofAltoolswouldbeusefulforreaderstounderstandcomponentsofthepipeline.
Minorissues:1)Thismanuscriptneedtoberevisedthoroughlyasitcontainseveralgrammaticalandtypographicalmistakes.
(e.
g.
genomewiseassociation(GWAS)studiesshouldbegenome-wideassociationstudies(GWAS).
ThispipelinehasbeenmentionedAltoolsandALtoolsinmanuscript,itshouldbeuniform2)Additionalfile11:FigureS7ismentionedatpage14(Line41),whichisotherwisemiss-ing.
3)InTable2,whatisthemeaningofvalueshavingcommainbetween,e.
g.
0,0034)InTable1;theyshowtotalcalledandtruecalledandfalsecalledSNPs.
WhataboutmissedSNPs,whichweregeneratedbydgwsimsoftware,butnotcalledatallbyAltools5)owtiewasnotusedwhileitcantakecareofsplicevariantsPrefer-enceforBWAoverBowtieshouldbementionedsome-where.
6)ThereisneedtogeneratecomprehensivemanualforAltoolsAuthor'sresponsetoreviewer3:WewouldliketothankProf.
Raghavaforhisexhaustivereview.
PleasefindhereafterapointbypointresponsetotheraisedconcernsMajorrevisions.
1.
Altoolswasbenchmarkedagainsttwopublishedsoftwareplatformsforthedeterminationofcopynumbervariations(CNVs)andlargedeletions.
Theresults(Additionalfile7:TableS3)showthatourCamioloetal.
BiologyDirect(2016)11:8Page9of11softwareperformedbetterintermsofexecutiontimeand,ingeneral,intermsofPPVandsensitivity.
2.
Theexecutionspeedisnowreportedandcomparedtosimilarsoftwareplatforms(Additionalfile7:TableS3).
3.
Thechoiceofthedifferentsoftwaremodulesisnowbetterexplainedinthetext.
4.
Aflowchartillustratingtheoriginalandthird-partysoftwarewithinAltoolshasbeenaddedtotherevisedversionofthemanuscript.
Minorissues1.
Aprofessionalscientificeditingservicehascarriedoutathoroughrevisionofthemanuscript.
Thisincludedthecarefulstandardizationandcorrectionofallsoftwarenames,thecheckingofabbreviationsandinitialismsforaccuracy,grammaticalcorrectionsandstylerevision.
2.
Themissingfigurehasnowbeenadded.
3.
","hasbeenreplacedby".
"asdecimalseparatorinallthetables.
4.
Thesensitivityvalueswerecalculatedas"thefractionofsimulatedvariantswhichwerecalledfromthesequencedata"(ref17)andisintendedtoaddresstheconcernraisedbythereviewer.
5.
ThepreferenceforBWAoverBowtie2asthealignerisnowaddressedintherevisedmanuscript6.
AcomprehensivemanualforAltoolsisincludedinthesoftwarefolder.
AdditionalfilesAdditionalfile1:FigureS1.
FlowchartdescribingtheeightAltoolsmodules.
Blueportionsrepresentnovelalgorithms,whereasredportionsrepresentthird-partyembeddedsoftware.
(DOC21kb)Additionalfile2:TableS1.
Sequencereadarchive(SRA)experimentsforA.
thalianaaccessionsBur0andTsu1availableathttp://www.
ncbi.
nlm.
nih.
gov/sra.
(DOC209kb)Additionalfile3:FigureS2.
Pipelinefortheidentificationofdeletionbreakpoints.
(a)Approximatedeletionboundariesareinferredbydetectingmappedpaired-endreadsthatalignatadistancethatisnotcompatiblewiththeexpectedinsert.
Overlappingsetsofimproperly-mappedmates(e.
g.
possiblyunderliningthesamedeletion)aremergedatthisstage.
(b)A2000-bprangeisselectedinthereferencegenomeateachofthefounddeletionboundaries(deletionstart±1000bpanddeletionend±1000bp).
Readsthataremappedwithintheseregionsareextractedfromthealignmentfiletogetherwiththecorrespondingunmappedmates.
(c)BLASTnisusedtomapreadsidentifiedatpoint(b)ontothereferencegenomeanddeletionbreakpointsareinferredbythepositionofthedetectedpartialalignments.
(DOC21kb)Additionalfile4:FigureS3.
Possibleduplicationinterferenceaffectingthecorrectidentificationofalargedeletion.
Inarealdeletion,readsmappingtothegenomicportionAhavetheirmatesmappedtoportionBatadistancethatisnotcompatiblewiththeirlibraryinsertsize.
However,ifadeletiondidnotoccurbetweenAandB,butratherBisduplicatedsomewhereupstreamwithinthesamechromosome,thenreadsmappingtoAmayhavetheirmatesmappedeitherinBorinBdup.
MatepairsaligningintheportionsA–Bdupwillfeatureamappingdistancethatisnotcompatiblewiththeirinsertand,inthiscase,adeletionmaybeerroneouslycalled.
(DOC21kb)Additionalfile5:TableS2.
PileupanalyserparameterstodetectthesimulatedpolymorphismsintheA.
thalianagenomewithdifferentreferencecoveragevalues.
(DOC207kb)Additionalfile6:FigureS4.
Distributionofthedifferences(PPVandsensitivity)betweendetectedandexpectedbreakpointpositionsderivedfromLargedeletionfinderanalysisofthesimulatedreadsdataset(coverage4x,20x,40xand100x)withthreelargedeletionsizes(2000,10000and50000bp).
(DOC21kb)Additionalfile7:FigureS5.
Scatterplotshowingdifferences(PPVandsensitivity)betweendetectedandexpectedcopynumberscalculatedbytheCoverageanalysertoolonsimulatedreadsdatasets(coverage4x,20x,40xand100x)andthreeduplicationssizes(2000,10000and50000bp).
(DOC21kb)Additionalfile8:TableS3.
BenchmarkofAltoolsforthedetectionofcopynumbervariations(CNVs)andlargedeletions.
TheCoverageanalysermodulewascomparedtoCNVseq[23]bytestingitsperformanceonthesimulatedA.
thalianagenomewith10xcoverageandthreeCNVsegmentsizes(2000,10,000and50,000bp).
DefaultparameterswereusedinCNVseqexceptthewindowsize(window-size50)forthesakeofuniformitywiththeAltoolssettings.
TheLargedeletionsfindermodulewascomparedtoPindel[10]bytestingitsperformanceonthesimulatedA.
thalianagenomewith10xcoverageandthreedeletedsegmentsizes(2000,10,000and50,000bp).
Tocomparethesoftwareplatformsunderequivalentconditions,Pindelwassettooutputonlydeletions(rfalse-tfalse-lfalse)whilesettingalltheremainingparameterstotheirdefaultvalues(forthedetectionof50,000-bpdeletionstheflag–x6wasadded).
BenchmarkingwascarriedoutonaserverequippedwithanIntel(R)Xeon(R)CPUX5660workingat2.
80GHz.
(DOC21kb)Additionalfile9:TableS4.
G|CbiasintheBur0andTsu1IlluminaNGSdatasets.
(DOC21kb)Additionalfile10:FigureS6.
Frequencyof(A)SNPsand(B)indelsinthealignmentofBur0andTsu1sequencesontheA.
thalianareferencegenome.
(DOC21kb)Additionalfile11:FigureS7.
(Top)Frequencyofthefournucleotidesinthereferenceandtargetgenomesatapolymorphicsite.
(Bottom)Frequencyofthefournucleotidesamongtheinsertedanddeletedbases.
(TIFF142kb)Additionalfile12:FigureS8.
Comparisonofpolymorphisms(SNPsandindels)foundintheA.
thalianaaccessionsBur0andTsu1.
(TIFF68kb)Additionalfile13:TableS5.
GeneOntologyenrichmentanalysisoftheBur0accessiontranscriptsthatareenclosedingainedregions(P=processandF=function).
(DOC21kb)Additionalfile14:TableS6.
GeneOntologyenrichmentanalysisoftheBur0accessiontranscriptsthatareenclosedinlostregions,includingcopynumbervariationandzerocoveragereferencegenomeportions(P=process,F=functionandC=cellularcomponent).
(DOC215kb)AbbreviationsCNV:copynumbervariation;GUI:graphicaluserinterface;GWAS:genome-wideassociationstudy;PAV:presence/absencevariation;SNP:singlenucleotidepolymorphism.
CompetinginterestsTheauthorsdeclarethattheyhavenocompetinginterests.
Authors'contributionsSCdesigned/producedthesoftwareandcontributedtothemanuscriptdrafting.
GStestedthesoftwareandprovidedsuggestionsforsomeoftheimplementedalgorithms.
APcontributedtothestrategyunderlyingthesoftwareandhelpedtowritethemanuscript.
Allauthorsreadandapprovedthefinalmanuscript.
AcknowledgementsThisprojectoriginatedfromSC'sMScthesisinDigitalBiologyattheUniversityofManchester.
Forthisreason,theleadauthorwouldliketothankProf.
AndyCamioloetal.
BiologyDirect(2016)11:8Page10of11BrassandDr.
HeatherVincentforguidanceandadvice.
Moreover,wewouldliketothankDr.
FrancescoVezzi,Prof.
MicheleMorganteandDr.
WalterSanseverinofortheirhelpandsuggestions.
Authordetails1UniversitàdeglistudidiSassari,DipartimentodiAgraria,SACEG,ViaEnricoDeNicola1,Sassari07100,Italy.
2PlantFunctionalBiologyandClimateChangeCluster(C3),UniversityofTechnologySydney,POBox123BroadwayNSW2007Sydney,Australia.
Received:22October2015Accepted:9February2016References1.
HelyarSJ,Hemmer-HansenJ,BekkevoldD,TaylorMI,OgdenR,LimborgMT,etal.
ApplicationofSNPsforpopulationgeneticsofnonmodelorganisms:newopportunitiesandchallenges.
MolEcolResour.
2011;11Suppl1:123–36.
2.
EathingtonSR,CrosbieTM,EdwardsMD,ReiterRS,BullJK.
MolecularMarkersinaCommercialBreedingProgram.
CropSci.
2007;47:S–154.
3.
LiH,HomerN.
Asurveyofsequencealignmentalgorithmsfornext-generationsequencing.
BriefBioinform.
2010;11:473–83.
4.
PiroozniaM,KramerM,ParlaJ,GoesFS,PotashJB,McCombieWR,etal.
Validationandassessmentofvariantcallingpipelinesfornext-generationsequencing.
HumGenomics.
2014;8:14.
5.
LiH,DurbinR.
FastandaccurateshortreadalignmentwithBurrows-Wheelertransform.
Bioinformatics.
2009;25:1754–60.
6.
LangmeadB,TrapnellC,PopM,SalzbergSL.
Ultrafastandmemory-efficientalignmentofshortDNAsequencestothehumangenome.
GenomeBiol.
2009;10:R25.
7.
KazazianHH.
Mobileelements:driversofgenomeevolution.
Science.
2004;303:1626–32.
8.
TuzunE,SharpAJ,BaileyJA,KaulR,MorrisonVA,PertzLM,etal.
Fine-scalestructuralvariationofthehumangenome.
NatGenet.
2005;37:727–32.
9.
MillsRE,LuttigCT,LarkinsCE,BeauchampA,TsuiC,PittardWS,etal.
Aninitialmapofinsertionanddeletion(INDEL)variationinthehumangenome.
GenomeRes.
2006;16:1182–90.
10.
YeK,SchulzMH,LongQ,ApweilerR,NingZ.
Pindel:apatterngrowthapproachtodetectbreakpointsoflargedeletionsandmediumsizedinsertionsfrompaired-endshortreads.
Bioinformatics.
2009;25:2865–71.
11.
FanX,AbbottTE,LarsonD,ChenK.
BreakDancer-IdentificationofGenomicStructuralVariationfromPaired-EndReadMapping.
CurrProtocBioinformatics.
2014;2014.
12.
KorbelJO,AbyzovA,MuXJ,CarrieroN,CaytingP,ZhangZ,etal.
PEMer:acomputationalframeworkwithsimulation-basederrormodelsforinferringgenomicstructuralvariantsfrommassivepaired-endsequencingdata.
GenomeBiol.
2009;10:R23.
13.
MedvedevP,StanciuM,BrudnoM.
Computationalmethodsfordiscoveringstructuralvariationwithnext-generationsequencing.
NatMethods.
2009;6(11Suppl):S13–20.
14.
AmusJ,SchmittAO,BortfeldtRH,BrockmannGA.
NovelSNPer:AFastToolfortheIdentificationandCharacterizationofNovelSNPsandInDels.
AdvBioinformatics.
2011;2011:1–11.
15.
CamioloS,PorcedduA.
gff2sequence,anewuserfriendlytoolforthegenerationofgenomicsequences.
BioDataMin.
2013;6:15.
16.
BartenhagenC,DugasM.
RSVSim:anR/Bioconductorpackageforthesimulationofstructuralvariations.
Bioinformatics.
2013;29:1679–81.
17.
LiuX,HanS,WangZ,GelernterJ,YangB-Z.
Variantcallersfornext-generationsequencingdata:acomparisonstudy.
PLoSOne.
2013;8:e75619.
18.
LiH,HandsakerB,WysokerA,FennellT,RuanJ,HomerN,etal.
TheSequenceAlignment/MapformatandSAMtools.
Bioinformatics.
2009;25:2078–9.
19.
KoboldtDC,ChenK,WylieT,LarsonDE,McLellanMD,MardisER,etal.
VarScan:variantdetectioninmassivelyparallelsequencingofindividualandpooledsamples.
Bioinformatics.
2009;25:2283–5.
20.
HatemA,BozdaD,TolandAE,atalyürekV.
Benchmarkingshortsequencemappingtools.
BMCBioinformatics.
2013;14:184.
21.
XuH,DiCarloJ,SatyaRV,PengQ,WangY.
Comparisonofsomaticmutationcallingmethodsinampliconandwholeexomesequencedata.
BMCGenomics.
2014;15:244.
22.
PightlingAW,PetronellaN,PagottoF.
Choiceofreference-guidedsequenceassemblerandSNPcallerforanalysisofListeriamonocytogenesshort-readsequencedatagreatlyinfluencesratesoferror.
BMCResNotes.
2015;8:748.
23.
Bioconductor-DNAcopy[http://www.
bioconductor.
org/packages/release/bioc/html/DNAcopy.
html]24.
XieC,TammiMT.
CNV-seq,anewmethodtodetectcopynumbervariationusinghigh-throughputsequencing.
BMCBioinformatics.
2009;10:80.
25.
DohmJC,LottazC,BorodinaT,HimmelbauerH.
Substantialbiasesinultra-shortreaddatasetsfromhigh-throughputDNAsequencing.
NucleicAcidsRes.
2008;36:e105.
26.
RamelF,SulmonC,GouesbetG,CouéeI.
Naturalvariationrevealsrelationshipsbetweenpre-stresscarbohydratenutritionalstatusandsubsequentresponsestoxenobioticandoxidativestressinArabidopsisthaliana.
AnnBot.
2009;104:1323–37.
27.
PeeleHM,GuanN,FogelqvistJ,DixeliusC.
LossandretentionofresistancegenesinfivespeciesoftheBrassicaceaefamily.
BMCPlantBiol.
2014;14:298.
Weacceptpre-submissioninquiriesOurselectortoolhelpsyoutondthemostrelevantjournalWeprovideroundtheclockcustomersupportConvenientonlinesubmissionThoroughpeerreviewInclusioninPubMedandallmajorindexingservicesMaximumvisibilityforyourresearchSubmityourmanuscriptatwww.
biomedcentral.
com/submitSubmityournextmanuscripttoBioMedCentralandwewillhelpyouateverystep:Camioloetal.
BiologyDirect(2016)11:8Page11of11

TMTHosting:夏季优惠,美国西雅图VPS月付7折,年付65折,美国服务器95折AS4837线路

tmthosting怎么样?tmthosting家本站也分享过多次,之前也是不温不火的商家,加上商家的价格略贵,之到斯巴达商家出现,这个商家才被中国用户熟知,原因就是斯巴达家的机器是三网回程AS4837线路,而且也没有多余的加价,斯巴达家断货后,有朋友发现TMTHosting竟然也在同一机房,所以大家就都入手了TMTHosting家的机器。目前,TMTHosting商家放出了夏季优惠,针对VPS推...

RackNerd:特价美国服务器促销,高配低价,美国多机房可选择,双E526**+AMD3700+NVMe

racknerd怎么样?racknerd今天发布了几款美国特价独立服务器的促销,本次商家主推高配置的服务器,各个配置给的都比较高,有Intel和AMD两种,硬盘也有NVMe和SSD等多咱组合可以选择,机房目前有夏洛特、洛杉矶、犹他州可以选择,性价比很高,有需要独服的朋友可以看看。点击进入:racknerd官方网站RackNerd暑假独服促销:CPU:双E5-2680v3 (24核心,48线程)内存...

华纳云新人下单立减40元/香港云服务器月付60元起,香港双向CN2(GIA)

华纳云(HNCloud Limited)是一家专业的全球数据中心基础服务提供商,总部在香港,隶属于香港联合通讯国际有限公司,拥有香港政府颁发的商业登记证明,保证用户的安全性和合规性。 华纳云是APNIC 和 ARIN 会员单位。主要提供数据中心基础服务、互联网业务解决方案, 以及香港服务器租用、香港服务器托管、香港云服务器、美国云服务器,云计算、云安全技术研发等产品和服务。其中云服务器基于成熟的 ...

altools.u32为你推荐
livewinrar5I:\Sam-research\QEF\Publications\Conference支持ipad支持ipadApplicationsios5netbios端口如何组织netbios端口的外部通信tracerouteLinux 下traceroute的工作原理是什么 !win10445端口Win10系统开放端口号怎样查看?phpecho在php中 echo和print 有什么区别联通版iphone4s苹果4S移动版和联通版有什么不同
国内vps t楼 tier 10t等于多少g 卡巴斯基永久免费版 卡巴斯基官方免费版 idc资讯 工信部icp备案号 789电视 股票老左 129邮箱 阿里校园 vip购优惠 web服务器安全 in域名 申请免费空间和域名 个人免费主页 网通服务器 数据库空间 服务器防火墙 更多