filesnanosim
nanosim 时间:2021-01-17 阅读:(
)
SOFTWAREOpenAccessNanoDJ:aDockerizedJupyternotebookforinteractiveOxfordNanoporeMinIONsequencemanipulationandgenomeassemblyHéctorRodríguez-Pérez1,TamaraHernández-Beeftink1,JoséM.
Lorenzo-Salazar2,JoséL.
Roda-García3,CarlosJ.
Pérez-González4,MarcosColebrook3*andCarlosFlores1,2,5*AbstractBackground:TheOxfordNanoporeTechnologies(ONT)MinIONportablesequencermakesitpossibletousecutting-edgegenomictechnologiesinthefieldandtheacademicclassroom.
Results:WepresentNanoDJ,aJupyternotebookintegrationoftoolsforsimplifiedmanipulationandassemblyofDNAsequencesproducedbyONTdevices.
Itintegratesbasecalling,readtrimmingandqualitycontrol,simulationandplottingroutineswithavarietyofwidelyusedalignersandassemblers,includingproceduresforhybridassembly.
Conclusions:WiththeuseofJupyter-facilitatedaccesstoself-explanatorycontentsofapplicationsandtheinteractivevisualizationofresults,aswellasbyitsdistributionintoaDockersoftwarecontainer,NanoDJisaimedtosimplifyandmakemorereproducibleONTDNAsequenceanalysis.
TheNanoDJpackagecode,documentationandinstallationinstructionsarefreelyavailableathttps://github.
com/genomicsITER/NanoDJ.
Keywords:Genomeanalysis,Nanoporesequencing,Jupyter,DockerBackgroundIthasneverbeenbeforesoeasyandaffordabletoaccessandutilizegeneticvariationofanyorganismandpurpose.
Thishasbeenmotivatedbythecontinuousdevelopmentofhigh-throughputDNAsequencingtechnologies,mostcommonlyknownasNextGenerationSequencing(NGS).
Akeyimprovementisthepossibilityofobtaininglongsin-glemoleculesequenceswiththefastandcost-efficiencytechnologyreleasedbyOxfordNanoporeTechnologies(ONT)andthemarketingin2014oftheMinION,aport-able,pocket-size,nanopore-basedNGSplatform[1].
Sincethen,severalalgorithmsandsoftwaretoolshaveflourishedspecificallyforONTsequencedata.
Despiteitssize,itpro-videsmulti-kilobasereadswithathroughputcomparabletootherbenchtopsequencersinthemarket(1–10Gbasesby2017),thereforestillnecessitatingofefficientandinte-gratedbioinformaticstoolstofacilitatethewidespreaduseofthetechnology.
WhileMinIONhasshownpromiseindistinctapplica-tions[2],becauseofthelowcost,laptopoperability,andtheUSB-poweredcompactdesignofMinION,cutting-edgeNGStechnologyisnotanymorenecessarilylinkedtotheestablishedideaofalargemachinewithhighcostthatmustbelocatedincentralizedsequencingcentersorinalabora-torybench.
Asaconsequence,theutilityofMinIONinfieldexperimentstomovefromsample-to-answersonsitehavebeendemonstratedwithinfectiousdiseasestudies[3,4],off-Earthgenomesequencing[5],andspeciesidentifica-tioninextremeenvironments[6–8],amongothers.
Lever-agingofMinIONcapabilitiesintheacademicclassroomisanaturalextensionofthesefieldstudiestofacilitateTheAuthor(s).
2019OpenAccessThisarticleisdistributedunderthetermsoftheCreativeCommonsAttribution4.
0InternationalLicense(http://creativecommons.
org/licenses/by/4.
0/),whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedyougiveappropriatecredittotheoriginalauthor(s)andthesource,providealinktotheCreativeCommonslicense,andindicateifchangesweremade.
TheCreativeCommonsPublicDomainDedicationwaiver(http://creativecommons.
org/publicdomain/zero/1.
0/)appliestothedatamadeavailableinthisarticle,unlessotherwisestated.
*Correspondence:mcolesan@ull.
edu.
es;cflores@ull.
edu.
esHéctorRodríguez-PérezandTamaraHernández-Beeftinkcontributedequallytothiswork.
3DepartamentodeIngenieríaInformáticaydeSistemas,UniversidaddeLaLaguna,SantaCruzdeTenerife,Spain1ResearchUnit,HospitalUniversitarioNuestraSeoradeCandelaria,UniversidaddeLaLaguna,SantaCruzdeTenerife,SpainFulllistofauthorinformationisavailableattheendofthearticleRodríguez-Pérezetal.
BMCBioinformatics(2019)20:234https://doi.
org/10.
1186/s12859-019-2860-zeducationofgenomicsinundergraduateandgraduatestu-dents[9].
Todate,thereisnospecificsoftwaresolutionaimedtofacilitateONTsequenceanalysesbyintegratingcapabilitiesfordatamanipulation,sequencecomparisonandassemblyinfieldexperimentsorforeducationalpurposestohelpfacilitatelearningofgenomics[9].
WehavedevelopedNanoDJ,aninteractivecollectionofJupyternotebookstointegrateavarietyofsoftware,advancedcomputercode,andplaincontextualexplanations.
Inaddition,NanoDJisdistributedasaDockersoftwarecontainertosimplifyin-stallationofdependenciesandimprovethereproducibilityofresults.
ImplementationNanoDJisdistributedasaDockercontainerbuiltunder-neathJupyternotebooks,whichisincreasinglypopularinlifesciencestosignificantlyfacilitatetheinteractiveex-plorationofdata[10],andhasbeenrecentlyintegratedinthewidelyusedGalaxyportal[11].
TheDockercontainerallowsNanoDJtoruninanisolated,self-containedpack-age,thatcanbeexecutedseamlesslyacrossawiderangeofcomputingplatforms[12],havinganegligibleimpactontheexecutionperformance[13].
NanoDJintegratesdi-verseapplications(Additionalfile1:TableS1)organizedinto12notebooksgroupedonthreesections(Fig.
1;Table1).
Mainresultsarepresentedasembeddedobjects.
Inaddition,oneofthenotebookswasconceivedforedu-cationalpurposesbysettingaparticularlysimpleproblemandtheinclusionoflow-levelexplanations.
Tofacilitatetheuseoftheeducationalnotebookandbypassingthein-stallationofDockerandNanoDJ,alightweightversionofthisnotebookandsmallsetsofONTreadscanbeutilizedfromaweb-browserusingBinder(https://mybinder.
org)intheNanoDJGitHubrepository.
Inaddition,aspartoftheCyVerseproject(https://www.
cyverse.
org/),NanoDJhasbeenincorporatedintoVICE,avisualandinteractivecomputingenvironmentthatfacilitatestrainingofONTdataanalysis.
WeillustratetheversatilityofNanoDJindistinctscenariosbyprovidingresultsfromfourcasestud-ies(Additionalfile1:TextS1).
Input,basecalling,andsimulationsInputdatacanbealistofFAST5filesfrompreviousbase-calledruns(e.
g.
aMetrichoroutput)orevent-levelsignaldatatobebasecalledusingthelatestONTcaller.
TheusercanalsosimulatereadswithNanoSimandpre-computedmodelparameters.
Thispossibilityisimportantindifferentscenariosastohelpdesigninganexperiment,ortobypasstechnicaldifficultiesinacademicsetups[9].
Summary,qualitycontrolandfilteringEitherforasimulatedoranempiricalrun,theuserwillobtainsummarydataandplotsinformingofreadlengthdistribution,GCcontentvs.
length,andreadlengthvs.
qualityscore(whenavailable).
Ifbarcodeswereusedintheexperiment,Porechopcanbeusedfordemultiplexing,barcodetrimmingandtofilteroutreads.
GenomeassemblyandcomparisonDependingontheapplication,sequencedatacanbealignedagainstreferencesequencesorusedforgenomeas-semblyusingdiversemethods.
Alignmentisperformedei-theragainstone(BWAandRebaler)ormultiple(BLAST)referencesequences,providingthegenerationofBAMfilesfordownstreamapplications(e.
g.
,variantidentification)orinformationofspeciescomposition.
Alternatively,theusermayoptforadenovoassembly.
NanoDJallowstheuseofsomeofthebest-performingalgorithms(Canu,Flye,andMiniasm),ortocombineONTreadswithothersobtainedwithsecond-generationNGSplatformsforahybridassem-bly(UnicyclerandMaSuRCA).
ThelatterprovidesmoreFig.
1SimplifiedschemeofallNanoDJfunctionalitiesRodríguez-Pérezetal.
BMCBioinformatics(2019)20:234Page2of4effectiveassembliesandreducederrorratecomparedtoas-sembliesbasedonlyonONTreads[14].
NanoDJincludesthepossibilityofcontigcorrection(Racon,Nanopolish,andPilon).
Assembliescanbeevaluatedwiththeembed-dedversionofQUAST,andrepresentedwithBandage.
LimitationsandfuturedirectionsFornon-expertusers,itwouldhavebeenbetterifNanoDJwasenvisagedasanon-lineapplicationtofacilitateitsuse.
However,ourmainobjectivewastointegratemajortoolsfortheanalysisofONTsequencesinaninteractivesoft-wareenvironmenttofacilitatelearningthebasicsbehindONTsequenceanalysiswhileprovidingausefultoolforprofessionals.
ProvidingitasaDockerizedsolutionsimplybolstersthefocusontheuseofthetool,reducingthebur-denofinstallingalldependenciesbytheuser.
Atthemo-ment,NanoDJissetfortheanalysisofsmallgenomesandtargetedNGSstudies,althoughfocusingonprimaryandsecondaryanalysisofDNAsequences.
Theintegrationoftoolsforvariantidentificationandtertiaryanalysis(anno-tationofvariantsorsequenceelements,interpretation,etc.
)[15,16],aswellasforepigenetics[17]anddirectRNAsequencing[18]willbethefocusoffurtherdevelop-mentsofNanoDJ.
ConclusionsWepresentNanoDJasanintegratedJupyter-basedtool-boxdistributedasaDockersoftwarecontainertofacili-tateONTsequenceanalysis.
NanoDJisbestsuitedfortheanalysesofsmallgenomesandtargetedNGSstudies.
WeanticipatethattheJupyternotebook-basedstructurewillsimplifyfurtherdevelopmentsinotherapplications.
AvailabilityandrequirementsProjectname:NanoDJProjecthomepage:https://github.
com/genomicsITER/NanoDJOperatingsystem(s):Windows,Linux,MacOSProgramminglanguage:Bash/PythonOtherrequirements:DockerinstallationLicense:GPLAnyrestrictionstousebynon-academics:NoneAdditionalfileAdditionalfile1:TableS1.
ApplicationsintegratedinNanoDJ.
TextS1.
Testingoncasestudydatasets.
TableS2.
DatasetsforillustrativeusesofNanoDJ.
TableS3.
Comparisonofdenovoassembliesusingdifferentinputsorwithanassemblycorrector.
TableS4.
Comparisonofthreedenovoassemblersinahigh-coverageONTdataset.
TableS5.
Comparisonofresultsfromtwohybriddenovoassemblers.
FigureS1.
Humanmito-chondrialDNAvariantrepresentationagainstthereferencesequence.
TableS6.
SourceofmitochondrialDNAgenomes,simulationsandclassi-ficationresults.
(DOCX1544kb)AcknowledgementsNotapplicableFundingThisresearchwasfundedbytheInstitutodeSaludCarlosIII(grantsPI14/00844andPI17/00610),theSpanishMinistryofScience,InnovationandUniversities(grantRTC-2017-6471-1;MINECO/AEI/FEDER,UE),theSpanishMinistryofEconomyandCompetitiveness(grantMTM2016–74877-P),whichwereco-financedbytheEuropeanRegionalDevelopmentFunds'AwayofmakingEurope'fromtheEuropeanUnion,AreaTenerife2030fromCabildodeTenerife(CGIEU0000219140),andbytheagreementOA17/008withInsti-tutoTecnológicoydeEnergíasRenovables(ITER)tostrengthenscientificandtechnologicaleducation,training,research,developmentandinnovationinGenomics,PersonalizedMedicineandBiotechnology.
Thefoundingen-titieshadnoroleinthedesignofthestudy,analysis,interpretationofdataorinmanuscriptwriting.
AvailabilityofdataandmaterialsAlldatageneratedoranalysedduringthisstudyareincludedinthispublishedarticleanditssupplementaryinformationfiles.
RawreadsfromMinIONandIlluminaareavailablefromtheSRAdatabase(accessionnumber(s)PRJNA451111,PRJNA451107).
Authors'contributionsHRPscriptedandtestedthesoftware,andcontributedtodataanalysis;THBwasinvolvedindataanalysisandinterpretation;JLSwasinvolvedindataanalysis;JRGrevisedandtestedthesoftwareandrevisedthemanuscript;CPGwasinvolvedinvisualization,dataanalysisandrevisedthemanuscript;MCconceivedtheproject,revisedandtestedthesoftware,andrevisedthemanuscript;CFconceivedtheproject,designedthesoftware,interpretedthedata,andcriticallyrevisedthemanuscript.
Allauthorshavereadandapprovedthefinalmanuscript.
Table1SummaryofNanoDJnotebooksNameFunctionality0.
0_QualityControl.
ipynbEvaluatethequalitycontrolandsequencehandling1.
0_Basecalling.
ipynbTranslatestheeventsortherawelectricalsignalfromanONTsequencer(FAST5format)toaDNAsequencetoobtainaFASTAoraFASTQfile1.
1_Trim+Demux.
ipynbPerformsequencetrimminganddemultiplexing2.
0_DeNovo_Canu-Miniasm.
ipynbDenovoassemblywithCanuorMiniasm,andpolishwithRaconandPilon3.
0_DeNovo_Canu+polish.
ipynbNanopolishmodulestoimprovetheCanuassembly4.
0_DeNovo_Flye.
ipynbDenovoassemblywithFlyesoftware5.
0_DeNovo_Hybrid.
ipynbPerformdenovoassemblyofNanoporereadsinconjunctionwithIlluminareadsusingMaSuRCAand/orUnicyclersoftware6.
0_AssemblyCompare.
ipynbComparedistinctassemblyresultsbasedonQUASTsoftware7.
0_SimulateReads.
ipynbObtainsimulatedreadsmadewithNanosimsoftwareandtheNanosim-hforkwithprecomputedmodels8.
0_Alignment.
ipynbReference-basedassemblyusingeitherBWA,BLASTorRebalersoftware9.
0_AssemblyGraph.
ipynbAssemblygraphvisualizationEducational.
ipynbPerformsbasecalling(withAlbacore),qualitycontrolsteps,andaBLAST-basedclassificationofthereads(foreducationalpurposes)Rodríguez-Pérezetal.
BMCBioinformatics(2019)20:234Page3of4EthicsapprovalandconsenttoparticipateNotapplicable.
ConsentforpublicationNotapplicable.
CompetinginterestsTheauthorsdeclarethattheyhavenocompetinginterests.
Publisher'sNoteSpringerNatureremainsneutralwithregardtojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations.
Authordetails1ResearchUnit,HospitalUniversitarioNuestraSeoradeCandelaria,UniversidaddeLaLaguna,SantaCruzdeTenerife,Spain.
2GenomicsDivision,InstitutoTecnológicoydeEnergíasRenovables(ITER),SantaCruzdeTenerife,Spain.
3DepartamentodeIngenieríaInformáticaydeSistemas,UniversidaddeLaLaguna,SantaCruzdeTenerife,Spain.
4DepartamentodeMatemáticas,EstadísticaeInvestigaciónOperativa,UniversidaddeLaLaguna,SantaCruzdeTenerife,Spain.
5CIBERdeEnfermedadesRespiratorias,InstitutodeSaludCarlosIII,Madrid,Spain.
Received:22June2018Accepted:29April2019References1.
BrownCG,ClarkeJ.
NanoporedevelopmentatOxfordNanopore.
NatBiotechnol.
2016;34:810–1.
2.
JainM,OlsenHE,PatenB,AkesonM.
TheOxfordNanoporeMinION:deliveryofnanoporesequencingtothegenomicscommunity.
GenomeBiol.
2016;17:239.
3.
QuickJ,LomanNJ,DuraffourS,etal.
Real-time,portablegenomesequencingforEbolasurveillance.
Nature.
2016;530:228–32.
4.
FariaNR,QuickJ,ClaroIM,ThézéJ,deJesusJG,GiovanettiM,KraemerMUG,HillSC,BlackA,daCostaAC,FrancoLC,SilvaSP,WuC-H,RaghwaniJ,CauchemezS,duPlessisL,VerottiMP,deOliveiraWK,CarmoEH,CoelhoGE,SantelliACFS,VinhalLC,HenriquesCM,SimpsonJT,LooseM,AndersenKG,GrubaughND,SomasekarS,ChiuCY,Muoz-MedinaJE,Gonzalez-BonillaCR,AriasCF,Lewis-XimenezLL,BaylisSA,ChieppeAO,AguiarSF,FernandesCA,LemosPS,NascimentoBLS,MonteiroHAO,SiqueiraIC,deQueirozMG,deSouzaTR,BezerraJF,LemosMR,PereiraGF,LoudalD,MouraLC,DhaliaR,FranaRF,MagalhesT,MarquesETJr,JaenischT,WallauGL,deLimaMC,NascimentoV,deCerqueiraEM,deLimaMM,MascarenhasDL,NetoJPM,LevinAS,Tozetto-MendozaTR,FonsecaSN,Mendes-CorreaMC,MilagresFP,SeguradoA,HolmesEC,RambautA,BedfordT,NunesMRT,SabinoEC,AlcantaraLCJ,LomanNJ,PybusOG.
EstablishmentandcryptictransmissionofZikavirusinBrazilandtheAmericas.
Nature.
2017;546:406–10.
5.
Castro-WallaceSL,ChiuCY,JohnKK,StahlSE,RubinsKH,McIntyreABR,DworkinJP,LupisellaML,SmithDJ,BotkinDJ,StephensonTA,JuulS,TurnerDJ,IzquierdoF,FedermanS,StrykeD,SomasekarS,AlexanderN,YuG,MasonCE,BurtonAS.
NanoporeDNAsequencingandgenomeassemblyontheinternationalSpaceStation.
SciRep.
2017;7:18022.
6.
JohnsonSS,ZaikovaE,GoerlitzDS,BaiY,TigheSW.
Real-timeDNAsequencingintheAntarcticdryvalleysusingtheOxfordNanoporesequencer.
JBiomolTech.
2017;28(1):2–7.
7.
PomerantzA,PeafielN,ArteagaA,BustamanteL,PichardoF,ColomaLA,Barrio-AmorosCL,Salazar-ValenzuelaD,ProstS.
Real-timeDNAbarcodinginaremoterainforestusingnanoporesequencing.
Gigascience.
2018;7(4):giy033.
8.
MenegonM,CantaloniC,Rodriguez-PrietoA,CentomoC,AbdelfattahA,RossatoM,BernardiM,XumerleL,LoaderS,DelledonneM.
OnsiteDNAbarcodingbynanoporesequencing.
PLoSOne.
2017;12:e0184741.
9.
ZaaijerS,ColumbiaUniversityUbiquitousgenomics2015class,ErlichY:usingmobilesequencersinanacademicclassroom.
Elife.
2016,5:e14258.
10.
AlmugbelR,HungLH,HuJ,AlmutairyA,OrtogeroN,TamtaY,YeungKY.
ReproducibleBioconductorworkflowsusingbrowser-basedinteractivenotebooksandcontainers.
JAmMedInformAssoc.
2018;25:4–12.
11.
GrüningBA,RascheE,Rebolledo-JaramilloB,EberhardC,HouwaartT,ChiltonJ,CoraorN,BackofenR,TaylorJ,NekrutenkoA.
Jupyterandgalaxy:easingentrybarriersintocomplexdataanalysesforbiomedicalresearchers.
PLoSComputBiol.
2017;13:e1005425.
12.
BoettigerC.
AnintroductiontoDockerforreproducibleresearch.
OperSystRev.
2015;49:71–9.
13.
DiTommasoP,PalumboE,ChatzouM,PrietoP,HeuerML,NotredameC.
TheimpactofDockercontainersontheperformanceofgenomicpipelines.
PeerJ.
2015;3:e1273.
14.
WickRR,JuddLM,GorrieCL,HoltKE.
CompletingbacterialgenomeassemblieswithmultiplexMinIONsequencing.
MicrobGenom.
2017;3:e000132.
15.
CookDE,Valle-InclanJE,PajoroA,RovenichH,ThommaB,FainoL.
Long-readannotation:automatedeukaryoticgenomeannotationbasedonlong-readcDNAsequencing.
PlantPhysiol.
2019;179:38–54.
16.
SedlazeckFJ,ReschenederP,SmolkaM,FangH,NattestadM,vonHaeselerA,SchatzMC.
Accuratedetectionofcomplexstructuralvariationsusingsingle-moleculesequencing.
NatMethods.
2018;15:461–8.
17.
StoiberMH,QuickJF,EganR,LeeJE,CelnikerSE,NeelyRK,LomanNJ,PennacchioLA,BrownJO.
DenovoidentificationofDNAmodificationsenabledbygenome-guidedNanoporesignalprocessing.
bioRxiv.
.
https://doi.
org/10.
1101/094672.
18.
GaraldeDR,SnellEA,JachimowiczD,SiposB,LloydJH,BruceM,PanticN,AdmassuT,JamesP,WarlandA,JordanM,CicconeJ,SerraS,KeenanJ,MartinS,McNeillL,WallaceEJ,JayasingheL,WrightC,BlascoJ,YoungS,BrocklebankD,JuulS,ClarkeJ,HeronAJ,TurnerDJ.
HighlyparalleldirectRNAsequencingonanarrayofnanopores.
NatMethods.
2018;15:201–6.
Rodríguez-Pérezetal.
BMCBioinformatics(2019)20:234Page4of4
ZJI又上新了!商家是原Wordpress圈知名主机商:维翔主机,成立于2011年,2018年9月启用新域名ZJI,提供中国香港、台湾、日本、美国独立服务器(自营/数据中心直营)租用及VDS、虚拟主机空间、域名注册等业务。本次商家新上韩国BGP+CN2线路服务器,国内三网访问速度优秀,适用8折优惠码,优惠后韩国服务器最低每月440元起。韩国一型CPU:Intel 2×E5-2620 十二核二十四线...
野草云月末准备了一些促销,主推独立服务器,也有部分云服务器,价格比较有性价比,佣金是10%循环,如果有时间请帮我们推推,感谢!公司名:LucidaCloud Limited官方网站:https://www.yecaoyun.com/香港独立服务器:CPU型号内存硬盘带宽价格购买地址E3-1230v216G240GB SSD或1TB 企盘30M299元/月点击购买E5-265016G240GB SS...
阿里云国际版注册认证教程-免绑卡-免实名买服务器安全、便宜、可靠、良心,支持人民币充值,提供代理折扣简介SunthyCloud成立于2015年,是阿里云国际版正规战略级渠道商,也是阿里云国际版最大的分销商,专业为全球企业客户提供阿里云国际版开户注册、认证、充值等服务,通过SunthyCloud开通阿里云国际版只需要一个邮箱,不需要PayPal信用卡就可以帮你开通、充值、新购、续费阿里云国际版,服务...
nanosim为你推荐
服务器空间租用网站服务器是租用好,还是购买服务器好,还是购买空间好..域名代理如何知道自己的域名是在哪个代理商注册的啊?vps主机vps主机用途有哪些?台湾主机台湾版本的主机好不好?台湾主机电脑主板那些牌子是台湾的?那些牌子是国产的?域名备案买域名要备案吗虚拟空间免费试用目前哪里有免费试用的虚拟主机 或者服务器用啊?美国虚拟主机购买我公司需要购买美国的虚拟主机。但是为什么有的海外主机很便宜!有的却很贵呢。 质量如何区分!有没办法去西安虚拟主机西安互联是个什么公司?华众虚拟主机管理系统星外,华众,依然这三个虚拟主机管理系统中哪个好
域名解析 域名拍卖 高防服务器租用 美国加州vps 域名服务器是什么 如何注销域名备案 新网域名解析 linode日本 空间服务商 本网站服务器在美国 dd444 圣诞促销 腾讯云分析 美国网站服务器 drupal安装 西安服务器托管 国外代理服务器 中国电信宽带测速 重庆联通服务器托管 wordpress空间 更多