www.cab.zju.edu.cn/cab/xueyuanxiashubumen/nx/bioinplant.htm

login是什么意思  时间:2021-04-04  阅读:()
《生物信息学札记》樊龙江附录:生物信息学主要英文术语及释义AbstractSyntaxNotation(ASN.
l)(NCBI发展的许多程序,如显示蛋白质三维结构的Cn3D等所使用的内部格式)Alanguagethatisusedtodescribestructureddatatypesformally,Withinbioinformatits,ithasbeenusedbytheNationalCenterforBiotechnologyInformationtoencodesequences,maps,taxonomicinformation,molecularstructures,andbiographicalinformationinsuchawaythatitcanbeeasilyaccessedandexchangedbycomputersoftware.
Accessionnumber(记录号)AuniqueidentifierthatisassignedtoasingledatabaseentryforaDNAorproteinsequence.
Affinegappenalty(一种设置空位罚分策略)Agappenaltyscorethatisalinearfunctionofgaplength,consistingofagapopeningpenaltyandagapextensionpenaltymultipliedbythelengthofthegap.
Usingthispenaltyschemegreatlyenhancestheperformanceofdynamicprogrammingmethodsforsequencealignment.
SeealsoGappenalty.
Algorithm(算法)Asystematicprocedureforsolvingaprobleminafinitenumberofsteps,typicallyinvolvingarepetitionofoperations.
Oncespecified,analgorithmcanbewritteninacomputerlanguageandrunasaprogram.
Alignment(联配/比对/联配)Referstotheprocedureofcomparingtwoormoresequencesbylookingforaseriesofindividualcharactersorcharacterpatternsthatareinthesameorderinthesequences.
Ofthetwotypesofalignment,localandglobal,alocalalignmentisgenerallythemostuseful.
SeealsoLocalandGlobalalignments.
Alignmentscore(联配/比对/联配值)Analgorithmicallycomputedscorebasedonthenumberofmatches,substitutions,insertions,anddeletions(gaps)withinanalignment.
ScoresformatchesandsubstitutionsArederivedfromascoringmatrixsuchastheBLOSUMandPAMmatricesforproteins,andaftinegappenaltiessuitableforthematrixarechosen.
Alignmentscoresareinlogoddsunits,oftenbitunits(logtothebase2).
Higherscoresdenotebetteralignments.
SeealsoSimilarityscore,Distanceinsequenceanalysis.
Alphabet(字母表)Thetotalnumberofsymbolsinasequence-4forDNAsequencesand20forproteinsequences.
Annotation(注释)Thepredictionofgenesinagenome,includingthelocationofprotein-encodinggenes,thesequenceoftheencodedproteins,anysignificant125www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江matchestootherProteinsofknownfunction,andthelocationofRNA-encodinggenes.
Predictionsarebasedongenemodels;e.
g.
,hiddenMarkovmodelsofintronsandexonsinproteinsencodinggenes,andmodelsofsecondarystructureinRNA.
AnonymousFTP(匿名FTP)WhenaFTPserviceallowsanyonetologin,itissaidtoprovideanonymousFTPser-vice.
AusercanlogintoananonymousFTPserverbytypinganonymousastheusernameandhisE-mailaddressasapassword.
MostWebbrowsersnownegotiateanonymousFTPlogonwithoutaskingtheuserforausernameandpassword.
SeealsoFTP.
ASCIITheAmericanStandardCodeforInformationInterchange(ASCII)encodesunaccentedlettersa-z,A-Z,thenumbersO-9,mostpunctuationmarks,space,andasetofcontrolcharacterssuchascarriagereturnandtab.
ASCIIspecifies128charactersthataremappedtothevaluesO-127.
ASCIItilesarecommonlycalledplaintext,meaningthattheyonlyencodetextwithoutextramarkup.
BACclone(细菌人工染色体克隆)BacterialartificialchromosomevectorcarryingagenomicDNAinsert,typically100–200kb.
Mostofthelarge-insertclonessequencedintheprojectwereBACclones.
Back-propagation(反向传输)Whentrainingfeed-forwardneuralnetworks,aback-propagationalgorithmcanbeusedtomodifythenetworkweights.
Aftereachtraininginputpatternisfedthroughthenetwork,thenetwork'soutputiscomparedwiththedesiredoutputandtheamountoferroriscalculated.
Thiserrorisback-propagatedthroughthenetworkbyusinganerrorfunctiontocorrectthenetworkweights.
SeealsoFeed-forwardneuralnetwork.
Baum-Welchalgorithm(Baum-Welch算法)AnexpectationmaximizationalgorithmthatisusedtotrainhiddenMarkovmodels.
Baye'srule(贝叶斯法则)Formsthebasisofconditionalprobabilitybycalculatingthelikelihoodofaneventoccurringbasedonthehistoryoftheeventandrelevantbackgroundinformation.
IntermsoftwoparametersAandB,thetheoremisstatedinanequation:Thecondition-alprobabilityofA,givenB,P(AIB),isequaltotheprobabilityofA,P(A),timestheconditionalprobabilityofB,givenA,P(BIA),dividedbytheprobabilityofB,P(B).
P(A)isthehistoricalorpriordistributionvalueofA,P(BIA)isanewpredictionforBforaparticularvalueofA,andP(B)isthesumofthenewlypredictedvaluesforB.
P(AIB)isaposteriorprobability,representinganewpredictionforAgiventhepriorknowledgeofAandthenewlydiscoveredrelationshipsbetweenAandB.
Bayesiananalysis(贝叶斯分析)Astatisticalprocedureusedtoestimateparametersofanunderlying126www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江distributionbasedonanobserveddistribution.
SeealsoBaye'srule.
Biochips(生物芯片)Miniaturizedarraysoflargenumbersofmolecularsubstrates,oftenoligonucleotides,inadefinedpattern.
TheyarealsocalledDNAmicroarraysandmicrochips.
Bioinformatics(生物信息学)Themergerofbiotechnologyandinformationtechnologywiththegoalofrevealingnewinsightsandprinciplesinbiology.
/Thedisciplineofobtaininginformationaboutgenomicorproteinsequencedata.
Thismayinvolvesimilaritysearchesofdatabases,comparingyourunidentifiedsequencetothesequencesinadatabase,ormakingpredictionsaboutthesequencebasedoncurrentknowledgeofsimilarsequences.
DatabasesarefrequentlymadepublicallyavailablethroughtheInternet,orlocallyatyourinstitution.
Bitscore(二进制值/Bit值)ThevalueS'isderivedfromtherawalignmentscoreSinwhichthestatisticalpropertiesofthescoringsystemusedhavebeentakenintoaccount.
Becausebitscoreshavebeennormalizedwithrespecttothescoringsystem,theycanbeusedtocomparealignmentscoresfromdifferentsearches.
BitunitsFrominformationtheory,abitdenotestheamountofinformationrequiredtodistinguishbetweentwoequallylikelypossibilities.
Thenumberofbitsofinformation,AJ,requiredtoconveyamessagethathasA4possibilitiesislog2M=Nbits.
BLAST(基本局部联配搜索工具,一种主要数据库搜索程序)BasicLocalAlignmentSearchTool.
Asetofprograms,usedtoperformfastsimilaritysearches.
NucleotidesequencescanbecomparedwithnucleotidesequencesinadatabaseusingBLASTN,forexample.
Complexstatisticsareappliedtojudgethesignificanceofeachmatch.
Reportedsequencesmaybehomologousto,orrelatedtothequerysequence.
TheBLASTPprogramisusedtosearchaproteindatabaseforamatchagainstaqueryproteinsequence.
ThereareseveralotherflavoursofBLAST.
BLAST2isanewerreleaseofBLAST.
Allowsforinsertionsordeletionsinthesequencesbeingaligned.
Gappedalignmentsmaybemorebiologicallysignificant.
Block(蛋白质家族中保守区域的组块)Conservedungappedpatternsapproximately3-60aminoacidsinlengthinasetofrelatedproteins.
BLOSUMmatrices(模块替换矩阵,一种主要替换矩阵)AnalternativetoPAMtables,BLOSUMtableswerederivedusinglocalmultiplealignmentsofmoredistantlyrelatedsequencesthanwereusedforthePAMmatrix.
Theseareusedtoassessthesimilarityofsequenceswhenperformingalignments.
Boltzmanndistribution(Boltzmann分布)Describesthenumberofmoleculesthathaveenergiesaboveacertainlevel,basedontheBoltzmanngasconstantandtheabsolutetemperature.
127www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江Boltzmannprobabilityfunction(Boltzmann概率函数)SeeBoltzmanndistribution.
BootstrapanalysisAmethodfortestinghowwellaparticulardatasetfitsamodel.
Forexample,thevalidityofthebrancharrangementinapredictedphylogenetictreecanbetestedbyresamplingcolumnsinamultiplesequencealignmenttocreatemanynewalignments.
Theappearanceofaparticularbranchintreesgeneratedfromtheseresampledsequencescanthenbemeasured.
Alternatively,asequencemaybeleftoutofananalysistodeter-minehowmuchthesequenceinfluencestheresultsofananalysis.
Branchlength(分支长度)Insequenceanalysis,thenumberofsequencechangesalongaparticularbranchofaphylogenetictree.
CDSorcds(编码序列)Codingsequence.
Chebyshe,dinequalityTheprobabilitythatarandomvariableexceedsitsmeanislessthanorequaltothesquareof1overthenumberofstandarddeviationsfromthemean.
Clone(克隆)Populationofidenticalcellsormolecules(e.
g.
DNA),derivedfromasingleancestor.
CloningVector(克隆载体)Amoleculethatcarriesaforeigngeneintoahost,andallows/facilitatesthemultiplicationofthatgeneinahost.
Whensequencingagenethathasbeenclonedusingacloningvector(ratherthanbyPCR),careshouldbetakennottoincludethecloningvectorsequencewhenperformingsimilaritysearches.
Plasmids,cosmids,phagemids,YACsandPACsareexampletypesofcloningvectors.
Clusteranalysis(聚类分析)Amethodforgroupingtogetherasetofobjectsthataremostsimilarfromalargergroupofrelatedobjects.
Therelationshipsarebasedonsomecriterionofsimilarityordifference.
Forsequences,asimilarityordistancescoreorastatisticalevaluationofthosescoresisused.
CobblerAsinglesequencethatrepresentsthemostconservedregionsinamultiplesequencealignment.
TheBLOCKSserverusesthecobblersequencetoperformadatabasesimilaritysearchasawaytoreachsequencesthataremoredivergentthanwouldbefoundusingthesinglesequencesinthealignmentforsearches.
Codingsystem(neuralnetworks)Regardingneuralnetworks,acodingsystemneedstobedesignedforrepresentinginputandoutput.
Thelevelofsuccessfoundwhentrainingthemodelwillbepartiallydependentonthequalityofthecodingsystemchosen.
Codonusage128www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江Analysisofthecodonsusedinaparticulargeneororganism.
COG(直系同源簇)Clustersoforthologousgroupsinasetofgroupsofrelatedsequencesinmicroorganismandyeast(S.
cerevisiae).
Thesegroupsarefoundbywholeproteomecomparisonsandincludeorthologsandparalogs.
SeealsoOrthologsandParalogs.
Comparativegenomics(比较基因组学)Acomparisonofgenenumbers,genelocations,andbiologicalfunctionsofgenesinthegenomesofdiverseorganisms,oneobjectivebeingtoidentifygroupsofgenesthatplayauniquebiologicalroleinaparticularorganism.
Complexity(ofanalgorithm)(算法的复杂性)Describesthenumberofstepsrequiredbythealgorithmtosolveaproblemasafunctionoftheamountofdata;forexample,thelengthofsequencestobealigned.
Conditionalprobability(条件概率)Theprobabilityofaparticularresult(orofaparticularvalueofavariable)givenoneormoreeventsorconditions(orvaluesofothervariables).
Conservation(保守)Changesataspecificpositionofanaminoacidor(lesscommonly,DNA)sequencethatpreservethephysico-chemicalpropertiesoftheoriginalresidue.
Consensus(一致序列)Asinglesequencethatrepresents,ateachsubsequentposition,thevariationfoundwithincorrespondingcolumnsofamultiplesequencealignment.
Context-freegrammarsArecursivesetofproductionrulesforgeneratingpatternsofstrings.
Theseconsistofasetofterminalcharactersthatareusedtocreatestrings,asetofnonterminalsymbolsthatcorrespondtorulesandactasplaceholdersforpatternsthatcanbegeneratedusingterminalcharacters,asetofrulesforreplacingnonterminalsymbolswithterminalcharacters,andastartsymbol.
Contig(序列重叠群/拼接序列)Asetofclonesthatcanbeassembledintoalinearorder.
ADNAsequencethatoverlapswithanothercontig.
Thefullsetofoverlappingsequences(contigs)canbeputtogethertoobtainthesequenceforalongregionofDNAthatcannotbesequencedinoneruninasequencingassay.
Importantingeneticmappingatthemolecularlevel.
CORBA(国际对象管理协作组制定的使OOP对象与网络接口统一起来的一套跨计算机、操作系统、程序语言和网络的共同标准)TheCommonObjectRequestBrokerArchitecture(CORBA)isanopenindustrystandardforworkingwithdistributedobjects,developedbytheObjectManagementGroup.
CORBAallowstheinterconnectionofobjectsandapplicationsregardlessofcomputerlanguage,machinearchitecture,orgeographiclocationofthecomputers.
Correlationcoefficient(相关系数)129www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江Anumericalmeasure,fallingbetween-1and1,ofthedegreeofthelinearrelationshipbetweentwovariables.
Apositivevalueindicatesadirectrelationship,anegativevalueindicatesaninverserelationship,andthedistanceofthevalueawayfromzeroindicatesthestrengthoftherelationship.
Avaluenearzeroindicatesnorelationshipbetweenthevariables.
Covariation(insequences)(共变)CoincidentchangeattwoormoresequencepositionsinrelatedsequencesthatmayinfluencethesecondarystructuresofRNAorproteinmolecules.
Coverage(ordepth)(覆盖率/厚度)Theaveragenumberoftimesanucleotideisrepresentedbyahigh-qualitybaseinacollectionofrandomrawsequence.
Operationally,a'high-qualitybase'isdefinedasonewithanaccuracyofatleast99%(correspondingtoaPHREDscoreofatleast20).
Database(数据库)Acomputerizedstorehouseofdatathatprovidesastandardizedwayforlocating,adding,removing,andchangingdata.
SeealsoObject-orienteddatabase,Relationaldatabase.
DendogramAformofatreethatliststhecomparedobjects(e.
g.
,sequencesorgenesinamicroarrayanalysis)inaverticalorderandjoinsrelatedonesbylevelsofbranchesextendingtoonesideofthelist.
Depth(厚度)SeecoverageDirichletmixturesDefinedastheconjugationalpriorofamultinomialdistribution.
Oneuseisforpredictingtheexpectedpatternofaminoacidvariationfoundinthematchstateofahid-denMarkovmodel(representingonecolumnofamultiplesequencealignmentofproteins),basedonpriordistributionsfoundinconservedproteindomains(blocks).
Distanceinsequenceanalysis(序列距离)Thenumberofobservedchangesinanoptimalalignmentoftwosequences,usuallynotcountinggaps.
DNASequencing(DNA测序)TheexperimentalprocessofdeterminingthenucleotidesequenceofaregionofDNA.
Thisisdonebylabellingeachnucleotide(A,C,GorT)witheitheraradioactiveorfluorescentmarkerwhichidentifiesit.
Thereareseveralmethodsofapplyingthistechnology,eachwiththeiradvantagesanddisadvantages.
Formoreinformation,refertoacurrenttextbook.
Highthroughputlaboratoriesfrequentlyuseautomatedsequencers,whicharecapableofrapidlyreadinglargenumbersoftemplates.
Sometimes,thesequencesmaybegeneratedmorequicklythantheycanbecharacterised.
Domain(功能域)Adiscreteportionofaproteinassumedtofoldindependentlyoftherestoftheproteinandpossessingitsownfunction.
130www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江Dotmatrix(点标矩阵图)Dotmatrixdiagramsprovideagraphicalmethodforcomparingtwosequences.
Onesequenceiswrittenhorizontallyacrossthetopofthegraphandtheotheralongtheleft-handside.
Dotsareplacedwithinthegraphattheintersectionofthesameletterappearinginbothsequences.
Aseriesofdiagonallinesinthegraphindicateregionsofalignment.
Thematrixmaybefilteredtorevealthemost-alikeregionsbyscoringaminimalthresholdnumberofmatcheswithinasequencewindow.
Draftgenomesequence(基因组序列草图)Thesequenceproducedbycombiningtheinformationfromtheindividualsequencedclones(bycreatingmergedsequencecontigsandthenemployinglinkinginformationtocreatescaffolds)andpositioningthesequencealongthephysicalmapofthechromosomes.
DUST(一种低复杂性区段过滤程序)Aprogramforfilteringlowcomplexityregionsfromnucleicacidsequences.
Dynamicprogramming(动态规划法)Adynamicprogrammingalgorithmsolvesaproblembycombiningsolutionstosub-problemsthatarecomputedonceandsavedinatableormatrix.
Dynamicprogrammingistypicallyusedwhenaproblemhasmanypossiblesolutionsandanoptimaloneneedstobefound.
Thisalgorithmisusedforproducingsequencealignments,givenascoringsystemforsequencecomparisons.
EMBL(欧洲分子生物学实验室,EMBL数据库是主要公共核酸序列数据库之一)EuropeanMolecularBiologyLaboratories.
MaintaintheEMBLdatabase,oneofthemajorpublicsequencedatabases.
EMBnet(欧洲分子生物学网络)EuropeanMolecularBiologyNetwork:http://www.
embnet.
org/wasestablishedin1988,andprovidesservicesincludinglocalmoleculardatabasesandsoftwareformolecularbiologistsinEurope.
ThereareseverallargeoutpostsofEMBnet,includingEXPASY.
Entropy(熵)Frominformationtheory,ameasureoftheunpredictablenatureofasetofpossibleelements.
Thehigherthelevelofvariationwithintheset,thehighertheentropy.
ErdosandRenyilawInatossofa"fair"coin,thenumberofheadsinarowthatcanbeexpectedisthelogarithmofthenumberoftossestothebase2.
Thelawmaybegeneralizedformorethantwopossibleoutcomesbychangingthebaseofthelogarithmtothenumberofout-comes.
Thislawwasusedtoanalyzethenumberofmatchesandmismatchesthatcanbeexpectedbetweenrandomsequencesasabasisforscoringthestatisticalsignificanceofasequencealignment.
EST(表达序列标签的缩写)131www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江SeeExpressedSequenceTagExpectvalue(E)(E值)Evalue.
ThenumberofdifferentalignentswithscoresequivalenttoorbetterthanSthatareexpectedtooccurinadatabasesearchbychance.
ThelowertheEvalue,themoresignificantthescore.
Inadatabasesimilaritysearch,theprobabilitythatanalignmentscoreasgoodastheonefoundbetweenaquerysequenceandadatabasesequencewouldbefoundinasmanycomparisonsbetweenrandomsequencesaswasdonetofindthematchingsequence.
Inothertypesofsequenceanalysis,Ehasasimilarmeaning.
Expectationmaximization(sequenceanalysis)Analgorithmforlocatingsimilarsequencepatternsinasetofsequences.
Aguessedalignmentofthesequencesisfirstusedtogenerateanexpectedscoringmatrixrepresentingthedistributionofsequencecharactersineachcolumnofthealignment,thispatternismatchedtoeachsequence,andthescoringmatrixvaluesarethenupdatedtomaximizethealignmentofthematrixtothesequences.
Theprocedureisrepeateduntilthereisnofurtherimprovement.
Exon(外显子)CodingregionofDNA.
SeeCDS.
ExpressedSequenceTag(EST)(表达序列标签)Randomlyselected,partialcDNAsequence;representsit'scorrespondingmRNA.
dbESTisalargedatabaseofESTsatGenBank,NCBI.
FASTA(一种主要数据库搜索程序)Thefirstwidelyusedalgorithmfordatabasesimilaritysearching.
Theprogramlooksforoptimallocalalignmentsbyscanningthesequenceforsmallmatchescalled"words".
Initially,thescoresofsegmentsinwhichtherearemultiplewordhitsarecalculated("init1").
Laterthescoresofseveralsegmentsmaybesummedtogeneratean"initn"score.
Anoptimizedalignmentthatincludesgapsisshownintheoutputas"opt".
Thesensitivityandspeedofthesearchareinverselyrelatedandcontrolledbythe"k-tup"variablewhichspecifiesthesizeofa"word".
(PearsonandLipman)Extremevaluedistribution(极值分布)Somemeasurementsarefoundtofollowadistributionthathasalongtailwhichdecaysathighvaluesmuchmoreslowlythanthatfoundinanormaldistribution.
Thisslow-fallingtypeiscalledtheextremevaluedistribution.
Thealignmentscoresbetweenunrelatedorrandomsequencesareanexample.
Thesescorescanreachveryhighvalues,particularlywhenalargenumberofcomparisonsaremade,asinadatabasesimilaritysearch.
Theprobabilityofaparticularscoremaybeaccuratelypredictedbytheextremevaluedistribution,whichfollowsadoublenegativeexponentialfunctionafterGumbel.
Falsenegative(假阴性)Anegativedatapointcollectedinadatasetthatwasincorrectlyreportedduetoafailureofthetestinavoidingnegativeresults.
132www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江Falsepositive(假阳性)Apositivedatapointcollectedinadatasetthatwasincorrectlyreportedduetoafailureofthetest.
Ifthetesthadcorrectlymeasuredthedatapoint,thedatawouldhavebeenrecordedasnegative.
Feed-forwardneuralnetwork(反向传输神经网络)Organizesnodesintosequencelayersinwhichthenodesineachlayerarefullyconnectedwiththenodesinthenextlayer,exceptforthefinaloutputlayer.
Inputisfedfromtheinputlayerthroughthelayersinsequenceina"feed-forward"direction,resultinginoutputatthefinallayer.
SeealsoNeuralnetwork.
Filtering(windowsize)Duringpair-wisesequencealignmentusingthedotmatrixmethod,randommatchescanbefilteredoutbyusingaslidingwindowtocomparethetwosequences.
Ratherthancomparingasinglesequencepositionatatime,awindowofadjacentpositionsinthetwosequencesiscomparedandadot,indicatingamatch,isgeneratedonlyifacertainminimalnumberofmatchesoccur.
Filtering(过滤)AlsoknownasMasking.
Theprocessofhidingregionsof(nucleicacidoraminoacid)sequencehavingcharacteristicsthatfrequentlyleadtospurioushighscores.
SeeSEGandDUST.
Finishedsequence(完成序列)Completesequenceofacloneorgenome,withanaccuracyofatleast99.
99%andnogaps.
FourieranalysisStudiestheapproximationsanddecompositionoffunctionsusingtrigonometricpolynomials.
Format(file)(格式)Differentprogramsrequirethatinformationbespecifiedtotheminaformalmanner,usingparticularkeywordsandordering.
Thisspecificationisafileformat.
Forward-backwardalgorithmUsedtotrainahiddenMarkovmodelbyaligningthemodelwithtrainingsequences.
Thealgorithmthenrefinesthemodeltoreducetheerrorwhenfittedtothegivendatausingagradientdescentapproach.
FTP(FileTransferProtocol)(文件传输协议)AllowsapersontotransferfilesfromonecomputertoanotheracrossanetworkusinganFTP-capableclientprogram.
TheFTPclientprogramcanonlycommunicatewithmachinesthatrunanFTPserver.
Theserver,inturn,willmakeaspecificportionofitstilesystemavailableforFTPaccess,providingthattheclientisabletosupplyarecognizedusernameandpasswordtotheserver.
Fullshotgunclone(鸟枪法克隆)Alarge-insertcloneforwhichfullshotgunsequencehasbeenproduced.
133www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江Functionalgenomics(功能基因组学)Assessmentofthefunctionofgenesidentifiedbybetween-genomecomparisons.
Thefunctionofanewlyidentifiedgeneistestedbyintroducingmutationsintothegeneandthenexaminingtheresultantmutantorganismforanalteredphenotype.
gap(空位/间隙/缺口)Aspaceintroducedintoanalignmenttocompensateforinsertionsanddeletionsinonesequencerelativetoanother.
Topreventtheaccumulationoftoomanygapsinanalignment,introductionofagapcausesthedeductionofafixedamount(thegapscore)fromthealignmentscore.
Extensionofthegaptoencompassadditionalnucleotidesoraminoacidisalsopenalizedinthescoringofanalignment.
Gappenalty(空位罚分)Anumericscoreusedinsequencealignmentprogramstopenalizethepresenceofgapswithinanalignment.
Thevalueofagappenaltyaffectshowoftengapsappearinalignmentsproducedbythealgorithm.
Mostalignmentprogramssuggestgappenaltiesthatareappropriateforparticularscoringmatrices.
Geneticalgorithm(遗传算法)Akindofsearchalgorithmthatwasinspiredbytheprinciplesofevolution.
Apopulationofinitialsolutionsisencodedandthealgorithmsearchesthroughthesebyapplyingapre-definedfitnessmeasurementtoeachsolution,selectingthosewiththehighestfitnessforreproduction.
Newsolutionscanbegeneratedduringthisphasebycrossoverandmutationoperations,definedintheencodedsolutions.
Geneticmap(遗传图谱)Agenomemapinwhichpolymorphiclociarepositionedrelativetooneanotheronthebasisofthefrequencywithwhichtheyrecombineduringmeiosis.
Theunitofdistanceiscentimorgans(cM),denotinga1%chanceofrecombination.
Genome(基因组)Thegeneticmaterialofanorganism,containedinonehaploidsetofchromosomes.
GibbssamplingmethodAnalgorithmforfindingconservedpatternswithinasetofrelatedsequences.
Aguessedalignmentofallbutonesequenceismadeandusedtogenerateascoringmatrixthatrepresentsthealignment.
Thematrixisthenmatchedtotheleft-outsequence,andaprobablelocationofthecorrespondingpatternisfound.
Thispredictionistheninputintoanewalignmentandanotherscoringmatrixisproducedandtestedonanewleft-outsequence.
Theprocessisrepeateduntilthereisnofurtherimprovementinthematrix.
Globalalignment(整体联配)Attemptstomatchasmanycharactersaspossible,fromendtoend,inasetoftwoor134www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江moresequences.
Gopher(一个文档发布系统,允许检索和显示文本文件)Graphtheory(图论)Abranchofmathematicswhichdealswithproblemsthatinvolveagraphornetworkstructure.
Agraphisdefinedbyasetofnodes(orpoints)andasetofarcs(linesoredges)joiningthenodes.
Insequenceandgenomeanalysis,graphtheoryisusedforsequencealignmentsandclusteringalikegenes.
GSS(基因综述序列)Genomesurveysequence.
GUI(图形用户界面)Graphicaluserinterface.
H(相对熵值)Histherelativeentropyofthetargetandbackgroundresiduefrequencies.
(KarlinandAltschul,1990).
Hcanbethoughtofasameasureoftheaverageinformation(inbits)availableperpositionthatdistinguishesanalignmentfromchance.
AthighvaluesofH,shortalignmentscanbedistinguishedbychance,whereasatlowerHvalues,alongeralignmentmaybenecessary.
(Altschul,1991)Half-bitsSomescoringmatricesareinhalf-bitunits.
Theseunitsarelogarithmstothebase2ofoddsscorestimes2.
Heuristic(启发式方法)Aprocedurethatprogressesalongempiricallinesbyusingrulesofthumbtoreachasolution.
Thesolutionisnotguaranteedtobeoptimal.
Hexadecimalsystem(16制系统)Thebase16countingsystemthatusesthedigitsO-9followedbythelettersA-F.
HGMP(人类基因组图谱计划)HumanGenomeMappingProject.
HiddenMarkovModel(HMM)(隐马尔可夫模型)Insequenceanalysis,aHMMisusuallyaprobabilisticmodelofamultiplesequencealignment,butcanalsobeamodelofperiodicpatternsinasinglesequence,representing,forexample,patternsfoundintheexonsofagene.
Inamodelofmultiplesequencealignments,eachcolumnofsymbolsinthealignmentisrepresentedbyafrequencydistributionofthesymbolscalledastate,andinsertionsanddeletionsbyotherstates.
Onethenmovesthroughthemodelalongaparticularpathfromstatetostatetryingtomatchagivensequence.
Thenextmatchingsymbolischosenfromeachstate,recordingitsprobability(frequency)andalsotheprobabilityofgoingtothatparticularstatefromapreviousone(thetransitionprobability).
Stateandtransitionprobabilitiesarethenmultipliedtoobtainaprobabilityofthegivensequence.
Generallyspeaking,aHMMisastatisticalmodelforanorderedsequenceofsymbols,actingasastochasticstatemachinethatgeneratesasymboleachtimeatransitionismadefromonestatetothenext.
Transitionsbetween135www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江statesarespecifiedbytransitionprobabilities.
Hiddenlayer(隐藏层)Aninnerlayerwithinaneuralnetworkthatreceivesitsinputandsendsitsoutputtootherlayerswithinthenetwork.
Onefunctionofthehiddenlayeristodetectcovariationwithintheinputdata,suchaspatternsofaminoacidcovariationthatareassociatedwithaparticulartypeofsecondarystructureinproteins.
Hierarchicalclustering(分级聚类)Theclusteringorgroupingofobjectsbasedonsomesinglecriterionofsimilarityordifference.
Anexampleistheclusteringofgenesinamicroarrayexperimentbasedonthecorrelationbetweentheirexpressionpatterns.
Thedistancemethodusedinphylogeneticanalysisisanotherexample.
HillclimbingAnonoptimalsearchalgorithmthatselectsthesingularbestpossiblesolutionatagivenstateorstep.
Thesolutionmayresultinalocallybestsolutionthatisnotagloballybestsolution.
Homology(同源性)Asimilarcomponentintwoorganisms(e.
g.
,geneswithstronglysimilarsequences)thatcanbeattributedtoacommonancestorofthetwoorganismsduringevolution.
Horizontaltransfer(水平转移)Thetransferofgeneticmaterialbetweentwodistinctspeciesthatdonotordinarilyexchangegeneticmaterial.
ThetransferredDNAbecomesestablishedintherecipientgenomeandcanbedetectedbyanovelphylogenetichistoryandcodoncontentcom-paredtotherestofthegenome.
HSP(高比值片段对)High-scoringsegmentpair.
Localalignmentswithnogapsthatachieveoneofthetopalignmentscoresinagivensearch.
HTGS/HGT(高通量基因组序列)High-throughoutgenomesequencesHTML(超文本标识语言)TheHyper-TextMarkupLanguage(HTML)providesastructuraldescriptionofadocumentusingaspecifiedtagset.
HTMLcurrentlyservesastheInternetlinguafrancafordescribinghypertextWebpagedocuments.
HyperplaneAgeneralizationofthetwo-dimensionalplanetoNdimensions.
HypercubeAgeneralizationofthethree-dimensionalcubetoNdimensions.
Identity(相同性/相同率)Theextenttowhichtwo(nucleotideoraminoacid)sequencesareinvariant.
Indel(插入或删除的缩略语)Aninsertionordeletioninasequencealignment.
Informationcontent(ofascoringmatrix)Arepresentationofthedegreeofsequenceconservationinacolumnofa136www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江scoringmatrixrepresentinganalignmentofrelatedsequences.
Itisalsothenumberofquestionsthatmustbeaskedtomatchthecolumntoapositioninatestsequence.
Forbases,themax-imumpossiblenumberis2,andforproteins,4.
32(logarithmtothebase2ofthenumberofpossiblesequencecharacters).
Informationtheory(信息理论)Abranchofmathematicsthatmeasuresinformationintermsofbits,theminimalamountofstructuralcomplexityneededtoencodeagivenpieceofinformation.
Inputlayer(输入层)Theinitiallayerinafeed-forwardneuralnet.
Thislayerencodesinputinformationthatwillbefedthroughthenetworkmodel.
InterfacedefinitionlanguageUsedtodefineaninterfacetoanobjectmodelinaprogramminglanguageneutralform,whereaninterfaceisanabstractionofaservicedefinedonlybytheoperationsthatcanbeperformedonit.
Internet(因特网)Thenetworkinfrastructure,consistingofcablesinterconnectedbyrouters,thatpro-videsglobalconnectivityforindividualcomputersandprivatenetworksofcomputers.
Asecondsenseofthewordinternetisthecollectivecomputerresourcesavailableoverthisglobalnetwork.
InterpolatedMarkovmodelAtypeofMarkovmodelofsequencesthatexaminessequencesforpatternsofvariablelengthinordertodiscriminatebestbetweengenesandnon-genesequences.
Intranet(内部网)Intron(内含子)Non-codingregionofDNA.
Iterative(反复的/迭代的)Asequenceofoperationsinaprocedurethatisperformedrepeatedly.
Java(一种由SUNMicrosystem开发的编程语言)K(BLAST程序的一个统计参数)AstatisticalparameterusedincalculatingBLASTscoresthatcanbethoughtofasanaturalscaleforsearchspacesize.
ThevalueKisusedinconvertingarawscore(S)toabitscore(S').
K-tuple(字/字长)Identicalshortstretchesofsequences,alsocalledwords.
lambda(λ,BLAST程序的一个统计参数)AstatisticalparameterusedincalculatingBLASTscoresthatcanbethoughtofasanaturalscaleforscoringsystem.
Thevaluelambdaisusedinconvertingarawscore(S)toabitscore(S').
LAN(局域网)Localareanetwork.
Likelihood(似然性)137www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江Thehypotheticalprobabilitythataneventwhichhasalreadyoccurredwouldyieldaspecificoutcome.
Unlikeprobability,whichreferstofutureevents,likelihoodreferstopastevents.
LineardiscriminantanalysisAnanalysisinwhichastraightlineislocatedonagraphbetweentwosetsofdatapointsinalocationthatbestseparatesthedatapointsintotwogroups.
Localalignment(局部联配)Attemptstoalignregionsofsequenceswiththehighestdensityofmatches.
Indoingso,oneormoreislandsofsubalignmentsarecreatedinthealignedsequences.
Logoddsscore(概率对数值)Thelogarithmofanoddsscore.
SeealsoOddsscore.
LowComplexityRegion(LCR)(低复杂性区段)Regionsofbiasedcompositionincludinghomopolymericruns,short-periodrepeats,andmoresubtleoverrepresentationofoneorafewresidues.
TheSEGprogramisusedtomaskorfilterLCRsinaminoacidqueries.
TheDUSTprogramisusedtomaskorfilterLCRsinnucleicacidqueries.
Machinelearning(机器学习)Thetrainingofacomputationalmodelofaprocessorclassificationschemetodistinguishbetweenalternativepossibilities.
Markovchain(马尔可夫链)Describesaprocessthatcanbeinoneofanumberofstatesatanygiventime.
TheMarkovchainisdefinedbyprobabilitiesforeachtransitionoccurring;thatis,probabilitiesoftheoccurrenceofstatesjgiventhatthecurrentstateisspSubstitutionsinnucleicacidandproteinsequencesaregenerallyassumedtofollowaMarkovchaininthateachsitechangesindependentlyoftheprevioushistoryofthesite.
Withthismodel,thenumberandtypesofsubstitutionsobservedoverarelativelyshortperiodofevolutionarytimecanbeextrapolatedtolongerperiodsoftime.
Inperformingsequencealignmentsandcalculatingthestatisticalsignificanceofalignmentscores,sequencesareassumedtobeMarkovchainsinwhichthechoiceofonesequencepositionisnotinfluencedbyanother.
Masking(过滤)AlsoknownasFiltering.
Theremovalofrepeatedorlowcomplexityregionsfromasequenceinordertoimprovethesensitivityofsequencesimilaritysearchesperformedwiththatsequence.
Maximumlikelihood(phylogeny,alignment)(最大似然法)Themostlikelyoutcome(treeoralignment),givenaprobabilisticmodelofevolutionarychangeinDNAsequences.
Maximumparsimony(最大简约法)Theminimumnumberofevolutionarystepsrequiredtogeneratetheobservedvariationinasetofsequences,asfoundbycomparisonofthenumberofstepsinallpossiblephylogenetictrees.
Methodofmoments138www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江Themeanorexpectedvalueofavariableisthefirstmomentofthevaluesofthevariablearoundthemean,definedasthatnumberfromwhichthesumofdeviationstoallvaluesiszero.
Thestandarddeviationisthesecondmomentofthevaluesaboutthemean,andsoon.
MinimumspanningtreeGivenasetofrelatedobjectsclassifiedbysomesimilarityordifferencescore,themini-mumspanningtreejoinsthemost-alikeobjectsonadjacentouterbranchesofatreeandthensequentiallyjoinsless-alikeobjectsbymoreinwardbranches.
Thetreebranchlengthsarecalculatedbythesameneighbor-joiningalgorithmthatisusedtobuildphylogenetictreesofsequencesfromadistancematrix.
Thesumoftheresultingbranchlengthsbetweeneachpairofobjectswillbeapproximatelythatfoundbytheclassificationscheme.
MMDB(分子建模数据库)MolecularModellingDatabase.
AtaxonomyassigneddatabaseofPDB(seePDB)files,andrelatedinformation.
Molecularclockhypothesis(分子钟假设)Thehypothesisthatsequenceschangeatthesamerateinthebranchesofanevolutionarytree.
MonteCarlo(蒙特卡罗法)Amethodthatsamplespossiblesolutionstoacomplexproblemasawaytoestimateamoregeneralsolution.
Motif(模序)Ashortconservedregioninaproteinsequence.
Motifsarefrequentlyhighlyconservedpartsofdomains.
MultipleSequenceAlignment(多序列联配)Analignmentofthreeormoresequenceswithgapsinsertedinthesequencessuchthatresidueswithcommonstructuralpositionsand/orancestralresiduesarealignedinthesamecolumn.
ClustalWisoneofthemostwidelyusedmultiplesequencealignmentprogramsMutationdatamatrix(突变数据矩阵,即PAM矩阵)Ascoringmatrixcompiledfromtheobservationofpointmutationsbetweenalignedsequences.
AlsoreferstoaDayhoffPAMmatrixinwhichthescoresaregivenaslogoddsscores.
N50length(N50长度,即覆盖50%所有核苷酸的最大序列重叠群长度)Ameasureofthecontiglength(orscaffoldlength)containinga'typical'nucleotide.
Specifically,itisthemaximumlengthLsuchthat50%ofallnucleotideslieincontigs(orscaffolds)ofsizeatleastL.
Nats(naturallogarithm)Anumberexpressedinunitsofthenaturallogarithm.
NCBI(美国国家生物技术信息中心)NationalCenterforBiotechnologyInformation(USA).
CreatedbytheUnitedStatesCongressin1988,todevelopinformationsystemstosupportthe139www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江biologicalresearchcommunity.
Needleman-Wunschalgorithm(Needleman-Wunsch算法)Usesdynamicprogrammingtofindglobalalignmentsbetweensequences.
Neighbor-joiningmethod(邻接法)Clusterstogetheralikepairswithinagroupofrelatedobjects(e.
g.
,geneswithsimilarsequences)tocreateatreewhosebranchesreflectthedegreesofdifferenceamongtheobjects.
Neuralnetwork(神经网络)Fromartificialintelligencealgorithms,techniquesthatinvolveasetofmanysimpleunitsthatholdsymbolicdata,whichareinterconnectedbyanetworkoflinksassociatedwithnumericweights.
Unitsoperateonlyontheirsymbolicdataandontheinputsthattheyreceivethroughtheirconnections.
Mostneuralnetworksuseatrainingalgorithm(seeBack-propagation)toadjustconnectionweights,allowingthenetworktolearnassociationsbetweenvariousinputandoutputpatterns.
SeealsoFeed-forwardneuralnetwork.
NIH(美国国家卫生研究院)NationalInstitutesofHealth(USA).
Noise(噪音)Insequenceanalysis,asmallamountofrandomlygeneratedvariationinsequencesthatisaddedtoamodelofthesequences;e.
g.
,ahiddenMarkovmodelorscoringmatrix,inordertoavoidthemodeloverfittingthesequences.
SeealsoOverfitting.
Normaldistribution(正态分布)Thedistributionfoundformanytypesofdatasuchasbodyweight,size,andexamscores.
Thedistributionisabell-shapedcurvethatisdescribedbyameanandstandarddeviationofthemean.
Localsequencealignmentscoresbetweenunrelatedorrandomsequencesdonotfollowthisdistributionbutinsteadtheextremevaluedistributionwhichhasamuchextendedtailforhigherscores.
SeealsoExtremevaluedistribution.
ObjectManagementGroup(OMG)(国际对象管理协作组)Anot-for-profitcorporationthatwasformedtopromotecomponent-basedsoftwarebyintroducingstandardizedobjectsoftware.
TheOMGestablishesindustryguidelinesanddetailedobjectmanagementspecificationsinordertoprovideacommonframeworkforapplicationdevelopment.
WithinOMGisaLifeSciencesResearchgroup,aconsortiumrepresentingpharmaceuticalcompanies,academicinstitutions,softwarevendors,andhardwarevendorswhoareworkingtogethertoimprovecommunicationandinter-operabilityamongcomputationalresourcesinlifesciencesresearch.
SeeCORBA.
Object-orienteddatabase(面向对象数据库)Unlikerelationaldatabases(seeentry),whichuseatabularstructure,object-orienteddatabasesattempttomodelthestructureofagivendatasetascloselyaspossible.
Indoingso,object-orienteddatabasestendtoreducetheappearanceofduplicateddataandthecomplexityofquerystructureoftenfoundinrelationaldatabases.
140www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江Oddsscore(概率/几率值)Theratioofthelikelihoodsoftwoeventsoroutcomes.
Insequencealignmentsandscoringmatrices,theoddsscoreformatchingtwosequencecharactersistheratioofthefrequencywithwhichthecharactersarealignedinrelatedsequencesdividedbythefrequencywithwhichthosesametwocharactersalignbychancealone,giventhefrequencyofoccurrenceofeachinthesequences.
Oddsscoresforasetofindividuallyalignedpositionsareobtainedbymultiplyingtheoddsscoresforeachposition.
Oddsscoresareoftenconvertedtologarithmstocreatelogoddsscoresthatcanbeaddedtoobtainthelogoddsscoreofasequencealignment.
OMIM(一种人类遗传疾病数据库)OnlineMendelianInheritanceinMan.
Databaseofgeneticdiseaseswithreferencestomolecularmedicine,cellbiology,biochemistryandclinicaldetailsofthediseases.
Optimalalignment(最佳联配)Thehighest-scoringalignmentfoundbyanalgorithmcapableofproducingmultiplesolutions.
Thisisthebestpossiblealignmentthatcanbefound,givenanyparameterssuppliedbytheusertothesequencealignmentprogram.
ORF(开放阅读框)OpenReadingFrame.
Aseriesofcodons(basetriplets)whichcanbetranslatedintoaprotein.
Therearesixpotentialreadingframesofanunidentifedsequence;TBLASTN(seeBLAST)transalatesanucleotidesequenceinallsixreadingframes,intoaprotein,thenattemptstoaligntheresultstosequenecesinaproteindatabase,returningtheresultsasanucleotidesequence.
Themostlikelyreadingframecanbeidentifiedusingon-linesoftware(e.
g.
ORFFinder).
Orthologous(直系同源)Homologoussequencesindifferentspeciesthatarosefromacommonancestralgeneduringspeciation;mayormaynotberesponsibleforasimilarfunction.
Apairofgenesfoundintwospeciesareorthologouswhentheencodedproteinsare60-80%identicalinanalignment.
Theproteinsalmostcertainlyhavethesamethree-dimensionalstructure,domainstructure,andbiologicalfunction,andtheencodinggeneshaveoriginatedfromacommonancestorgeneatanearlierevolutionarytime.
Twoorthologs1andIIingenomesAandB,respectively,maybeidentifiedwhenthecompletegenomesoftwospeciesareavailable:(1)inadatabasesimilaritysearchofalloftheproteomeofBusingIasaquery,IIisthebesthitfound,and(2)Iisthebesthitwhen11isusedasaqueryoftheproteomeofB.
Thebesthitisthedatabasesequencewiththehighestexpectvalue(E).
Orthologyisalsopredictedbyaveryclosephylogeneticrelationshipbetweensequencesorbyaclusteranalysis.
ComparetoParalogs.
SeealsoClusteranalysis.
Outputlayer(输出层)Thefinallayerofaneuralnetworkinwhichsignalsfromlowerlevelsinthenetworkareinputintooutputstateswheretheyareweightedandsummedto141www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江giveanoutputsignal.
Forexample,theoutputsignalmightbethepredictionofonetypeofproteinsecondarystructureforthecentralaminoacidinasequencewindow.
OverfittingCanoccurwhenusingalearningalgorithmtotrainamodelsuchasaneuralnetorhid-denMarkovmodel.
Overfittingreferstothemodelbecomingtoohighlyrepresentativeofthetrainingdataandthusnolongerrepresentativeoftheoverallrangeofdatathatissupposedtobemodeled.
Pvalue(P值/概率值)Theprobabilityofanalignmentoccurringwiththescoreinquestionorbetter.
Thepvalueiscalculatedbyrelatingtheobservedalignmentscore,S,totheexpecteddistributionofHSPscoresfromcomparisonsofrandomsequencesofthesamelengthandcompositionasthequerytothedatabase.
ThemosthighlysignificantPvalueswillbethosecloseto0.
PvaluesandEvaluesaredifferentwaysofrepresentingthesignificanceofthealignment.
Pair-wisesequencealignment(双序列联配)Analignmentperformedbetweentwosequences.
PAM(可接受突变百分率/可以观察到的突变百分率,它可作为一种进化时间单位)PercentAcceptedMutation.
AunitintroducedbyDayhoffetal.
toquantifytheamountofevolutionarychangeinaproteinsequence.
1.
0PAMunit,istheamountofevolutionwhichwillchange,onaverage,1%ofaminoacidsinaproteinsequence.
APAM(x)substitutionmatrixisalook-uptableinwhichscoresforeachaminoacidsubstitutionhavebeencalculatedbasedonthefrequencyofthatsubstitutionincloselyrelatedproteinsthathaveexperiencedacertainamount(x)ofevolutionarydivergence.
Paralogous(旁系同源)Homologoussequenceswithinasinglespeciesthatarosebygeneduplication.
Genesthatarerelatedthroughgeneduplicationevents.
Theseeventsmayleadtotheproductionofafamilyofrelatedproteinswithsimilarbiologicalfunctionswithinaspecies.
Paralogousgenefamilieswithinaspeciesareidentifiedbyusinganindividualproteinasaqueryinadatabasesimilaritysearchoftheentireproteomeofanorganism.
Theprocessisrepeatedfortheentireproteomeandtheresultingsetsofrelatedproteinsarethensearchedforclustersthataremostlikelytohaveaconserveddomainstructureandshouldrepresentaparalogousgenefamily.
ParametricsequencealignmentAnalgorithmthatfindsarangeofpossiblealignmentsbasedonvaryingtheparametersofthescoringsystemformatches,mismatches,andgappenalties.
AnexampleistheBayesblockaligner.
PDB(主要蛋白质结构数据库之一)BrookhavenProteinDataBank.
Adatabaseandformatoffileswhichdescribethe3Dstructureofaproteinornucleicacid,asdeterminedbyX-raycrystallographyornuclearmagneticresonance(NMR)imaging.
The142www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江moleculesdescribedbythefilesareusuallyviewedlocallybydedicatedsoftware,butcansometimesbevisualisedontheworldwideweb.
Pearsoncorrelationcoefficent(Pearson相关系数)Ameasureofthecorrelationbetweentwovariablesthatreflectsthedegreetowhichthetwovariablesarerelated.
Forexample,thecoefficientisusedasameasureofsimilarityofgeneexpressioninamicroarrayexperiment.
SeealsoCorrelationcoefficient.
PercentidentityThepercentageofthecolumnsinanalignmentoftwosequencesthatincludesidenticalaminoacids.
Columnsinthealignmentthatincludegapsarenotscoredinthecalculation.
Percentsimilarity(相似百分率)Thepercentageofthecolumnsinanalignmentoftwosequencesthatincludeseitheridenticalaminoacidsoraminoacidsthatarefrequentlyfoundsubstitutedforeachotherinsequencesofrelatedproteins(conservativesubstitutions).
ThesesubstitutionsmaybefoundinanaminoacidsubstitutionmatrixsuchastheDayhoffPAMandHenikoffBLOSUMmatrices.
Columnsinthealignmentthatincludegapsarenotscoredinthecalculation.
Perceptron(感知器,模拟人类视神经控制系统的图形识别机)Aneuralnetworkinwhichinputandoutputstatesaredirectlyconnectedwithoutinterveninghiddenlayers.
PHRED(一种广泛应用的原始序列分析程序,可以对序列的各个碱基进行识别和质量评价)Awidelyusedcomputerprogramthatanalysesrawsequencetoproducea'basecall'withanassociated'qualityscore'foreachpositioninthesequence.
APHREDqualityscoreofXcorrespondstoanerrorprobabilityofapproximately10-X/10.
Thus,aPHREDqualityscoreof30correspondsto99.
9%accuracyforthebasecallintherawread.
PHRAP(一种广泛应用的原始序列组装程序)Awidelyusedcomputerprogramthatassemblesrawsequenceintosequencecontigsandassignstoeachpositioninthesequenceanassociated'qualityscore',onthebasisofthePHREDscoresoftherawsequencereads.
APHRAPqualityscoreofXcorrespondstoanerrorprobabilityofapproximately10-X/10.
Thus,aPHRAPqualityscoreof30correspondsto99.
9%accuracyforabaseintheassembledsequence.
Phylogeneticstudies(系统发育研究)PIR(主要蛋白质序列数据库之一,翻译自GenBank)AdatabaseoftranslatedGenBanknucleotidesequences.
PIRisaredundant(seeRedundancy)proteinsequencedatabase.
Thedatabaseisdividedintofourcategories:PIR1-Classifiedandannotated.
PIR2-Annotated.
PIR3-Unverified.
PIR4-Unencodedoruntranslated.
Poissondistribution(帕松分布)Usedtopredicttheoccurrenceofinfrequenteventsoveralongperiodoftime143www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江orwhentherearealargenumberoftrials.
Insequenceanalysis,itisusedtocalculatethechancethatonepairofalargenumberofpairsofunrelatedsequencesmaygiveahighlocalalignmentscore.
Position-specificscoringmatrix(PSSM)(特定位点记分矩阵,PSI-BLAST等搜索程序使用)ThePSSMgivesthelog-oddsscoreforfindingaparticularmatchingaminoacidinatargetsequence.
Representsthevariationfoundinthecolumnsofanalignmentofasetofrelatedsequences.
Eachsubsequentmatrixcolumncorrespondstothenextcolumninthealignmentandeachrowcorrespondstoaparticularsequencecharacter(oneoffourbasesinDNAsequencesor20aminoacidsinproteinsequences).
Matrixvaluesarelogoddsscoresobtainedbydividingthecountsoftheresidueinthealignment,dividingbytheexpectednumberofcountsbasedonsequencecomposition,andconvertingtheratiotoalogscore.
Thematrixismovedalongsequencestofindsimilarregionsbyaddingthematchinglogoddsscoresandlookingforhighvalues.
Thereisnoallowanceforgaps.
Alsocalledaweightmatrixorscoringmatrix.
Posterior(Bayesiananalysis)AconditionalprobabilitybasedonpriorknowledgeandnewlyevaluatedrelationshipsamongvariablesusingBayesrule.
SeealsoBayesrule.
Prior(Bayesiananalysis)Theexpecteddistributionofavariablebasedonpreviousdata.
Profile(分布型)Amatrixrepresentationofaconservedregioninamultiplesequencealignmentthatallowsforgapsinthealignment.
Therowsincludescoresformatchingsequentialcolumnsofthealignmenttoatestsequence.
Thecolumnsincludesubstitutionscoresforaminoacidsandgappenalties.
SeealsoPSSM.
ProfilehiddenMarkovmodel(分布型隐马尔可夫模型)AhiddenMarkovmodelofaconservedregioninamultiplesequencealignmentthatincludesgapsandmaybeusedtosearchnewsequencesforsimilaritytothealignedsequences.
Proteome(蛋白质组)Theentirecollectionofproteinsthatareencodedbythegenomeofanorganism.
Initiallytheproteomeisestimatedbygenepredictionandannotationmethodsbuteventuallywillberevisedasmoreinformationonthesequenceoftheexpressedgenesisobtained.
Proteomics(蛋白质组学)Systematicanalysisofproteinexpressionofnormalanddiseasedtissuesthatinvolvestheseparation,identificationandcharacterizationofalloftheproteinsinanorganism.
PseudocountsSmallnumberofcountsthatisaddedtothecolumnsofascoringmatrixtoincreasethevariabilityeithertoavoidzerocountsortoaddmorevariationthanwasfoundinthesequencesusedtoproducethematrix.
144www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江PSI-BLAST(BLAST系列程序之一)Position-SpecificIterativeBLAST.
AniterativesearchusingtheBLASTalgorithm.
Aprofileisbuiltaftertheinitialsearch,whichisthenusedinsubsequentsearches.
Theprocessmayberepeated,ifdesiredwithnewsequencesfoundineachcycleusedtorefinetheprofile.
DetailscanbefoundinthisdiscussionofPSI-BLAST.
(Altschuletal.
)PSSM(特定位点记分矩阵)Seeposition-specificscoringmatrixandprofile.
Publicsequencedatabases(公共序列数据库,指GenBank、EMBL和DDBJ)Thethreecoordinatedinternationalsequencedatabases:GenBank,theEMBLdatalibraryandDDBJ.
Q20(Qualityscore20)Aqualityscoreof>or=20indicatesthatthereislessthana1in100chancethatthebasecallisincorrect.
Theseareconsequentlyhigh-qualitybases.
Specifically,thequalityvalue"q"assignedtoabasecallisdefinedas:q=-10xlog10(p)wherepistheestimatederrorprobabilityforthatbasecall.
Notethathighqualityvaluescorrespondtolowerrorprobabilities,andconversely.
QualitytrimmingThisisanalgorithmwhichusesaslidingwindowof50basesandtrimsfromthe5'endofthereadfollowedbythe3'end.
Witheachwindow,thenumberoflowquality(10orless)basesisdetermined.
Ifmorethan5basesarebelowthethresholdquality,thewindowisincrementedbyonebaseandtheprocessisrepeated.
Whenthelowqualitytestfails,thepositionwhereitstoppedisrecorded.
Theparametersforwindowlengthlowqualitythresholdandnumberoflowqualitybasestoleratedarefixed.
Thepositionsofthe5'and3'boundariesofthequalityregionarenotedintheplotofqualityvaluespresentedinthe"ChromatogramDetails"report.
Query(待查序列/搜索序列)Theinputsequence(orothertypeofsearchterm)withwhichalloftheentriesinadatabasearetobecompared.
Radiationhybrid(RH)map(辐射杂交图谱)AgenomemapinwhichSTSsarepositionedrelativetooneanotheronthebasisofthefrequencywithwhichtheyareseparatedbyradiation-inducedbreaks.
Thefrequencyisassayedbyanalysingapanelofhuman–hamsterhybridcelllines,eachproducedbylethallyirradiatinghumancellsandfusingthemwithrecipienthamstercellssuchthateachcarriesacollectionofhumanchromosomalfragments.
Theunitofdistanceiscentirays(cR),denotinga1%chanceofabreakoccuringbetweentwolociRawScore(初值,指最初得到的联配值S)Thescoreofanalignment,S,calculatedasthesumofsubstitutionandgapscores.
Substitutionscoresaregivenbyalook-uptable(seePAM,BLOSUM).
GapscoresaretypicallycalculatedasthesumofG,thegapopeningpenalty145www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江andL,thegapextensionpenalty.
Foragapoflengthn,thegapcostwouldbeG+Ln.
Thechoiceofgapcosts,GandLisempirical,butitiscustomarytochooseahighvalueforG(10-15)andalowvalueforL(1-2).
Rawsequence(原始序列/读胶序列)Individualunassembledsequencereads,producedbysequencingofclonescontainingDNAinserts.
ReceiveroperatorcharacteristicThereceiveroperatorcharacteristic(ROC)curvedescribestheprobabilitythatatestwillcorrectlydeclaretheconditionpresentagainsttheprobabilitythatthetestwilldeclaretheconditionpresentwhenactuallyabsent.
Thisisshownthroughagraphoftheteslssensitivityagainstoneminusthetestspecificityfordifferentpossiblethresholdvalues.
Redundancy(冗余)Thepresenceofmorethanoneidenticalitemrepresentsredundancy.
Inbioinformatics,thetermisusedwithreferencetothesequencesinasequencedatabase.
Ifadatabaseisdescribedasbeingredundant,morethanoneidentical(redundant)sequencemaybefound.
Ifthedatabaseissaidtobenon-redundant(nr),thedatabasemanagershaveattemptedtoreducetheredundancy.
Thetermisambiguouswithreferencetogenetics,andassuch,thedegreeofnon-redundancyvariesaccordingtothedatabasemanager'sinterpretationoftheterm.
Onecanarguewhetherornottwoallelesofalocusdefinesthelimitofredundancy,orwhetherthesamelocusindifferent,closelyrelatedorganismsconstitutesredundency.
Non-redundantdatabasesare,insomeways,superior,butarelesscomplete.
Thesefactorsshouldbetakenintoconsiderationwhenselectingadatabasetosearch.
RegularexpressionsThiscomputationaltoolprovidesamethodforexpressingthevariationsfoundinasetofrelatedsequencesincludingarangeofchoicesatoneposition,insertions,repeats,andsoon.
Forexample,theseexpressionsareusedtocharacterizevariationsfoundinproteindomainsinthePROSITEcatalog.
RegularizationAsetoftechniquesforreducingdataoverfittingwhentrainingamodel.
SeealsoOverfitting.
Relationaldatabase(关系数据库)Organizesinformationintotableswhereeachcolumnrepresentsthefieldsofinforma-tionthatcanbestoredinasinglerecord.
Eachrowinthetablecorrespondstoasinglerecord.
Asingledatabasecanhavemanytablesandaquerylanguageisusedtoaccessthedata.
SeealsoObject-orienteddatabase.
Scaffold(支架,由序列重叠群拼接而成)Theresultofconnectingcontigsbylinkinginformationfrompaired-endreadsfromplasmids,paired-endreadsfromBACs,knownmessengerRNAsorothersources.
Thecontigsinascaffoldareorderedandorientedwithrespecttooneanother.
146www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江Scoringmatrix(记分矩阵)SeePosition-specificscoringmatrix.
SEG(一种蛋白质程序低复杂性区段过滤程序)Aprogramforfilteringlowcomplexityregionsinaminoacidsequences.
Residuesthathavebeenmaskedarerepresentedas"X"inanalignment.
SEGfilteringisperformedbydefaultintheblastpsubroutineofBLAST2.
0.
(WoottonandFederhen)Selectivity(indatabasesimilaritysearches)(数据库相似性搜索的选择准确性)Theabilityofasearchmethodtolocatemembersofaproteinfamilywithoutmakingafalse-positiveclassificationofmembersofotherfamilies.
Sensitivity(indatabasesimilaritysearches)(数据库相似性搜索的灵敏性)Theabilityofasearchmethodtolocateasmanymembersofaproteinfamilyaspossi-ble,includingdistantmembersoflimitedsequencesimilarity.
SequenceTaggedSite(序列标签位点)ShortcDNAsequencesofregionsthathavebeenphysicallymapped.
STSsprovideuniquelandmarks,oridentifiers,throughoutthegenome.
Usefulasaframeworkforfurthersequencing.
Significance(显著水平)Asignificantresultisonethathasnotsimplyoccurredbychance,andthereforeisprob-ablytrue.
Significancelevelsshowhowlikelyaresultisduetochance,expressedasaprobability.
Insequenceanalysis,thesignificanceofanalignmentscoremaybecalcu-latedasthechancethatsuchascorewouldbefoundbetweenrandomorunrelatedsequences.
SeeExpectvalue.
Similarityscore(sequencealignment)(相似性值)Similaritymeanstheextenttowhichnucleotideorproteinsequencesarerelated.
Theextentofsimilaritybetweentwosequencescanbebasedonpercentsequenceidentityand/orconservation.
InBLASTsimilarityreferstoapositivematrixscore.
Thesumofthenumberofidenticalmatchesandconservative(highscoring)substitu-tionsinasequencealignmentdividedbythetotalnumberofalignedsequencecharac-ters.
Gapsareusuallyignored.
SimulatedannealingAsearchalgorithmthatattemptstosolvetheproblemoffindingglobalextrema.
Thealgorithmwasinspiredbythephysicalcoolingprocessofmetalsandthefreezingprocessinliquidswhereatomsslowdowninmovementandlineuptoformacrystal.
Thealgorithmtraversestheenergylevelsofafunction,alwaysacceptingenergylevelsthataresmallerthanpreviousones,butsometimesacceptingenergylevelsthataregreater,accordingtotheBoltzmannprobabilitydistribution.
Single-linkageclusteranalysisAnanalysisofagroupofrelatedobjects,e.
g.
,similarproteinsindifferentgenomestoidentifybothcloseandmoredistantrelationships,representedonatreeordendogram.
Themethodjoinsthemostcloselyrelatedpairsbytheneighbor-joiningalgorithmbyrepresentingthesepairsasouterbrancheson147www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江thetree.
Moredistantobjectsarethenpro-gressivelyaddedtolowertreebranches.
Themethodisalsousedtopredictphylogenet-icrelationshipsbydistancemethods.
SeealsoHierarchicalclustering,Neighbor-joiningmethod.
Smith-Watermanalgorithm(Smith-Waterman算法)Usesdynamicprogrammingtofindlocalalignmentsbetweensequences.
Thekeyfea-tureisthatallnegativescorescalculatedinthedynamicprogrammingmatrixarechangedtozeroinordertoavoidextendingpoorlyscoringalignmentsandtoassistinidentifyinglocalalignmentsstartingandstoppinganywherewiththematrix.
SNP(单核苷酸多态性)Singlenucleotidepolymorphism,orasinglenucleotidepositioninthegenomesequenceforwhichtwoormorealternativeallelesarepresentatappreciablefrequency(traditionally,atleast1%)inthehumanpopulation.
Spaceortimecomplexity(时间或空间复杂性)Analgorithmscomplexityisthemaximumamountofcomputermemoryortimerequiredforthenumberofalgorithmicstepstosolveaproblem.
Specificity(indatabasesimilaritysearches)(数据库相似性搜索的特异性)Theabilityofasearchmethodtolocatemembersofoneproteinfamily,includingdis-tantlyrelatedmembers.
SSR(简单序列重复)Simplesequencerepeat,asequenceconsistinglargelyofatandemrepeatofaspecifick-mer(suchas(CA)15).
ManySSRsarepolymorphicandhavebeenwidelyusedingeneticmapping.
Stochasticcontext-freegrammarAformalrepresentationofgroupsofsymbolsindifferentpartsofasequence;i.
e.
,notinthesamecontext.
AnexampleiscomplementaryregionsinRNAthatwillformsec-ondarystructures.
Thestochasticfeatureintroducesvariabilityintosuchregions.
StringencyReferstotheminimumnumberofmatchesrequiredwithinawindow.
SeealsoFiltering.
STS(序列标签位点的缩写)SeeSequenceTaggedSiteSubstitution(替换)Thepresenceofanon-identicalaminoacidatagivenpositioninanalignment.
Ifthealignedresidueshavesimilarphysico-chemicalpropertiesthesubstitutionissaidtobe"conservative".
SubstitutionMatrix(替换矩阵)Asubstitutionmatrixcontainingvaluesproportionaltotheprobabilitythataminoacidimutatesintoaminoacidjforallpairsofaminoacids.
suchmatricesareconstructedbyassemblingalargeanddiversesampleofverifiedpairwisealignmentsofaminoacids.
Ifthesampleislargeenoughtobestatisticallysignificant,theresultingmatricesshouldreflectthetrueprobabilitiesofmutationsoccuringthroughaperiodofevolution.
148www.
cab.
zju.
edu.
cn/cab/xueyuanxiashubumen/nx/bioinplant.
htm《生物信息学札记》樊龙江SumofpairsmethodSumsthesubstitutionscoresofallpossiblepair-wisecombinationsofsequencecharac-tersinonecolumnofamultiplesequencealignment.
SWISS-PROT(主要蛋白质序列数据库之一)Anon-redundant(SeeRedundancy)proteinsequencedatabase.
Thoroughlyannotatedandcrossreferenced.
AsubdivisionisTrEMBL.
SyntenyThepresenceofasetofhomologousgenesinthesameorderontwogenomes.
ThreadingInproteinstructureprediction,thealigningofthesequenceofaproteinofunknownstructurewithaknownthree-dimensionalstructuretodeterminewhethertheaminoacidsequenceisspatiallyandchemicallycompatiblewiththatstructure.
TrEMBL(蛋白质数据库之一,翻译自EMBL)AproteinsequencedatabaseofTranslatedEMBLnucleotidesequences.
Uncertainty(不确定性)Frominformationtheory,alogarithmicmeasureoftheaveragenumberofchoicesthatmustbemadeforidentificationpurposes.
SeealsoInformationcontent.
UnifiedModelingLanguage(UML)AstandardsanctionedbytheObjectManagementGroupthatprovidesaformalnota-tionfordescribingobject-orienteddesign.
UniGene(人类基因数据库之一)Databaseofuniquehumangenes,atNCBI.
EntriesareselectedbynearidenticalpresenceinGenBankanddbESTdatabases.
Theclustersofsequencesproducedareconsideredtorepresentasinglegene.
UnitaryMatrix(一元矩阵)AlsoknownasIdentityMatrix.
Ascoringsysteminwhichonlyidenticalcharactersreceiveapositivescore.
URL(统一资源定位符)Uniformresourcelocator.
ViterbialgorithmCalculatestheoptimalpathofasequencethroughahiddenMarkovmodelofsequencesusingadynamicprogrammingalgorithm.
WeightmatrixSeePosition-specificscoringmatrix.
149

乐凝网络支持24小时无理由退款,香港HKBN/美国CERA云服务器,低至9.88元/月起

乐凝网络怎么样?乐凝网络是一家新兴的云服务器商家,目前主要提供香港CN2 GIA、美国CUVIP、美国CERA、日本东京CN2等云服务器及云挂机宝等服务。乐凝网络提供比同行更多的售后服务,让您在使用过程中更加省心,使用零云服务器,可免费享受超过50项运维服务,1分钟内极速响应,平均20分钟内解决运维问题,助您无忧上云。目前,香港HKBN/美国cera云服务器,低至9.88元/月起,支持24小时无理...

Hostodo:$34.99/年KVM-2.5GB/25G NVMe/8TB/3个数据中心

Hostodo在九月份又发布了两款特别套餐,开设在美国拉斯维加斯、迈阿密和斯波坎机房,基于KVM架构,采用NVMe SSD高性能磁盘,最低1.5GB内存8TB月流量套餐年付34.99美元起。Hostodo是一家成立于2014年的国外VPS主机商,主打低价VPS套餐且年付为主,基于OpenVZ和KVM架构,美国三个地区机房,支持支付宝或者PayPal、加密货币等付款。下面列出这两款主机配置信息。CP...

EdgeNat 新年开通优惠 - 韩国独立服务器原生IP地址CN2线路七折优惠

EdgeNat 商家在之前也有分享过几次活动,主要提供香港和韩国的VPS主机,分别在沙田和首尔LG机房,服务器均为自营硬件,电信CN2线路,移动联通BGP直连,其中VPS主机基于KVM架构,宿主机采用四路E5处理器、raid10+BBU固态硬盘!最高可以提供500Gbps DDoS防御。这次开年活动中有提供七折优惠的韩国独立服务器,原生IP地址CN2线路。第一、优惠券活动EdgeNat优惠码(限月...

login是什么意思为你推荐
今日油条油条晚上炸好定型明天可再复炸吗?罗伦佐娜手上鸡皮肤怎么办,维洛娜毛周角化修复液www.522av.com我的IE浏览器一打开就是这个网站http://www.522dh.com/?mu怎么改成百度啊 怎么用注册表改啊789se.com莫非现在的789mmm珍的com不管了haole10.comwww.qq10eu.in是QQ网站吗yinrentangweichentang万艾可正品的作用真的不错吗www.ca800.comPLC好学吗www4399com4399是什么网站啊???www.175qq.com最炫的qq分组19ise.com欲火难耐看什么电影 19部性感至极的佳片
域名到期查询 plesk 站群服务器 Hello图床 英语简历模板word 2017年万圣节 xen 微信收钱 怎么测试下载速度 cdn加速是什么 东莞idc 备案空间 免费asp空间 hostease ncp是什么 百度新闻源申请 此网页包含的内容将不使用安全的https 中国域名根服务器 电信测速器在线测网速 万网主机代理 更多