匹配关键字排名查询
关键字排名查询 时间:2021-04-30 阅读:(
)
基本的匹配计算主要内容关键词查询结构化查询字符串的匹配算法允许出错的字符串的匹配算法关键词查询目前关键词查询是最常用的信息查询方式.
又可分为:1.
单个词2.
多个词组成的上下文(高级检索)3.
多个词用and,or或not组成的句子4.
自然语言句子单个词查询用一个最贴切的词表示查询的意思考研原理:文档用词组成的向量或文本.
匹配变成是否文档中含有查询词.
上下文查询类型:短语Phrasee.
g,ShandongUniversity近似句子允许有拼写错误等ShadongUniversity高级查询中组成的上下文:书名作者,出版社,发表时间,价格DefinitionAsyntax(语法)composedofatomsthatretrievedocuments,andofBooleanoperatorswhichworkontheiroperandse.
g,translationAND(syntaxORsyntactic)(布尔表达式)FuzzyBooleanRetrievedocumentsappearinginsomeoperands(TheANDmayrequireittoappearinmoreoperandsthantheOR)RankedhigherwhichhasalargernumberofelementsBoolean查询语法句法自然语言Generalizationof"fuzzyBoolean"AqueryisanenumerationofwordsandcontextqueriesAllthedocumentsmatchingaportionoftheuserqueryareretrievedSetathresholdsothatthedocumentwithverylowweightarenotretrieved结构查询内容与结构混合查询-给出匹配模板进行匹配三种结构-固定结构-超链结构-层次结构Fixed(固定)StructureDocument:afixedsetoffieldsEX:amailhasasender,areceiver,adate,asubjectandabodyfieldSearchforthemailssenttoagivenpersonwith"football"intheSubjectfieldAhypertextisadirectedgraphwherenodesholdsometext(textcontents)thelinksrepresentconnectionsbetweennodesorbetweenpositionsinsidenodes(structuralconnectivity)HypertextHierarchicalStructureHierarchicalStructure层次查询的处理从根到叶逐层限制的多次查询基于树或图的匹配算法StringMatchingdetectingtheoccurrenceofaparticularsubstring(pattern)inanotherstring(text)AstraightforwardSolutionTheKnuth-Morris-PrattAlgorithmStraightforwardsolutionAlgorithm:SimplestringmatchingInput:PandT,thepatternandtextstrings;m,thelengthofP.
Thepatternisassumedtobenonempty.
Output:ThereturnvalueistheindexinTwhereacopyofPbegins,or-1ifnomatchforPisfound.
Definition11.
1NotationforpatternsandtextPthepatternbeingsearchfor;TthetextinwhichPissought;mthelengthofPnthelengthofT,notknowntothealgorithm;mm)/*misthelengthofP*/match=i;//matchfound.
successcase//break;/*exittheloophere*/if(tj==pk){j++;k++;}else//Backupovermatchedcharacters.
intbackup=k-1;//从本次查询点的下一个顶点开始/j=j-backup;k=k-backup;Slidepatternforward,startover.
j++;i=j;returnmatch;ikjPTAnalysisWorst-casecomplexityisin(mn)P=aaabT=aaaaaaaaaaaaaabNeedtobackup.
However,itworksquitewellonaveragefornaturallanguage.
TheKnuth-Morris-PrattAlgorithmPatternMatchingwithFiniteAutomata(自动机)e.
g.
P="AABC"startisthebeginningindexofTIdea:rememberingthematchedpartbyutilizingtheprefixofpatternPanddonotconsiderT.
However,itisnotscalableforthesizeoftermtable.
TheKnuth-Morris-PrattFlowchart(流程图)Characterlabelsareinsidethenodes,notonthearcs.
Eachnodehastwoarrowsouttoothernodes:successlink,orfaillinknextcharacterisreadonlyafterasuccesslinkAspecialnode,node0,called"getnextchar"whichreadinnexttextcharacter.
e.
g.
P="ABABCB"T=ABABABCBConstructionoftheKMPFlowchartDefinition:FaillinksWedefinefail[k]asthelargestr(withr=1)/p1,…,ps与pk-s+1,…,pk-1比较5.
if(ps==pk-1)/*就是它!
*/6.
break;7.
s=fail[s];}/*否则递归向下找*/8.
fail[k]=s+1;}fail[1]=0;fail[2-1]=fail[1]=s=0;fail[2]=1;k=3;s=fail[3-1]=1;p2p1,s=fail[1]=0;fail[3]=s+1=1.
k=4;s=fail[3]=1;p1=p3;fail[4]=s+1=2;Tocomputefail[8],s=fail[7]=5,butp7p5,recomputes=fail[5]=3,butp7p3either,sore-computes=fail[3]=1.
stillp7p1.
Finally,s=fail[1]=0,endthesearch,andfail[8]isassigneds+1=1;PABABABCBfail01123451index12345678TheKnuth-Morris-PrattScanAlgorithmintkmpScan(char[]P,char[]T,intm,int[]fail)intmatch,j,k;match=-1;j=1;k=1;while(endText(T,j)==false)if(k>m)//success//match=j-m;break;if(k==0)//thepointofTmovesahead,andrescan//j++;k=1;elseif(tj==pk)//successatpositionkofP//j++;k++;else//Followfailarrow.
k=fail[k];//failandgobacktothepointofPcontinueloop.
returnmatch;没有使用变量iAnalysisBasedonthesimilarmethodonanalyzingthetimecomplexityofalgorithmKMPsetup,Thescanalgorithmrequires2ncharactercomparisonsintheworstcaseOverall:worstcasecomplexityis(n+m)RK算法输入:TwonbitstringsA(a1,a2,…,an)andB(b1,b2,…,bn)输出:whetherA=B.
传统方法:传输n位依次比较.
指纹机制:定义n位整数根据指纹函数Fp(x)=xmodp,p是一个素数比较Fp(a)是否等于Fp(b),传输位数减小为O(logp)设代表字符集合,x,定义函数ord(x),d=||,ord(x):{0,1,2,…,d-1}对任意的模式P,|P|=m,利用多项式指纹Q(P)=ord(P1)dm-1+ord(P2)dm-2+…+ord(Pm-1)d+ord(Pm)代表P同样对文本T=T1,T2,….
,Tn从左到右计算长度为m的连续子串的指纹,如Q(i)=ord(Ti)dm-1+ord(Ti+1)dm-2+…+ord(Ti+m-2)d+ord(Ti+m-1)并和Q(P)相比较.
若相同,则找到匹配的子串.
起始位置为i,00,b->1aa0*2+0=0,bb->2*1+1=3,03ba->(3-2*1)*2+0=202ab->(2-1*2)*2+1=1ba->(1-0*2)*2+0=2aa->(2-1*2)*2+0=0findtheposition问题是得到的整数无法表示了,过于大取素数q,Q(i)(modq)=Q(p)(modq)Q(i+1)(modq)=(Q(i)–ord(Ti)dm-1)*d)(modq)+ord(Ti+m)但这样的话,当Q(i)(modq)=Q(p)(modq),不一定对应的字符串相同,这时可以逐位进行检查,有人证明该算法的期望时间复杂性为O(m+n),是较好的算法.
特点:可以推广到高维的字符串匹配,是否可以应用到对2维图像的匹配应用到对3维物体的匹配计算具有一定误差的匹配ElementsofDynamicProgrammingConstructingsolutiontoaproblembybuildingitupdynamicallyfromsolutionstosmaller(orsimpler)sub-problemssub-instancesarecombinedtoobtainsub-instancesofincreasingsize,untilfinallyarrivingatthesolutionoftheoriginalinstance.
makeachoiceateachstep,butthechoicemaydependonthesolutionstosub-problemsPrincipleofoptimalitytheoptimalsolutiontoanynontrivialinstanceofaproblemisacombinationofoptimalsolutionstosomeofitssub-instances.
Memorization(foroverlappingsub-problems)avoidcalculatingthesamethingtwice,usuallybykeepingatableofknowresultsthatfillsupassub-instancesaresolved.
Principleofoptimalitytheoptimalsolutiontoanynontrivialinstanceofaproblemisacombinationofoptimalsolutionstosomeofitssub-instances.
Memorization(foroverlappingsub-problems)avoidcalculatingthesamethingtwice,usuallybykeepingatableofknowresultsthatfillsupassub-instancesaresolved.
MemorizationforDynamicprogrammingversionofarecursivealgorithme.
g.
Tradespaceforspeedbystoringsolutionstosub-problemsratherthanre-computingthem.
Assolutionsarefoundforsuproblems,theyarerecordedinadictionary,Beforeanyrecursivecall,sayonsubproblemQ,checkthedictionarytoseeifasolutionforQhasbeenstored.
Ifnosolutionhasbeenstored,goaheadwithrecursivecall.
IfasolutionhasbeenstoredforQ,retrievethestoredsolution,anddonotmaketherecursivecall.
Justbeforereturningthesolution,storeitinthedictionary.
Dynamicprogrammingversionofthefib.
DevelopmentofadynamicprogrammingalgorithmCharacterizethestructureofanoptimalsolutionBreakingaproblemintosub-problemwhetherprincipleofoptimalityapplyRecursivelydefinethevalueofanoptimalsolutiondefinethevalueofanoptimalsolutionbasedonvalueofsolutionstosub-problemsComputethevalueofanoptimalsolutioninabottom-upfashioncomputeinabottom-upfashionandsavethevaluesalongthewaylaterstepsusethesavevaluesofperviousstepsConstructanoptimalsolutionfromcomputedinformation字符串的近似匹配(Approximatestringmatching)Inmanyapplicationswecan'texpectanexactcopy,wewanttofindaapproximatingstringmatchwithatmostkmistakes,e.
g.
,aspellingcorrector.
Wewilldevelopadynamicprogrammingalgorithmforthek-approximatematch.
Definition:Letkbeanonnegativeinteger.
Ak-approximatematchisamatchofPinTthathasatmostkdifferences.
Thedifferencescanbeanyofthefollowingthreetypes,thenameofthedifferenceistheoperationneededonTtobringitclosertoP.
Revise:ThecorrespondingcharactersinPandTaredifferent;Delete:TcontainsacharacterthatismissingfromP.
Insert:TismissingacharacterthatappearsinP.
如何修改T中的子串,使其能匹配上e.
g.
3-approximatematchP:unnecessarilyT:unescessaraly(madethreespellingerrors)Definition11.
6DifferencetableD[i][j]=theminimumnumberofdifferencebetweenP1,…,PiandasegmentofTendingattj.
1im,1jm.
定义:D[0][j]=0;D[i,0]=i;Therewillbeak-approximatematchendingattjforanyjsuchthatD[m][j]k,sowecanstopassoonaswefindanentrylessthanorequaltokinthelastrowofD,whichisthefirstk-approximatematch.
TherulesforthecomputationofDD[i][j]=D[i-1][j-1]ifpi=tj/*noerror*/D[i][j]=D[i-1][j-1]+1ifpitjandrevisetjtopiandbothiandjincrease;D[i][j]=D[i-1][j]+1ifinsertpiintoT,onlyiincrease.
D[i][j]=D[i][j-1]+1ifdeletetjfromTandonlyjincrease.
Eachentryrequiresonlyentriesaboveitandtoitsleftinthetable0000012m12mD[i-1][j-1]D[i-1][j]D[i][j-1]D[i][j]D[i][j]iscalculatedtogettheminimumvaluefromabove4formulaeHaveahsppyday000000000000h1a2p3P4y51111110111112221211222223322221233334333321244444444321D[5][12]=1,t[8.
.
12]hasonemisspellingwithP.
NonserialMonadicDPFormulations:Longest-Common-SubsequenceGivenasequenceA=,asubsequenceofAcanbeformedbydeletingsomeentriesfromA.
GiventwosequencesA=andB=,findthelongestsequencethatisasubsequenceinbothAandB.
IfA=andB=,thelongestcommonsubsequenceofAandBis.
Longest-Common-SubsequenceProblemLetF[i,j]denotethelengthofthelongestcommonsubsequenceofthefirstielementsofAandthefirstjelementsofB.
TheobjectiveoftheLCSproblemistofindF[n,m].
Wecanwrite:左下和右上的最大值ConsidertheLCSoftwoamino-acidsequencesHEAGAWGHEEandPAWHEAE.
TheFtableforcomputingtheLCSofthesequences.
TheLCSisAWHEE.
F[7,10]=5LCS=AWHEE,
beervm是一家国人商家,主要提供国内KVM VPS,有河南移动、广州移动等。现在预售湖南长沙联通vds,性价比高。湖南长沙vps(长沙vds),1GB内存/7GB SSD空间/10TB流量/1Gbps端口/独立IP/KVM,350元/月,有需要的可以关注一下。Beervm长沙联通vps套餐:长沙联通1G青春版(预售)长沙联通3G标准版(预售)长沙联通3G(预售)vCPU:1vCPU:2vCPU...
速云怎么样?速云是一家国人商家。速云商家主要提供广州移动、深圳移动、广州茂名联通、香港HKT等VDS和独立服务器。目前,速云推出深圳独服优惠活动,机房为深圳移动机房,购买深圳服务器可享受5折优惠,目前独立服务器还支持申请免费试用,需要提交工单开通免费体验试用,次月可享受永久8折优惠,也是需工单申请哦!点击进入:速云官方网站地址活动期限至 2021年7月22日速云云服务器优惠活动:活动1:新购首月可...
今天获得消息,vdsina上了AMD EPYC系列的VDS,性价比比较高,站长弄了一个,盲猜CPU是AMD EPYC 7B12(经过咨询,详细CPU型号是“EPYC 7742”)。vdsina,俄罗斯公司,2014年开始运作至今,在售卖多类型VPS和独立服务器,可供选择的有俄罗斯莫斯科datapro和荷兰Serverius数据中心。付款比较麻烦:信用卡、webmoney、比特币,不支持PayPal...
关键字排名查询为你推荐
objectflashphpadmin下载求张艺兴《莲》mp3下载2019支付宝五福支付宝五福是哪五福?internetexplorer无法打开电脑的Internet Explorer打不开?Aliasedinternal上海市浦东新区人民法院民事判决书(2009)浦民三(知)初字第206号我爱e书网侯龙涛小说那里有下载的tumblr上不去为什么,爱看软件打不开?页面一直在加载什么是seo学习SEO的好处是什么?艾泰科技艾泰路由器设置!!!
虚拟主机管理系统 广州服务器租用 西安服务器 tier mach5 wordpress技巧 光棍节日志 typecho 地址大全 hnyd 刀片服务器是什么 169邮箱 adroit cdn加速原理 什么是服务器托管 服务器监测 上海联通宽带测速 安徽双线服务器 视频服务器是什么 starry 更多