equationsitelink

sitelink  时间:2021-05-24  阅读:()
TheMultiRankBootstrapAlgorithm:Semi-SupervisedPoliticalBlogClassicationandRankingUsingSemi-SupervisedLinkClassicationFrankLinandWilliamW.
CohenCarnegieMellonUniversity,5000ForbesAve,Pittsburgh,PA15213frank,wcohen@cs.
cmu.
eduAbstractWepresentanewsemi-supervisedlearningalgorithmforclassifyingpoliticalblogsinablognetworkandrankingthemwithinpredictedclasses.
Wetestouralgorithmontwodatasetsandachieveclassicationaccuracyof81.
9%and84.
6%usingonly2seedblogs.
IntroductionWeproposeanovelalgorithmthatbothclassiespoliticalblogsandrankstheblogswithinthepredicatedclass.
Weseealinktoablogofacertainpoliticalfactionasalinkthatendorsesthatfaction.
Inpredictingthelinklabel,weex-ploitalinkingpropertyfoundinthepoliticalblogosphere:blogswithsimilarpoliticalleaningtendtolinktoeachother(Adamic&Glance2005).
Webootstraptheclassicationoftheblogsandthelinksandtherankingoftheblogsbypropagatingpoliticalleaningfromaninitialsetofknownseednodes.
Weshowthatouralgorithmachieveshighclas-sicationaccuracywhenappliedtonetworksofliberalandconservativepoliticalblogsusingveryfewseeds.
ProposedAlgorithmPageRank(Pageetal.
1998)iswidelyusedtodeterminetheimportanceorauthorityofawebsite.
However,differ-entcommunitiesofusersmightattachdifferentdegreesofauthoritytothesamesite.
Thissuggestsassessingauthor-itywithanextendedversionofPageRank,inwhicheverywebsite(andeveryinter-sitelink)isassociatedwithadiffer-entcommunity,andauthorityscorespropagateonlywithinacommunity.
Inthecontextofpoliticalblogs,eachblogandeachhyperlinkwouldbeassignedtoaparticularfac-tion(e.
g.
liberalorconservative);belowwewilldescribeamethodforassigningblogstofactionsgivenasmallsetofseeds.
Toassessafaction-specicmeasureofauthority,wedeneMultiRankasfollows:rf=(1d)u+dWfrf(1)whereWfijisWijiftheedgefromitojisinEf,otherwisezero;anduistheuniformpersonalizationvectorwhereui=1/|V|anddisaconstantdampingfactor.
Inthisequation,Copyrightc2008,AssociationfortheAdvancementofArticialIntelligence(www.
aaai.
org).
Allrightsreserved.
rfcanbeseenastheprobabilityofarandomwalkonGiftheweonlyfollowedgesbelongstofactionf.
Incontextofapoliticalblognetwork,wecanseethisastheprobabilityofaliberal/conservativeblogsurferrandomlyclickingonlinkspointingtoliberal/conservativeblogs.
Inordertocalculaterf,weneedEf.
Weproposeanitera-tivebootstrappingalgorithm,showninFigure1,tograduallyexpandthesetofedgesEffromasetofinitialseednodesSuntiltheeveryedgeintheentiregraphhasbeenlabeled.
Input:AgraphG=(V,E),setofseednodesS,anedgeexpansionmetriconthegraphM(G,f)thatreturnsasetofpreviouslyunlabelededgesandlabelthemfOutput:Rankingvectorsrf=1.
.
.
nwherefcorrespondtoeachfactionAlgorithm:initializeEfusingSwhile|f=1.
.
.
nEf|=|E|do–e←infinity–whilee>0rf←MultiRank(G,Ef)flabel(v)←argmaxfrf(v)v∈VEf←{e(x→v)∈E:label(v)=f}fe←|EfEf|Ef←Eff–Ef←EfM(G,f)fFigure1:TheMultiRankbootstrapalgorithm(ExploratoryPhase)Wetriedtwoexpansionmetrics:therstmetricsimplylabelallcurrentlyunlabelededgesneighboringcurrentlyla-belededgeswiththesamelabelasthecommonendpoint.
Thesecondmetricisthesame,exceptwecontroltheexpan-sionbylimitingittonunlabelededgesincidenttothenodeswiththehighestcombinedrankingfrf(v),wherenisthenumberofnodesincidenttolabelededges.
Werefertotherstmetricasinniteexpansionandthesecondascontrolledexpansion.
Afterthealgorithmconverges,wecanclassifytheedgesaccordingtoEf,rankthenodeswithinfactionsaccordingtorf,andclassifythenodesaccordingtoargmaxfrf(v).
Wealsopresentasecond,optionalphasetothealgorithmKaleInniteExpansionKaleControlledExpansionExploratorySettlingExploratorySettlingSeedsVertexEdgeVertexEdgeVertexEdgeVertexEdge20.
6410.
7630.
8190.
9680.
7870.
8980.
8040.
95240.
6980.
8760.
8040.
9520.
7700.
9120.
8190.
96880.
7030.
8940.
8040.
9520.
7850.
9490.
8190.
968120.
7000.
8930.
8040.
9520.
8270.
9530.
8040.
952160.
7280.
9170.
8040.
9520.
8240.
9530.
8040.
952200.
7570.
9520.
8070.
9660.
7800.
9590.
8040.
965AdamicInniteExpansionAdamicControlledExpansionExploratorySettlingExploratorySettlingSeedsVertexEdgeVertexEdgeVertexEdgeVertexEdge20.
7000.
8350.
8460.
9780.
5930.
7760.
8450.
97740.
7440.
8880.
8490.
9780.
6140.
7700.
8480.
97860.
7450.
8920.
8490.
9780.
7970.
8870.
8540.
978100.
7360.
8800.
8490.
9780.
7270.
8720.
8490.
978200.
7310.
8890.
8470.
9770.
7430.
9160.
8490.
978400.
7080.
9090.
8460.
9770.
7600.
9450.
8490.
978Table1:Blog(Vertex)andlink(Edge)classicationaccuracyontheKaleandAdamicdatasetsthatmayfurtherimprovetheoutputoftherstphase.
WewillrefertotheoriginalalgorithmshowninFigure1astheexploratoryphaseandthesecondextensionalgorithmasthesettlingphase.
Thesettlingphaseagainexploitsthelinkpropertyfoundinpoliticalblognetwork:blogsaremorelikelytolinkstoblogsofthesamepoliticalfaction.
First,wendallthenodeswherethemajorityoftheneighborsareofandifferentfaction,changingthelabelingofitsin-comingedgestothemajorityneighborfaction,andrunningtheMultiRankalgorithmonthemodiedgraph.
Thisisre-peateduntilthealgorithmconvergeswhena)therearenomorechangesinedgelabelingorb)whenthealgorithmre-visitsanoldstateduetocyclingchanges.
ExperimentsandDiscussionsToassesstheeffectivenessofouralgorithm,wetesteditontwodatasets.
Therstdatasetisconstructedinthesamewayasdescribedin(Kaleetal.
2007),whereweendedupwithagraphof404connectedblogs.
WewillrefertothisastheKaledataset.
Theseconddatasetisconstructedbysimplycreatingagraphfrom(Adamic&Glance2005)andtakingthelargestconnectedcomponent.
Thisdatasetcontains1222connectedblogsandwerefertoitastheAdamicdataset.
Itshouldbepointedoutthatthedatasetlabelingisnot100%accurateasnotedin(Adamic&Glance2005).
Werunouralgorithmonthetwodatasetsvaryingthreeparameters:thenumberofseednodes,theexpansionmet-ric,andtheinclusionorexclusionoftheoptional"settlingphase.
"Inallourexperiments,wepickseedsaccordingtothetopnPageRankedblogs,n/2perfaction.
Inallin-stancesoftheMultiRankalgorithmthedampingfactordissetto0.
85,apopularchoiceofdampingfactorwhichweborrowedwithoutfurthertuning.
Wepointoutsomeobservationsontheeffectofthethreevariables.
First,inclusionoftheoptionalsettlingphasetendstoimproveupontheresultsoftherstexploratoryphaseuptoanalmostconstantpointregardlessofthenumberofseedswiththeexceptionofcontrolledexpansionwith12and16seedsontheKaledataset,wheresettlingphaseactuallyhurttheperformance.
Second,increasingthenumberofseedsimprovestheperformanceoftheexploratoryphase,butnotwiththeadditionofthesettlingphase,whichworkssurpris-inglywellwithonlytwoseeds.
Third,ingeneral,controllingtheexpansionseemstohelpclassicationaccuracy.
AnotherinterestingpropertyofthisalgorithmisthatmostclassicationerrorsaremadeonblogswithlowerPageR-ank.
IfblogsareorderedbyPageRank,theerrorrateonthetopquartileofblogsis0.
05,whiletheerrorrateonthebottomquartileis0.
45(datanotshownduetospacelimita-tions).
ConclusionsWehaveintroducedanewsemi-supervisedalgorithmforsi-multaneouslyclassifyingandrankingpoliticalblogsbasedonlinkstructure.
Weshowedthatthisalgorithmrequiresveryfewinitialseedstoachieveperformanceabove80%ontwopoliticalblogdatasetsofdifferentsizeandlinkstruc-ture.
Thisalgorithmtendfavormoreauthoritativeblogsintermsofclassicationaccuracy.
ReferencesAdamic,L.
,andGlance,N.
2005.
Thepoliticalblogo-sphereandthe2004u.
s.
election:Dividedtheyblog.
InProceedingsoftheWWW-2005WorkshopontheWeblog-gingEcosystem.
Kale,A.
;Karandikar,A.
;Kolari,P.
;Java,A.
;Finin,T.
;andJoshi,A.
2007.
Modelingtrustandinuenceintheblogosphereusinglinkpolarity.
InICWSM2007.
Page,L.
;Brin,S.
;Motwani,R.
;andWinograd,T.
1998.
ThePageRankcitationranking:Bringingordertotheweb.
Technicalreport,StanfordDigitalLibraryTechnologiesProject.

Sharktech:无限流量服务器丹佛,洛杉矶,荷兰$49/月起,1Gbps带宽哦!

鲨鱼机房(Sharktech)我们也叫它SK机房,是一家成立于2003年的老牌国外主机商,提供的产品包括独立服务器租用、VPS主机等,自营机房在美国洛杉矶、丹佛、芝加哥和荷兰阿姆斯特丹等,主打高防产品,独立服务器免费提供60Gbps/48Mpps攻击防御。机房提供1-10Gbps带宽不限流量服务器,最低丹佛/荷兰机房每月49美元起,洛杉矶机房最低59美元/月起。下面列出部分促销机型的配置信息。机房...

UCloud新人优惠中国香港/日本/美国云服务器低至4元

UCloud优刻得商家这几年应该已经被我们不少的个人站长用户认知,且确实在当下阿里云、腾讯云服务商不断的只促销服务于新用户活动,给我们很多老用户折扣的空间不多。于是,我们可以通过拓展选择其他同类服务商享受新人的福利,这里其中之一就选择UCloud商家。UCloud服务商2020年创业板上市的,实际上很早就有认识到,那时候价格高的离谱,谁让他们只服务有钱的企业用户呢。这里希望融入到我们大众消费者,你...

CloudCone(20美元/年)大硬盘VPS云服务器,KVM虚拟架构,1核心1G内存1Gbps带宽

近日CloudCone商家对旗下的大硬盘VPS云服务器进行了少量库存补货,也是悄悄推送了一批便宜VPS云服务器产品,此前较受欢迎的特价20美元/年、1核心1G内存1Gbps带宽的VPS云服务器也有少量库存,有需要美国便宜大硬盘VPS云服务器的朋友可以关注一下。CloudCone怎么样?CloudCone服务器好不好?CloudCone值不值得购买?CloudCone是一家成立于2017年的美国服务...

sitelink为你推荐
鼓风机morphvoxalargarios5documentcss支持ipad支持ipad支持ipad支持ipad支持ipadCTioswin7关闭445端口win7系统怎么关闭445和135这两个端口
安徽虚拟主机 域名服务器 免费vps 网通vps 荷兰vps lamp 网通服务器ip 网通代理服务器 php空间申请 服务器合租 国外免费asp空间 gtt 如何安装服务器系统 上海服务器 华为云盘 服务器是干什么用的 华为云建站 php服务器 1美元 域名和主机 更多