reciprocatedgraphsearch
graphsearch 时间:2021-05-25 阅读:(
)
ICWSM'2007Boulder,Colorado,USAStructuralLinkAnalysisfromUserProfilesandFriendsNetworks:AFeatureConstructionApproachWilliamH.
HsuJosephLancasterMartinS.
R.
ParadesiTimWeningerDepartmentofComputingandInformationSciences,KansasStateUniversity234NicholsHallManhattan,KS66506-2302+17855326350{bhsu|joseph|pmsr|weninger}@ksu.
eduAbstractWeconsidertheproblemsofpredicting,classifying,andannotatingfriendsrelationsinfriendsnetworks,baseduponnetworkstructureanduserprofiledata.
First,wedocumentadatamodelfortheblogserviceLiveJournal,anddefineasetofmachinelearningproblemssuchaspredictingexistinglinksandestimatinginter-pairdistance.
Next,weexplainhowtheproblemofclassifyingauserpairinasocialnetwork,asdirectlyconnectedornot,posestheproblemofselectingandconstructingrelevantfeatures.
Wedocumentfeatureanalyzersforattributesthatdependonlyongraphattributes,thosethatdependonindividualuserdemographicsandset-valuedattributes(e.
g.
,interests,communities,andeducationalinstitutions),andthosethatdependoncandidateuserpairs.
Wethenextendourdatamodelusingwhole-networkattributesandreportmachinelearningexperimentsonlearningtheconceptofaconnectedpairoffriendsfromLiveJournaldata.
Finally,wedevelopatheoryofdependenttypesforderivingcausalexplanationsanddiscusshowthiscanbeusedtoscalestatisticalrelationallearninguptoourfullcorpus,arecentcrawlofoveramillionrecordsfromLiveJournal.
GeneralTermsAlgorithms,ExperimentationKeywordsdatamining,linkanalysis,machinelearning,socialnetworkanalysis,userprofiling.
1.
IntroductionAnalysisoffriendsnetworksprovidesabasisforunderstandingthewebofinfluence[Ko01]insocialmedia.
Inparticular,theproblemsofdeterminingtheexistenceoflinksandofclassifyingandannotatingknownlinksarefirststepstowardidentifyingpotentialrelationships.
Thisinferredinformationcaninturnbeusedtointroducenewpotentialfriendstooneanother,makebasicrecommendationssuchascommunityrecruitsormoderatorcandidates,oridentifywholecliquesandcommunities.
Inthispaper,weconsidertheproblemofdiscoveringlinksinanincompletegraph.
Wepresentanapproachtolinkpredictionthatisbasedongraphfeatureanalysisandintrinsicattributesofentities(usersandcommunities).
Wereportsomepromisingpreliminaryresultsonradius-limitedneighborhoodsofthebloggingserviceLiveJournalanddiscusstheresultsofexploratoryexperimentsthatpointtowardaneedtodifferentiatethetypesoffeaturesinafriendsnetwork,namely:1.
thosethatdependonthedemographicsoftheentirenetwork2.
thosethatarecomputableforeachuseroreachpairofuser3.
thosethatdependontheexistenceofareported,inferred,orsuspectedlinkWederivesomesuchfeaturesanddiscussthecostsofcomputing,selecting,andrecombiningthem.
Ofparticularinterestinthedomainofcommercialweblogsandsocialmediaaredemographicfeaturesrelevanttocollaborativerecommendationofgoodsandformationofbrandingcommunities.
Thestructuraldependenceandcontext-specificdependenceoffeaturesdetermineswhatnewfeaturesarefeasibletoconstruct,bothintermsofstatisticalsufficiencyandcomputationalcomplexity.
Inconclusion,weexaminesomenewfeaturesthatwerederivedbyhand,discussthealgorithmsusedtocomputethem,andrelatethesespecificalgorithmstoabroaderclassofrelationaldatabasequeriesthatformthebasisofamorepowerfulfeatureconstructionsystem.
2.
Background2.
1FriendsNetworksfromUserProfilesSocialnetworkservicessuchasMySpaceandFacebookallowuserstolistinterestsandlinktofriends,sometimesannotatingtheselinksbydesignatingtrustlevelsorqualitativeratingsforselectedfriends.
Somesuchservices,suchasGoogle'sOrkut,arecommunity-centric;others,suchasthevideobloggingserviceYouTubeandthephotoserviceFlickr,emphasizesocialmedia;whilesome,suchasSixApart'sLiveJournalandVox,areorganizedaroundtext-and-imageweblogs.
LiveJournalanditsderivativeservices,suchasGreatestJournal,DeadJournal,andJournalFen,arebasedonthesameopen-sourceservercode.
Atthetimeofthiswriting,thereareover11.
7millionLiveJournalaccounts,1.
8millionofthemactive.
ThefriendsnetworkofLiveJournal,ourtopicofstudy,hastwovarietiesofaccounts:usersandcommunities(weomitRSSfeeds).
Oneadvantageouspropertyofitsdatamodel,stemmingfromacommonschemaforthetwoaccounttypes(whichcouldoriginallybeconvertedfromusertocommunity),isthatitprovidesasimple,flexiblerepresentationforentitiesandrelations.
StartEndLinkDenotesUserUserTrustorfriendshipUserCommunityReadershiporsubscribershipCommunityUserMembership,postingaccess,maintainerCommunityCommunityObsoleteTable1.
TypesoflinksintheblogserviceLiveJournal.
Table1showsthetypesoflinksinLiveJournalandtheirconstituentattributes.
Friendshipisanasymmetricrelationbetweentwoaccounts,eachrepresentedbyavertexinadirectedgraph.
Thetypeofthestartandendpointdefinestherelationshipsetattributesofthelink.
Forexample,auseruwhoaddsanotheruservtohisorherfriendslistcanspecifythemembershipinanyofupto30groups.
Theseservethedualpurposeofblogaggregation(postsfromeachgroup'smembersarefilteredintoitsaggregatorpage,whichucanreadormakepublic)andgroups-basedsecurity(eachgroupdenotesaread/commentaccesscontrollist).
Accesscontrollistsforcommunitiesareassociatedwithmemberships(community-to-userlinks),whilecontentiscontrolledbypostersorsubscribers.
Ausercan"watch"acommunityinordertoaddallaccessiblepoststoamainaggregatorpageortocustomgroups.
Thesetofaccessiblepostsconsistsofeitherpublicpostsonly,orpublicandrestricted(members-only)posts.
Theaccesscontrollistisdefinedbythemembershiprelationandindividualposters'selections(whethertoallowcommentsandwhethertodisplaythembydefaultfromnoreaders,allreaders,non-anonymousreaders,orcommunitymembers).
Acquisitionofprivilegesisacommunityproperty,ofwhichonlymembershipmaybeacquiredsolelybyuseraction("joining"acommunity),ifthemoderatorhasspecifiedopenmembership.
Figure1.
LiveJournalaccesscontrollistmaintenance(communitymoderatorinterface).
Thus,areciprocallinkbetweenauserandacommunitymeansthattheuserbothsubscribestothecommunityandisanapprovedmember.
Linksfromuserutovarelistedinthe"Friends"listofuandinanoptionallydisplayed"FriendsOf"listofv.
Thislistcanbepartitionedintoreciprocalandnon-reciprocalsublistsforauseru:MutualFriends:{v|(v,u)∈E∧(u,v)∈E}AlsoFriendOf:{v|(v,u)∈E∧(u,v)E}Thecommunityanalogueofthe"FriendsOf"lististhe"WatchedBy"(subscriber)list,whosemembershavethecommunitynamelistedinthe"Friends:Communities"sectionsoftheirindividualuserprofilepages.
Thecommunityanalogueofthe"Friends"lististhe"Members"list.
ThefriendsnetworkforLiveJournalconsistsofaverylargecentralconnectedcomponentandmanysmallislands,mostofwhicharesingletonusers.
Thereareafewsourcevertices,correspondingtoaccountsthatlinktoothersbuthavenoreciprocatedfriendships;theseareusuallyRSSorblogaggregatoraccountsownedbyindividuals.
Additionally,therearesinkverticescorrespondingtoaccountswatchedbyothers,butwhichhavenamednofriends.
Someofthesearechannelsforannouncementordisseminationofcreativework.
2.
2LinkIdentificationInpreviouswork[HKP+06],weintroducedalinkpredictionproblemforLiveJournal:givenagraphinwhichtheexistenceofacandidatelinkishidden(elidedifitexists),classifyitaspresentorabsentgivenallotherattributesofthegraphandoftheendpoints.
Ourinitialapproachtolinkidentificationconsistedofdividingfriendsnetworkfeaturesintographfeaturesandinterest-basedfeatures.
Graphfeaturescouldbecomputedsimplybyscanningthegraph,inthecaseofpair-distancemetrics,performingall-pairsshortestpath(APSP)search:1.
Indegreeofu:popularityoftheuser2.
Indegreeofv:popularityofthecandidate3.
Outdegreeofu:numberofotherfriendsbesidesthecandidate;saturationoffriendslist4.
Outdegreeofv:numberofexistingfriendsofthecandidatebesidestheuser;correlateslooselywithlikelihoodofareciprocallink5.
Numberofmutualfriendswsuchthatu→w∧w→v6.
"Forwarddeleteddistance":minimumalternativedistancefromutovinthegraphwithouttheedge(u,v)7.
BackwarddistancefromvtouinthegraphTheseweresupplementedbyinterest-basedfeatures:8.
Numberofmutualinterestsbetweenuandv9.
Numberofinterestslistedbyu10.
Numberofinterestslistedbyv11.
Ratioofthenumberofmutualintereststothenumberlistedbyu12.
Ratioofthenumberofmutualintereststothenumberlistedbyv2.
3EfficientfeatureanalysisThedegreeattributescanbeenumeratedintimelinearinthenumberofusers,ascanthemutualfriendscountforeachpairofusers.
Forwarddeleteddistancemeasuresthedistancefromutovbyalternateroutes,aftertheedge(u,v)iselided.
Thepredictiontaskisthustoreconstructtheincompletegraphresultingfromthiserasure,todeterminewhetheraparticularlink(u,v)existed.
ForwarddeleteddistancecanbeprecomputedexhaustivelyfortheentiregraphinΘ(|E|(|V|+|E|))=Θ(|E|2)timebyerasingeachedgeinEandre-runningabreadth-firstsearchfromthestartvertex.
Ifacandidateedgeisnotstoredintheresultingcache,itsdeleteddistanceisthatfoundbyBFSontheoriginalgraph,inΘ(|V|+|E|)time.
Inagraph(V,E),backwarddistancerequiresΘ(|V|+|E|)usingBFSforaparticularcandidateedge.
SincetheexpectedsizeoftheedgesetisE[|E|]=k|V|,aboutk=20onaverageacrossLiveJournal,thebottleneckcomputationisthatofforwarddeleteddistance:Θ(|E|2)=Θ(k2|V|2),orΘ(|V|2)withalargeconstant.
Usingastraightforwardstringpairenumerationandcomparisonalgorithm,themutualinterestcountsarestoredinmatrixof|V|2elements,eachrequiringconstanttimetocheck(givenamaximumof150interests).
previouswork[HKP+06],weintroducedalinkpredictionproblemforLiveJournal:givenagraphinwhichtheexistenceofacandidatelinkishidden(elidedifitexists),classifyitaspresentorabsentgivenallotherattributesofthegraphandoftheendpoints.
Ourinitialapproachtolinkidentificationconsistedofdividingfriendsnetworkfeaturesintographfeaturesandinterest-basedfeatures.
2.
4MethodologiesforlinkminingGetoorandDiehl[GD05]recentlysurveyedtechniquesforlinkmining,focusingonstatisticalrelationallearningapproachesandemphasizinggraphicalmodelsrepresentationsoflinkstructure.
Ketkaretal.
[KHC05]comparedataminingtechniquesovergraph-basedrepresentationsoflinkstofirst-orderandrelationalrepresentationsandlearningtechniquesthatarebaseduponinductivelogicprogramming(ILP).
SarkarandMoore[SM05]extendtheanalysisofsocialnetworksintothetemporaldimensionbymodelingchangeinlinkstructureacrossdiscretetimesteps,usinglatentspacemodelsandmultidimensionalscaling.
OneofthechallengesincollectingtimeseriesdatafromLiveJournalistheslowrateofdataacquisition,justasspatialannotationdata(suchasthatfoundinLJmapsandthe"plotyourfriendsonamapmeme)isrelativelyincomplete.
2.
5OtherapplicationsusinggraphminingPopesculandUngar[PU03]learnakindofentity-relationalmodelfromdatainordertopredictlinks.
Hill[Hi03]andBhattacharyaandGetoor[BG04]similarlyusestatisticalrelationallearningfromdatainordertoresolveidentityuncertainty,particularlycoreferencesandotherredundancies(alsocalleddeduplication).
Resigetal.
[RDHT04]usealarge(200000-user)crawlofLiveJournaltoannotateasocialnetworkofinstantmessagingusers,andexploretheapproachofpredictingonlinetimesasafunctionoffriendsgraphdegree.
Therehavebeennumerousrecentapplicationsofsocialnetworkminingbasedonthetextandheadersofe-mail.
OnenotableresearchprojectbyMcCallumetal.
[MCW05]usestheEnrone-mailcorpusandinfersrolesandtopiccategoriesbasedonlinkanalysisAprimarygoalofthisworkistoextendthegraphminingapproachbeyondlinkpredictionandrecommendationtowardslinkexplanationandannotation.
Itmaybemuchmoreusefultoexplainwhyagroupoffriendsinablogservicecreatedaccountsenmasseoraddedoneanotherasfriendsthantorecommendrelationshipsetsthatarealreadyextantorstructuredaccordingtoapreexistentsocialgroup.
Forexample,highschoolclassmatesoftencreateaccountsandencouragetheirpeerstojointhesameservice.
Inafewcases,thisisencouragedorfacilitatedbyateacher,foraclassproject.
Solvingtheproblemoflinkpredictionisnotparticularlyusefulinthiscase,becausetheuserdecisionshavealreadybeenmadeorstronglyconstrained;however,itmaybeveryusefultolinkotherclassmatesnotworkingonthesameprojecttothesamerelationshipset(perhapstheywereencouragedtojointheblogservicebystudentswhocontinuedtouseitaftertheclassproject).
Largegroupssuchaswebcomicsubscriberships,communityco-members,etc.
arealsosomewhatidentifiable,andrelatingmembersofablogservicetooneanotherthroughrelationshipsetsisatypicalentity-relationaldatamodelingoperationthatcanbemademorerobustandefficientthroughgraphfeatureextraction.
3.
ExperimentDesign3.
1LJCrawlerv2Toacquirethegraphstructureandattributesdescribeintheprevioussection,wedevelopedanHTTP-basedspidercalledLJCrawlertoharvestuserinformationfromLiveJournalAmultithreadedversionofthisprogram,whichretrievesBMLdatapublishedbyDenga(theownersofLiveJournal),collectsanaverageofupto15recordspersecond,traversingthesocialnetworkdepth-firstandarchivingtheresultsinamasterindexfile.
BecauseLiveJournal'sfunctionalityforlookingupusersbyusernumberisonlyavailabletoadministrators,wedecidedtocompilealistofseedsforadisjoint-setrepresentationofthedisconnectedsocialnetwork.
Forpurposesofthisexperiment,however,startingfromjustoneseed(thefirstauthor'sLiveJournalID)andrestrictingthecrawltooneconnectedcomponentwassufficient.
UsingLJCrawler,wecompiledanadjacencylistandthefollowinggroundfeaturesforeachuser:Accounttype(user,community)InterestlistSchoollistCommunitieswatchedlistCommunitymembershiplistFriendsoflistFriendslist3.
2FeatureAnalyzersWedefineasingleexampletobeacandidateedge(u,v)intheunderlyingdirectedgraphofthesocialnetwork,alongwithasetofdescriptivefeaturescalculatedfromtheannotatedgraphrecordedbyLJCrawler:Otherfeatures:Additionalplannedfeaturesforcontinuingexperimentsincludedates(updatefrequencieswhentakendifferentially),useroptionssuchasmaximumfriendscount,andcontentdescriptorsofLiveJournalentriesandcomments(averagepostlength,wordfrequency,etc.
).
3.
3GraphSearchAlgorithmsforComputingFeaturesComputingtheminimumforwardandbackwarddistancescanbedonemoreefficientlybyusingbreadth-firstsearch.
Currently,aJavaimplementationofthisalgorithmrequiresunderoneminuteona2GHzAMDOpteronsystemtoprocessa2000-nodegraph.
However,enumeratingallpossiblecandidatepairswithinaneighborhoodof2nodes(1.
6millionpairsfor4000nodes)requiresseveralhoursonthesamesystem.
WenotethattheamortizedcostofrunningBFStoprecomputeall-pairsshortestpaths(APSP)withtheactualedgedeleted(whichisnecessarytoavoidknowingthepredictiontargetinlinkpredicton)isΘ(|E|(|V|+|E|)).
Thisisprohibitivelylargeevenforour"mid-sized"subgraphsof10-50Knodes;when|V|isabout11million,|E|isalittleover200million,enumeratingAPSPiscompletelyinfeasible.
However,wedonottypicallyconsiderallofE,sothebottleneckistypicallythefirststepplusaconstantnumberofcallstoBFS,requiringrunningtimeinΘ(k(|V|+|E|)).
3.
4GeneratingCandidatesWeconsideredseveralalternativewaystogeneratecandidateedges(u,v):Thefirsttechniqueislikelytobeunscalable,asthenumberofcandidatesis|V|2.
ThesecondrequireshavingarepresentativelylargesampleofthefullLiveJournalsocialnetwork,inordertofitthedistributionparametersaccurately.
Thethirdwasthemoststraightforwardtoimplement.
Twocallstotheallpairsshortestpathalgorithmprovidedcostmatrix,andonepassateachradiusuptoamaximumof10yieldedthedatashowninTable2.
Tosimplifytheinitialexperiments,wedefinedtheclassificationproblemtobeclassificationofd(u,v)as1or2.
Thistaskisactuallyusefulforsocialnetworkrecommendersystemsbecausediscriminationofadirectfriendfroma"friendofafriend"(FOAF)isfunctionallysimilartorecommendingFOAFstolinktodirectly.
Therearemoredetailedclassificationtargets,suchasplacement,promotion,anddemotionoflinkedfriendswithinstrataoftrust(setting,increasing,anddecreasingthesecuritylevel),butchoosingauser'sfriendstobeginwithisthemorefundamentaldecision.
Table2andTable3reportthedistributionofinter-vertexdistancesinthefriendsnetworkfortwosubnetworksinducedbylimitingthemaximumnumberofnodes.
DistancedFrequency(=d)Cumulative(≤d)1620462042107307113511369896183407459926243333534002467336255246988716247004812470059001000∞9731256735Table2.
Numberofcandidateedgesforthe1000-nodeLiveJournalgraph.
DistancedFrequency(=d)Cumulative(≤d)1194101941023705683899783403075793053452037313134265123747143717361845314556267265714582838339145862292914586511001458651∞1745341633185Table3.
Numberofcandidateedgesforthe4000-nodeLiveJournalgraph.
4.
Results4.
1Preliminaryexperiment:941-nodeversionInapreliminaryexperiment,weconstructeda941-nodesubgraph,definingtheconceptIsFriendOfandtrainedthreetypesofinducerswith:1.
allattributes2.
allgraphattributesexcludingtheforwardandbackwarddistances3.
thebackwarddistancesalone4.
thebackwardandforwarddistancesalone5.
interest-relatedattributesalone.
Table4andTable5showtheresultsforthreeinducers:theJ48decisiontreeinducer,Holte's1Rinducer(asingle-ruleclassifierbasedonasingleattribute)[Ho93],andtheLogisticregressioninducer.
Allaccuracymeasureswerecollectedover10-foldcross-validatedruns.
TheJ48outputwthallfeaturesachievesasignificantboostoverthenexthighest(distanceonly).
InducerAllNoDistBkDistDistInterestJ4898.
294.
895.
897.
688.
5OneR95.
892.
095.
895.
888.
5Logistic91.
690.
988.
388.
988.
4Table4.
Percentaccuracyforpredictingallclassesusingthe941-nodegraph.
InducerAllNoDistBkDistDistInterestJ4889.
565.
767.
783.
05.
4OneR67.
741.
167.
767.
74.
5Logistic38.
333.
304.
54.
5Table5.
Precision(truepositivestoallpositives)usingthe941-nodegraph.
4.
2ExperimentsonrestrictedgraphsWedevelopedanapplication,ljclipper,torestricttheoverallfriendsgraphtothatinducedbyasubsetofnodesoffixednumber,foundusingbreadth-firstsearchstartingfromagivenseed.
Usinga4000-nodesubgraphsummarizedinTable3,wegenerated1633185candidateedges.
Notethatallforwarddistancesaregreaterthan1:whenuandvareactuallyconnected,weerase(u,v).
Inpreliminaryexperiments,wethencomputedthelengthoftheshortestalternativepath.
Thisis,however,alessscalableapproach,becausetheasymptoticrunningtimeisdominatedbythesuperlineartimerequiredtocomputeThecompletelistingofalltwelvefeaturesisgiveninSection2.
2.
Thenumericaltypesofallofthenetworkfeatures–boththeonesdescribingthegraphandthosemeasuringandinterestsandratios–makesdatasetamenabletologisticregression.
InducerAccuracyPrecisionRecallJ4899.
997.
596.
1OneR99.
691.
791.
8Table6.
Percentaccuracy,precisionandrecallusinga1000-nodegraph(10-foldCV).
InducerAccuracyPrecisionRecallJ4899.
895.
892.
0OneR99.
791.
189.
9Table7.
Percentaccuracy,precisionandrecallusinga2000-nodegraph(10-foldCV).
InducerAccuracyPrecisionRecallJ4899.
894.
588.
3OneR99.
788.
284.
3Table8.
Percentaccuracy,precisionandrecallusinga4000-nodegraph(10-foldCV).
Table6throughTable8showtheaccuracy,precision,andrecallforthe1000,2000,and4000-nodefriendsgraphs.
Trendsofhigherprecisionthanrecall,anddiminishingprecisionandrecallasthenetworkgrowslarger,areobserved.
Thesetrendsaresustainedforsubsamplesofsize10000andsize100000,thoughprecisionandrecallalsodiminishslightlywithsampling.
4.
3DataacquisitionandlargerexperimentsThecrawlerhasbeenimprovedwithseveralservice-specificoptimizationsforfetchinguserinfopages.
PresentlythesedonotuseLiveJournal'sBMLfeedofuserdata,whichisincompleteforourpurposes(thatis,notallgroundattributesinourinitialrelationsareprovided).
Atpresstime,thiscrawlerprocessesabout20000userrecordsperhourandthuswouldrequireoveraweektocrawlLiveJournal.
ThecurrentbottleneckistheΘ(|V|(|V|+|E|))stepdescribedinSection3.
3.
Thisisthedominantterm,becausetheconstantkdenotingthenumberofcandidateedgesisusuallymuchsmallerthann,e.
g.
,100-1000,sothatΘ(k(|V|+|E|))isnotonlyinΘ(|V|+|E|),butactuallyjustafewhundredtimesthecostofasingleBFS.
4.
4InterpretationUsingmutualinterestsalone,evenwithnormalizationbasedonthenumberofinterestsinuandv,resultsinverypoorpredictionaccuracyusingallinducerswithwhichweexperimented.
Intermediateresultsareachievedusingmutualfriendscountanddegree(NoDist:65.
7%onpredictingedges)andusingforwarddeleteddistanceandbackwarddistance(Dist:67.
7%).
Usingall12computedgraphandannotationfeaturesresultedinthehighestprecision(All:89.
5%)andaccuracy(All:98.
2%).
WenotethatLiveJournalonceusedavariantofnormalizedmutualintereststoproducealistofpotentialfriends,arrangedindecreasingorderofmatchquality.
AlthoughthiswasnotthesametypeofrecommendersystemasLJMinersupports,itshowsthatthestateoftheartusermatchingsystemshavealotofroomforimprovement.
TheresultsindicatethatfeaturesproducedbyLJMiner,usedwithagoodinducer,cangeneratecollaborativeandstructuralrecommendations.
5.
ContinuingWorkScalingup:Ourcurrentresearchfocusesonscalinguptotensofthousandsandeventuallymillionsofusers.
Crawlingover11-12millionrecordsisatleasttechnicallyfeasible,butscalingupthegraphanalyzersisachallengethatmaybestbemetwithheuristicsearch.
Learningrelationalmodels:Apromisingareaofresearchistherecoveryofrelationalgraphicalmodels,includingclass-level(membershipandreferenceslot)uncertainty.
[GFKT02]LJMinerhasyieldedareadysourceofsemistructureddataforbothstructurelearninganddistributionlearning.
Anotherpotentiallyusefulapproachistoorganizeusersandcommunitiesintoclustersusingthisrelationalmodel.
Wehavedevelopedschemasforblogposts(entries,threads,comments)andforusersanddynamicgroupsofusers.
Thisisrelatedtopreviouspreliminaryworkonrelationaldataminingforpersonalizationofwebportals,especiallycomputationalgridportals.
[HBJ03].
Muchoftherelationalmetadatainthebioinformaticsdomaincomesfromdescriptionlanguagesforworkflowsandworkflowcomponents[Hs04].
Thenextstepinourexperimentalplanistouseschemassuchasourdetailedonesforblogseviceusersandbioinformaticsinformationandcomputationalgridusers[Hs05]tolearnaricherpredictivemodel.
Finally,modelingrelationaldataasitpersistsorchangesacrosstimeisanimportantchallenge.
AcknowledgementsWethankToddEastonandKirstenHildrumforhelpfuldiscussionsconcerningalgorithmsandtheLiveJournaldatamodel.
WealsothankAndrewKingandTejaswiPydimarriforcontributionstotheoriginalLJMinersystemandVikasBahirwaniforcontributionstothesecondversion.
References[BG04]I.
Bhattacharya&L.
Getoor.
Deduplicationandgroupdetectionusinglinks.
InProceedingsoftheACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD)WorkshoponLinkAnalysisandGroupDetection(LinkKDD2004),Seattle,WA,USA,August22-25,2004.
[CLRS02]T.
H.
Cormen,C.
E.
Leiserson,R.
L.
Rivest,&C.
Stein.
IntroductiontoAlgorithms,SecondEdition.
Cambridge,MA:MITPress,2002.
[GD05]L.
Getoor&C.
P.
Diehl.
Linkmining:asurvey.
SIGKDDExplorations,SpecialIssueonLinkMining,7(2):3-12.
[GFKT02]L.
Getoor,N.
Friedman,D.
Koller,&B.
Taskar.
LearningProbabilisticModelsofLinkStructure.
JournalofMachineLearningResearch,2002.
[HBJ03]W.
H.
Hsu,P.
Boddhireddy,&R.
Joehanes.
Usingprobabilisticrelationalmodelsforcollaborativefiltering.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponStatisticalLearningofRelationalModels(SRL),Acapulco,MEXICO,August,2003.
[Hi03]S.
Hill.
SocialnetworkrelationalvectorsforanonymousidentitymatchingInProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponStatisticalLearningofRelationalModels(SRL),Acapulco,MEXICO,August,2003.
[Ho93]R.
C.
Holte.
VerySimpleClassificationRulesPerformWellonMostCommonlyUsedDatasets.
MachineLearning,11(1):63-90.
[Hs04]W.
H.
Hsu.
Relationalgraphicalmodelsofcomputationalworkflowsfordatamining.
InProceedingsoftheInternationalConferenceonSemanticsofaNetworkedWorld:SemanticsforGridDatabases(ICSNW-2004),p.
309-310,Paris,FRANCE,June,2004.
[Hs05]W.
H.
Hsu.
Relationalgraphicalmodelsforcollaborativefilteringandrecommendationofcomputationalworkflowcomponents.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponMulti-AgentInformationRetrievalandRecommenderSystems,Edinburgh,UK,July31,2005.
[HKP+06]W.
H.
Hsu,A.
King,M.
S.
R.
Paradesi,T.
Pydimarri,&T.
Weninger.
CollaborativeandStructuralRecommendationofFriendsusingWeblog-basedSocialNetworkAnalysis.
InProceedingsofthe2006AAAISpringSymposiumonComputatationalApproachestoAnalyzingWeblogs(CAAW2006).
[KHC05]N.
S.
Ketkar,L.
B.
Holder,&D.
J.
Cook.
Comparisonofgraph-basedandlogic-basedmulti-relationaldatamining.
SIGKDDExplorations,SpecialIssueonLinkMining,7(2):64-71.
[Ko01]D.
Koller.
Representation,ReasoningandLearning.
IJCAIComputersandThoughtAwardLecture,2001.
[MCW05]A.
McCallum,A.
Corrada-Emmanuel,&X.
Wang.
Topicandrolediscoveryinsocialnetworks.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI),Edinburgh,UK,August,2005.
[MH04]M.
Mukherjee&L.
B.
Holder.
Graph-baseddataminingonsocialnetworks.
InProceedingsoftheACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD)WorkshoponLinkAnalysisandGroupDetection(LinkKDD2004),Seattle,WA,USA,August22-25,2004.
[PU03]A.
Popescul&L.
H.
Ungar.
Statisticalrelationallearningforlinkprediction.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponStatisticalLearningofRelationalModels(SRL),Acapulco,MEXICO,August,2003.
[RDHT04]J.
Resig,S.
Dawara,C.
M.
Homan,&A.
Teredesai.
Extractingsocialnetworksfrominstantmessagingpopulations.
InProceedingsoftheACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD)WorkshoponLinkAnalysisandGroupDetection(LinkKDD2004),Seattle,WA,USA,August22-25,2004.
[SM05]P.
Sarkar&A.
Moore.
Dynamicsocialnetworkanalysisusinglatentspacemodels.
SIGKDDExplorations,SpecialIssueonLinkMining,7(2):31-40.
香港云服务器最便宜价格是多少钱一个月/一年?无论香港云服务器推出什么类型的配置和活动,价格都会一直吸引我们,那么就来说说香港最便宜的云服务器类型和香港最低的云服务器价格吧。香港云服务器最便宜最低价的价格是多少?香港云服务器只是服务器中最受欢迎的产品。香港云服务器有多种配置类型,如1核1G、2核2G、2核4G、8到16核32G等。这些配置可以满足大多数用户的需求,无论是电商站、视频还是游戏、小说等。...
快快CDN主营业务为海外服务器无须备案,高防CDN,防劫持CDN,香港服务器,美国服务器,加速CDN,是一家综合性的主机服务商。美国高防服务器,1800DDOS防御,单机1800G DDOS防御,大陆直链 cn2线路,线路友好。快快CDN全球安全防护平台是一款集 DDOS 清洗、CC 指纹识别、WAF 防护为一体的外加全球加速的超强安全加速网络,为您的各类型业务保驾护航加速前进!价格都非常给力,需...
创梦网络怎么样,创梦网络公司位于四川省达州市,属于四川本地企业,资质齐全,IDC/ISP均有,从创梦网络这边租的服务器均可以****,属于一手资源,高防机柜、大带宽、高防IP业务,另外创梦网络近期还会上线四川眉山联通、广东优化线路高防机柜,CN2专线相关业务。广东电信大带宽近期可以预约机柜了,成都优化线路,机柜租用、服务器云服务器租用,适合建站做游戏,不须要在套CDN,全国访问快,直连省骨干,大网...
graphsearch为你推荐
工艺美术品设计专业力学迅雷绑定ipad支持ipadVTLHiosnetbios端口如何组织netbios端口的外部通信ipadwifiipad的wifi打不开怎么办?iexplore.exe应用程序错误iexplore.exe - 应用程序错误怎么办阿??????fusioncharts如何自定义FusionCharts图表上的工具提示?360chrome360Chrome 世界之窗极速浏览器 ChromePlus
vps推荐 域名解析文件 美元争夺战 云图标 网通服务器ip mysql主机 免费ftp空间申请 英文站群 佛山高防服务器 网游服务器 流媒体加速 免费蓝钻 ssl加速 大化网 万网服务器 nnt 锐速 贵州电信 香港博客 accountsuspended 更多