reciprocatedgraphsearch
graphsearch  时间:2021-05-25  阅读:(
)
 
 
ICWSM'2007Boulder,Colorado,USAStructuralLinkAnalysisfromUserProfilesandFriendsNetworks:AFeatureConstructionApproachWilliamH.
HsuJosephLancasterMartinS.
R.
ParadesiTimWeningerDepartmentofComputingandInformationSciences,KansasStateUniversity234NicholsHallManhattan,KS66506-2302+17855326350{bhsu|joseph|pmsr|weninger}@ksu.
eduAbstractWeconsidertheproblemsofpredicting,classifying,andannotatingfriendsrelationsinfriendsnetworks,baseduponnetworkstructureanduserprofiledata.
First,wedocumentadatamodelfortheblogserviceLiveJournal,anddefineasetofmachinelearningproblemssuchaspredictingexistinglinksandestimatinginter-pairdistance.
Next,weexplainhowtheproblemofclassifyingauserpairinasocialnetwork,asdirectlyconnectedornot,posestheproblemofselectingandconstructingrelevantfeatures.
Wedocumentfeatureanalyzersforattributesthatdependonlyongraphattributes,thosethatdependonindividualuserdemographicsandset-valuedattributes(e.
g.
,interests,communities,andeducationalinstitutions),andthosethatdependoncandidateuserpairs.
Wethenextendourdatamodelusingwhole-networkattributesandreportmachinelearningexperimentsonlearningtheconceptofaconnectedpairoffriendsfromLiveJournaldata.
Finally,wedevelopatheoryofdependenttypesforderivingcausalexplanationsanddiscusshowthiscanbeusedtoscalestatisticalrelationallearninguptoourfullcorpus,arecentcrawlofoveramillionrecordsfromLiveJournal.
GeneralTermsAlgorithms,ExperimentationKeywordsdatamining,linkanalysis,machinelearning,socialnetworkanalysis,userprofiling.
1.
IntroductionAnalysisoffriendsnetworksprovidesabasisforunderstandingthewebofinfluence[Ko01]insocialmedia.
Inparticular,theproblemsofdeterminingtheexistenceoflinksandofclassifyingandannotatingknownlinksarefirststepstowardidentifyingpotentialrelationships.
Thisinferredinformationcaninturnbeusedtointroducenewpotentialfriendstooneanother,makebasicrecommendationssuchascommunityrecruitsormoderatorcandidates,oridentifywholecliquesandcommunities.
Inthispaper,weconsidertheproblemofdiscoveringlinksinanincompletegraph.
Wepresentanapproachtolinkpredictionthatisbasedongraphfeatureanalysisandintrinsicattributesofentities(usersandcommunities).
Wereportsomepromisingpreliminaryresultsonradius-limitedneighborhoodsofthebloggingserviceLiveJournalanddiscusstheresultsofexploratoryexperimentsthatpointtowardaneedtodifferentiatethetypesoffeaturesinafriendsnetwork,namely:1.
thosethatdependonthedemographicsoftheentirenetwork2.
thosethatarecomputableforeachuseroreachpairofuser3.
thosethatdependontheexistenceofareported,inferred,orsuspectedlinkWederivesomesuchfeaturesanddiscussthecostsofcomputing,selecting,andrecombiningthem.
Ofparticularinterestinthedomainofcommercialweblogsandsocialmediaaredemographicfeaturesrelevanttocollaborativerecommendationofgoodsandformationofbrandingcommunities.
Thestructuraldependenceandcontext-specificdependenceoffeaturesdetermineswhatnewfeaturesarefeasibletoconstruct,bothintermsofstatisticalsufficiencyandcomputationalcomplexity.
Inconclusion,weexaminesomenewfeaturesthatwerederivedbyhand,discussthealgorithmsusedtocomputethem,andrelatethesespecificalgorithmstoabroaderclassofrelationaldatabasequeriesthatformthebasisofamorepowerfulfeatureconstructionsystem.
2.
Background2.
1FriendsNetworksfromUserProfilesSocialnetworkservicessuchasMySpaceandFacebookallowuserstolistinterestsandlinktofriends,sometimesannotatingtheselinksbydesignatingtrustlevelsorqualitativeratingsforselectedfriends.
Somesuchservices,suchasGoogle'sOrkut,arecommunity-centric;others,suchasthevideobloggingserviceYouTubeandthephotoserviceFlickr,emphasizesocialmedia;whilesome,suchasSixApart'sLiveJournalandVox,areorganizedaroundtext-and-imageweblogs.
LiveJournalanditsderivativeservices,suchasGreatestJournal,DeadJournal,andJournalFen,arebasedonthesameopen-sourceservercode.
Atthetimeofthiswriting,thereareover11.
7millionLiveJournalaccounts,1.
8millionofthemactive.
ThefriendsnetworkofLiveJournal,ourtopicofstudy,hastwovarietiesofaccounts:usersandcommunities(weomitRSSfeeds).
Oneadvantageouspropertyofitsdatamodel,stemmingfromacommonschemaforthetwoaccounttypes(whichcouldoriginallybeconvertedfromusertocommunity),isthatitprovidesasimple,flexiblerepresentationforentitiesandrelations.
StartEndLinkDenotesUserUserTrustorfriendshipUserCommunityReadershiporsubscribershipCommunityUserMembership,postingaccess,maintainerCommunityCommunityObsoleteTable1.
TypesoflinksintheblogserviceLiveJournal.
Table1showsthetypesoflinksinLiveJournalandtheirconstituentattributes.
Friendshipisanasymmetricrelationbetweentwoaccounts,eachrepresentedbyavertexinadirectedgraph.
Thetypeofthestartandendpointdefinestherelationshipsetattributesofthelink.
Forexample,auseruwhoaddsanotheruservtohisorherfriendslistcanspecifythemembershipinanyofupto30groups.
Theseservethedualpurposeofblogaggregation(postsfromeachgroup'smembersarefilteredintoitsaggregatorpage,whichucanreadormakepublic)andgroups-basedsecurity(eachgroupdenotesaread/commentaccesscontrollist).
Accesscontrollistsforcommunitiesareassociatedwithmemberships(community-to-userlinks),whilecontentiscontrolledbypostersorsubscribers.
Ausercan"watch"acommunityinordertoaddallaccessiblepoststoamainaggregatorpageortocustomgroups.
Thesetofaccessiblepostsconsistsofeitherpublicpostsonly,orpublicandrestricted(members-only)posts.
Theaccesscontrollistisdefinedbythemembershiprelationandindividualposters'selections(whethertoallowcommentsandwhethertodisplaythembydefaultfromnoreaders,allreaders,non-anonymousreaders,orcommunitymembers).
Acquisitionofprivilegesisacommunityproperty,ofwhichonlymembershipmaybeacquiredsolelybyuseraction("joining"acommunity),ifthemoderatorhasspecifiedopenmembership.
Figure1.
LiveJournalaccesscontrollistmaintenance(communitymoderatorinterface).
Thus,areciprocallinkbetweenauserandacommunitymeansthattheuserbothsubscribestothecommunityandisanapprovedmember.
Linksfromuserutovarelistedinthe"Friends"listofuandinanoptionallydisplayed"FriendsOf"listofv.
Thislistcanbepartitionedintoreciprocalandnon-reciprocalsublistsforauseru:MutualFriends:{v|(v,u)∈E∧(u,v)∈E}AlsoFriendOf:{v|(v,u)∈E∧(u,v)E}Thecommunityanalogueofthe"FriendsOf"lististhe"WatchedBy"(subscriber)list,whosemembershavethecommunitynamelistedinthe"Friends:Communities"sectionsoftheirindividualuserprofilepages.
Thecommunityanalogueofthe"Friends"lististhe"Members"list.
ThefriendsnetworkforLiveJournalconsistsofaverylargecentralconnectedcomponentandmanysmallislands,mostofwhicharesingletonusers.
Thereareafewsourcevertices,correspondingtoaccountsthatlinktoothersbuthavenoreciprocatedfriendships;theseareusuallyRSSorblogaggregatoraccountsownedbyindividuals.
Additionally,therearesinkverticescorrespondingtoaccountswatchedbyothers,butwhichhavenamednofriends.
Someofthesearechannelsforannouncementordisseminationofcreativework.
2.
2LinkIdentificationInpreviouswork[HKP+06],weintroducedalinkpredictionproblemforLiveJournal:givenagraphinwhichtheexistenceofacandidatelinkishidden(elidedifitexists),classifyitaspresentorabsentgivenallotherattributesofthegraphandoftheendpoints.
Ourinitialapproachtolinkidentificationconsistedofdividingfriendsnetworkfeaturesintographfeaturesandinterest-basedfeatures.
Graphfeaturescouldbecomputedsimplybyscanningthegraph,inthecaseofpair-distancemetrics,performingall-pairsshortestpath(APSP)search:1.
Indegreeofu:popularityoftheuser2.
Indegreeofv:popularityofthecandidate3.
Outdegreeofu:numberofotherfriendsbesidesthecandidate;saturationoffriendslist4.
Outdegreeofv:numberofexistingfriendsofthecandidatebesidestheuser;correlateslooselywithlikelihoodofareciprocallink5.
Numberofmutualfriendswsuchthatu→w∧w→v6.
"Forwarddeleteddistance":minimumalternativedistancefromutovinthegraphwithouttheedge(u,v)7.
BackwarddistancefromvtouinthegraphTheseweresupplementedbyinterest-basedfeatures:8.
Numberofmutualinterestsbetweenuandv9.
Numberofinterestslistedbyu10.
Numberofinterestslistedbyv11.
Ratioofthenumberofmutualintereststothenumberlistedbyu12.
Ratioofthenumberofmutualintereststothenumberlistedbyv2.
3EfficientfeatureanalysisThedegreeattributescanbeenumeratedintimelinearinthenumberofusers,ascanthemutualfriendscountforeachpairofusers.
Forwarddeleteddistancemeasuresthedistancefromutovbyalternateroutes,aftertheedge(u,v)iselided.
Thepredictiontaskisthustoreconstructtheincompletegraphresultingfromthiserasure,todeterminewhetheraparticularlink(u,v)existed.
ForwarddeleteddistancecanbeprecomputedexhaustivelyfortheentiregraphinΘ(|E|(|V|+|E|))=Θ(|E|2)timebyerasingeachedgeinEandre-runningabreadth-firstsearchfromthestartvertex.
Ifacandidateedgeisnotstoredintheresultingcache,itsdeleteddistanceisthatfoundbyBFSontheoriginalgraph,inΘ(|V|+|E|)time.
Inagraph(V,E),backwarddistancerequiresΘ(|V|+|E|)usingBFSforaparticularcandidateedge.
SincetheexpectedsizeoftheedgesetisE[|E|]=k|V|,aboutk=20onaverageacrossLiveJournal,thebottleneckcomputationisthatofforwarddeleteddistance:Θ(|E|2)=Θ(k2|V|2),orΘ(|V|2)withalargeconstant.
Usingastraightforwardstringpairenumerationandcomparisonalgorithm,themutualinterestcountsarestoredinmatrixof|V|2elements,eachrequiringconstanttimetocheck(givenamaximumof150interests).
previouswork[HKP+06],weintroducedalinkpredictionproblemforLiveJournal:givenagraphinwhichtheexistenceofacandidatelinkishidden(elidedifitexists),classifyitaspresentorabsentgivenallotherattributesofthegraphandoftheendpoints.
Ourinitialapproachtolinkidentificationconsistedofdividingfriendsnetworkfeaturesintographfeaturesandinterest-basedfeatures.
2.
4MethodologiesforlinkminingGetoorandDiehl[GD05]recentlysurveyedtechniquesforlinkmining,focusingonstatisticalrelationallearningapproachesandemphasizinggraphicalmodelsrepresentationsoflinkstructure.
Ketkaretal.
[KHC05]comparedataminingtechniquesovergraph-basedrepresentationsoflinkstofirst-orderandrelationalrepresentationsandlearningtechniquesthatarebaseduponinductivelogicprogramming(ILP).
SarkarandMoore[SM05]extendtheanalysisofsocialnetworksintothetemporaldimensionbymodelingchangeinlinkstructureacrossdiscretetimesteps,usinglatentspacemodelsandmultidimensionalscaling.
OneofthechallengesincollectingtimeseriesdatafromLiveJournalistheslowrateofdataacquisition,justasspatialannotationdata(suchasthatfoundinLJmapsandthe"plotyourfriendsonamapmeme)isrelativelyincomplete.
2.
5OtherapplicationsusinggraphminingPopesculandUngar[PU03]learnakindofentity-relationalmodelfromdatainordertopredictlinks.
Hill[Hi03]andBhattacharyaandGetoor[BG04]similarlyusestatisticalrelationallearningfromdatainordertoresolveidentityuncertainty,particularlycoreferencesandotherredundancies(alsocalleddeduplication).
Resigetal.
[RDHT04]usealarge(200000-user)crawlofLiveJournaltoannotateasocialnetworkofinstantmessagingusers,andexploretheapproachofpredictingonlinetimesasafunctionoffriendsgraphdegree.
Therehavebeennumerousrecentapplicationsofsocialnetworkminingbasedonthetextandheadersofe-mail.
OnenotableresearchprojectbyMcCallumetal.
[MCW05]usestheEnrone-mailcorpusandinfersrolesandtopiccategoriesbasedonlinkanalysisAprimarygoalofthisworkistoextendthegraphminingapproachbeyondlinkpredictionandrecommendationtowardslinkexplanationandannotation.
Itmaybemuchmoreusefultoexplainwhyagroupoffriendsinablogservicecreatedaccountsenmasseoraddedoneanotherasfriendsthantorecommendrelationshipsetsthatarealreadyextantorstructuredaccordingtoapreexistentsocialgroup.
Forexample,highschoolclassmatesoftencreateaccountsandencouragetheirpeerstojointhesameservice.
Inafewcases,thisisencouragedorfacilitatedbyateacher,foraclassproject.
Solvingtheproblemoflinkpredictionisnotparticularlyusefulinthiscase,becausetheuserdecisionshavealreadybeenmadeorstronglyconstrained;however,itmaybeveryusefultolinkotherclassmatesnotworkingonthesameprojecttothesamerelationshipset(perhapstheywereencouragedtojointheblogservicebystudentswhocontinuedtouseitaftertheclassproject).
Largegroupssuchaswebcomicsubscriberships,communityco-members,etc.
arealsosomewhatidentifiable,andrelatingmembersofablogservicetooneanotherthroughrelationshipsetsisatypicalentity-relationaldatamodelingoperationthatcanbemademorerobustandefficientthroughgraphfeatureextraction.
3.
ExperimentDesign3.
1LJCrawlerv2Toacquirethegraphstructureandattributesdescribeintheprevioussection,wedevelopedanHTTP-basedspidercalledLJCrawlertoharvestuserinformationfromLiveJournalAmultithreadedversionofthisprogram,whichretrievesBMLdatapublishedbyDenga(theownersofLiveJournal),collectsanaverageofupto15recordspersecond,traversingthesocialnetworkdepth-firstandarchivingtheresultsinamasterindexfile.
BecauseLiveJournal'sfunctionalityforlookingupusersbyusernumberisonlyavailabletoadministrators,wedecidedtocompilealistofseedsforadisjoint-setrepresentationofthedisconnectedsocialnetwork.
Forpurposesofthisexperiment,however,startingfromjustoneseed(thefirstauthor'sLiveJournalID)andrestrictingthecrawltooneconnectedcomponentwassufficient.
UsingLJCrawler,wecompiledanadjacencylistandthefollowinggroundfeaturesforeachuser:Accounttype(user,community)InterestlistSchoollistCommunitieswatchedlistCommunitymembershiplistFriendsoflistFriendslist3.
2FeatureAnalyzersWedefineasingleexampletobeacandidateedge(u,v)intheunderlyingdirectedgraphofthesocialnetwork,alongwithasetofdescriptivefeaturescalculatedfromtheannotatedgraphrecordedbyLJCrawler:Otherfeatures:Additionalplannedfeaturesforcontinuingexperimentsincludedates(updatefrequencieswhentakendifferentially),useroptionssuchasmaximumfriendscount,andcontentdescriptorsofLiveJournalentriesandcomments(averagepostlength,wordfrequency,etc.
).
3.
3GraphSearchAlgorithmsforComputingFeaturesComputingtheminimumforwardandbackwarddistancescanbedonemoreefficientlybyusingbreadth-firstsearch.
Currently,aJavaimplementationofthisalgorithmrequiresunderoneminuteona2GHzAMDOpteronsystemtoprocessa2000-nodegraph.
However,enumeratingallpossiblecandidatepairswithinaneighborhoodof2nodes(1.
6millionpairsfor4000nodes)requiresseveralhoursonthesamesystem.
WenotethattheamortizedcostofrunningBFStoprecomputeall-pairsshortestpaths(APSP)withtheactualedgedeleted(whichisnecessarytoavoidknowingthepredictiontargetinlinkpredicton)isΘ(|E|(|V|+|E|)).
Thisisprohibitivelylargeevenforour"mid-sized"subgraphsof10-50Knodes;when|V|isabout11million,|E|isalittleover200million,enumeratingAPSPiscompletelyinfeasible.
However,wedonottypicallyconsiderallofE,sothebottleneckistypicallythefirststepplusaconstantnumberofcallstoBFS,requiringrunningtimeinΘ(k(|V|+|E|)).
3.
4GeneratingCandidatesWeconsideredseveralalternativewaystogeneratecandidateedges(u,v):Thefirsttechniqueislikelytobeunscalable,asthenumberofcandidatesis|V|2.
ThesecondrequireshavingarepresentativelylargesampleofthefullLiveJournalsocialnetwork,inordertofitthedistributionparametersaccurately.
Thethirdwasthemoststraightforwardtoimplement.
Twocallstotheallpairsshortestpathalgorithmprovidedcostmatrix,andonepassateachradiusuptoamaximumof10yieldedthedatashowninTable2.
Tosimplifytheinitialexperiments,wedefinedtheclassificationproblemtobeclassificationofd(u,v)as1or2.
Thistaskisactuallyusefulforsocialnetworkrecommendersystemsbecausediscriminationofadirectfriendfroma"friendofafriend"(FOAF)isfunctionallysimilartorecommendingFOAFstolinktodirectly.
Therearemoredetailedclassificationtargets,suchasplacement,promotion,anddemotionoflinkedfriendswithinstrataoftrust(setting,increasing,anddecreasingthesecuritylevel),butchoosingauser'sfriendstobeginwithisthemorefundamentaldecision.
Table2andTable3reportthedistributionofinter-vertexdistancesinthefriendsnetworkfortwosubnetworksinducedbylimitingthemaximumnumberofnodes.
DistancedFrequency(=d)Cumulative(≤d)1620462042107307113511369896183407459926243333534002467336255246988716247004812470059001000∞9731256735Table2.
Numberofcandidateedgesforthe1000-nodeLiveJournalgraph.
DistancedFrequency(=d)Cumulative(≤d)1194101941023705683899783403075793053452037313134265123747143717361845314556267265714582838339145862292914586511001458651∞1745341633185Table3.
Numberofcandidateedgesforthe4000-nodeLiveJournalgraph.
4.
Results4.
1Preliminaryexperiment:941-nodeversionInapreliminaryexperiment,weconstructeda941-nodesubgraph,definingtheconceptIsFriendOfandtrainedthreetypesofinducerswith:1.
allattributes2.
allgraphattributesexcludingtheforwardandbackwarddistances3.
thebackwarddistancesalone4.
thebackwardandforwarddistancesalone5.
interest-relatedattributesalone.
Table4andTable5showtheresultsforthreeinducers:theJ48decisiontreeinducer,Holte's1Rinducer(asingle-ruleclassifierbasedonasingleattribute)[Ho93],andtheLogisticregressioninducer.
Allaccuracymeasureswerecollectedover10-foldcross-validatedruns.
TheJ48outputwthallfeaturesachievesasignificantboostoverthenexthighest(distanceonly).
InducerAllNoDistBkDistDistInterestJ4898.
294.
895.
897.
688.
5OneR95.
892.
095.
895.
888.
5Logistic91.
690.
988.
388.
988.
4Table4.
Percentaccuracyforpredictingallclassesusingthe941-nodegraph.
InducerAllNoDistBkDistDistInterestJ4889.
565.
767.
783.
05.
4OneR67.
741.
167.
767.
74.
5Logistic38.
333.
304.
54.
5Table5.
Precision(truepositivestoallpositives)usingthe941-nodegraph.
4.
2ExperimentsonrestrictedgraphsWedevelopedanapplication,ljclipper,torestricttheoverallfriendsgraphtothatinducedbyasubsetofnodesoffixednumber,foundusingbreadth-firstsearchstartingfromagivenseed.
Usinga4000-nodesubgraphsummarizedinTable3,wegenerated1633185candidateedges.
Notethatallforwarddistancesaregreaterthan1:whenuandvareactuallyconnected,weerase(u,v).
Inpreliminaryexperiments,wethencomputedthelengthoftheshortestalternativepath.
Thisis,however,alessscalableapproach,becausetheasymptoticrunningtimeisdominatedbythesuperlineartimerequiredtocomputeThecompletelistingofalltwelvefeaturesisgiveninSection2.
2.
Thenumericaltypesofallofthenetworkfeatures–boththeonesdescribingthegraphandthosemeasuringandinterestsandratios–makesdatasetamenabletologisticregression.
InducerAccuracyPrecisionRecallJ4899.
997.
596.
1OneR99.
691.
791.
8Table6.
Percentaccuracy,precisionandrecallusinga1000-nodegraph(10-foldCV).
InducerAccuracyPrecisionRecallJ4899.
895.
892.
0OneR99.
791.
189.
9Table7.
Percentaccuracy,precisionandrecallusinga2000-nodegraph(10-foldCV).
InducerAccuracyPrecisionRecallJ4899.
894.
588.
3OneR99.
788.
284.
3Table8.
Percentaccuracy,precisionandrecallusinga4000-nodegraph(10-foldCV).
Table6throughTable8showtheaccuracy,precision,andrecallforthe1000,2000,and4000-nodefriendsgraphs.
Trendsofhigherprecisionthanrecall,anddiminishingprecisionandrecallasthenetworkgrowslarger,areobserved.
Thesetrendsaresustainedforsubsamplesofsize10000andsize100000,thoughprecisionandrecallalsodiminishslightlywithsampling.
4.
3DataacquisitionandlargerexperimentsThecrawlerhasbeenimprovedwithseveralservice-specificoptimizationsforfetchinguserinfopages.
PresentlythesedonotuseLiveJournal'sBMLfeedofuserdata,whichisincompleteforourpurposes(thatis,notallgroundattributesinourinitialrelationsareprovided).
Atpresstime,thiscrawlerprocessesabout20000userrecordsperhourandthuswouldrequireoveraweektocrawlLiveJournal.
ThecurrentbottleneckistheΘ(|V|(|V|+|E|))stepdescribedinSection3.
3.
Thisisthedominantterm,becausetheconstantkdenotingthenumberofcandidateedgesisusuallymuchsmallerthann,e.
g.
,100-1000,sothatΘ(k(|V|+|E|))isnotonlyinΘ(|V|+|E|),butactuallyjustafewhundredtimesthecostofasingleBFS.
4.
4InterpretationUsingmutualinterestsalone,evenwithnormalizationbasedonthenumberofinterestsinuandv,resultsinverypoorpredictionaccuracyusingallinducerswithwhichweexperimented.
Intermediateresultsareachievedusingmutualfriendscountanddegree(NoDist:65.
7%onpredictingedges)andusingforwarddeleteddistanceandbackwarddistance(Dist:67.
7%).
Usingall12computedgraphandannotationfeaturesresultedinthehighestprecision(All:89.
5%)andaccuracy(All:98.
2%).
WenotethatLiveJournalonceusedavariantofnormalizedmutualintereststoproducealistofpotentialfriends,arrangedindecreasingorderofmatchquality.
AlthoughthiswasnotthesametypeofrecommendersystemasLJMinersupports,itshowsthatthestateoftheartusermatchingsystemshavealotofroomforimprovement.
TheresultsindicatethatfeaturesproducedbyLJMiner,usedwithagoodinducer,cangeneratecollaborativeandstructuralrecommendations.
5.
ContinuingWorkScalingup:Ourcurrentresearchfocusesonscalinguptotensofthousandsandeventuallymillionsofusers.
Crawlingover11-12millionrecordsisatleasttechnicallyfeasible,butscalingupthegraphanalyzersisachallengethatmaybestbemetwithheuristicsearch.
Learningrelationalmodels:Apromisingareaofresearchistherecoveryofrelationalgraphicalmodels,includingclass-level(membershipandreferenceslot)uncertainty.
[GFKT02]LJMinerhasyieldedareadysourceofsemistructureddataforbothstructurelearninganddistributionlearning.
Anotherpotentiallyusefulapproachistoorganizeusersandcommunitiesintoclustersusingthisrelationalmodel.
Wehavedevelopedschemasforblogposts(entries,threads,comments)andforusersanddynamicgroupsofusers.
Thisisrelatedtopreviouspreliminaryworkonrelationaldataminingforpersonalizationofwebportals,especiallycomputationalgridportals.
[HBJ03].
Muchoftherelationalmetadatainthebioinformaticsdomaincomesfromdescriptionlanguagesforworkflowsandworkflowcomponents[Hs04].
Thenextstepinourexperimentalplanistouseschemassuchasourdetailedonesforblogseviceusersandbioinformaticsinformationandcomputationalgridusers[Hs05]tolearnaricherpredictivemodel.
Finally,modelingrelationaldataasitpersistsorchangesacrosstimeisanimportantchallenge.
AcknowledgementsWethankToddEastonandKirstenHildrumforhelpfuldiscussionsconcerningalgorithmsandtheLiveJournaldatamodel.
WealsothankAndrewKingandTejaswiPydimarriforcontributionstotheoriginalLJMinersystemandVikasBahirwaniforcontributionstothesecondversion.
References[BG04]I.
Bhattacharya&L.
Getoor.
Deduplicationandgroupdetectionusinglinks.
InProceedingsoftheACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD)WorkshoponLinkAnalysisandGroupDetection(LinkKDD2004),Seattle,WA,USA,August22-25,2004.
[CLRS02]T.
H.
Cormen,C.
E.
Leiserson,R.
L.
Rivest,&C.
Stein.
IntroductiontoAlgorithms,SecondEdition.
Cambridge,MA:MITPress,2002.
[GD05]L.
Getoor&C.
P.
Diehl.
Linkmining:asurvey.
SIGKDDExplorations,SpecialIssueonLinkMining,7(2):3-12.
[GFKT02]L.
Getoor,N.
Friedman,D.
Koller,&B.
Taskar.
LearningProbabilisticModelsofLinkStructure.
JournalofMachineLearningResearch,2002.
[HBJ03]W.
H.
Hsu,P.
Boddhireddy,&R.
Joehanes.
Usingprobabilisticrelationalmodelsforcollaborativefiltering.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponStatisticalLearningofRelationalModels(SRL),Acapulco,MEXICO,August,2003.
[Hi03]S.
Hill.
SocialnetworkrelationalvectorsforanonymousidentitymatchingInProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponStatisticalLearningofRelationalModels(SRL),Acapulco,MEXICO,August,2003.
[Ho93]R.
C.
Holte.
VerySimpleClassificationRulesPerformWellonMostCommonlyUsedDatasets.
MachineLearning,11(1):63-90.
[Hs04]W.
H.
Hsu.
Relationalgraphicalmodelsofcomputationalworkflowsfordatamining.
InProceedingsoftheInternationalConferenceonSemanticsofaNetworkedWorld:SemanticsforGridDatabases(ICSNW-2004),p.
309-310,Paris,FRANCE,June,2004.
[Hs05]W.
H.
Hsu.
Relationalgraphicalmodelsforcollaborativefilteringandrecommendationofcomputationalworkflowcomponents.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponMulti-AgentInformationRetrievalandRecommenderSystems,Edinburgh,UK,July31,2005.
[HKP+06]W.
H.
Hsu,A.
King,M.
S.
R.
Paradesi,T.
Pydimarri,&T.
Weninger.
CollaborativeandStructuralRecommendationofFriendsusingWeblog-basedSocialNetworkAnalysis.
InProceedingsofthe2006AAAISpringSymposiumonComputatationalApproachestoAnalyzingWeblogs(CAAW2006).
[KHC05]N.
S.
Ketkar,L.
B.
Holder,&D.
J.
Cook.
Comparisonofgraph-basedandlogic-basedmulti-relationaldatamining.
SIGKDDExplorations,SpecialIssueonLinkMining,7(2):64-71.
[Ko01]D.
Koller.
Representation,ReasoningandLearning.
IJCAIComputersandThoughtAwardLecture,2001.
[MCW05]A.
McCallum,A.
Corrada-Emmanuel,&X.
Wang.
Topicandrolediscoveryinsocialnetworks.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI),Edinburgh,UK,August,2005.
[MH04]M.
Mukherjee&L.
B.
Holder.
Graph-baseddataminingonsocialnetworks.
InProceedingsoftheACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD)WorkshoponLinkAnalysisandGroupDetection(LinkKDD2004),Seattle,WA,USA,August22-25,2004.
[PU03]A.
Popescul&L.
H.
Ungar.
Statisticalrelationallearningforlinkprediction.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponStatisticalLearningofRelationalModels(SRL),Acapulco,MEXICO,August,2003.
[RDHT04]J.
Resig,S.
Dawara,C.
M.
Homan,&A.
Teredesai.
Extractingsocialnetworksfrominstantmessagingpopulations.
InProceedingsoftheACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD)WorkshoponLinkAnalysisandGroupDetection(LinkKDD2004),Seattle,WA,USA,August22-25,2004.
[SM05]P.
Sarkar&A.
Moore.
Dynamicsocialnetworkanalysisusinglatentspacemodels.
SIGKDDExplorations,SpecialIssueonLinkMining,7(2):31-40.
 
		  
		  
		      
			  
		  
			  			   
			      
			        
			          
			          RackNerd今天补货了3款便宜vps,最便宜的仅$9.49/年, 硬盘是SSD RAID-10 Storage,共享G口带宽,最低配给的流量也有2T,注意,这3款补货的便宜vps是intel平台。官方网站便宜VPS套餐机型均为KVM虚拟,SolusVM Control Panel ,硬盘是SSD RAID-10 Storage,共享G口带宽,大流量。CPU:1核心内存:768 MB硬盘:12 ...
			         
			       
				  
			     
							   
			      
			        
			          
			          数脉科技(shuhost)8月促销:香港独立服务器,自营BGP、CN2+BGP、阿里云线路,新客立减400港币/月,老用户按照优惠码减免!香港服务器带宽可选10Mbps、30Mbps、50Mbps、100Mbps带宽,支持中文本Windows、Linux等系统。官方网站:https://www.shuhost.com* 更大带宽可在选购时选择同样享受优惠。* 目前仅提供HKBGP、阿里云产品,香港...
			         
			       
				  
			     
							   
			      
			        
			          
			          webhosting24决定从7月1日开始对日本机房的VPS进行NVMe和流量大升级,几乎是翻倍了硬盘和流量,当然前提是价格依旧不变。目前来看,国内过去走的是NTT直连,服务器托管机房应该是CDN77*(也就是datapacket.com),加上高性能平台(AMD Ryzen 9 3900X+NVMe),这样的日本VPS还是有相当大的性价比的。官方网站:https://www.webhosting...
			         
			       
				  
			     
							
			   
			   
graphsearch为你推荐
	朗科ios5请务必阅读正文之后的免责条款部分支持ipad步骤iosnetbios端口如何组织netbios端口的外部通信勒索病毒win7补丁怎么删除 防勒索病毒 打的补丁iexplore.exe应用程序错误iexplore.exe应用程序错误win10445端口WIN7怎么打开3306端口win7telnetwindows7的TELNET服务在哪里开启啊ms17-010win10蒙林北冬虫夏草酒·10年原浆1*6 500ml 176,176是一瓶的价格还是一箱的价格
域名注册信息查询 域名注册中心 最新代理服务器地址 lnmp 腾讯云盘 themeforest 海外服务器 香港机房托管 国外空间服务商 网络星期一 网站保姆 12u机柜尺寸 商家促销 ca4249 193邮箱 中国电信测速器 cxz 成都主机托管 阿里云邮箱怎么注册 免费赚q币 更多