pandemic403forbidden

403forbidden  时间:2021-04-12  阅读:()
LosingMyRevolutionHowManyResourcesSharedonSocialMediaHaveBeenLostHanyM.
SalahEldeenandMichaelL.
NelsonOldDominionUniversity,DepartmentofComputerScienceNorfolkVA,23529,USA{hany,mln}@cs.
odu.
eduAbstract.
Socialmediacontenthasgrownexponentiallyintherecentyearsandtheroleofsocialmediahasevolvedfromjustnarratinglifeeventstoactuallyshapingthem.
Inthispaperweexplorehowmanyresourcessharedinsocialmediaarestillavailableontheliveweborinpublicwebarchives.
Byanalyzingsixdierentevent-centricdatasetsofresourcessharedinsocialmediaintheperiodfromJune2009toMarch2012,wefoundabout11%lostand20%archivedafterjustayearandanaverageof27%lostand41%archivedaftertwoandahalfyears.
Furthermore,wefoundanearlylinearrelationshipbetweentimeofsharingoftheresourceandthepercentagelost,withaslightlylesslinearrelationshipbetweentimeofsharingandarchivingcoverageoftheresource.
Fromthismodelweconcludethataftertherstyearofpublishing,nearly11%ofsharedresourceswillbelostandafterthatwewillcontinuetolose0.
02%perday.
Keywords:WebArchiving,SocialMedia,DigitalPreservation1IntroductionWithmorethan845millionFacebookusersattheendof2011[5]andover140milliontweetssentdailyin2011[16]userscantakephotos,videos,posttheiropinions,andreportincidentsastheyhappen.
Manyofthepostsandtweetsareaboutquotidianeventsandtheirpreservationisdebatable.
However,someofthepostsandeventsareaboutculturallyimportanteventswhosepreservationislesscontroversial.
Inthispaperweshedlightontheimportanceofarchivingsocialmediacontentabouttheseeventsandestimatehowmuchofthiscontentisarchived,stillavailable,orlostwithnopossibilityofrecovery.
Toemphasizetheculturallyimportantcommentaryandsharing,wecol-lecteddataaboutsixeventsinthetimeperiodofJune2009toMarch2012:theH1N1virusoutbreak,MichaelJackson'sdeath,theIranianelectionsandprotests,BarackObama'sNobelPeacePrize,theEgyptianrevolution,andtheSyrianuprising.
arXiv:1209.
3026v1[cs.
DL]13Sep20122HanyM.
SalahEldeenandMichaelL.
Nelson2RelatedWorkToourknowledge,nopriorstudyhasanalyzedtheamountofsharedresourcesinsocialmedialostthroughtime.
Therehavebeenmanystudiesanalyzingthebehaviorofuserswithinasocialnetwork,howtheyinteract,andwhatcontenttheyshare[3,19,20,23].
AsforTwitter,Kwaketal.
[6]studieditsnatureanditstopologicalcharacteristicsandfoundadeviationfromknowncharacteristicsofhumansocialnetworksthatwereanalyzedbyNewmanandPark[10].
Leeanalyzedthereasonsbehindsharingnewsinsocialmediaandfoundthatinfor-mativenesswasthestrongestmotivationinpredictingnewssharingintention,followedbysocializingandstatusseeking[4].
AlsosharedcontentinsocialmedialikeTwittermoveanddiuserelativelyfastasstatedbyYangetal.
[22].
Furthermore,manyconcernswereraisedaboutthepersistenceofsharedresourcesandwebcontentingeneral.
NelsonandAllenstudiedthepersistenceofobjectsinadigitallibraryandfoundthat,withjustoverayear,3%ofthesampletheycollectedhaveappearedtonolongerbeavailable[9].
Sandersonetal.
analyzedthepersistenceandavailabilityofwebresourcesreferencedfrompapersinscholarlyrepositoriesusingMementoandfoundthat28%oftheseresourceshavebeenlost[14].
Memento[17]isacollectionofHTTPextensionsthatenablesuniform,inter-archiveaccess.
Ainsworthetal.
[1]examinedhowmuchofthewebisarchivedandfounditrangesfrom16%to79%,dependingonthestartingseedURIs.
McCownetal.
examinedthefactorsaectingreconstructingwebsites(usingcachesandarchives)andfoundthatPageRank,Age,andthenumberofhopsfromthetop-levelofthesiteweremostinuential[8].
3DataGatheringWecompiledalistofURIsthatweresharedinsocialmediaandcorrespondtospecicculturallyimportantevents.
Inthissectionwedescribethedataacqui-sitionandsamplingprocessweperformedtoextractsixdierentdatasetswhichwillbetestedandanalyzedinthefollowingsections.
3.
1StanfordSNAPProjectDatasetTheStanfordLargeNetworkDatasetisacollectionofabout50largenetworkdatasetshavingmillionsofnodes,edgesandtuples.
ItwascollectedasapartoftheStanfordNetworkAnalysisPlatform(SNAP)project[15].
Itincludessocialnetworks,webgraphs,roadnetworks,Internetnetworks,citationnetworks,collaborationnetworks,andcommunicationnetworks.
Forthepurposeofourinvestigation,weselectedtheirTwitterpostsdataset.
ThisdatasetwascollectedfromJune1st,2009toDecember31st,2009andcontainsnearly476milliontweetspostedbynearly17millionusers.
Thedatasetisestimatedtocover20%-30%ofallpostspublishedonTwitterduringthattimeframe[21].
ToselectwhichLosingMyRevolution3eventswillbecoveredinthisstudy,weexaminedCNN's2009eventstimeline1.
Wewantedtoselectasmallnumberofeventsthatwerediverse,withlimitedoverlap,andrelativelyimportanttoalargenumberofpeople.
Giventhat,weselectedfourevents:theH1N1virusoutbreak,theIranianprotestsandelections,MichaelJackson'sdeath,andBarrackObama'sNobelPeacePrizeaward.
Preparation:Atweetistypicallycomposedoftext,hashtags,embeddedre-sourcesorURIsandusertagsallspanningamaximumof140characters.
HereisanexampleofatweetrecordintheSNAPdataset:T2009-07-3123:57:18Uhttp://Twitter.
com/nickgotchWRT@rockingjude:December21,2009DepopulationbyFoodWillBeginhttp://is.
gd/1WMZbWHOA.
.
BETTERWATCHRTplz#pwa#tcotThelinestartingwiththeletterTindicatesthedateandtimeofthetweetcreation.
WhilethelinestartingwithUshowsalinktotheuserwhoau-thoredthisparticulartweet.
Finally,thelinestartingwithWshowstheen-tiretweetincludingalltheuser-references"@rockingjude",theembeddedURIs"http://is.
gd/1WMZb",andhashtags"#pwa#tcot".
TagExpansion:Wewantedtoselecttweetsthatwecansaywithhighcon-denceareaboutaselectedevent.
Inthiscase,precisionismoreimportantthanrecallascollectingeverysingletweetpublishedaboutacertaineventislessimportantthanmakingsurethattheselectedtweetsaredenitelyaboutthatevent.
Severalstudiesfocusedonestimatingtheaboutnessofacertainwebpageoraresourceingeneral[12,18].
FortunatelyinTwitter,hashtagsincorporatedwithinatweetcanhelpusestimatetheir"aboutness".
Usersnormallyaddcer-tainhashtagstotheirtweetstoeasethesearchanddiscoverabilityinfollowingacertaintopic.
Thesehashtagswillbeutilizedintheevent-centricltrationprocess.
Foreachevent,weselectedinitialtagsthatdescribeit(Table1).
Thoseinitialtagswerederivedempiricallyafterexaminingsomeevent-relatedtweets.
Nextweextractedallthehashtagsthatco-occurredwithourinitialsetofhashtags.
Forexample,inclassH1N1weextractedalltheotherhashtagsthatappearedalongwith#h1n1withinthesametweetandkeptcountoftheirfrequency.
Thoseextractedhashtagsweresortedindescendingorderofthefrequencyoftheirappearanceintweets.
Weremovedallthegeneralscopetagslike#cnn,#health,#death,#warandothers.
Inregardstoaboutness,removinggeneraltagswillindeeddecreaserecallbutwillincreaseprecision.
Finallywepickedthetop8-10hashtagstorepresentthisevent-classandbeutilizedintheltrationprocess.
Table1showsthenalsetoftagsselectedforeachclass.
TweetFiltration:Inthepreviousstepweextractedthetagsthatwillhelpusclassifyandltertweetsinthedatasetaccordingtoeachevent.
Thisltration1http://www.
cnn.
com/2009/US/12/16/year.
timeline/index.
html4HanyM.
SalahEldeenandMichaelL.
NelsonEventInitialHashtagsTopCo-occurringHashtagsH1N1'h1n1''swine'=61,829'swineflu'=56,419'flu'=8,436Outbreak=61,351'pandemic'=6,839'influenza'=1,725'grippe'=1,559'tamiflu'=331M.
Jackson's'michaeljackson''michael'=27,075'mj'=18,584'thisisit'8,770'rip'=3,559'jacko'=3,325Death=22,934'kingofpop'=2,888'jackson'=2,559'thriller'=1,357'thankyoumichael'=1,050Iranian'iranelection''iran'949,641'gr88'=197,113'tehran'=109,006'freeiran'=13,378Elections=911,808'neda'=191,067'mousavi'=16,587'united4iran'=9,198'iranrevolution'=7,295Obama's'obama'=48,161&'nobel'=2,261'obamanobel'=14'nobelprize''nobelpeace'=113NobelPrize'peace'=3,721'barack'=1292'nobelpeaceprize'=107Table1.
Twitterhashtagsgeneratedforlteringandtheirfrequencyofoccurringprocessaimstoextractareasonablesizeddatasetoftweetsforeacheventandtominimizetheinter-eventoverlap.
Sincethelifeandpersistenceofthetweetitselfisnotthefocusofthisstudybutrathertheassociatedresourcethatappearsinthetweet(image,video,shortenedURIorotherembeddedresource),wewillextractonlythetweetsthatcontainanembeddedresource.
Thisstepresultedin181milliontweetswithembeddedresources(http://is.
gd/1WMZbinthepriorexample).
ThesetweetswerefurtherlteredtokeeponlythetweetsthathaveatleastoneoftheexpandedtagsobtainedfromTable1.
Thenumberoftweetsafterthisphasereached1.
1milliontweets.
Filteringthetweetsbasedontheoccurrenceofatleastoneofthehashtagsonlyisundesirableasitwillcausetwoproblems:First,itwillintroducepossibleeventoverlapduetogeneraltweetstalkingabouttwoormoretopics.
Second,isthatusingonlythesingleoccurrenceofthesetagswillyieldahugeamountoftweetsandweneedtoreducethissizetoreachamoremanageablesize.
In-tuitivelyspeaking,stronglyrelatedhashtagswillco-occuroften.
Forexample,atweetthathas#h1n1alongwith#swineuand#pandemicismostlikelyabouttheH1N1outbreakratherthanatweethavingjustthetag#uorjust#sick.
Filteringwiththisco-occurrencewillinturnsolvebothproblemsasbyincreasingrelevancetoaparticularevent,generaltweetsthattalkaboutseveraleventswillbelteredoutthusdiminishingtheoverlap,andinturnitwillreducethesizeofthedataset.
Next,weincreasetheprecisionofthetweetsassociatedwitheacheventfromthesetof1.
1milliontweets.
Intherstiterationweselectedthetagthathadthehighestfrequencyofco-occurrenceinthedatasetwiththeinitialtagandaddedittoasetwewillcalltheselectionset.
Afterthatwechecktheco-occurrenceofalltheremainingextractedtagswiththetagintheselectionsetandrecordthefrequenciesofco-occurrence.
Aftersortingthefrequenciesofco-occurrencewiththetagfromtheselectionset,wepickthehighestonetokeepaddittotheselectionset.
Werepeatthisstepofcountingco-occurrencesbutwithallthepreviouslyextractedhashtagsintheselectionsetfrompreviousiterations.
Toelaborate,forH1N1assumethatthehastag'#h1n1'hadthehighestfrequencyofappearanceinthedatasetsoweaddittotheselectionset.
IntheLosingMyRevolution5nextiterationwerecordthehowmanytimeseachtaginthelistappearedalongwith'#h1n1'inasametweet.
Ifweselected'#swine'astheonewiththehighestfrequencyofoccurrencewiththeinitialtag'#h1n1'weaddittotheselectionlistandinthenextiterationwerecordthefrequencyofoccurrenceoftheremaininghashtagswithbothoftheextractedtags'#h1n1'and'#swine'.
Werepeatthisstep,foreachevent,tothepointwherewehaveamanageablesizedatasetwhichwearecondentinits'aboutness'inrelationtotheevent.
EventHashtagsselectedforlterationTweetsExtractedOperationPerformedFinalTweetsMJmichael27,075michael&michaeljackson22,934Sample10%2,293Iraniran949,641iran&iranelection911,808iran&iranelection&gr88189,757iran&iranelection&gr88&neda91,815iran&iranelection&gr88&neda&tehran34,294Sample10%3,429H1N1h1n161,351h1n1&swine44,972h1n1&swine&swineflu42,574h1n1&swine&swineflu&pandemic5,517TakeAll5,517Obamaobama48,161obama&nobel1,118TakeAll1,118Table2.
TweetFiltrationiterationsandnaltweetcollectionsTwoproblemsappearedfromthisapproachwiththeIranandMichaelJack-sondatasets.
IntheIrandatasetthenumberoftweetswasinhundredsofthou-sandsandevenwith5tagsco-occurrenceitwasstillabout34K+tweets.
Tosolvethisweperformedarandomsamplingfromthoseresultingtweetstotakeonly10%ofthemresultinginasmallermanageabledataset.
ThesecondproblemwiththeMichaelJacksondatasetuponusing5tagstodecreaseittoamanage-ablesizewerealizedtherewerefewuniquedomainsfortheembeddedresources.
Acloserlookrevealedthiscombinationoftagswasmostlyborder-linetweetspam(MJringtones).
Tosolvethisweusedonlythetwotoptags"#michael"and"#michaeljackson",andthenwerandomlysampled10%oftheresultingtweetstoreachthedesireddatasetsize(Table2).
3.
2EgyptianRevolutionDatasetTheoneyearanniversaryofthiseventwastheoriginalmotivationforthisstudy[13].
Inthiscase,westartedwithaneventandthentriedtogetso-cialmediacontentdescribingit.
Despiteitsubiquity,gatheringsocialmediaforapasteventissurprisinglyhard.
WepickedtheEgyptianrevolutionduetotheroleofthesocialmediaincuratinganddrivingtheincidentsthatledtotheresignationofthepresident.
SeveralinitiativeswerecommencedtocollectandcuratethesocialmediacontentduringtherevolutionlikeR-sheif.
org2whichspecializesinsocialcontentanalysisoftheissuesintheArabworldbyusingaggregatedatafromTwitterandtheWeb.
WearecurrentlyintheprocessofobtainingthemillionsofrecordsrelatedtotheArabSpringof2011.
Meanwhile,wedecidedtobuildourowndatasetmanually.
2http://www.
r-shief.
org/6HanyM.
SalahEldeenandMichaelL.
NelsonThereareseveralsitesthatcurateresourcesabouttheEgyptianRevolutionandwewanttoinvestigateasmanyofthemaspossible.
Atthesametime,weneedtodiversifyourresourcesandthetypesofdigitalartifactsthatareembeddedinthem.
Tweets,videos,images,embeddedlinks,entirewebpagesandbookswereincludedinourinvestigation.
Forthesakeofconsistency,welimitedouranalysistoresourcescreatedwithintheperiodfromthe20thofJanuary2011tothe1stofMarch2011.
Inthenextsubsectionsweexplaineachoftheresourcesweutilizedinourdataacquisitionindetail.
Storify:StorifyisawebsitethatenablesuserstocreatestoriesbycreatingcollectionsofURIs(e.
g.
,Tweets,images,videos,links)andarrangethemtem-porally.
Theseentriesarepostedbyreferencetotheirhostwebsites.
Thus,addingcontenttoStorifydoesnotnecessarilymeanitisarchived.
IfauseraddedavideofromYouTubeandafterawhilethepublisherofthatvideodecidedtoremoveitfromYouTubetheuserisleftwithagapintheirStorifyentry.
ForthispurposewegatheredalltheStorifyentriesthatwerecreatedbetween20thofJanuary2011andthe1stofMarch2011,resultingin219uniqueresources.
IAmJan25:Someentirewebsiteswerededicatedasacollectionhubofmediatocuratetherevolution.
Basedonpubliccontributions,thosewebsitescollectdierenttypesofmedia,classifythem,orderthemchronologicallyandpublishthemtothepublic.
WepickedawebsitenamedIAmJan25.
com,asanexampleofthesewebsites,toanalyzeandinvestigate.
Theadministratorsofthewebsitereceivedselectedvideosandimagesfornotableeventsandactionsthathappenedduringtherevolution.
Thoseimagesandvideoswereselectedbyusersastheyvouchedforthemtobeofsomeimportanceandtheysendtheresource'sURItothewebsiteadministrators.
Thewebsiteitselfisdividedintotwocollections:avideocollectionandanimagecollection.
Thevideocollectionhad2387uniqueURIswhiletheimagecollectionhad3525uniqueURIs.
TweetsFromTahrir:Severalbookswerepublishedin2011documentingtherevolutionandtheArabSpring.
TobridgethegapbetweenbooksanddigitalmediaweanalyzedabookentitledTweetsfromTahrir[11]whichwaspub-lishedonApril21st,2011.
Asthenamestates,thisbooktellsastoryformedbytweetsofpeopleduringtherevolutionandtheclasheswiththepastregime.
Weanalyzedthisbookasacollectionoftweetsthathadtheluxuryofapaperbackpreservationandfocusedonthetweetedmedia,inthiscaseimages.
Thebookhadatotalof1118tweetshaving23uniqueimages.
3.
3SyriaDatasetThisdatasethasbeenselectedtorepresentacurrent(March2012)event.
UsingtheTwittersearchAPI,wefollowedthesamepatternofdataacquisitionasinsection3.
1.
Westartedwithonehashtag,#Syria,andexpandedit.
Table3LosingMyRevolution7showthetagsproducedfromthetagexpansionstep.
AfterthateachofthosetagswereinputintoaprocessutilizingtheTwitterstreamingAPIandproducedtherst1000resultsmatchingeachtag.
Fromthisset,werandomlysampled10%.
Asaresult,1955tweetswereextractedeachhavingoneormoreembeddedresourcesandtagsfromtheexpandedtagsinTable3.
InitialHashtagsExtractedHashtags'Syria''Bashar''RiseDamascus''GenocideInSyria''STOPASSAD2012''AssadCrimes''Assad'Table3.
Twitter#TagsgeneratedforlteringtheSyrianuprisingTable4showstheresourcescollectedalongwiththetopleveldomainsthatthoseresourcesbelongtoforeachevent.
EventTopDomains(numberofresourcesfound)MJyoutube(110),twitpic(45),latimes(43),cnn(30),amazon(30)Iranyoutube(385),twitpic(36),blogspot(30),roozonline(29)H1N1rhizalabs(676),reuters(17),google(16),utrackers(16),calgaryherald(11)Obamablogspot(16),nytimes(15),wordpress(12),youtube(11),cnn(10)Egyptyoutube(2414),cloudfront(2303),yfrog(1255),twitpic(114),imageshack.
us(20)Syriayoutube(130),twitter(61),hostpic.
biz(9),telegraph.
co.
uk(5)Table4.
Thetopleveldomainsfoundforeacheventordereddescendinglybythenumberofresources.
4UniquenessandExistenceFromthepreviousdatagatheringstepweobtainedsixdierentdatasetsrelatedtosixdierenthistoricevents.
ForeacheventweextractedalistofURIsthatweresharedintweetsoruploadedtositeslikeStorifyorIAmJan25.
ToanswerthequestionofhowmuchofthesocialmediacontentismissingwetestthoseURIsforeachdatasettoeliminateURIaliasesinwhichseveralURIsidentifytothesameresource.
UponobtainingthoseuniqueURIsweexaminehowmanyofwhicharestillavailableonthelivewebandhowmanyareavailableinpublicwebarchives.
4.
1UniquenessSomeURIs,especiallythosethatappearinTwitter,maybealiasesforthesameresource.
Forexample"http://bit.
ly/2EEjBl"and"http://goo.
gl/2ViC"bothresolveto"http://www.
cnn.
com".
Tosolvethis,weresolvedalltheURIsfollowingredirectstothenalURI.
TheHTTPresponseofthelastredirecthasaeldcalledlocationthatcontainstheoriginallongURIoftheresource.
ThisstepreducedthetotalnumberofURIsinthesixdatasetsfrom21,625to11,051.
Table5showsthenumberofuniqueresourcesineverydataset.
4.
2ExistenceontheLive-WebAfterobtainingtheuniqueURIsfromthepreviousstepweresolveallofthemandclassifythemasSuccessorFailure.
TheSuccessclassincludesalltheresources8HanyM.
SalahEldeenandMichaelL.
NelsonAllUnique2,2931,187=51.
77%MJArchivedNotArchivedAvailable316=26.
62%474=39.
93%Missing90=7.
58%307=25.
86%397=33.
45%406=34.
20%each/1,187AllUnique3,4291,340=39.
08%IranArchivedNotArchivedAvailable415=30.
97%586=43.
73%Missing101=7.
54%238=17.
76%339=25.
30%516=38.
51%each/1,340AllUnique5,5171,645=29.
82%H1N1ArchivedNotArchivedAvailable595=36.
17%656=39.
88%Missing98=5.
96%296=17.
99%394=23.
95%693=42.
12%each/1,645AllUnique1,118370=33.
09%ObamaArchivedNotArchivedAvailable143=38.
65%135=36.
49%Missing33=8.
92%59=15.
95%92=24.
86%176=47.
57%each/370AllUnique7,3136,154=84.
15%EgyptArchivedNotArchivedAvailable1,069=17.
37%4440=72.
15%Missing173=2.
81%472=7.
67%645=10.
48%1242=20.
18%each/6,154AllUnique1,955355=18.
16%SyriaArchivedNotArchivedAvailable19=5.
35%311=87.
61%Missing0=0%25=7.
04%25=7.
04%19=5.
35%each/355Table5.
Percentagesofuniqueresourcesfromalltheextractedonesweobtainedpereventandthepercentagesofpresenceofthoseuniqueresourcesonlivewebandinarchives.
Allresources=21,625,Uniqueresources=11,051thatultimatelyreturna"200OK"HTTPresponse.
TheFailureclassincludesalltheresourcesthatreturna"4XX"familyresponselike:"404NotFound","403Forbidden"and"410Gone",the"30X"redirectfamilywhilehavinginniteloopredirects,andservererrorswithresponse"50X".
Toavoidtransienterrorswerepeatedtherequests,onalldatasets,severaltimesforaweektoresolvethoseerrors.
Wealsotestfor"Soft404s",whicharepagesthatreturn"200OK"responsecodebutarenotarepresentationoftheresource,usingatechniquebasedonaheuristicforautomaticallydiscoveringsoft404sfromBar-Yossefetal.
[2].
Wealsoincludenoresponsefromtheserver,aswellasDNStimeouts,asfailures.
Notethatfailuremeansthatthisresourceismissingontheliveweb.
Table5summarizes,foreachdataset,thetotalpercentagesoftheresourcesmissingfromthelivewebandthenumberofmissingresourcesdividedbythetotalnumberofuniqueresources.
4.
3ExistenceintheArchivesInthepreviousstepwetestedtheexistenceoftheuniquelistofURIsforeacheventontheliveweb.
Next,weevaluatehowmanyURIshavebeenarchivedinpublicwebarchives.
TocheckthosearchivesweutilizetheMementoframe-work.
IfthereisamementofortheURI,wedownloaditsmementotimemapandanalyzeit.
Thetimemapisadatestamporderedlistofallknownarchivedver-sions(called"mementos")ofaURI.
Next,weparsethistimemapandextractLosingMyRevolution9thenumberofmementosthatpointtoversionsoftheresourceinthepublicarchives.
Wedeclaretheresourcetobearchivedifithasatleastonememento.
Thisstepwasalsorepeatedseveraltimestoavoidthetransientstatesofthearchivesbeforedeemingaresourceasunarchived.
TheresultsofthisexperimentalongwiththearchivecoveragepercentagearepresentedinTable5.
5ExistenceasaFunctionofTimeInspectingtheresultsfromthepreviousstepssuggeststhatthenumberofmiss-ingsharedresourcesinsocialmediacorrespondingtoaneventisdirectlypropor-tionalwithitsage.
Todeterminedatesforeachoftheeventsthisweextractedallthecreationdatesfromallthetweet-baseddatasetsandsortedthem.
Foreachevent,weplottedagraphillustratingthenumberoftweetsperdayrelatedtothateventasshowningure1.
Sincethedatasetisseparatedtemporallyinto3partitions,andinordertodisplayalltheeventsononegraphwereducedthesizeofthex-axisbyremovingthetimeperiodsnotcoveredinourstudy.
Fig.
1.
URIssharedperdaycorrespondingtoeacheventandshowingthetwopeaksinthenon-Syrianandnon-EgyptianeventsUponexaminingthegraphwefoundaninterestingphenomenainthenon-Syrianandnon-Egyptianevents:eacheventhastwopeaks.
Uponinvestigatinghistorytimelineswecametoconclusionthatthosepeaksreectasecondwaveofsocialmediainteractionasaresultofnewincidentwithinthesameeventafteraperiodoftime.
Forexample,intheH1N1dataset,therstpeakillustratestheworld-wideoutbreakannouncementwhilethesecondpeakdenotesthereleaseofthevaccine.
IntheIrandataset,therstpeakshowsthepeakoftheelectionswhilethesecondpeakpinpointstheIraniantrials.
AsfortheMJdatasettherstpeakcorrespondstohisdeathandthesecondpeakdescribestherumorsthatMichaelJacksondiedofunnaturalcausesandapossiblehomicide.
FortheObamadataset,therstpeakrevealstheannouncementofhiswinningtheprizewhilethesecondpeakpresentstheaward-givingceremonyinOslo.
FortheEgyptianevolution,theresourcesareallwithinasmalltimeslotof2weeks10HanyM.
SalahEldeenandMichaelL.
Nelsonaroundthedate11thofFebruary.
AsfortheSyrianevent,sincethecollectionwasveryrecenttherewasnoobviouspeaks.
Thosepeaksweexaminedwillbecometemporalcentroidsofthesocialcontentcollections(thedatasets).
MJ(June25th&July10th2009),Iran(June13th&1stAugust2009),H1N1(September11th&5thOctober2009),andObama(October9th&December10th2009).
Egyptwas(February11th2011)andtheSyriadatasetalsohadonecentroidonMarch27th2012.
Wespliteacheventaccordingtothetwocentroidsineacheventaccordingly.
Figure1showsthosepeaksandTable6showsthemissingcontentandthearchivedcontentpercentagescorrespondingtoeachcentroid.
MJIranH1N1ObamaEgyptSyria%Missing36.
24%31.
62%26.
98%24.
47%23.
49%25.
64%24.
59%26.
15%10.
48%7.
04%%Archived39.
45%30.
78%43.
08%36.
26%41.
65%43.
87%47.
87%46.
15%20.
18%5.
35%Table6.
TheSplitDatasetFig.
2.
Percentageofcontentmissingandarchivedfortheeventsasafunctionoftime.
Figure2showsthemissingandarchivedvaluesfromTable6asafunctionoftimesinceshared.
Equation1showsthemodeledestimateforthepercentageofsharedresourceslost,whereAgeisindays.
Whilethereisalesslinearrelationshipbetweentimeandbeingarchived,equation2showsthemodeledestimateforthepercentageofsharedresourcesarchivedinapublicarchive.
ContentLostPercentage=0.
02(Ageindays)+4.
20(1)ContentArchivedPercentage=0.
04(Ageindays)+6.
74(2)Giventheseobservationsandourcurvettingweestimatethatafterayearfrompublishingabout11%ofcontentsharedinsocialmediawillbegone.
Afterthispoint,wearelosingroughly0.
02%ofthiscontentperday.
LosingMyRevolution116ConclusionsandFutureworkWecanconcludethatthereisanearlylinearrelationshipbetweentimeofshar-inginthesocialmediaandthepercentagelost.
Althoughnotaslinear,thereisasimilarrelationshipbetweenthetimeofsharingandtheexpectedpercentageofcoverageinthearchives.
Toreachthisconclusion,weextractedcollectionsoftweetsandothersocialmediacontentthatwaspostedandsharedinrelationtosixdierenteventsthatoccurredinthetimeperiodfromJune2009toMarch2012.
Nextweextractedtheembeddedresourceswithinthissocialmediacontentandtestedtheirexistenceonthelivewebandinthearchives.
Afteranalyzingthepercentageslostandarchivedinrelationtotimeandplottingthemweusedalinearregressionmodeltotthosepoints.
Finallywepresentedtwolinearmodelsthatcanestimatetheexistenceofaresource,thatwaspostedorsharedatonepointoftimeinthesocialmedia,onthelivewebandinthearchivesasafunctionofageinthesocialmedia.
Inthenextstageofourresearchweneedtoexpandthedatasetsandimportothersimilardatasetsespeciallyintheuncoveredtemporalareas(e.
g.
,theyearof2010andbefore2009).
Examiningmoredatasetsacrossextendedpointsintimecouldenableustobettermodelthesetwofunctionsoftime.
Alsoseveralotherfactorsbesidetimewouldbeanalyzedtounderstandtheireectonpersistenceonthelivewebandarchivingcoveragelike:publishingvenue,rateofsharing,popularityofauthorsandthenatureoftherelatedevent.
7AcknowledgmentsThisworkwassupportedinpartbytheLibraryofCongressandNSFIIS-1009392.
References1.
Ainsworth,ScottG.
andAlsum,AhmedandSalahEldeen,HanyandWeigle,MicheleC.
andNelson,MichaelL.
:HowMuchoftheWebIsArchivedInProceedingsofthe11thannualinternationalACM/IEEEjointconferenceonDigitallibraries,JCDL'11,pages133-136,(2011).
2.
Bar-Yossef,ZivandBroder,AndreiZ.
andKumar,RaviandTomkins,Andrew.
:SicTransitGloriaTelae:TowardsanUnderstandingoftheWeb'sDecay.
InProceedingsofthe13thinternationalconferenceonWorldWideWeb,WWW'04,pages328-337,(2004).
3.
F.
Benevenut,T.
Rodrigues,M.
Cha,andV.
Almeida.
:CharacterizingUserBehav-iorinOnlineSocialNetworks.
InInProc.
ofACMSIGCOMMInternetMeasure-mentConference,SIGCOMM'09,pages49-62,(2009).
4.
Lee,CheiandMa,LongandGoh,Dion.
:WhyDoPeopleShareNewsinSocialMediaActiveMediaTechnology,SpringerBerlin/Heidelberg,pages129-140,Vol-ume:6890,(2011).
12HanyM.
SalahEldeenandMichaelL.
Nelson5.
Facebookocialfactsheet,http://newsroom.
fb.
com/content/default.
aspxNewsAreaId=226.
Kwak,HaewoonandLee,ChanghyunandPark,HosungandMoon,Sue.
:WhatisTwitter,aSocialNetworkoraNewsMediaInProceedingsofthe19thinternationalconferenceonWorldwideweb,WWW'10,pages591-600,(2010).
7.
GordonMohr,MicheleKimpton,MichealStackandIgorRanitovic.
:IntroductiontoHeritrix,anArchivalQualityWebCrawler.
In4thInternationalWebArchivingWorkshop,IWAW'04,(2004).
8.
FrankMcCownandNorouDiawaraandMichaelL.
Nelson.
:FactorsAectingWebsiteReconstructionfromtheWebInfrastructure.
InProceedingsofthe7thACM/IEEE-CSJointConferenceonDigitalLibraries,JCDL'07,pages39-48,(2007).
9.
MichaelL.
Nelson,B.
DanetteAllen.
:ObjectPersistenceandAvailabilityinDigitalLibraries.
D-LibMagazine,Volume8,Number1,January(2002)10.
M.
E.
J.
NewmanandJ.
Park.
:Whysocialnetworksaredierentfromothertypesofnetworks.
Phys.
Rev.
E,68(3):036122,September,(2003).
11.
AlexNunnsandNadiaIdle.
:TweetsFromTahrir.
ISBN-10:1935928457.
12.
T.
A.
PhelpsandR.
Wilensky.
:RobustHyperlinksCostJustFiveWordsEach.
TechnicalReport,UCB/CSD-00-1091,EECSDepartment,UniversityofCalifornia,Berkeley,(2000).
13.
HanyM.
SalahEldeen,MichaelL.
Nelson.
:LosingMyRevolution:AyearaftertheEgyptianRevolution,10%ofthesocialmediadocumentationisgone.
http://ws-dl.
blogspot.
com/2012/02/2012-02-11-losing-my-revolution-year.
html14.
RobertSanderson,MarkPhillipsandHerbertVandeSompel.
:AnalyzingthePersistenceofReferencedWebResourceswithMemento.
CoRR,arXiv:1105.
3459,(2011)15.
StanfordSNAPProjectDataset,http://snap.
stanford.
edu/16.
Twitternumbers,http://blog.
Twitter.
com/2011/03/numbers.
html17.
H.
VandeSompel,M.
L.
Nelson,R.
Sanderson,L.
L.
Balakireva,S.
Ainsworth,H.
Shankar.
:Memento:TimeTravelfortheWeb,TechnicalReport,arXiv:0911.
1112,November,(2009).
18.
Wan,X.
,Yang,J.
:Wordrank-basedLexicalSignaturesforFindingLostorRelatedWebPages.
InProceedingsofthe8thAsia-PacicWebconferenceonFrontiersofWWWResearchandDevelopment,APWeb'06,pages843-849,(2006).
19.
C.
Wilson,B.
Boe,A.
Sala,K.
P.
Puttaswamy,andB.
Y.
Zhao.
:UserInteractionsinSocialNetworksandtheirImplications.
InProceedingsofthe4thACMEuropeanconferenceonComputersystems,EuroSys'09,pages205-218,(2009).
20.
Wu,ShaomeiandHofman,JakeM.
andMason,WinterA.
andWatts,DuncanJ.
:WhoSaysWhattoWhomonTwitter.
InProceedingsofthe20thinternationalconferenceonWorldwideweb,WWW'11,pages705-714,(2011).
21.
JaewonYangandJureLeskovec.
:PatternsofTemporalVariationinOnlineMedia.
InACMInternationalConferenceonWebSearchandDataMinig,WSDM'11,pages177-186,(2011).
22.
J.
YangandS.
Counts.
:PredictingtheSpeed,Scale,andRangeofInformationDiusioninTwitter.
In4thInternationalAAAIConferenceonWeblogsandSocialMedia,ICWSM'10,May,(2010).
23.
D.
ZhaoandM.
B.
Rosson.
:HowandWhyPeopleTwitter:TheRolethatMicro-bloggingPlaysinInformalCommunicationatWork.
InProceedingsoftheACM2009internationalconferenceonSupportinggroupwork.
GROUP'09,pages243-252,(2009).

提速啦母鸡 E5 128G 61IP 1200元

提速啦(www.tisula.com)是赣州王成璟网络科技有限公司旗下云服务器品牌,目前拥有在籍员工40人左右,社保在籍员工30人+,是正规的国内拥有IDC ICP ISP CDN 云牌照资质商家,2018-2021年连续4年获得CTG机房顶级金牌代理商荣誉 2021年赣州市于都县创业大赛三等奖,2020年于都电子商务示范企业,2021年于都县电子商务融合推广大使。资源优势介绍:Ceranetwo...

ucloud国内云服务器2元/月起;香港云服务器4元/首月;台湾云服务器3元/首月

ucloud云服务器怎么样?ucloud为了扩大云服务器市场份额,给出了超低价云服务器的促销活动,活动仍然是此前的Ucloud全球大促活动页面。目前,ucloud国内云服务器2元/月起;香港云服务器4元/首月;台湾云服务器3元/首月。相当于2-4元就可以试用国内、中国香港、中国台湾这三个地域的云服务器1个月了。ucloud全球大促仅限新用户,国内云服务器个人用户低至56元/年起,香港云服务器也仅8...

青云互联19元/月,美国洛杉矶CN2GIA/香港安畅CN2云服务器低至;日本云主机

青云互联怎么样?青云互联美国洛杉矶cn2GIA云服务器低至19元/月起;香港安畅cn2云服务器低至19元/月起;日本cn2云主机低至35元/月起!青云互联是一家成立于2020年的主机服务商,致力于为用户提供高性价比稳定快速的主机托管服务。青云互联本站之前已经更新过很多相关文章介绍了,青云互联的机房有香港和洛杉矶,都有CN2 GIA线路、洛杉矶带高防,商家承诺试用7天,打死全额退款点击进入:青云互联...

403forbidden为你推荐
centos6.5centos 6.5服务器基本配置有哪些企业建网站企业为什么要建网站重庆网站制作请问重庆那一家网站制作公司资信度比较好?技术实力雄厚呢?颁发的拼音发字的多音字组词泉州商标注册泉州注册一个商标具体要怎么弄?具体流程是什么?12306.com如何登录12306powerbydedecms织梦dedecms怎么去掉power by dedecms方法discuzx2Discuz! Database Error怎么解决headersalreadysentPHP中session_start的意思是什么正在跳转电影空间如何把空间自带的三级域名跳转到主域名上?
未注册域名查询 合租服务器 x3220 yardvps 安云加速器 googleapps 老左博客 标准机柜尺寸 地址大全 hnyd 京东商城双十一活动 howfile 刀片服务器是什么 合租空间 softbank邮箱 服务器托管什么意思 中国电信测速网 免费私人服务器 中国电信测速器 photobucket 更多