storedhttperror503

httperror503  时间:2021-04-10  阅读:()
HALId:hal-01576291https://hal.
laas.
fr/hal-01576291Submittedon22Aug2017HALisamulti-disciplinaryopenaccessarchiveforthedepositanddisseminationofsci-entificresearchdocuments,whethertheyarepub-lishedornot.
ThedocumentsmaycomefromteachingandresearchinstitutionsinFranceorabroad,orfrompublicorprivateresearchcenters.
L'archiveouvertepluridisciplinaireHAL,estdestinéeaudéptetàladiffusiondedocumentsscientifiquesdeniveaurecherche,publiésounon,émanantdesétablissementsd'enseignementetderecherchefranaisouétrangers,deslaboratoirespublicsouprivés.
ExperienceReport:LogMiningusingNaturalLanguageProcessingandApplicationtoAnomalyDetectionChristopheBertero,MatthieuRoy,CarlaSauvanaud,GillesTrédanTocitethisversion:ChristopheBertero,MatthieuRoy,CarlaSauvanaud,GillesTrédan.
ExperienceReport:LogMin-ingusingNaturalLanguageProcessingandApplicationtoAnomalyDetection.
28thInternationalSymposiumonSoftwareReliabilityEngineering(ISSRE2017),Oct2017,Toulouse,France.
10p.
hal-01576291ExperienceReport:LogMiningusingNaturalLanguageProcessingandApplicationtoAnomalyDetectionChristopheBertero,MatthieuRoy,CarlaSauvanaudandGillesTredanLAAS-CNRS,UniversitedeToulouse,CNRS,INSA,Toulouse,FranceEmail:rstname.
name@laas.
frAbstract—Eventloggingisakeysourceofinformationonasystemstate.
Readinglogsprovidesinsightsonitsactivity,assessitscorrectstateandallowstodiagnoseproblems.
However,readingdoesnotscale:withthenumberofmachinesincreasinglyrising,andthecomplexicationofsystems,thetaskofauditingsystems'healthbasedonloglesisbecomingoverwhelmingforsystemadministrators.
Thisobservationledtomanyproposalsautomatingtheprocessingoflogs.
However,mostoftheseproposalstillrequiresomehumanintervention,forinstancebytagginglogs,parsingthesourcelesgeneratingthelogs,etc.
Inthiswork,wetargetminimalhumaninterventionforlogleprocessingandproposeanewapproachthatconsiderslogsasregulartext(asopposedtorelatedworksthatseektoexploitatbestthelittlestructureimposedbylogformatting).
Thisapproachallowstoleveragemoderntechniquesfromnaturallanguageprocessing.
Morespecically,werstapplyawordembeddingtechniquebasedonGoogle'sword2vecalgorithm:logles'wordsaremappedtoahighdimensionalmetricspace,thatwethenexploitasafeaturespaceusingstandardclassiers.
Theresultingpipelineisverygeneric,computationallyefcient,andrequiresverylittleintervention.
Wevalidateourapproachbyseekingstresspatternsonanexperimentalplatform.
Resultsshowastrongpredictiveperfor-mance(≈90%accuracy)usingthreeout-of-the-boxclassiers.
Keywords—Anomalydetection,logle,NLP,word2vec,machinelearning,VNFI.
INTRODUCTIONGatheringfeedbackaboutcomputersystemsstatesisadauntingtask.
Tothisaim,itisacommonpracticetohaveprogramsreportontheirinternalstate,forinstancethroughjournalsandlogles,thatcanbeanalyzedbysystemadmin-istrators.
However,assystemstendtogrowinsize,thistraditionalloggingmethoddoesnotscalewell.
Indeed,scatteredsoftwarecomponentsandapplicationsproduceheterogeneouslogles.
Forinstance,loggingmethodssuchasthecommonsyslog,areextremellyexibleintheirsyntax(seetheRFC[7]).
Also,differentloglesmaygatherinformationwithdistincttypesofinformation.
Forinstancerule-basedlogging[4]tracesthestartandtheterminationofapplicationsfunctions,whilesyslogeventloggingcollectssystemactivity.
Eachofthemtendstodescribeapartialviewofthewholesystem.
Inparticular,[3]showsthateventlogging,assertionchecking,andrule-basedloggingareorthogonalsourcesforsystemmonitoring.
Moreover,eachpartialviewofthesystem,evenwhenusingthesameloggingmethod(orprotocol),maynotusethesamekeywordstoexpressnormalorerroneousbehaviors.
Thisplethoraofavailableloglesburdenslogsummarization.
Asaresult,sourcecodeanalyzesandcommunicationswithapplicationdeveloppersarenecessaryfortroubleshootingorauditingsystems[17].
Notwithstanding,suchnonautomaticprocessesarenotacceptableinlargecomputingsystembe-causetroubleshootingforrecongurationmustbehandledon-line.
Toaddressthesechallenges,alargenumberofstudiesproposedapproachestoautomateandscaleuploganalysis([5],[8],[17],[23],[24]).
Mostapproachesrequirehowevercumbersomelogprocessing,forinstancebymanuallytaggingimportantevents,orbyparsingthesourcecodefunctionstoassessthexedandvariablepartsoflogevents.
Thecontributionofthispaperistoproposeanewapproachdepartingfromthisresearchlineandconsideringlogminingasanaturallanguageprocessingtask.
Thisapproachhastwomainconsequences,i)weloseapartofthecontextbyunder-exploitingthespecicitiesofeachstructuredsentenceaccordingtoapredenedpatternand,mostimportantly,ii)ourapproachisagnostictotheformatofthelogles.
Thus,whileconsideringsetsofloglesaslanguages,wegaintheabilitytousemodernNaturalLanguageProcessing(NLP)methods.
Inotherwords,wetradeaccuracyforvolume,preferringtheabilitytoinaccuratelyprocesslargevolumesofloglesinsteadofaccuratelyprocessingsometediouslypreprocessedlogs.
Assuch,thequestionweexploreinthisworkis:"Whatcanoff-the-shelfNaturalLanguageProcessingalgorithmsbringtologmining".
Wemoreparticularlyfocusonsuchquestionsas"ismysysteminstateAorstateB".
Theproposedapproachisrathersimpleandbrutal.
InsteadofpreciselytrackingtheeventsrelatedtoatransitionfromAtoB,wecollectlargeamountsoflogeventsrelatedtosystemsinstatesAandB.
Wethentransformthelogsintomultidimensionalvectorsoffeatures(usingNLPalgorithms)andtrainaclassierontheresultingdata.
Theresultingpipelineisarelativelystandardbigdataapplication,wherewetargettherealizationofclassiersprovidingaccurateinformationaboutthetargetsystemstate.
Webelievethisapproachisspecicallyinterestingduetotheexpensiveexpertiseusuallyrequiredtopreprocessthelogs.
Weshowinthispaper,throughaseriesofexperiments,thatwithminimumsetupeffortandstandardtools,itispossible1toautomaticallyextractrelevantinformationaboutasystemstate.
Wemoreparticularlyusetheword2vecalgorithmofGoogle[16]forlogmining,whichisanalgorithmforlearninghigh-qualityvectorrepresentationsofwords.
ItnotablyhasbeenusedforNLPinsomepreviousworksbutnotfortheanalysisoflogles.
Throughexperiments,weillustratethepotentialbenetsofourapproach,byprovidinganswerstosystemadministrators'questionswhendataismassivelyavailable.
Asanillustrativeexample,wefocusonthedetectionofstressrelatedanomaliesoverabroadrangeofcongurations.
Morespecically,wedeployedonavirtualcloudenvironmentavirtualnetworkfunctionrunningapanelofthreeapplications,namelyaproxy,arouter,andadatabase,towhichweappliedalargevarietyofstresspatternsbymeansoffaultinjection(highCPUandmemoryconsumption,highnumberofdiskaccesses,increaseofnetworklatencyandnetworkpacketlosses).
WeshowthatbysimplyanalyzingtheresultsofNLPprocessedlogles,itispossibletodetectstressedbehaviorswith≈90%accuracy.
Inthefollowing,werstpresentinSectionIItherationaleofourlogminingapproach,anddescribeouruseoffaultinjec-tionforvalidationpurposesinSectionII.
Then,inSectionIIIwedeneourcasestudy,theexperimentalplatformonwhichwedeployedit,andtheimplementationofourapprocahonthisplatform.
SectionIVpresentssomepromisingexperimentalresults.
InSectionVwediscussourresults,andanalyzetheirthreatstovalidity.
SectionVIdescribesrelatedworksregardingNLPandlogminingfordetectionpurposes.
Finally,weconcludethispaperinsectionVII.
II.
APPROACHA.
GeneralapproachoverviewTheapproachproposedasthecontributionofthispaperispresentedinFigure1.
Considerasetofloglesrelatedtoagivensystem.
Eachoftheseloglescontainsavaryingamountoflines,eachlineconsistingofoneapplicationofthesystemreportinganevent.
Eachlogevent(line)isalistofwords.
Asweconsiderloglesasanaturallanguage,weanalyzetheseloglesusingNaturalLanguageProcessingtools.
Assuch,werstremoveallnonalphanumericcharacters(asrequiredbyword2vec)andreplacethembyspaces,namelysed's/[a-zA-Z0-9]//g'.
Secondly,weuseword2vecfrom[16],apopularembed-dingtoolemployedbyGoogletoprocessnaturallanguage.
Inanutshell,word2vecproducesamappingfromthesetofwordsofatextcorpus(asetofloglesinourcase)toaneuclideanspacesayT.
Inthecaseofa20-dimensionsspaceTR20.
Thus,eachwordofaneventgetsassignedcoordinatesinavectorspace.
Theenjoyablepropertyofword2vecisitsabilitytoproducemeaningfulembeddings,wheresimilarwordsendupclose,whereaswordsthatarenotrelatedtoeachotherendupfarawayintheembeddingspace.
OnceeachwordhasbeenmappedtotheembeddingspaceT,wedenethepositionofalogeventasthebarycenterofitswords.
Followingasimilarscheme,oncealllogeventsfromagivenloglehavebeenmappedtopoints,wedeneNormalSystemStressedSystemInjectionCharacterlteringCharacterlteringword2vecppBinaryClassierfX|AX|A{p(x)}x∈X|AAtrain{p(x)}x∈X|AAtrainUnknownSystemCharacterlteringpf≥1/2xp(x)f(p(x))AAFig.
1:Generalapproachoverview.
Left:Training.
Right:Inference.
thepositionofthislogleasthebarycenterofthepositionofitslogevents.
Hence,attheendoftheprocess,eachlogleismappedtoasinglepointinT.
Thisdrasticcompressionhasonemajorinterest:itproducesacompactandusefulinputtotraditionalclassiers.
AssumingXrepresentsthesetofallpossiblelogles,suchmappingcanberepresentedasafunction:p:X→Tx→p(x).
Now,assumethatonehasaccesstoalargesetXofobservations(logles)onthesystem,correspondingtotwostatesthatwewouldliketocharacterize,sayAandA.
LetX|AandX|Abethecorrespondingloglessets.
Bytheabovedescribedprocess,everyobservationx∈X=X|A∪X|Acanbeassignedtoacoordinatep(x)∈T.
Inathirdstep,wetrainaclassier,namedfhereafter,onp(x|x∈X|A).
Atypicalsuchclassierfisanapproximationoftheidealseparationfunction:f:T→[0,1]p(y)→P(A|y).
Thetrainingofaclassierrequiresanavailablesetoflabeleddata.
Theselabelsmaybeforinstance:normalandanomalous.
Incasesthatlabeleddataisnotavailable,onecangeneratethembymonitoringasystemwhileexperiencingnormalandanomalousbehaviors.
Sinceanomalousbehaviorsareundesiredeventsand,assuch,usuallynotfrequentin2recentsystems,theyneedtobesynthesizedusingtechniquessuchasfaultinjection.
Inthispaper,wegeneratesetsofnormalandanomalousbehaviorsinacontrolledmannerusingfaultinjectiontechniquesforallanomalousbehaviors,asrepresentedinFigure1.
Oncethetrainingisnished,theresultingclassierisusedtoprovide,givenanynewproductionloglex,aninferredstate(anomalousornot)f(p(x))thatweclaimisagoodapproximationoftheactualstressstatusofthesystem,i.
e.
,P(A|x)f(p(x)).
Itisactuallyexpressedasaprobabilityandweneedtosetalimitoverwhichasystemiscategorizedasstressed,say1/2asinFigure1.
Inthecasexcontainsunencounteredwords,thosearesimplyignored.
III.
CASESTUDYANDEXPERIMENTALPLATFORMA.
CasestudyWeherebypresentourcasestudyonvirtualnetworkfunction(VNF)calledClearwater1aswellastheworkloadgeneratorusedduringourexperimentstosimulateactualusersofthistargetsystem.
Thiscasestudywasusedinourpreivouswork[19]foranomalydetectionbasedonmonitoringdata.
Itconstitutesameaningfulcasestudyinthatitdeploysseveralcomponentsofdifferentroles(e.
g.
,router,proxyanddatabase).
Whileweapplyourapproachwithnospeciccongurationnoraprioriknowledgeoftheimplementationsforeachcomponent,weconsiderthatourapproachhasgoodchancestogeneralizetovariouscasestudies.
1)Description:TheserviceisanopensourceVNFnamedClearwater.
ItprovidesvoiceandvideocallsbasedontheSessionInitiationProtocol(SIP),andmessagingapplications.
ClearwaterencompassesseveralsoftwarecomponentsandweparticularlyfocusourworkonBono,Sprout,HomesteadshowninFigure2.
BonoistheSIPproxyimplementingtheProxy-Call/SessionControlFunctions.
Ithandlesusers'requestsandroutesthemtoSprout.
ItalsoperformsNetworkAddressTranslationtraversalmechanisms.
SproutistheIMSSIProuter,receivingrequestsfromBonoandroutingthemtotheadequateendpoints.
Itimple-mentssomeServing-CSCFandInterrogating-CSCFfunctionsandgetstherequiredusersprolesandauthenticationdatafromHomestead.
Sproutcanalsocallapplicationserversandactuallycontainsitselfamultimediatelephony(MMTel)applicationserver,whosedataisstoredinanotherClearwatercomponentnotpresentedinthiswork(whencallsarecong-uredtouseitsservices).
HomesteadisaHTTPRESTfulserver.
IteitherstoresHomeSubscriberServer(HSS)datainaCassandradatabaseandmastersdata(i.
e.
,informationaboutsubscribedservicesandlocations),orpullsdatafromanotherIMScompliantHSS.
Bono,Sprout,andHomesteadworktogethertocontrolthesessionsinitiatedbyusersandhandletheentireCSCF.
Ourcasestudyencompassesthesethreecomponents,eachonebeingdeployedonadedicatedvirtualmachine(VM)ofourvirtualizedexperimentalplatform(seeSectionIII-B).
1http://www.
projectclearwater.
org/about-clearwater/Fig.
2:Clearwaterdeployment.
2)Workload:IMSworkloadscanbeemulatedbymeansoftheSIPpbenchmark2.
ThebenchmarkcontainsaworkloadthatcanbeconguredwithanumberofcallspersecondtobesenttotheIMS,andascenario.
Theexecutionofascenariocorrespondstoacall.
AscenarioisdescribedintermsofSIPtransactionsinXML.
ASIPtransactioncorrespondstoaSIPmessagetobesentandanexpectedSIPresponsemessage.
Acallfailswhenatransactionfails.
Atransactionmayfailfortworeasons:eitheramessageisnotreceivedwithinaxedtimewindow(i.
e.
,thetimeout),oranunexpectedmessageisreceived.
UnexpectedmessagesareidentiedbytheHTTPerrorcodes500(InternalServerError),503(ServiceUnavailable)and403(Forbidden).
ThescenariorunforourexperimentationssimulatesastandardcallbetweentwousersandencompassesthestandardSIPREGISTER,INVITE,UPDATE,andBYEmessages.
Thescenarioisavailableonline3.
Timeoutsaresetto10secasinsimilarexperimentalcampaigns[2].
3)Faultinjectionfortrainingandvalidation:Faultinjec-tionisusedinourstudyforcollectingloglesrepresentingbothnormalbehaviorsandstressedbehaviorsofatargetsystem,inordertoprovidethemasinputsforthetrainingandvalidationoftheclassiers.
Weemulateerrorsbymeansofinjectiontoolsthatimplementsystemsstressing.
Thesetoolswereusedinourpreviouswork[19].
Wecalltheorchestrationofseveralexecutionsofthetargetsysteminpresenceornotoferroremulationsanexperimentalcampaign.
Inthefollowingwepresenttheerrorsthatourinjectiontoolsemulateanddescribetheexecutionofanexperimentalcampaign.
Erroremulation.
Weemulatethefollowingvetypesoferrors,whichwewillbereferringtoasCPU,memory,disk,networkpacketloss,andnetworklatencyerrorsrespectively:(1)highCPUconsumption,(2)misuseofmemory,i.
e.
,increaseofmemoryconsumption,(3)abnormalnumberofdiskaccesses,i.
e.
,largeincreaseofdiskI/Oaccessesandsynchronizations,(4)networkpacketloss,(5)networklatencyincrease.
CPUerrors.
AbnormalCPUconsumptionsmayarisefromprogramsencounteringimpossibleterminationconditionsleadingtoinniteloops,busywaitsordeadlocksofcompetingactions,whicharecommonissuesinmultiprocessinganddistributedsystems.
2http://sipp.
sourceforge.
net/index.
html3https://homepages.
laas.
fr/csauvana/sipp\scenario/issre2016\sipp\scenario.
xml3Memoryerrors.
Abnormalmemoryusagesarecommonandhappenwhenallocatedchunksofmemoryarenotfreedaftertheiruse.
Accumulationsofunfreedmemorymayleadtomemoryshortageandsystemfailures.
Diskerrors.
Ahighnumberofdiskaccesses,oranincreaseofdiskaccessesoverashortperiodoftime,emulatediskswhoseaccessesoftenfailandleadtoanincreaseindiskaccessretries.
Itmayalsoresultfromaprogramstuckinaninniteloopofdatawriting.
Networkpacketlossandlatencyerrors.
Sucherrorsmayarisefromnetworkinterfacesofthetargetsystemorfromthenetworkinterconnectionofthevirtualizedinfrastructurehostingthesystem.
Weemulatepacketlossesandlatencyincreases.
Packetlossesmayarisefromundersizedbuffers,wrongroutingpoliciesorevenrewallmiscongurations.
Latencyerrorsmayoriginatefromqueuingorprocessingdelaysofpacketsongatewaysoratthetargetsystemlevel.
Fromthedenitionoftheseerrortypes,animportantexper-imentalparameteristheinjectionintensity,i.
e.
,theexpectedimpactmagnitudeofthedifferentinjectionsfromuserspointsofview.
Inourstudy,wepresentresultsforthedetectionoferrorswithhighintensities.
Inotherterms,experimentalcampaignsperforminjectionsthatstronglyaffectthetargetsystemcapabilitytoanswerusersrequests.
TableIpresentstheintensitylevelsthatwecalibratedforourClearwatercasestudy.
ErrortypeUnitIntensitylevelCPU%90Memory%97Disk#process50Networkpacketloss%8.
0Networklatencyms.
80TABLEI:Injectionintensitylevels.
Regardingthememory,diskandCPUinjections,thein-tensityvaluesoferrorsareconstrainedbythecapacityoftheoperatingsystems(OSs)onwhicharedeployedtheapplica-tionsofourcasestudy.
Inotherwords,theintensitylevelscorrespondtothemaximumresourceconsumptionallowedbytheOSbeforekillingtheexecutionoftheinjectionagent.
Consideringtheremainingtypesofinjections,thecorre-spondingintensitylevelsissetsoastoleadtoaround99%ofunsuccessfullyansweredrequestswhenappliedinatleastoneVM.
Theunsuccessfullyansweredrequestsratecanbeknownfromtheworkloadlogles.
Experimentalcampaigns.
Theexperimentalcampaignisconductedusingacustomizablemainscriptthateitherlaunchesnormaloranomalousexecutionsofthetargetsystem.
Theexperimentalcampaigneitherlaunchesnormalorstressedexecutionsofthetargetsystem.
Anexecution,beitnormaloranomalous,producesonelogleforeachVMofourtargetsystem.
Wedeneacampaigntorunasmanynormalexecutionsasthenumberofstressedexecutions.
Theselectednumberofstressedexecutionsisconguredtorepresentallcombinationsofdifferentinjections(i.
e.
,theinjectionofeacherrortype,ineachVM).
Whenrunningananomalousexecution,theconguredinjectionstartsaftertsecondsfromthetargetsystemboottime,wheretisrandomlyselectedinapreconguredinterval.
Thisprocessaddsrandomizationtothesetofcollectedlogles,aprerequisiteforthegeneralizationofourresults.
Additionally,consecutiveexecutionsofacampaignareseparatedbytherebootofallVMsofthetargetsystemandtheworkloadinordertobesuretorestartfromacleanandunpollutedstate.
Asaresult,theparametersofanexperimentalcampaignareasfollows:i)targetVMslistedinlvm,ii)errortypeslistedinltype,iii)aninjectiondurationsetininjectduration,iv)acleanrundurationsetincleanrunduration,v)anintervalofvaluesdeningafterwhichtimeaninjectioncanstartafterarebootsetininterval.
Moreover,acampaignisexecutedasfollows.
EacherrortypeisinjectedinarstVM,theninasecondVM,etc.
withrebootsofthetargetsystemandtheworkloadbeforeeachnewexecution.
Thestressedexecutionsareorchestratedasexplainedinalgorithm1.
Thenthesamenumberofnormalexecutionsareperformed.
Algorithm1OrchestrationofstressedexecutionsofthetargetsysteminanexperimentalcampaignInput:lvm,ltype,injectduration,interval,cleanrundurationstartworkload()CleanrunforvminlvmdoRunswithinjectionsforerrinltypedostartworkload()randtime=randomint(interval)sleep(randtime)inject=Injection(err,injectduration)injectinvm(vm,inject)stopworkload()rebootvms()endforendforB.
ExperimentalplatformInthefollowing,werstpresenttheplatformonwhichwerunexperiments.
Thenwedescribetheimplementationrequiredtocarryoutourexperimentsnamelytheinjectionagents,experimentalcampaignparameters,andthecollectionoflogles.
1)Platform:Wedeployedourtargetsystemonavirtual-izedplatform.
TheplatformiscomposedofaclusterincludingtwohypervisorsandseveralVMs.
FourVMsaredeployedforourtargetsystem:oneVMrunstheworkloadandtheotherthreerespectivelyhostthecomponentsBono,SproutandHomesteadofClearwater.
TheworkloadVMalsohasthemeanstocontroltheexperimentalcampaignlaunch.
TwootherVMsarerespectivelyusedtostoreloglescollectedfromthetargetsystemandtoanalyzethestoredlogles.
ThedeploymentoftheVMsisillustratedinFigure3.
4Fig.
3:Virtualizedplatform.
TheplatformisaVMwarevSphere5.
1privatecloudcomposedof2serversDellInc.
PowerEdgeR620withIntelXeonCPUE5-26602.
20GHzand64GBmemory.
EachserverhasaVMFSstorage.
EachVMdeployedforthetargetsystemimplementationhas2CPUs,a10GBmemory,a10GBdiskandrunstheUbuntuOS.
VMsareconnectedthrougha100Mbpsnetwork.
2)Faultinjection:InjectionsinthetargetsystemarecarriedoutbyinjectionagentsinstalledintheseVMs.
ThereisoneinjectionagentforeacherrortypeineachVMofatargetsystem.
AgentsarerunandstoppedthroughanSSHconnectionorchestratedbythecampaignmainscript.
TheyemulateerrorspresentedinSectionIII-A3bymeansofasoftwareimplementation.
CPUanddiskerrorsareemulatedusingthestresstesttoolstress-ng4.
CPUinjectionsrun2processes(thereare2coresineachVM)runningallthestressmethodslistedinthetooldocumentation.
Thepercentageofloadingissetaccordingtotheintensityleveloftheinjection.
Diskinjectionsstartseveralworkerswriting50Moand50workerscontinuouslycallingthesynccommand,withanionicelevelof0.
Thenumberofwritingworkersissetaccordingtotheintensityleveloftheinjection.
Memoryinjectionsarerunbymeansofapythonscriptreservingmemoryspacewhilecontinuouslycheckingwhethertheamountofmemoryspacereservedbythescriptcorre-spondstotheamountsetbytheintensityleveloftheinjection.
Finally,weusetheLinuxkerneltoolsiptablesandtcfortheinjectionofnetworklatenciesonthePOSROUTINGchain,andiptablesontheINPUTchainfortheinjectionofpacketlosses.
Allnetworkprotocolsaretargeted.
3)Experimentalcampaignsparameters:Anexperimentalcampaigncorrespondstotheexecutionofacustomizablemainscriptthatstartstheworkloadofourtargetsystem,andeithermakescleanrunofthistargetsystemormakesrunswhileperforminginjectionsinthetargetsystemVMs.
Theparametersoftheexperimentalcampaignswerunareasfollows.
Theinjectiondurationiscalibratedsoastoaffectseveralinstancesofworkloadexecutions(anexecutionlastslessthan1sec).
Wecalibratedtheinjectiondurationtobe10minlonginordertocollectaround5000linesoflogleforeachcleanrunandinjection.
Also,wecalibratedthecleanrundurationtobe30min.
Finally,wecalibratedthestartofinjectionstoberandomlyselectedintheintervalfrom1to10min.
ThisintervalallowstheVMstostabilizeafterareboot.
4http://kernel.
ubuntu.
com/cking/stress-ng/Apr1806:44:37cw-011restund[1368]:stunserverreadyApr1806:44:37cw-011bono[1284]:2005-Description:Applicationstarted.
@@Cause:Theapplicationisstarting.
@@Effect:Normal.
@@Action:None.
Apr1806:45:01cw-011CRON[1521]:(root)CMD(/usr/lib/sysstat/sadc11/var/log/sysstat/clearwater-sa'date+%d'>/dev/null2>&1)Fig.
4:Exampleofsyslogevents.
OurexperimentalcampaignparametersaresummarizedinTableII.
Campaignparameterslvm={Bono,Sprout,Homestead}ltype={CPU,memory,disk,latency,packetloss}injectionduration=10mincleanrunduration=10mininterval=[1:10]minTABLEII:Injectioncampaignparametersofthefourexperimentations.
4)Loglescollection:TheloglesthatweuseinthisstudyaregeneratedbytheLinux-basedUbuntuOSusingsyslog,thestandardtoolformessagelogging.
Eventsareloggedwithapredenedpatterncontaininginthatorderthedateoftheeventissue,thehostnameoftheequipmentdeliveringtheevent,theprocessdeliveringtheevent,aprioritylevel,theidoftheprocessdeliveringtheeventandnallythemessagecontainingfree-formattedinformation.
Forinstance,noperformancemetricsofthesystemarelogged.
AexampleofsyslogeventsisprovidedinFigure4.
Resultsofpreviousstudies[3]showthatsyslogeventloggingisthemoresuitablemethodtouseinthiscontext,althoughacombinationoftheseveralmethodsincreasesthefailurecoverage.
Thesyslogfacilityhastheadvantagetogatherseveralapplicationsevents.
Duringexperimentalcampaigns,loglesarecollectedbymeansofagents(theyarerepresentedbyorangesquaresinFigure3)andstoredinadatabaseforlateranalysis.
IV.
RESULTSInthissection,wequantitativelystudytheeffectivenessofthepresentedapproachbypresentingtheanalysisresultsover660logles.
Afterbrieyintroducingtheconsideredmetrics,wewilldetailtheobtainedresults.
Themainresearchquestionweseektoansweris:Usingonlysysloglesasinput,howaccuratelycanouralgorithmdistinguishStressedandnonStressedsystemsThesecondaryquestionsarei)howsensitivearetheresultstotheparametersusedtocalibratethemodelsofourapproachandii)whatistheabilityofourapproachtoissuequickdecisiononasystemstate5A.
MaterialsandMetricsUsingthetestbedpresentedinSectionIII-Bwegenerateasetof660loglesthatwillconstitutethebasisofourmodelstraining.
Exactlyhalfofthese(330)originatefromnormalunstressedsystemexecutions.
Theotherhalfcapturessystemswithinjectedfaults.
Moreprecisely,weran22replicationsforeachofthe5injectioncampaignsovereachofthe3targetVMsofourcasestudy,foratotalof(2235)=330stressedlogles.
Word2Vectraining:Toestablishtheword2vectrainingset,weusetheconcatenationofall660loglesfromwhichweremovedallnonalphanumericcharacters.
word2vec,originallydesignedforNLPtasks,canbetunedwithanumberofdifferentoptions.
Themostimportantparameteristheembeddingspacedimensiondim(T),itsim-pactisdetailedinSectionIV-B2.
Theotherparametersmostlyallowtosetupltersinordertooptimizethecomputation.
Wedeactivatedallofthemtokeepthemaximumamountofinformationavailabletotheclassier.
Finally,fromthetwomethodsproposedintheimplementationofword2vec,namelyskip-gramandcbow(deningwhetherthesourcecontextwordsshouldbepredictedfromtargetwordsortheopposite5),wechosecbowbecauseofitssimplicity,inordertoprovidean"as-simple-as-possible"solution.
Giventherelativelysmallsizeofourtextcorpus(com-paredtoalltheEnglishtextsavailableontheweb,namelyword2vec'soriginalusecase),andthewellknownefciencyoftheword2vecimplementation,theoverallcomputationistractableonastandardcomputer(seeSectionIV-B3).
Therefore,thephilosophybehindimplementationchoicesisthefollowing:keepitsimple,andkeepthemaximumamountofinformation.
Fromwordcoordinatestologlecoordinates:Theoutputofword2vecisalecontainingthecoordinatesofthe233kdistinctwordsofourtrainingcorpusinT.
TotransformloglesintocoordinatesinT,weexploredtwostandardstrategies:baryInthebarycenterapproach,werstcomputethepositionofeachlineofalogle,denedastheaveragepositionofallthewordsitcontains.
Then,thepositionoftheleisdenedastheaverageofallitsline:p(f)=def1/|f|l∈f1/|l|w∈lp(w).
tdfTermfrequency-inversedocumentfrequencyisastan-dardmetricofinformationretrieval.
Comparedtothebarycenterapproach,wordsareweightedbytheirfre-quencyinthedocument.
Thatis,afrequent(common)wordwillproportionallyhavelessweightthanararewordwhencomputingtheaveragepositionofalogle.
Wereliedonthescikit-learn6standardimplementationofthefunction.
Theoutputofthisstepisamatrixof660*dim(T)entriesdecoratedwiththeircorrespondingtargetlabels(stressed,unstressedsystem).
5Seeoneimplementationexplainationhttps://www.
tensorow.
org/tutorials/word2vec.
Lastreadon13/08/2017.
6http://scikit-learn.
org/Classiers:Binaryclassiersareamongstthemostcom-monandunderstoodclassiersinmachinelearning.
Were-strictedourstudytothreesimpleandstateoftheartap-proaches:NaiveBayes,RandomForestsandNeuralNetworks.
Wereliedonthefollowingscikit-learnlibraryimple-mentations:RandomForestClassier,MLPClassier,andGaussianNB.
Allthesealgorithmsbelongtotheclassofsuper-visedalgorithms.
Inotherwords,theyrequirelabeledtrainingdata,althoughwecouldhaveusedunsupervisedapproachessuchastheonestestedin[8],i.
e.
,PrincipalComponentsAnalysisandInvariantmining.
Again,thephilosophyofourapproachistorefrainfromnetuningthoseimplementationsandtoassesstheglobalstrategyasahole.
Wethereforeusedthedefaultparametersonallthesealgorithms.
ClassierAssessment:Toassesstheclassicationaccu-racy,weusedthestandard10-foldvalidationapproach.
Werstrandomlydividedthetrainingsetin10equalsizedchunks.
Eachpossiblegroupof9chunkswasusedtotrainourclassierwhiletheremainingchunkwasusedasatest.
Let{Xi}1≤i≤10beapartitioningofXinto10chunks.
LetXjbethetestedchunk,andletTj(resp.
Fj)bethesubsetofstressed(resp.
unstressed)logsofXj.
ThesetoftruepositivesTPjforXjisdenedas:TPj={x∈Xjs.
t.
fj(x)≥1/2∧x∈Tj}.
Logsthatbelongtostressedmachinesandtowhichtheclassierfj(trainedusing∪i=jXi)assignedaprobabilitygreaterthan1/2ofbeingstressedaretruepositivesforXj.
Similarly,thesetoffalsepositivesFPjforXj(logsbelongingtounstressedmachinesbutdetectedasmorelikelystressed)isdenedas:FPj={x∈Xjs.
t.
fj(x)≥1/2∧x∈Fj}.
Noticethatthetruenegativeandfalsenegativesetsaresymmetricallydened.
Togetacloserlookatfj,onecanuseReceiverOp-eratingCharacteristics(ROC).
Thatis,lets∈[0,1]bea"safetylevel"onewantstoapplytof-baseddecisions.
LetXsj={x∈Xj,fj(x)≥s}bethesubsetofXjcontainingonlythelogsdetectedasstressedwithprobabilityatleasts.
Foreachvalueofs,itisthuspossibletodeneatruepositiverateTPRs=|Xsj∩Tj|/|Tj|andafalsepositiverateFPRs=|Xsj∩Fj|/|Fj|.
Thegraphicalrepresentationoftheobtained{FPRs,TPRs}couplesprovidesaprecisevisualdescriptionoff'sperformance,asinFigure5thatwillbepresentedshortlyhereafter.
B.
ResultsanalysisInthefollowing,afterexploringthedetailedresultsob-tainedusingatypicaltrainedclassier,westudytheimpactoftheembeddinghostspacedimension.
Wethenstudytheruntimeoverheadofourapproach.
1)Accuracy:Figure5presentstheROCsobtainedonatypicalconguration.
Moreprecisely,inthissetup,weuseddim(T)=20andexploredvariousaggregation/classiercongurations.
Theresultsareverygood,withNeuralNetwork6Fig.
5:ReceiverOperatingCharacteristicof3classiers,fordim(T)=20.
ThisplotshowstheTruePositiveRateofeveryclassierasafunctionoftheFalsePositiveRateofthesameclassier.
andRandomForestexhibitingastrongclassicationaccuracy(>95%AUC).
Theaggregationtechnique(i.
e.
,basedontf-idforbarycenter)haslittleimpact.
NaiveBayesperformsconsiderablybetterthanrandom(77%and81%AUCfortf-idfandbarycenterresp.
),butisvisiblylessprecisethantheothertwoclassiers.
Theseverygoodresultsconrmthesoundnessoftheapproach.
Onecanhaveamoredetailedlookattheoriginofmisclas-sications.
TableIIIexhibitstheconfusionmatrixofNeuralNetwork(usingbarycenteranddim(T)=20).
Althougharound90%ofthetargetsgetcorrectlycategorized,onecanseethattheerrorsareslightlyleaningtowardsfalsepositives(thatis,anunstressedsystemiswrongfullycategorizedasstressed).
Althoughthisisnotthepurposeofthisstudy,itispossibletoexploitthisimbalanceforanoverallbetterclassi-cationaccuracy(forinstancebyraisinga1/2limitoverwhichasystemiscategorizedasstressed).
Thestresspatternsarenotveryhomogeneouslydetected,withLatencystressbeing7timesmoreefcientlydetectedthanCPUstress.
However,becauseoftheaccuracyoftheconsideredclassier,theseresultsonlyconcernasmallnumberofevents,andthereforehavealowstatisticalpower.
TableIVpresentsthemisclassiedentriesbyapplication:allthreeapplications(namelyBono,SproutandHomestead)yieldtosimilarclassicationaccuracy.
TABLEIII:ConfusionmatrixfortheNeuralNetworkclassier,usingdim(T)=20,andbarycenter:detailedbystresstypeStressTypeDetectedAsStressed(True)DetectedasUnstressed(False)NoStress0.
1150.
885Packetloss0.
9390.
061Latency0.
9850.
015Memory0.
9390.
061Disk0.
9700.
030CPU0.
8930.
106Fig.
6:AreaUndertheROCCurves(AUC)capturingtheperformanceofourclassiers,asafunctionofthenumberofdimensionsoftheembeddingspaceTABLEIV:ConfusionmatrixfortheNeuralNetworkclassier,usingdim(T)=20,andbarycenter:detailedbyapplicationTargetMachineRequestsNumberofmisclassicationsSuccessRate(%)Bono2201991.
4Sprout2201792.
3Homestead2202090.
92)Parameterssensitivity:Weherefocusontwochoicesofimportance:thedimensionoftheembeddingspacedim(T),andtheclassieralgorithm.
Tocompareourclassiers,weusetheAreaUnderCurve(AUC)measure.
Inanutshell,itmeasurestheareaundertheROCofaclassier.
Thatis,anAUCof1denotesaperfectclassication,whileanAUCof0denotesaworsethanrandomprediction.
Itisalsocommonlypresented,givenarandompositive(stressed)andrandomnegative(unstressed)example,astheprobabilityfortheclassiertorankthenegativeexamplebelow(thatis,lessstressed)thepositiveexample.
TheROCAUCisknowtowellsummarizesROCcurves[1].
Figure6providestheAUCmeasuresforour3consid-eredclassiersforvariousembeddingspacedimensions.
Asexpected,increasingthenumberofdimensionsincreasestheclassicationaccuracy:moreinformationhelps.
Thisincreaseishoweververylimited:apartfromNeuralNetwork,whereincreasingdimensionsfrom5to20hasavisibleimpact,classieraccuraciesallstaystablefordim(T)>20.
Thisisgoodnews,assuchparametercanbehardtotuneapriori.
Moregenerally,thisgureconrmsthepreviousobserva-tions:classicationisveryaccurate,especiallyusingNeuralNetworkandRandomForest,withAUCsconsistentlyscoringabove0.
95.
3)Timingperformance:Whenselectingaclassier,theexpectedclassicationaccuracyisthemostimportantcriteria.
However,inoperationalcontexts,anothercrucialcriteriaisthecomputationalcomplexityofbothtrainingandprediction.
7Fig.
7:Trainingwalltimeoftheclassierson660instances,forvaryingembeddingspacedimensions.
Noticethelog-logscale.
Toprovidesomeinsights,werecordedwallclocktimesofthetrainingofmachinelearningmodels(Figure7)andofindividualpredictionofthesemodels(Figure8)operations.
ThosewereperformedonclassicalMacbookProwith16GBofRAMandaquad-coreInteli7.
Interestingly,theseguresprovideanewperspectiveonourclassiers.
Resultsconrmthereputationofeachofthosemodels:NaiveBayesisverysimple,itisquicklytrainedandprovidesfastanswers.
NeuralNetworkisaconsiderablymorecomplexmodelwhosetrainingrequiressignicantlymoretime.
However,oncetraineditisabletoanswerreasonablyfast.
Contrariwise,RandomForestisquicklytrainedbutre-quiresconsiderablymoretimetoissuepredictions.
Issuingapredictionrequiresonaverage66ms(resp.
5msand11ms)forRandomForest(resp.
NaiveBayesandNeuralNetwork).
Notsurprisingly,increasingdim(T)comeswithacompu-tationalcost(asitincreasesthenumberoffeaturesonwhicheachmodelistrained),butsinceSectionIV-B1showsthatdim(T)=20isalreadysufcienttoobtainaccurateresults,weconcludethatthisapproachiscomputationallytractable.
Themostprominentdecisionisthechoiceoftheclassier:al-thoughthesimplestpossibleclassier(NaiveBayes)providescheapandreasonableanswers,moreefcientclassierslikeRandomForestorNeuralNetworkwillcostabitmore,eitherattrainingtime,oratpredictiontime.
Toconclude,thisresultssectionexploredtheperformanceofthreestateoftheartclassiersexploitingthelogpositions.
Theseclassiersexhibitastrongperformanceforareasonablecost.
Themostimportantparameter,thedimensionofthehostspacedim(T),isnotverysensitive:valuesrangingfrom20to200willroughlydeliverthesameperformance.
Althoughmanyparameterscouldbepreciselytunedtooptimizetheclassiers,webelievethesegoodresultsobtainedusingmostlydefaultvaluesofCOTStoolsalreadyvalidatethesoundnessofourapproach.
Moreprecisely,theseshowtheextremelypowerfuleffectoftheword2vecembeddingappliedtologs:itallowstosummarizeeachlogletoasinglepointinTwhileFig.
8:Timetakenforatrainedmodeltoissueoneprediction.
Noticethelog-linscale.
keepingenoughinformationtoallowanefcientclassication.
V.
DISCUSSIONOurapproachleavesonecommonquestionofallmachinelearningapproachesintact:howgeneralarethelearnedmod-elsInotherwords,aretheclassiersbuiltinthiscontextabletoprovideaccurateanswersindifferentcontexts,applicationenvironments,underdifferentinjectioncampaignsAlthoughthisquestionisdenitelyofinterest,weargueitsscopegoeswellbeyondthispaper.
Philosophically,thisstudyshowsthatitiseasytotrainefcientclassiers.
Butinformally,aclassierisonlyasgoodasitstrainingdata.
Theavailabilityoflabelledtrainingdatacanclearlylimittheapplicabilityofourapproach.
Theadvantageoffaultinjectioniftogatherrelevantlabeleddatasetsinashorttimeperiod.
Althoughitenablestoevaluateourapproachinastraghforwardmannerthisimplementioncanbecumbersome.
However,whilewerelyonfaultinjectiontogatherdatasets,othersourcesexist:user-basedfeedback,crowedsourceddatasets,andcrashreportsoflargescaledeployments.
Inourpreviouswork[19]weanalyzedmonitoringcoun-terssuchasCPUconsumptionornumberofdiskaccessesforanomalydetection.
Resultsfromcounter-baseddetectionshowedagoodpredictiveperformancethatisyetnotfullyalignedwiththeresultsofthisstudy.
Forinstance,latencyer-rorsweresignicantlyhardertodetect.
Inthisstudy,weshowthatbysolelyminingsyslogleswecoulddetectanomalieswithhighaccuracyforalltypesofanomalies.
Consequenlty,webelieveourapproachislargelypromising.
Asforfuturework,weplantostudyanhybridapproachleveragingbothloggingandcounter-baseddatainordertofurtherevaluatetheirpotentialcomplementarity.
whattypeoflogsenhanceorweakentheefciencyofourapproach.
Finally,resultspresentedinthispapershowthatourapproachdetectswiththesameaccuracythestressesinjectedineithertypeofapplicationofourcasestudy(i.
e.
,proxy,routeranddatabase).
Inotherwords,theanalysisofsystemrelatedlogssuchassyslogisanefcientwaytosummarize8applicationbehaviorsforstressdetectionwithnoregardtothetypeofapplication.
Webelievehoweverthatsyslogeventsarenotenoughtoderiveapplicationdataowsthatmayallowtodetectothertypesofanomaliesormoreimportantlyforadmin-istrators,todiagnosetheoriginofananomaly.
Consequently,weneedtoexploreinfutureworkothertypesoflogs,notablytheonesgeneratedbyourcasestudyapplication.
VI.
RELATEDWORKInthisstudy,weuseaword2vec-basedmethodforlogminingwithavalidation-purposedapplicationofdetectingstressedbehaviorsincomputingsystems.
word2vecisamethodforlearninghigh-qualityvectorrepresentationsofwords.
IthasbeenusedforNLPinsomepreviousworksbutnotfortheanalysisoflogles.
Incomparison,ourpreviouswork[19]focusesonanomalydetectionbasedonmonitoringdatacollectedbymeansofaspecicsoftwareagent,deployedbeforehandontargetmachines,andprovidingnumericalmet-ricsonthesystembehavior.
Hereweexploitthedefaultsystem-producedtextuallogstopredictstress.
Besidethedeeptechnicaldifferences,ourapproachallowsdifferentuse-cases,likepost-mortemanalysisofthebehavioroftheseveralprocessesbeingexecutedinthetargetedsystems.
Consequently,inthefollowingwepresentseparatelysev-eralworksrelatedtoNLPandotherworksrelatedtologlesanalysisfordetectionpurposes.
NLPapplications.
Intheliterature,mostoftheNLPalgorithmsareusedfordocumentprocessing[26]toisolatereferencesofagivensubjectinadocumentanddetectthesentimentsofthewriter,ortoexploittweets[11]todetectcyber-attackssuchasdistributeddenialofservice.
Tothebestofourknowledge,relativelyfewworksexploitNLPforadifferentpurposethandocumentanalysis.
Weprovidehereaquicksummaryofthesenon-traditionalusesofNLP.
In[15],theauthorsuseaNLPtechniquecalledLatentSemanticIndexingtoidentifysourcecodedocumentsthatmatchauserqueryexpressedinnaturallanguage.
Theyusethesametechniquein[14]todetectsimilarpieceofcode(i.
e.
,duplicatedfunctions)insoftwaresystemscode.
Inaddition,LatentDirichletAllocationsareusedforasimilarpurposein[20].
NLPisalsoappliedonnetworkpacketpayloadsfornetworkintrusiondetectionin[18].
In[10],customersaccessestobusinessesURLsareanalyzedusingaword2vec-basedmethodtoproposebetterservicestocustomers.
Finally,NLPisalsousedtodetectdesignandrequirementdebts[13]fromcommentsoftenopensourceprojects.
Logminingfordetectionpurposes.
Althoughsomeworksproposenewmethodstogeneraterelevantlogeventsasin[4],loglesstillgatherawiderangeofeventsandevaluatingtheirinformationintheexecutioncontextorweightingtheirgravityisstillintricate.
Forinstance,theauthorsof[17]analyzeawiderangeoflogswithengineersandcompareeventssignalingfailurestotheengineersfeedbackonactualfailures.
Itturnsoutthatthenumberofactualfailuresislowerthanthefailuresreportedbylogs.
Alsotheypointoutthatsyslogmessageseveritylevelisof"dubiousvalue",andthatitisessentialtotakeintoaccounttheoperationalcontextduringwhichlogeventsarecollected.
Nevertheless,loglesanalysisforanomaly(e.
g.
,crash,fault,OSstressing.
.
.
)detectionincomputingsystemshasbeenwidelystudiedanditisstillanactiveresearcheld,inparticularwhenconsideringtheevermorecomplexrecentcomputingsystems.
Executiontracesofstreamingapplicationsareanalyzedin[9]inordertodetectanomalies.
Theauthorsanalyzetracesbymeansofthemergingpatternminingmethodappliedonpatternsofevents(i.
e.
,linesoftraces).
Thentheybuildagraphrepresentingthedataowbetweenthedifferentcomputingunitsoftheapplication.
Likewise,in[21]theauthorsanalyzethetemporalityofexecutiontracesinordertoderivesystemstatesfromtheirestimatedcontrolows.
Theauthorsof[25]alsoworkontheorderednatureoflogles.
Theyexploittimeseriespotentiallyhiddenbehindlogseventsforfailuresymptomsdetection.
TheyuseaprobabilisticmodelingusingamixtureofHiddenMarkovModels(HMM)torepresentdifferenttimewindows(i.
e.
,sessions)oflogsevent.
TheyproposeanewmethodforthelearningoftheHMMmixtureworkingonline.
Automatictechniquesbasedonmachinelearningorstatis-ticsalgorithmshavebeenwidelyusedforthismatter,asin[6]wheretheauthorsproposeanewapproachfordiskfailureprediction.
Moreprecisely,theyanalyzebymeansofaSupportVectorMachine(SVM)model,sequencesofsyslogeventsbasedonsyslogtagnumberssequencesorkeystringsinevents.
In[22],theauthorproposesanewalgorithmfortheclusteringoflogeventsandimplementsatoolbasedonitnamedSLCT.
Loglesparsingisexploitedin[24].
Theparsinguseslogpatternsidentiedfromastaticanalysisofsourcecode.
Then,twotypesoffeaturesarecomputedfromtheentireavailablelogles,andtheyarefedtothePCA-basedanomalydetectionalgorithmforanofinedetection.
Alogextractorforanomalydetectionisstudiedin[12].
TheextractoruseslogclusteringbasedontheLevenshteineditingdistancetoevaluatethesimilaritiesamongstlogeventsstrings(i.
e.
,twostringsareclosetogetherifthereisaminimalnumberofactionstochangetherststringintotheother).
Templatesarethenextractedfromlogclusters.
Finally,asequenceoflogeventsmatchingpatternsiscreatedandfeedtoamachinelearningalgorithm.
TheNaiveBayes,andRecurrentNeuralNetworksareevaluated.
VII.
CONCLUSIONSANDFUTUREWORKInthispaper,wetackledtheproblemofanomalydetectionbymininglogsproducedbyrunningsystems.
Differentlytopreviousstudies,wedevelopalinguisticapproachbyconsideringlogsasregularplaintextdocuments.
ThisenablestoexploitrecentNLPtechniquestoextractinformationfromthegrammaticalstructureandcontextoflogevents.
Loglesarerepresentedasasetoffeaturesthatcanbeprocessedbystandardmachinelearningalgorithms.
Assuchthisapproachshiftstheburdenoflogpreprocessingtowardthecollectionofrepresentativedatasets.
Itisagoodtradewhendataismassivelyavailablelikeinrecentdistributedsystems.
OurexperimentalcampaignsondifferentcomponentsofaVNFrelyonfaultinjectiontosynthetizeanomalousbehaviorsandcollectrelevantdatasetsondemand.
Wemoreparticularlyfocusonthecaseofstressdetectionandshowthatstrongpredictors(≈90%accuracy)areeasilytrainedwithnohumaninterventionintheloop.
Eventhoughwefocusonstress9detectioninthiswork,ourapproachisttedforcomputingsystemsadministratorsfortheonlinedetectionofanytypeofanomaly.
Asforfuturework,weplantoexploreunsupervisedclas-siersthatwouldnotrestrainourapproachscopetolabelledtrainingdataandmostlyknownanomalies.
Sysloglesareusedinthisstudy,howeverweplantoinquireaboutwhattypeoflogles(e.
g.
,dmesg,applicationlogs.
.
.
)enhanceorweakentheefciencyofourapproach.
Also,weplantoextendourstudytomorepreciseonlineeventtroubleshootingwhilecombiningthisdetectionapproachwithourpreviousworkoncounter-baseddetection[19].
REFERENCES[1]A.
P.
Bradley,"Theuseoftheareaundertheroccurveintheevaluationofmachinelearningalgorithms,"Patternrecognition,vol.
30,no.
7,pp.
1145–1159,1997.
[2]L.
Cao,P.
Sharma,S.
Fahmy,andV.
Saxena,"Nfv-vital:Aframeworkforcharacterizingtheperformanceofvirtualnetworkfunctions,"inNetworkFunctionVirtualizationandSoftwareDenedNetwork(NFV-SDN),2015IEEEConferenceon,Nov2015,pp.
93–99.
[3]M.
Cinque,D.
Cotroneo,R.
D.
Corte,andA.
Pecchia,"Characterizingdirectmonitoringtechniquesinsoftwaresystems,"IEEETransactionsonReliability,vol.
65,no.
4,pp.
1665–1681,Dec2016.
[4]M.
Cinque,D.
Cotroneo,andA.
Pecchia,"Eventlogsfortheanalysisofsoftwarefailures:Arule-basedapproach,"IEEETransactionsonSoftwareEngineering,vol.
39,no.
6,pp.
806–821,June2013.
[5]M.
Farshchi,J.
G.
Schneider,I.
Weber,andJ.
Grundy,"Experiencereport:Anomalydetectionofcloudapplicationoperationsusinglogandcloudmetriccorrelationanalysis,"inSoftwareReliabilityEngineering(ISSRE),2015IEEE26thInternationalSymposiumon,Nov2015,pp.
24–34.
[6]R.
W.
FeatherstunandE.
W.
Fulp,"Usingsyslogmessagesequencesforpredictingdiskfailures,"inProceedingsofthe24thInternationalConferenceonLargeInstallationSystemAdministration,ser.
LISA'10.
Berkeley,CA,USA:USENIXAssociation,2010,pp.
1–10.
[7]R.
Gerhards,"TheSyslogProtocol,"RFCEditor,RFC5424,March2009.
[8]S.
He,J.
Zhu,P.
He,andM.
R.
Lyu,"Experiencereport:Systemloganalysisforanomalydetection,"in2016IEEE27thInternationalSymposiumonSoftwareReliabilityEngineering(ISSRE),Oct2016,pp.
207–218.
[9]O.
Iegorov,V.
Leroy,A.
Termier,J.
F.
Mehaut,andM.
Santana,"Dataminingapproachtotemporaldebuggingofembeddedstreamingapplications,"in2015InternationalConferenceonEmbeddedSoftware(EMSOFT),Oct2015,pp.
167–176.
[10]R.
Kanagasabai,A.
Veeramani,H.
Shangfeng,K.
Sangaralingam,andG.
Manai,"Classicationofmassivemobileweblogurlsforcustomerprolinganalytics,"in2016IEEEInternationalConferenceonBigData(BigData),Dec2016,pp.
1609–1614.
[11]R.
P.
Khandpur,T.
Ji,S.
Jan,G.
Wang,C.
-T.
Lu,andN.
Ramakrishnan,"Crowdsourcingcybersecurity:Cyberattackdetectionusingsocialmedia,"arXivpreprintarXiv:1702.
07745,2017.
[12]C.
Liu,"Dataanalysisofminimally-structuredheterogeneouslogs:Anexperimentalstudyoflogtemplateextractionandanomalydetectionbasedonrecurrentneuralnetworkandnaivebayes.
"Master'sthesis,KTH,SchoolofComputerScienceandCommunication(CSC),2016.
[13]E.
Maldonado,E.
Shihab,andN.
Tsantalis,"Usingnaturallanguageprocessingtoautomaticallydetectself-admittedtechnicaldebt,"IEEETransactionsonSoftwareEngineering,vol.
PP,no.
99,pp.
1–1,2017.
[14]A.
MarcusandJ.
I.
Maletic,"Identicationofhigh-levelconceptclonesinsourcecode,"inProceedings16thAnnualInternationalConferenceonAutomatedSoftwareEngineering(ASE2001),Nov2001,pp.
107–114.
[15]A.
Marcus,A.
Sergeyev,V.
Rajlich,andJ.
I.
Maletic,"Aninformationretrievalapproachtoconceptlocationinsourcecode,"in11thWorkingConferenceonReverseEngineering,Nov2004,pp.
214–223.
[16]T.
Mikolov,I.
Sutskever,K.
Chen,G.
S.
Corrado,andJ.
Dean,"Distributedrepresentationsofwordsandphrasesandtheircomposi-tionality,"inAdvancesinneuralinformationprocessingsystems,2013,pp.
3111–3119.
[17]A.
Oliner,"Whatsupercomputerssay:Astudyofvesystemlogs,"inProceedingsofDSN2007,2007.
[18]K.
RieckandP.
Laskov,"Detectingunknownnetworkattacksusinglanguagemodels,"inProceedingsoftheThirdInternationalConferenceonDetectionofIntrusionsandMalware&VulnerabilityAssessment,ser.
DIMVA'06.
Berlin,Heidelberg:Springer-Verlag,2006,pp.
74–90.
[19]C.
Sauvanaud,K.
Lazri,M.
Kaaniche,andK.
Kanoun,"Anomalydetectionandrootcauselocalizationinvirtualnetworkfunctions,"in27thIEEEInternationalSymposiumonSoftwareReliabilityEngineering,ISSRE2016,Ottawa,ON,Canada,October23-27,2016,2016,pp.
196–206.
[20]T.
Savage,B.
Dit,M.
Gethers,andD.
Poshyvanyk,"Topicxp:Exploringtopicsinsourcecodeusinglatentdirichletallocation,"in2010IEEEInternationalConferenceonSoftwareMaintenance,Sept2010,pp.
1–6.
[21]J.
Tan,X.
Pan,S.
Kavulya,R.
Gandhi,andP.
Narasimhan,"Salsa:Analyzinglogsasstatemachines,"inProceedingsoftheFirstUSENIXConferenceonAnalysisofSystemLogs,ser.
WASL'08.
Berkeley,CA,USA:USENIXAssociation,2008,pp.
6–6.
[22]R.
Vaarandi,"Adataclusteringalgorithmforminingpatternsfromeventlogs,"inProceedingsofthe3rdIEEEWorkshoponIPOperationsManagement(IPOM2003)(IEEECat.
No.
03EX764),Oct2003,pp.
119–126.
[23]Y.
Watanabe,H.
Otsuka,M.
Sonoda,S.
Kikuchi,andY.
Matsumoto,"Onlinefailurepredictioninclouddatacentersbyreal-timemessagepatternlearning,"inCloudComputingTechnologyandScience(Cloud-Com),2012IEEE4thInternationalConferenceon,Dec2012,pp.
504–511.
[24]W.
Xu,L.
Huang,A.
Fox,D.
Patterson,andM.
I.
Jordan,"Detectinglarge-scalesystemproblemsbyminingconsolelogs,"inProceedingsoftheACMSIGOPS22NdSymposiumonOperatingSystemsPrinciples,ser.
SOSP'09.
NewYork,NY,USA:ACM,2009,pp.
117–132.
[25]K.
YamanishiandY.
Maruyama,"Dynamicsyslogminingfornetworkfailuremonitoring,"inProceedingsoftheEleventhACMSIGKDDInternationalConferenceonKnowledgeDiscoveryinDataMining,ser.
KDD'05.
NewYork,NY,USA:ACM,2005,pp.
499–508.
[26]J.
Yi,T.
Nasukawa,R.
Bunescu,andW.
Niblack,"Sentimentanalyzer:extractingsentimentsaboutagiventopicusingnaturallanguagepro-cessingtechniques,"inThirdIEEEInternationalConferenceonDataMining,Nov2003,pp.
427–434.
10

傲游主机38.4元起,韩国CN2/荷兰VPS全场8折vps香港高防

傲游主机怎么样?傲游主机是一家成立于2010年的老牌国外VPS服务商,在澳大利亚及美国均注册公司,是由在澳洲留学的害羞哥、主机论坛知名版主组长等大佬创建,拥有多家海外直连线路机房资源,提供基于VPS主机和独立服务器租用等,其中VPS基于KVM或者XEN架构,可选机房包括中国香港、美国洛杉矶、韩国、日本、德国、荷兰等,均为CN2或者国内直连优秀线路。傲游主机提供8折优惠码:haixiuge,适用于全...

hostkey俄罗斯、荷兰GPU显卡服务器/免费Windows Server

Hostkey.com成立于2007年的荷兰公司,主要运营服务器出租与托管,其次是VPS、域名、域名证书,各种软件授权等。hostkey当前运作荷兰阿姆斯特丹、俄罗斯莫斯科、美国纽约等数据中心。支持Paypal,信用卡,Webmoney,以及支付宝等付款方式。禁止VPN,代理,Tor,网络诈骗,儿童色情,Spam,网络扫描,俄罗斯色情,俄罗斯电影,俄罗斯MP3,俄罗斯Trackers,以及俄罗斯法...

零途云:香港站群云服务器16IP220元/月,云服务器低至39元/月

零途云(Lingtuyun.com)新上了香港站群云服务器 – CN2精品线路,香港多ip站群云服务器16IP/5M带宽,4H4G仅220元/月,还有美国200g高防云服务器低至39元/月起。零途云是一家香港公司,主要产品香港cn2 gia线路、美国Cera线路云主机,美国CERA高防服务器,日本CN2直连服务器;同时提供香港多ip站群云服务器。即日起,购买香港/美国/日本云服务器享受9折优惠,新...

httperror503为你推荐
openeuleropen opening opens opened有什么区别网红名字被抢注谁知道这个网红叫什么名字?求帮助!云计算什么叫做“云计算”?老虎数码相机里的传感器CCD和CMO是什么意思?javbibitreebibi是什么牌子的www.22zizi.com乐乐电影天堂 http://www.leleooo.com 这个网站怎么样?lcoc.top日本Ni-TOP是什么意思?www.gogo.com祺笑化瘀祛斑胶囊效果。dpscycle寻求LR 高输出宏bihaiyinsha碧海银沙中国十大网页?
美国vps 韩国vps俄罗斯美女 免费静态空间 asp免费空间申请 可外链相册 河南移动网 电信主机 创建邮箱 php服务器 主机返佣 大化网 服务器防御 sonya 美国达拉斯 tracert byebyelove 国内云主机 台式机主机 主机响 sockscap下载 更多