storedhttperror503

httperror503 时间:2021-04-10 阅读:()

HALId:hal-01576291https://hal.
laas.
fr/hal-01576291Submittedon22Aug2017HALisamulti-disciplinaryopenaccessarchiveforthedepositanddisseminationofsci-entificresearchdocuments,whethertheyarepub-lishedornot.
ThedocumentsmaycomefromteachingandresearchinstitutionsinFranceorabroad,orfrompublicorprivateresearchcenters.
L'archiveouvertepluridisciplinaireHAL,estdestinéeaudéptetàladiffusiondedocumentsscientifiquesdeniveaurecherche,publiésounon,émanantdesétablissementsd'enseignementetderecherchefranaisouétrangers,deslaboratoirespublicsouprivés.
ExperienceReport:LogMiningusingNaturalLanguageProcessingandApplicationtoAnomalyDetectionChristopheBertero,MatthieuRoy,CarlaSauvanaud,GillesTrédanTocitethisversion:ChristopheBertero,MatthieuRoy,CarlaSauvanaud,GillesTrédan.
ExperienceReport:LogMin-ingusingNaturalLanguageProcessingandApplicationtoAnomalyDetection.
28thInternationalSymposiumonSoftwareReliabilityEngineering(ISSRE2017),Oct2017,Toulouse,France.
10p.
hal-01576291ExperienceReport:LogMiningusingNaturalLanguageProcessingandApplicationtoAnomalyDetectionChristopheBertero,MatthieuRoy,CarlaSauvanaudandGillesTredanLAAS-CNRS,UniversitedeToulouse,CNRS,INSA,Toulouse,FranceEmail:rstname.
name@laas.
frAbstract—Eventloggingisakeysourceofinformationonasystemstate.
Readinglogsprovidesinsightsonitsactivity,assessitscorrectstateandallowstodiagnoseproblems.
However,readingdoesnotscale:withthenumberofmachinesincreasinglyrising,andthecomplexicationofsystems,thetaskofauditingsystems'healthbasedonloglesisbecomingoverwhelmingforsystemadministrators.
Thisobservationledtomanyproposalsautomatingtheprocessingoflogs.
However,mostoftheseproposalstillrequiresomehumanintervention,forinstancebytagginglogs,parsingthesourcelesgeneratingthelogs,etc.
Inthiswork,wetargetminimalhumaninterventionforlogleprocessingandproposeanewapproachthatconsiderslogsasregulartext(asopposedtorelatedworksthatseektoexploitatbestthelittlestructureimposedbylogformatting).
Thisapproachallowstoleveragemoderntechniquesfromnaturallanguageprocessing.
Morespecically,werstapplyawordembeddingtechniquebasedonGoogle'sword2vecalgorithm:logles'wordsaremappedtoahighdimensionalmetricspace,thatwethenexploitasafeaturespaceusingstandardclassiers.
Theresultingpipelineisverygeneric,computationallyefcient,andrequiresverylittleintervention.
Wevalidateourapproachbyseekingstresspatternsonanexperimentalplatform.
Resultsshowastrongpredictiveperfor-mance(≈90%accuracy)usingthreeout-of-the-boxclassiers.
Keywords—Anomalydetection,logle,NLP,word2vec,machinelearning,VNFI.
INTRODUCTIONGatheringfeedbackaboutcomputersystemsstatesisadauntingtask.
Tothisaim,itisacommonpracticetohaveprogramsreportontheirinternalstate,forinstancethroughjournalsandlogles,thatcanbeanalyzedbysystemadmin-istrators.
However,assystemstendtogrowinsize,thistraditionalloggingmethoddoesnotscalewell.
Indeed,scatteredsoftwarecomponentsandapplicationsproduceheterogeneouslogles.
Forinstance,loggingmethodssuchasthecommonsyslog,areextremellyexibleintheirsyntax(seetheRFC[7]).
Also,differentloglesmaygatherinformationwithdistincttypesofinformation.
Forinstancerule-basedlogging[4]tracesthestartandtheterminationofapplicationsfunctions,whilesyslogeventloggingcollectssystemactivity.
Eachofthemtendstodescribeapartialviewofthewholesystem.
Inparticular,[3]showsthateventlogging,assertionchecking,andrule-basedloggingareorthogonalsourcesforsystemmonitoring.
Moreover,eachpartialviewofthesystem,evenwhenusingthesameloggingmethod(orprotocol),maynotusethesamekeywordstoexpressnormalorerroneousbehaviors.
Thisplethoraofavailableloglesburdenslogsummarization.
Asaresult,sourcecodeanalyzesandcommunicationswithapplicationdeveloppersarenecessaryfortroubleshootingorauditingsystems[17].
Notwithstanding,suchnonautomaticprocessesarenotacceptableinlargecomputingsystembe-causetroubleshootingforrecongurationmustbehandledon-line.
Toaddressthesechallenges,alargenumberofstudiesproposedapproachestoautomateandscaleuploganalysis([5],[8],[17],[23],[24]).
Mostapproachesrequirehowevercumbersomelogprocessing,forinstancebymanuallytaggingimportantevents,orbyparsingthesourcecodefunctionstoassessthexedandvariablepartsoflogevents.
Thecontributionofthispaperistoproposeanewapproachdepartingfromthisresearchlineandconsideringlogminingasanaturallanguageprocessingtask.
Thisapproachhastwomainconsequences,i)weloseapartofthecontextbyunder-exploitingthespecicitiesofeachstructuredsentenceaccordingtoapredenedpatternand,mostimportantly,ii)ourapproachisagnostictotheformatofthelogles.
Thus,whileconsideringsetsofloglesaslanguages,wegaintheabilitytousemodernNaturalLanguageProcessing(NLP)methods.
Inotherwords,wetradeaccuracyforvolume,preferringtheabilitytoinaccuratelyprocesslargevolumesofloglesinsteadofaccuratelyprocessingsometediouslypreprocessedlogs.
Assuch,thequestionweexploreinthisworkis:"Whatcanoff-the-shelfNaturalLanguageProcessingalgorithmsbringtologmining".
Wemoreparticularlyfocusonsuchquestionsas"ismysysteminstateAorstateB".
Theproposedapproachisrathersimpleandbrutal.
InsteadofpreciselytrackingtheeventsrelatedtoatransitionfromAtoB,wecollectlargeamountsoflogeventsrelatedtosystemsinstatesAandB.
Wethentransformthelogsintomultidimensionalvectorsoffeatures(usingNLPalgorithms)andtrainaclassierontheresultingdata.
Theresultingpipelineisarelativelystandardbigdataapplication,wherewetargettherealizationofclassiersprovidingaccurateinformationaboutthetargetsystemstate.
Webelievethisapproachisspecicallyinterestingduetotheexpensiveexpertiseusuallyrequiredtopreprocessthelogs.
Weshowinthispaper,throughaseriesofexperiments,thatwithminimumsetupeffortandstandardtools,itispossible1toautomaticallyextractrelevantinformationaboutasystemstate.
Wemoreparticularlyusetheword2vecalgorithmofGoogle[16]forlogmining,whichisanalgorithmforlearninghigh-qualityvectorrepresentationsofwords.
ItnotablyhasbeenusedforNLPinsomepreviousworksbutnotfortheanalysisoflogles.
Throughexperiments,weillustratethepotentialbenetsofourapproach,byprovidinganswerstosystemadministrators'questionswhendataismassivelyavailable.
Asanillustrativeexample,wefocusonthedetectionofstressrelatedanomaliesoverabroadrangeofcongurations.
Morespecically,wedeployedonavirtualcloudenvironmentavirtualnetworkfunctionrunningapanelofthreeapplications,namelyaproxy,arouter,andadatabase,towhichweappliedalargevarietyofstresspatternsbymeansoffaultinjection(highCPUandmemoryconsumption,highnumberofdiskaccesses,increaseofnetworklatencyandnetworkpacketlosses).
WeshowthatbysimplyanalyzingtheresultsofNLPprocessedlogles,itispossibletodetectstressedbehaviorswith≈90%accuracy.
Inthefollowing,werstpresentinSectionIItherationaleofourlogminingapproach,anddescribeouruseoffaultinjec-tionforvalidationpurposesinSectionII.
Then,inSectionIIIwedeneourcasestudy,theexperimentalplatformonwhichwedeployedit,andtheimplementationofourapprocahonthisplatform.
SectionIVpresentssomepromisingexperimentalresults.
InSectionVwediscussourresults,andanalyzetheirthreatstovalidity.
SectionVIdescribesrelatedworksregardingNLPandlogminingfordetectionpurposes.
Finally,weconcludethispaperinsectionVII.
II.
APPROACHA.
GeneralapproachoverviewTheapproachproposedasthecontributionofthispaperispresentedinFigure1.
Considerasetofloglesrelatedtoagivensystem.
Eachoftheseloglescontainsavaryingamountoflines,eachlineconsistingofoneapplicationofthesystemreportinganevent.
Eachlogevent(line)isalistofwords.
Asweconsiderloglesasanaturallanguage,weanalyzetheseloglesusingNaturalLanguageProcessingtools.
Assuch,werstremoveallnonalphanumericcharacters(asrequiredbyword2vec)andreplacethembyspaces,namelysed's/[a-zA-Z0-9]//g'.
Secondly,weuseword2vecfrom[16],apopularembed-dingtoolemployedbyGoogletoprocessnaturallanguage.
Inanutshell,word2vecproducesamappingfromthesetofwordsofatextcorpus(asetofloglesinourcase)toaneuclideanspacesayT.
Inthecaseofa20-dimensionsspaceTR20.
Thus,eachwordofaneventgetsassignedcoordinatesinavectorspace.
Theenjoyablepropertyofword2vecisitsabilitytoproducemeaningfulembeddings,wheresimilarwordsendupclose,whereaswordsthatarenotrelatedtoeachotherendupfarawayintheembeddingspace.
OnceeachwordhasbeenmappedtotheembeddingspaceT,wedenethepositionofalogeventasthebarycenterofitswords.
Followingasimilarscheme,oncealllogeventsfromagivenloglehavebeenmappedtopoints,wedeneNormalSystemStressedSystemInjectionCharacterlteringCharacterlteringword2vecppBinaryClassierfX|AX|A{p(x)}x∈X|AAtrain{p(x)}x∈X|AAtrainUnknownSystemCharacterlteringpf≥1/2xp(x)f(p(x))AAFig.
1:Generalapproachoverview.
Left:Training.
Right:Inference.
thepositionofthislogleasthebarycenterofthepositionofitslogevents.
Hence,attheendoftheprocess,eachlogleismappedtoasinglepointinT.
Thisdrasticcompressionhasonemajorinterest:itproducesacompactandusefulinputtotraditionalclassiers.
AssumingXrepresentsthesetofallpossiblelogles,suchmappingcanberepresentedasafunction:p:X→Tx→p(x).
Now,assumethatonehasaccesstoalargesetXofobservations(logles)onthesystem,correspondingtotwostatesthatwewouldliketocharacterize,sayAandA.
LetX|AandX|Abethecorrespondingloglessets.
Bytheabovedescribedprocess,everyobservationx∈X=X|A∪X|Acanbeassignedtoacoordinatep(x)∈T.
Inathirdstep,wetrainaclassier,namedfhereafter,onp(x|x∈X|A).
Atypicalsuchclassierfisanapproximationoftheidealseparationfunction:f:T→[0,1]p(y)→P(A|y).
Thetrainingofaclassierrequiresanavailablesetoflabeleddata.
Theselabelsmaybeforinstance:normalandanomalous.
Incasesthatlabeleddataisnotavailable,onecangeneratethembymonitoringasystemwhileexperiencingnormalandanomalousbehaviors.
Sinceanomalousbehaviorsareundesiredeventsand,assuch,usuallynotfrequentin2recentsystems,theyneedtobesynthesizedusingtechniquessuchasfaultinjection.
Inthispaper,wegeneratesetsofnormalandanomalousbehaviorsinacontrolledmannerusingfaultinjectiontechniquesforallanomalousbehaviors,asrepresentedinFigure1.
Oncethetrainingisnished,theresultingclassierisusedtoprovide,givenanynewproductionloglex,aninferredstate(anomalousornot)f(p(x))thatweclaimisagoodapproximationoftheactualstressstatusofthesystem,i.
e.
,P(A|x)f(p(x)).
Itisactuallyexpressedasaprobabilityandweneedtosetalimitoverwhichasystemiscategorizedasstressed,say1/2asinFigure1.
Inthecasexcontainsunencounteredwords,thosearesimplyignored.
III.
CASESTUDYANDEXPERIMENTALPLATFORMA.
CasestudyWeherebypresentourcasestudyonvirtualnetworkfunction(VNF)calledClearwater1aswellastheworkloadgeneratorusedduringourexperimentstosimulateactualusersofthistargetsystem.
Thiscasestudywasusedinourpreivouswork[19]foranomalydetectionbasedonmonitoringdata.
Itconstitutesameaningfulcasestudyinthatitdeploysseveralcomponentsofdifferentroles(e.
g.
,router,proxyanddatabase).
Whileweapplyourapproachwithnospeciccongurationnoraprioriknowledgeoftheimplementationsforeachcomponent,weconsiderthatourapproachhasgoodchancestogeneralizetovariouscasestudies.
1)Description:TheserviceisanopensourceVNFnamedClearwater.
ItprovidesvoiceandvideocallsbasedontheSessionInitiationProtocol(SIP),andmessagingapplications.
ClearwaterencompassesseveralsoftwarecomponentsandweparticularlyfocusourworkonBono,Sprout,HomesteadshowninFigure2.
BonoistheSIPproxyimplementingtheProxy-Call/SessionControlFunctions.
Ithandlesusers'requestsandroutesthemtoSprout.
ItalsoperformsNetworkAddressTranslationtraversalmechanisms.
SproutistheIMSSIProuter,receivingrequestsfromBonoandroutingthemtotheadequateendpoints.
Itimple-mentssomeServing-CSCFandInterrogating-CSCFfunctionsandgetstherequiredusersprolesandauthenticationdatafromHomestead.
Sproutcanalsocallapplicationserversandactuallycontainsitselfamultimediatelephony(MMTel)applicationserver,whosedataisstoredinanotherClearwatercomponentnotpresentedinthiswork(whencallsarecong-uredtouseitsservices).
HomesteadisaHTTPRESTfulserver.
IteitherstoresHomeSubscriberServer(HSS)datainaCassandradatabaseandmastersdata(i.
e.
,informationaboutsubscribedservicesandlocations),orpullsdatafromanotherIMScompliantHSS.
Bono,Sprout,andHomesteadworktogethertocontrolthesessionsinitiatedbyusersandhandletheentireCSCF.
Ourcasestudyencompassesthesethreecomponents,eachonebeingdeployedonadedicatedvirtualmachine(VM)ofourvirtualizedexperimentalplatform(seeSectionIII-B).
1http://www.
projectclearwater.
org/about-clearwater/Fig.
2:Clearwaterdeployment.
2)Workload:IMSworkloadscanbeemulatedbymeansoftheSIPpbenchmark2.
ThebenchmarkcontainsaworkloadthatcanbeconguredwithanumberofcallspersecondtobesenttotheIMS,andascenario.
Theexecutionofascenariocorrespondstoacall.
AscenarioisdescribedintermsofSIPtransactionsinXML.
ASIPtransactioncorrespondstoaSIPmessagetobesentandanexpectedSIPresponsemessage.
Acallfailswhenatransactionfails.
Atransactionmayfailfortworeasons:eitheramessageisnotreceivedwithinaxedtimewindow(i.
e.
,thetimeout),oranunexpectedmessageisreceived.
UnexpectedmessagesareidentiedbytheHTTPerrorcodes500(InternalServerError),503(ServiceUnavailable)and403(Forbidden).
ThescenariorunforourexperimentationssimulatesastandardcallbetweentwousersandencompassesthestandardSIPREGISTER,INVITE,UPDATE,andBYEmessages.
Thescenarioisavailableonline3.
Timeoutsaresetto10secasinsimilarexperimentalcampaigns[2].
3)Faultinjectionfortrainingandvalidation:Faultinjec-tionisusedinourstudyforcollectingloglesrepresentingbothnormalbehaviorsandstressedbehaviorsofatargetsystem,inordertoprovidethemasinputsforthetrainingandvalidationoftheclassiers.
Weemulateerrorsbymeansofinjectiontoolsthatimplementsystemsstressing.
Thesetoolswereusedinourpreviouswork[19].
Wecalltheorchestrationofseveralexecutionsofthetargetsysteminpresenceornotoferroremulationsanexperimentalcampaign.
Inthefollowingwepresenttheerrorsthatourinjectiontoolsemulateanddescribetheexecutionofanexperimentalcampaign.
Erroremulation.
Weemulatethefollowingvetypesoferrors,whichwewillbereferringtoasCPU,memory,disk,networkpacketloss,andnetworklatencyerrorsrespectively:(1)highCPUconsumption,(2)misuseofmemory,i.
e.
,increaseofmemoryconsumption,(3)abnormalnumberofdiskaccesses,i.
e.
,largeincreaseofdiskI/Oaccessesandsynchronizations,(4)networkpacketloss,(5)networklatencyincrease.
CPUerrors.
AbnormalCPUconsumptionsmayarisefromprogramsencounteringimpossibleterminationconditionsleadingtoinniteloops,busywaitsordeadlocksofcompetingactions,whicharecommonissuesinmultiprocessinganddistributedsystems.
2http://sipp.
sourceforge.
net/index.
html3https://homepages.
laas.
fr/csauvana/sipp\scenario/issre2016\sipp\scenario.
xml3Memoryerrors.
Abnormalmemoryusagesarecommonandhappenwhenallocatedchunksofmemoryarenotfreedaftertheiruse.
Accumulationsofunfreedmemorymayleadtomemoryshortageandsystemfailures.
Diskerrors.
Ahighnumberofdiskaccesses,oranincreaseofdiskaccessesoverashortperiodoftime,emulatediskswhoseaccessesoftenfailandleadtoanincreaseindiskaccessretries.
Itmayalsoresultfromaprogramstuckinaninniteloopofdatawriting.
Networkpacketlossandlatencyerrors.
Sucherrorsmayarisefromnetworkinterfacesofthetargetsystemorfromthenetworkinterconnectionofthevirtualizedinfrastructurehostingthesystem.
Weemulatepacketlossesandlatencyincreases.
Packetlossesmayarisefromundersizedbuffers,wrongroutingpoliciesorevenrewallmiscongurations.
Latencyerrorsmayoriginatefromqueuingorprocessingdelaysofpacketsongatewaysoratthetargetsystemlevel.
Fromthedenitionoftheseerrortypes,animportantexper-imentalparameteristheinjectionintensity,i.
e.
,theexpectedimpactmagnitudeofthedifferentinjectionsfromuserspointsofview.
Inourstudy,wepresentresultsforthedetectionoferrorswithhighintensities.
Inotherterms,experimentalcampaignsperforminjectionsthatstronglyaffectthetargetsystemcapabilitytoanswerusersrequests.
TableIpresentstheintensitylevelsthatwecalibratedforourClearwatercasestudy.
ErrortypeUnitIntensitylevelCPU%90Memory%97Disk#process50Networkpacketloss%8.
0Networklatencyms.
80TABLEI:Injectionintensitylevels.
Regardingthememory,diskandCPUinjections,thein-tensityvaluesoferrorsareconstrainedbythecapacityoftheoperatingsystems(OSs)onwhicharedeployedtheapplica-tionsofourcasestudy.
Inotherwords,theintensitylevelscorrespondtothemaximumresourceconsumptionallowedbytheOSbeforekillingtheexecutionoftheinjectionagent.
Consideringtheremainingtypesofinjections,thecorre-spondingintensitylevelsissetsoastoleadtoaround99%ofunsuccessfullyansweredrequestswhenappliedinatleastoneVM.
Theunsuccessfullyansweredrequestsratecanbeknownfromtheworkloadlogles.
Experimentalcampaigns.
Theexperimentalcampaignisconductedusingacustomizablemainscriptthateitherlaunchesnormaloranomalousexecutionsofthetargetsystem.
Theexperimentalcampaigneitherlaunchesnormalorstressedexecutionsofthetargetsystem.
Anexecution,beitnormaloranomalous,producesonelogleforeachVMofourtargetsystem.
Wedeneacampaigntorunasmanynormalexecutionsasthenumberofstressedexecutions.
Theselectednumberofstressedexecutionsisconguredtorepresentallcombinationsofdifferentinjections(i.
e.
,theinjectionofeacherrortype,ineachVM).
Whenrunningananomalousexecution,theconguredinjectionstartsaftertsecondsfromthetargetsystemboottime,wheretisrandomlyselectedinapreconguredinterval.
Thisprocessaddsrandomizationtothesetofcollectedlogles,aprerequisiteforthegeneralizationofourresults.
Additionally,consecutiveexecutionsofacampaignareseparatedbytherebootofallVMsofthetargetsystemandtheworkloadinordertobesuretorestartfromacleanandunpollutedstate.
Asaresult,theparametersofanexperimentalcampaignareasfollows:i)targetVMslistedinlvm,ii)errortypeslistedinltype,iii)aninjectiondurationsetininjectduration,iv)acleanrundurationsetincleanrunduration,v)anintervalofvaluesdeningafterwhichtimeaninjectioncanstartafterarebootsetininterval.
Moreover,acampaignisexecutedasfollows.
EacherrortypeisinjectedinarstVM,theninasecondVM,etc.
withrebootsofthetargetsystemandtheworkloadbeforeeachnewexecution.
Thestressedexecutionsareorchestratedasexplainedinalgorithm1.
Thenthesamenumberofnormalexecutionsareperformed.
Algorithm1OrchestrationofstressedexecutionsofthetargetsysteminanexperimentalcampaignInput:lvm,ltype,injectduration,interval,cleanrundurationstartworkload()CleanrunforvminlvmdoRunswithinjectionsforerrinltypedostartworkload()randtime=randomint(interval)sleep(randtime)inject=Injection(err,injectduration)injectinvm(vm,inject)stopworkload()rebootvms()endforendforB.
ExperimentalplatformInthefollowing,werstpresenttheplatformonwhichwerunexperiments.
Thenwedescribetheimplementationrequiredtocarryoutourexperimentsnamelytheinjectionagents,experimentalcampaignparameters,andthecollectionoflogles.
1)Platform:Wedeployedourtargetsystemonavirtual-izedplatform.
TheplatformiscomposedofaclusterincludingtwohypervisorsandseveralVMs.
FourVMsaredeployedforourtargetsystem:oneVMrunstheworkloadandtheotherthreerespectivelyhostthecomponentsBono,SproutandHomesteadofClearwater.
TheworkloadVMalsohasthemeanstocontroltheexperimentalcampaignlaunch.
TwootherVMsarerespectivelyusedtostoreloglescollectedfromthetargetsystemandtoanalyzethestoredlogles.
ThedeploymentoftheVMsisillustratedinFigure3.
4Fig.
3:Virtualizedplatform.
TheplatformisaVMwarevSphere5.
1privatecloudcomposedof2serversDellInc.
PowerEdgeR620withIntelXeonCPUE5-26602.
20GHzand64GBmemory.
EachserverhasaVMFSstorage.
EachVMdeployedforthetargetsystemimplementationhas2CPUs,a10GBmemory,a10GBdiskandrunstheUbuntuOS.
VMsareconnectedthrougha100Mbpsnetwork.
2)Faultinjection:InjectionsinthetargetsystemarecarriedoutbyinjectionagentsinstalledintheseVMs.
ThereisoneinjectionagentforeacherrortypeineachVMofatargetsystem.
AgentsarerunandstoppedthroughanSSHconnectionorchestratedbythecampaignmainscript.
TheyemulateerrorspresentedinSectionIII-A3bymeansofasoftwareimplementation.
CPUanddiskerrorsareemulatedusingthestresstesttoolstress-ng4.
CPUinjectionsrun2processes(thereare2coresineachVM)runningallthestressmethodslistedinthetooldocumentation.
Thepercentageofloadingissetaccordingtotheintensityleveloftheinjection.
Diskinjectionsstartseveralworkerswriting50Moand50workerscontinuouslycallingthesynccommand,withanionicelevelof0.
Thenumberofwritingworkersissetaccordingtotheintensityleveloftheinjection.
Memoryinjectionsarerunbymeansofapythonscriptreservingmemoryspacewhilecontinuouslycheckingwhethertheamountofmemoryspacereservedbythescriptcorre-spondstotheamountsetbytheintensityleveloftheinjection.
Finally,weusetheLinuxkerneltoolsiptablesandtcfortheinjectionofnetworklatenciesonthePOSROUTINGchain,andiptablesontheINPUTchainfortheinjectionofpacketlosses.
Allnetworkprotocolsaretargeted.
3)Experimentalcampaignsparameters:Anexperimentalcampaigncorrespondstotheexecutionofacustomizablemainscriptthatstartstheworkloadofourtargetsystem,andeithermakescleanrunofthistargetsystemormakesrunswhileperforminginjectionsinthetargetsystemVMs.
Theparametersoftheexperimentalcampaignswerunareasfollows.
Theinjectiondurationiscalibratedsoastoaffectseveralinstancesofworkloadexecutions(anexecutionlastslessthan1sec).
Wecalibratedtheinjectiondurationtobe10minlonginordertocollectaround5000linesoflogleforeachcleanrunandinjection.
Also,wecalibratedthecleanrundurationtobe30min.
Finally,wecalibratedthestartofinjectionstoberandomlyselectedintheintervalfrom1to10min.
ThisintervalallowstheVMstostabilizeafterareboot.
4http://kernel.
ubuntu.
com/cking/stress-ng/Apr1806:44:37cw-011restund[1368]:stunserverreadyApr1806:44:37cw-011bono[1284]:2005-Description:Applicationstarted.
@@Cause:Theapplicationisstarting.
@@Effect:Normal.
@@Action:None.
Apr1806:45:01cw-011CRON[1521]:(root)CMD(/usr/lib/sysstat/sadc11/var/log/sysstat/clearwater-sa'date+%d'>/dev/null2>&1)Fig.
4:Exampleofsyslogevents.
OurexperimentalcampaignparametersaresummarizedinTableII.
Campaignparameterslvm={Bono,Sprout,Homestead}ltype={CPU,memory,disk,latency,packetloss}injectionduration=10mincleanrunduration=10mininterval=[1:10]minTABLEII:Injectioncampaignparametersofthefourexperimentations.
4)Loglescollection:TheloglesthatweuseinthisstudyaregeneratedbytheLinux-basedUbuntuOSusingsyslog,thestandardtoolformessagelogging.
Eventsareloggedwithapredenedpatterncontaininginthatorderthedateoftheeventissue,thehostnameoftheequipmentdeliveringtheevent,theprocessdeliveringtheevent,aprioritylevel,theidoftheprocessdeliveringtheeventandnallythemessagecontainingfree-formattedinformation.
Forinstance,noperformancemetricsofthesystemarelogged.
AexampleofsyslogeventsisprovidedinFigure4.
Resultsofpreviousstudies[3]showthatsyslogeventloggingisthemoresuitablemethodtouseinthiscontext,althoughacombinationoftheseveralmethodsincreasesthefailurecoverage.
Thesyslogfacilityhastheadvantagetogatherseveralapplicationsevents.
Duringexperimentalcampaigns,loglesarecollectedbymeansofagents(theyarerepresentedbyorangesquaresinFigure3)andstoredinadatabaseforlateranalysis.
IV.
RESULTSInthissection,wequantitativelystudytheeffectivenessofthepresentedapproachbypresentingtheanalysisresultsover660logles.
Afterbrieyintroducingtheconsideredmetrics,wewilldetailtheobtainedresults.
Themainresearchquestionweseektoansweris:Usingonlysysloglesasinput,howaccuratelycanouralgorithmdistinguishStressedandnonStressedsystemsThesecondaryquestionsarei)howsensitivearetheresultstotheparametersusedtocalibratethemodelsofourapproachandii)whatistheabilityofourapproachtoissuequickdecisiononasystemstate5A.
MaterialsandMetricsUsingthetestbedpresentedinSectionIII-Bwegenerateasetof660loglesthatwillconstitutethebasisofourmodelstraining.
Exactlyhalfofthese(330)originatefromnormalunstressedsystemexecutions.
Theotherhalfcapturessystemswithinjectedfaults.
Moreprecisely,weran22replicationsforeachofthe5injectioncampaignsovereachofthe3targetVMsofourcasestudy,foratotalof(2235)=330stressedlogles.
Word2Vectraining:Toestablishtheword2vectrainingset,weusetheconcatenationofall660loglesfromwhichweremovedallnonalphanumericcharacters.
word2vec,originallydesignedforNLPtasks,canbetunedwithanumberofdifferentoptions.
Themostimportantparameteristheembeddingspacedimensiondim(T),itsim-pactisdetailedinSectionIV-B2.
Theotherparametersmostlyallowtosetupltersinordertooptimizethecomputation.
Wedeactivatedallofthemtokeepthemaximumamountofinformationavailabletotheclassier.
Finally,fromthetwomethodsproposedintheimplementationofword2vec,namelyskip-gramandcbow(deningwhetherthesourcecontextwordsshouldbepredictedfromtargetwordsortheopposite5),wechosecbowbecauseofitssimplicity,inordertoprovidean"as-simple-as-possible"solution.
Giventherelativelysmallsizeofourtextcorpus(com-paredtoalltheEnglishtextsavailableontheweb,namelyword2vec'soriginalusecase),andthewellknownefciencyoftheword2vecimplementation,theoverallcomputationistractableonastandardcomputer(seeSectionIV-B3).
Therefore,thephilosophybehindimplementationchoicesisthefollowing:keepitsimple,andkeepthemaximumamountofinformation.
Fromwordcoordinatestologlecoordinates:Theoutputofword2vecisalecontainingthecoordinatesofthe233kdistinctwordsofourtrainingcorpusinT.
TotransformloglesintocoordinatesinT,weexploredtwostandardstrategies:baryInthebarycenterapproach,werstcomputethepositionofeachlineofalogle,denedastheaveragepositionofallthewordsitcontains.
Then,thepositionoftheleisdenedastheaverageofallitsline:p(f)=def1/|f|l∈f1/|l|w∈lp(w).
tdfTermfrequency-inversedocumentfrequencyisastan-dardmetricofinformationretrieval.
Comparedtothebarycenterapproach,wordsareweightedbytheirfre-quencyinthedocument.
Thatis,afrequent(common)wordwillproportionallyhavelessweightthanararewordwhencomputingtheaveragepositionofalogle.
Wereliedonthescikit-learn6standardimplementationofthefunction.
Theoutputofthisstepisamatrixof660*dim(T)entriesdecoratedwiththeircorrespondingtargetlabels(stressed,unstressedsystem).
5Seeoneimplementationexplainationhttps://www.
tensorow.
org/tutorials/word2vec.
Lastreadon13/08/2017.
6http://scikit-learn.
org/Classiers:Binaryclassiersareamongstthemostcom-monandunderstoodclassiersinmachinelearning.
Were-strictedourstudytothreesimpleandstateoftheartap-proaches:NaiveBayes,RandomForestsandNeuralNetworks.
Wereliedonthefollowingscikit-learnlibraryimple-mentations:RandomForestClassier,MLPClassier,andGaussianNB.
Allthesealgorithmsbelongtotheclassofsuper-visedalgorithms.
Inotherwords,theyrequirelabeledtrainingdata,althoughwecouldhaveusedunsupervisedapproachessuchastheonestestedin[8],i.
e.
,PrincipalComponentsAnalysisandInvariantmining.
Again,thephilosophyofourapproachistorefrainfromnetuningthoseimplementationsandtoassesstheglobalstrategyasahole.
Wethereforeusedthedefaultparametersonallthesealgorithms.
ClassierAssessment:Toassesstheclassicationaccu-racy,weusedthestandard10-foldvalidationapproach.
Werstrandomlydividedthetrainingsetin10equalsizedchunks.
Eachpossiblegroupof9chunkswasusedtotrainourclassierwhiletheremainingchunkwasusedasatest.
Let{Xi}1≤i≤10beapartitioningofXinto10chunks.
LetXjbethetestedchunk,andletTj(resp.
Fj)bethesubsetofstressed(resp.
unstressed)logsofXj.
ThesetoftruepositivesTPjforXjisdenedas:TPj={x∈Xjs.
t.
fj(x)≥1/2∧x∈Tj}.
Logsthatbelongtostressedmachinesandtowhichtheclassierfj(trainedusing∪i=jXi)assignedaprobabilitygreaterthan1/2ofbeingstressedaretruepositivesforXj.
Similarly,thesetoffalsepositivesFPjforXj(logsbelongingtounstressedmachinesbutdetectedasmorelikelystressed)isdenedas:FPj={x∈Xjs.
t.
fj(x)≥1/2∧x∈Fj}.
Noticethatthetruenegativeandfalsenegativesetsaresymmetricallydened.
Togetacloserlookatfj,onecanuseReceiverOp-eratingCharacteristics(ROC).
Thatis,lets∈[0,1]bea"safetylevel"onewantstoapplytof-baseddecisions.
LetXsj={x∈Xj,fj(x)≥s}bethesubsetofXjcontainingonlythelogsdetectedasstressedwithprobabilityatleasts.
Foreachvalueofs,itisthuspossibletodeneatruepositiverateTPRs=|Xsj∩Tj|/|Tj|andafalsepositiverateFPRs=|Xsj∩Fj|/|Fj|.
Thegraphicalrepresentationoftheobtained{FPRs,TPRs}couplesprovidesaprecisevisualdescriptionoff'sperformance,asinFigure5thatwillbepresentedshortlyhereafter.
B.
ResultsanalysisInthefollowing,afterexploringthedetailedresultsob-tainedusingatypicaltrainedclassier,westudytheimpactoftheembeddinghostspacedimension.
Wethenstudytheruntimeoverheadofourapproach.
1)Accuracy:Figure5presentstheROCsobtainedonatypicalconguration.
Moreprecisely,inthissetup,weuseddim(T)=20andexploredvariousaggregation/classiercongurations.
Theresultsareverygood,withNeuralNetwork6Fig.
5:ReceiverOperatingCharacteristicof3classiers,fordim(T)=20.
ThisplotshowstheTruePositiveRateofeveryclassierasafunctionoftheFalsePositiveRateofthesameclassier.
andRandomForestexhibitingastrongclassicationaccuracy(>95%AUC).
Theaggregationtechnique(i.
e.
,basedontf-idforbarycenter)haslittleimpact.
NaiveBayesperformsconsiderablybetterthanrandom(77%and81%AUCfortf-idfandbarycenterresp.
),butisvisiblylessprecisethantheothertwoclassiers.
Theseverygoodresultsconrmthesoundnessoftheapproach.
Onecanhaveamoredetailedlookattheoriginofmisclas-sications.
TableIIIexhibitstheconfusionmatrixofNeuralNetwork(usingbarycenteranddim(T)=20).
Althougharound90%ofthetargetsgetcorrectlycategorized,onecanseethattheerrorsareslightlyleaningtowardsfalsepositives(thatis,anunstressedsystemiswrongfullycategorizedasstressed).
Althoughthisisnotthepurposeofthisstudy,itispossibletoexploitthisimbalanceforanoverallbetterclassi-cationaccuracy(forinstancebyraisinga1/2limitoverwhichasystemiscategorizedasstressed).
Thestresspatternsarenotveryhomogeneouslydetected,withLatencystressbeing7timesmoreefcientlydetectedthanCPUstress.
However,becauseoftheaccuracyoftheconsideredclassier,theseresultsonlyconcernasmallnumberofevents,andthereforehavealowstatisticalpower.
TableIVpresentsthemisclassiedentriesbyapplication:allthreeapplications(namelyBono,SproutandHomestead)yieldtosimilarclassicationaccuracy.
TABLEIII:ConfusionmatrixfortheNeuralNetworkclassier,usingdim(T)=20,andbarycenter:detailedbystresstypeStressTypeDetectedAsStressed(True)DetectedasUnstressed(False)NoStress0.
1150.
885Packetloss0.
9390.
061Latency0.
9850.
015Memory0.
9390.
061Disk0.
9700.
030CPU0.
8930.
106Fig.
6:AreaUndertheROCCurves(AUC)capturingtheperformanceofourclassiers,asafunctionofthenumberofdimensionsoftheembeddingspaceTABLEIV:ConfusionmatrixfortheNeuralNetworkclassier,usingdim(T)=20,andbarycenter:detailedbyapplicationTargetMachineRequestsNumberofmisclassicationsSuccessRate(%)Bono2201991.
4Sprout2201792.
3Homestead2202090.
92)Parameterssensitivity:Weherefocusontwochoicesofimportance:thedimensionoftheembeddingspacedim(T),andtheclassieralgorithm.
Tocompareourclassiers,weusetheAreaUnderCurve(AUC)measure.
Inanutshell,itmeasurestheareaundertheROCofaclassier.
Thatis,anAUCof1denotesaperfectclassication,whileanAUCof0denotesaworsethanrandomprediction.
Itisalsocommonlypresented,givenarandompositive(stressed)andrandomnegative(unstressed)example,astheprobabilityfortheclassiertorankthenegativeexamplebelow(thatis,lessstressed)thepositiveexample.
TheROCAUCisknowtowellsummarizesROCcurves[1].
Figure6providestheAUCmeasuresforour3consid-eredclassiersforvariousembeddingspacedimensions.
Asexpected,increasingthenumberofdimensionsincreasestheclassicationaccuracy:moreinformationhelps.
Thisincreaseishoweververylimited:apartfromNeuralNetwork,whereincreasingdimensionsfrom5to20hasavisibleimpact,classieraccuraciesallstaystablefordim(T)>20.
Thisisgoodnews,assuchparametercanbehardtotuneapriori.
Moregenerally,thisgureconrmsthepreviousobserva-tions:classicationisveryaccurate,especiallyusingNeuralNetworkandRandomForest,withAUCsconsistentlyscoringabove0.
95.
3)Timingperformance:Whenselectingaclassier,theexpectedclassicationaccuracyisthemostimportantcriteria.
However,inoperationalcontexts,anothercrucialcriteriaisthecomputationalcomplexityofbothtrainingandprediction.
7Fig.
7:Trainingwalltimeoftheclassierson660instances,forvaryingembeddingspacedimensions.
Noticethelog-logscale.
Toprovidesomeinsights,werecordedwallclocktimesofthetrainingofmachinelearningmodels(Figure7)andofindividualpredictionofthesemodels(Figure8)operations.
ThosewereperformedonclassicalMacbookProwith16GBofRAMandaquad-coreInteli7.
Interestingly,theseguresprovideanewperspectiveonourclassiers.
Resultsconrmthereputationofeachofthosemodels:NaiveBayesisverysimple,itisquicklytrainedandprovidesfastanswers.
NeuralNetworkisaconsiderablymorecomplexmodelwhosetrainingrequiressignicantlymoretime.
However,oncetraineditisabletoanswerreasonablyfast.
Contrariwise,RandomForestisquicklytrainedbutre-quiresconsiderablymoretimetoissuepredictions.
Issuingapredictionrequiresonaverage66ms(resp.
5msand11ms)forRandomForest(resp.
NaiveBayesandNeuralNetwork).
Notsurprisingly,increasingdim(T)comeswithacompu-tationalcost(asitincreasesthenumberoffeaturesonwhicheachmodelistrained),butsinceSectionIV-B1showsthatdim(T)=20isalreadysufcienttoobtainaccurateresults,weconcludethatthisapproachiscomputationallytractable.
Themostprominentdecisionisthechoiceoftheclassier:al-thoughthesimplestpossibleclassier(NaiveBayes)providescheapandreasonableanswers,moreefcientclassierslikeRandomForestorNeuralNetworkwillcostabitmore,eitherattrainingtime,oratpredictiontime.
Toconclude,thisresultssectionexploredtheperformanceofthreestateoftheartclassiersexploitingthelogpositions.
Theseclassiersexhibitastrongperformanceforareasonablecost.
Themostimportantparameter,thedimensionofthehostspacedim(T),isnotverysensitive:valuesrangingfrom20to200willroughlydeliverthesameperformance.
Althoughmanyparameterscouldbepreciselytunedtooptimizetheclassiers,webelievethesegoodresultsobtainedusingmostlydefaultvaluesofCOTStoolsalreadyvalidatethesoundnessofourapproach.
Moreprecisely,theseshowtheextremelypowerfuleffectoftheword2vecembeddingappliedtologs:itallowstosummarizeeachlogletoasinglepointinTwhileFig.
8:Timetakenforatrainedmodeltoissueoneprediction.
Noticethelog-linscale.
keepingenoughinformationtoallowanefcientclassication.
V.
DISCUSSIONOurapproachleavesonecommonquestionofallmachinelearningapproachesintact:howgeneralarethelearnedmod-elsInotherwords,aretheclassiersbuiltinthiscontextabletoprovideaccurateanswersindifferentcontexts,applicationenvironments,underdifferentinjectioncampaignsAlthoughthisquestionisdenitelyofinterest,weargueitsscopegoeswellbeyondthispaper.
Philosophically,thisstudyshowsthatitiseasytotrainefcientclassiers.
Butinformally,aclassierisonlyasgoodasitstrainingdata.
Theavailabilityoflabelledtrainingdatacanclearlylimittheapplicabilityofourapproach.
Theadvantageoffaultinjectioniftogatherrelevantlabeleddatasetsinashorttimeperiod.
Althoughitenablestoevaluateourapproachinastraghforwardmannerthisimplementioncanbecumbersome.
However,whilewerelyonfaultinjectiontogatherdatasets,othersourcesexist:user-basedfeedback,crowedsourceddatasets,andcrashreportsoflargescaledeployments.
Inourpreviouswork[19]weanalyzedmonitoringcoun-terssuchasCPUconsumptionornumberofdiskaccessesforanomalydetection.
Resultsfromcounter-baseddetectionshowedagoodpredictiveperformancethatisyetnotfullyalignedwiththeresultsofthisstudy.
Forinstance,latencyer-rorsweresignicantlyhardertodetect.
Inthisstudy,weshowthatbysolelyminingsyslogleswecoulddetectanomalieswithhighaccuracyforalltypesofanomalies.
Consequenlty,webelieveourapproachislargelypromising.
Asforfuturework,weplantostudyanhybridapproachleveragingbothloggingandcounter-baseddatainordertofurtherevaluatetheirpotentialcomplementarity.
whattypeoflogsenhanceorweakentheefciencyofourapproach.
Finally,resultspresentedinthispapershowthatourapproachdetectswiththesameaccuracythestressesinjectedineithertypeofapplicationofourcasestudy(i.
e.
,proxy,routeranddatabase).
Inotherwords,theanalysisofsystemrelatedlogssuchassyslogisanefcientwaytosummarize8applicationbehaviorsforstressdetectionwithnoregardtothetypeofapplication.
Webelievehoweverthatsyslogeventsarenotenoughtoderiveapplicationdataowsthatmayallowtodetectothertypesofanomaliesormoreimportantlyforadmin-istrators,todiagnosetheoriginofananomaly.
Consequently,weneedtoexploreinfutureworkothertypesoflogs,notablytheonesgeneratedbyourcasestudyapplication.
VI.
RELATEDWORKInthisstudy,weuseaword2vec-basedmethodforlogminingwithavalidation-purposedapplicationofdetectingstressedbehaviorsincomputingsystems.
word2vecisamethodforlearninghigh-qualityvectorrepresentationsofwords.
IthasbeenusedforNLPinsomepreviousworksbutnotfortheanalysisoflogles.
Incomparison,ourpreviouswork[19]focusesonanomalydetectionbasedonmonitoringdatacollectedbymeansofaspecicsoftwareagent,deployedbeforehandontargetmachines,andprovidingnumericalmet-ricsonthesystembehavior.
Hereweexploitthedefaultsystem-producedtextuallogstopredictstress.
Besidethedeeptechnicaldifferences,ourapproachallowsdifferentuse-cases,likepost-mortemanalysisofthebehavioroftheseveralprocessesbeingexecutedinthetargetedsystems.
Consequently,inthefollowingwepresentseparatelysev-eralworksrelatedtoNLPandotherworksrelatedtologlesanalysisfordetectionpurposes.
NLPapplications.
Intheliterature,mostoftheNLPalgorithmsareusedfordocumentprocessing[26]toisolatereferencesofagivensubjectinadocumentanddetectthesentimentsofthewriter,ortoexploittweets[11]todetectcyber-attackssuchasdistributeddenialofservice.
Tothebestofourknowledge,relativelyfewworksexploitNLPforadifferentpurposethandocumentanalysis.
Weprovidehereaquicksummaryofthesenon-traditionalusesofNLP.
In[15],theauthorsuseaNLPtechniquecalledLatentSemanticIndexingtoidentifysourcecodedocumentsthatmatchauserqueryexpressedinnaturallanguage.
Theyusethesametechniquein[14]todetectsimilarpieceofcode(i.
e.
,duplicatedfunctions)insoftwaresystemscode.
Inaddition,LatentDirichletAllocationsareusedforasimilarpurposein[20].
NLPisalsoappliedonnetworkpacketpayloadsfornetworkintrusiondetectionin[18].
In[10],customersaccessestobusinessesURLsareanalyzedusingaword2vec-basedmethodtoproposebetterservicestocustomers.
Finally,NLPisalsousedtodetectdesignandrequirementdebts[13]fromcommentsoftenopensourceprojects.
Logminingfordetectionpurposes.
Althoughsomeworksproposenewmethodstogeneraterelevantlogeventsasin[4],loglesstillgatherawiderangeofeventsandevaluatingtheirinformationintheexecutioncontextorweightingtheirgravityisstillintricate.
Forinstance,theauthorsof[17]analyzeawiderangeoflogswithengineersandcompareeventssignalingfailurestotheengineersfeedbackonactualfailures.
Itturnsoutthatthenumberofactualfailuresislowerthanthefailuresreportedbylogs.
Alsotheypointoutthatsyslogmessageseveritylevelisof"dubiousvalue",andthatitisessentialtotakeintoaccounttheoperationalcontextduringwhichlogeventsarecollected.
Nevertheless,loglesanalysisforanomaly(e.
g.
,crash,fault,OSstressing.
.
.
)detectionincomputingsystemshasbeenwidelystudiedanditisstillanactiveresearcheld,inparticularwhenconsideringtheevermorecomplexrecentcomputingsystems.
Executiontracesofstreamingapplicationsareanalyzedin[9]inordertodetectanomalies.
Theauthorsanalyzetracesbymeansofthemergingpatternminingmethodappliedonpatternsofevents(i.
e.
,linesoftraces).
Thentheybuildagraphrepresentingthedataowbetweenthedifferentcomputingunitsoftheapplication.
Likewise,in[21]theauthorsanalyzethetemporalityofexecutiontracesinordertoderivesystemstatesfromtheirestimatedcontrolows.
Theauthorsof[25]alsoworkontheorderednatureoflogles.
Theyexploittimeseriespotentiallyhiddenbehindlogseventsforfailuresymptomsdetection.
TheyuseaprobabilisticmodelingusingamixtureofHiddenMarkovModels(HMM)torepresentdifferenttimewindows(i.
e.
,sessions)oflogsevent.
TheyproposeanewmethodforthelearningoftheHMMmixtureworkingonline.
Automatictechniquesbasedonmachinelearningorstatis-ticsalgorithmshavebeenwidelyusedforthismatter,asin[6]wheretheauthorsproposeanewapproachfordiskfailureprediction.
Moreprecisely,theyanalyzebymeansofaSupportVectorMachine(SVM)model,sequencesofsyslogeventsbasedonsyslogtagnumberssequencesorkeystringsinevents.
In[22],theauthorproposesanewalgorithmfortheclusteringoflogeventsandimplementsatoolbasedonitnamedSLCT.
Loglesparsingisexploitedin[24].
Theparsinguseslogpatternsidentiedfromastaticanalysisofsourcecode.
Then,twotypesoffeaturesarecomputedfromtheentireavailablelogles,andtheyarefedtothePCA-basedanomalydetectionalgorithmforanofinedetection.
Alogextractorforanomalydetectionisstudiedin[12].
TheextractoruseslogclusteringbasedontheLevenshteineditingdistancetoevaluatethesimilaritiesamongstlogeventsstrings(i.
e.
,twostringsareclosetogetherifthereisaminimalnumberofactionstochangetherststringintotheother).
Templatesarethenextractedfromlogclusters.
Finally,asequenceoflogeventsmatchingpatternsiscreatedandfeedtoamachinelearningalgorithm.
TheNaiveBayes,andRecurrentNeuralNetworksareevaluated.
VII.
CONCLUSIONSANDFUTUREWORKInthispaper,wetackledtheproblemofanomalydetectionbymininglogsproducedbyrunningsystems.
Differentlytopreviousstudies,wedevelopalinguisticapproachbyconsideringlogsasregularplaintextdocuments.
ThisenablestoexploitrecentNLPtechniquestoextractinformationfromthegrammaticalstructureandcontextoflogevents.
Loglesarerepresentedasasetoffeaturesthatcanbeprocessedbystandardmachinelearningalgorithms.
Assuchthisapproachshiftstheburdenoflogpreprocessingtowardthecollectionofrepresentativedatasets.
Itisagoodtradewhendataismassivelyavailablelikeinrecentdistributedsystems.
OurexperimentalcampaignsondifferentcomponentsofaVNFrelyonfaultinjectiontosynthetizeanomalousbehaviorsandcollectrelevantdatasetsondemand.
Wemoreparticularlyfocusonthecaseofstressdetectionandshowthatstrongpredictors(≈90%accuracy)areeasilytrainedwithnohumaninterventionintheloop.
Eventhoughwefocusonstress9detectioninthiswork,ourapproachisttedforcomputingsystemsadministratorsfortheonlinedetectionofanytypeofanomaly.
Asforfuturework,weplantoexploreunsupervisedclas-siersthatwouldnotrestrainourapproachscopetolabelledtrainingdataandmostlyknownanomalies.
Sysloglesareusedinthisstudy,howeverweplantoinquireaboutwhattypeoflogles(e.
g.
,dmesg,applicationlogs.
.
.
)enhanceorweakentheefciencyofourapproach.
Also,weplantoextendourstudytomorepreciseonlineeventtroubleshootingwhilecombiningthisdetectionapproachwithourpreviousworkoncounter-baseddetection[19].
REFERENCES[1]A.
P.
Bradley,"Theuseoftheareaundertheroccurveintheevaluationofmachinelearningalgorithms,"Patternrecognition,vol.
30,no.
7,pp.
1145–1159,1997.
[2]L.
Cao,P.
Sharma,S.
Fahmy,andV.
Saxena,"Nfv-vital:Aframeworkforcharacterizingtheperformanceofvirtualnetworkfunctions,"inNetworkFunctionVirtualizationandSoftwareDenedNetwork(NFV-SDN),2015IEEEConferenceon,Nov2015,pp.
93–99.
[3]M.
Cinque,D.
Cotroneo,R.
D.
Corte,andA.
Pecchia,"Characterizingdirectmonitoringtechniquesinsoftwaresystems,"IEEETransactionsonReliability,vol.
65,no.
4,pp.
1665–1681,Dec2016.
[4]M.
Cinque,D.
Cotroneo,andA.
Pecchia,"Eventlogsfortheanalysisofsoftwarefailures:Arule-basedapproach,"IEEETransactionsonSoftwareEngineering,vol.
39,no.
6,pp.
806–821,June2013.
[5]M.
Farshchi,J.
G.
Schneider,I.
Weber,andJ.
Grundy,"Experiencereport:Anomalydetectionofcloudapplicationoperationsusinglogandcloudmetriccorrelationanalysis,"inSoftwareReliabilityEngineering(ISSRE),2015IEEE26thInternationalSymposiumon,Nov2015,pp.
24–34.
[6]R.
W.
FeatherstunandE.
W.
Fulp,"Usingsyslogmessagesequencesforpredictingdiskfailures,"inProceedingsofthe24thInternationalConferenceonLargeInstallationSystemAdministration,ser.
LISA'10.
Berkeley,CA,USA:USENIXAssociation,2010,pp.
1–10.
[7]R.
Gerhards,"TheSyslogProtocol,"RFCEditor,RFC5424,March2009.
[8]S.
He,J.
Zhu,P.
He,andM.
R.
Lyu,"Experiencereport:Systemloganalysisforanomalydetection,"in2016IEEE27thInternationalSymposiumonSoftwareReliabilityEngineering(ISSRE),Oct2016,pp.
207–218.
[9]O.
Iegorov,V.
Leroy,A.
Termier,J.
F.
Mehaut,andM.
Santana,"Dataminingapproachtotemporaldebuggingofembeddedstreamingapplications,"in2015InternationalConferenceonEmbeddedSoftware(EMSOFT),Oct2015,pp.
167–176.
[10]R.
Kanagasabai,A.
Veeramani,H.
Shangfeng,K.
Sangaralingam,andG.
Manai,"Classicationofmassivemobileweblogurlsforcustomerprolinganalytics,"in2016IEEEInternationalConferenceonBigData(BigData),Dec2016,pp.
1609–1614.
[11]R.
P.
Khandpur,T.
Ji,S.
Jan,G.
Wang,C.
-T.
Lu,andN.
Ramakrishnan,"Crowdsourcingcybersecurity:Cyberattackdetectionusingsocialmedia,"arXivpreprintarXiv:1702.
07745,2017.
[12]C.
Liu,"Dataanalysisofminimally-structuredheterogeneouslogs:Anexperimentalstudyoflogtemplateextractionandanomalydetectionbasedonrecurrentneuralnetworkandnaivebayes.
"Master'sthesis,KTH,SchoolofComputerScienceandCommunication(CSC),2016.
[13]E.
Maldonado,E.
Shihab,andN.
Tsantalis,"Usingnaturallanguageprocessingtoautomaticallydetectself-admittedtechnicaldebt,"IEEETransactionsonSoftwareEngineering,vol.
PP,no.
99,pp.
1–1,2017.
[14]A.
MarcusandJ.
I.
Maletic,"Identicationofhigh-levelconceptclonesinsourcecode,"inProceedings16thAnnualInternationalConferenceonAutomatedSoftwareEngineering(ASE2001),Nov2001,pp.
107–114.
[15]A.
Marcus,A.
Sergeyev,V.
Rajlich,andJ.
I.
Maletic,"Aninformationretrievalapproachtoconceptlocationinsourcecode,"in11thWorkingConferenceonReverseEngineering,Nov2004,pp.
214–223.
[16]T.
Mikolov,I.
Sutskever,K.
Chen,G.
S.
Corrado,andJ.
Dean,"Distributedrepresentationsofwordsandphrasesandtheircomposi-tionality,"inAdvancesinneuralinformationprocessingsystems,2013,pp.
3111–3119.
[17]A.
Oliner,"Whatsupercomputerssay:Astudyofvesystemlogs,"inProceedingsofDSN2007,2007.
[18]K.
RieckandP.
Laskov,"Detectingunknownnetworkattacksusinglanguagemodels,"inProceedingsoftheThirdInternationalConferenceonDetectionofIntrusionsandMalware&VulnerabilityAssessment,ser.
DIMVA'06.
Berlin,Heidelberg:Springer-Verlag,2006,pp.
74–90.
[19]C.
Sauvanaud,K.
Lazri,M.
Kaaniche,andK.
Kanoun,"Anomalydetectionandrootcauselocalizationinvirtualnetworkfunctions,"in27thIEEEInternationalSymposiumonSoftwareReliabilityEngineering,ISSRE2016,Ottawa,ON,Canada,October23-27,2016,2016,pp.
196–206.
[20]T.
Savage,B.
Dit,M.
Gethers,andD.
Poshyvanyk,"Topicxp:Exploringtopicsinsourcecodeusinglatentdirichletallocation,"in2010IEEEInternationalConferenceonSoftwareMaintenance,Sept2010,pp.
1–6.
[21]J.
Tan,X.
Pan,S.
Kavulya,R.
Gandhi,andP.
Narasimhan,"Salsa:Analyzinglogsasstatemachines,"inProceedingsoftheFirstUSENIXConferenceonAnalysisofSystemLogs,ser.
WASL'08.
Berkeley,CA,USA:USENIXAssociation,2008,pp.
6–6.
[22]R.
Vaarandi,"Adataclusteringalgorithmforminingpatternsfromeventlogs,"inProceedingsofthe3rdIEEEWorkshoponIPOperationsManagement(IPOM2003)(IEEECat.
No.
03EX764),Oct2003,pp.
119–126.
[23]Y.
Watanabe,H.
Otsuka,M.
Sonoda,S.
Kikuchi,andY.
Matsumoto,"Onlinefailurepredictioninclouddatacentersbyreal-timemessagepatternlearning,"inCloudComputingTechnologyandScience(Cloud-Com),2012IEEE4thInternationalConferenceon,Dec2012,pp.
504–511.
[24]W.
Xu,L.
Huang,A.
Fox,D.
Patterson,andM.
I.
Jordan,"Detectinglarge-scalesystemproblemsbyminingconsolelogs,"inProceedingsoftheACMSIGOPS22NdSymposiumonOperatingSystemsPrinciples,ser.
SOSP'09.
NewYork,NY,USA:ACM,2009,pp.
117–132.
[25]K.
YamanishiandY.
Maruyama,"Dynamicsyslogminingfornetworkfailuremonitoring,"inProceedingsoftheEleventhACMSIGKDDInternationalConferenceonKnowledgeDiscoveryinDataMining,ser.
KDD'05.
NewYork,NY,USA:ACM,2005,pp.
499–508.
[26]J.
Yi,T.
Nasukawa,R.
Bunescu,andW.
Niblack,"Sentimentanalyzer:extractingsentimentsaboutagiventopicusingnaturallanguagepro-cessingtechniques,"inThirdIEEEInternationalConferenceonDataMining,Nov2003,pp.
427–434.
10

展开全文

storedhttperror503相关文档

Lawfulhttperror503

message_templatehttperror503

杨紫别祝我生日快乐关于“致自己生日”的唯美句子有哪些？今日油条天天吃油条，身体会怎么样硬盘的工作原理简述下硬盘的工作原理？firetrap我发现好多外贸店都卖其乐的原单，有怎么多原单吗比肩工场大运比肩主事,运行长生地是什么意思？psbc.com邮政银行卡6215995915000241921是哪个地区的 www.622hh.comwww.710av.com怎么不可以看了百度指数词百度指数是指，词不管通过什么样的搜索引擎进行搜索，都会被算成百度指数吗？www.kaspersky.com.cn卡巴斯基杀毒软件有免费的吗？稳定版的怎么找？www.diediao.com谁知道台湾的拼音怎么拼啊？有具体的对照表最好！万网域名查询国外永久服务器 GGC cve-2014-6271 kddi directadmin win8.1企业版升级win10 全能主机免费ftp站点云鼎网络大容量存储器刀片服务器是什么 789电视网微软服务器操作系统 vip域名根服务器东莞idc 海外空间带宽租赁服务器维护更多

storedhttperror503

【IT狗】在线ping，在线tcping，路由追踪

wordpress外贸集团企业主题 wordpress高级推广外贸主题

恒创科技SonderCloud，美国VPS综合性能测评报告，美国洛杉矶机房，CN2+BGP优质线路，2核4G内存10Mbps带宽，适用于稳定建站业务需求