factorwarez

warez 时间:2021-01-03 阅读:()

LearningUsefulSystemCallAttributesforAnomalyDetectionGauravTandonandPhilipK.
ChanDepartmentofComputerSciencesFloridaInstituteofTechnologyMelbourne,FL32901{gtandon,pkc}@cs.
fit.
eduAbstractTraditionalhost-basedanomalydetectionsystemsmodelnormalbehaviorofapplicationsbyanalyzingsystemcallsequences.
Currentsequenceisthenexamined(usingthemodel)foranomalousbehavior,whichcouldcorrespondtoattacks.
Thoughthesetechniqueshavebeenshowntobequiteeffective,akeyelementseemstobemissing–theinclusionandutilizationofthesystemcallarguments.
Recentresearchshowsthatsequence-basedsystemsarepronetoevasion.
Weproposeanideaoflearningdifferentrepresentationsforsystemcallarguments.
Resultsindicatethatthisinformationcanbeeffectivelyusedfordetectingmoreattackswithreasonablespaceandtimeoverhead.
IntroductionIntrusiondetectionsystems(IDSs)aregenerallycategorizedassignature-basedandanomaly-based.
Insignaturedetection,systemsaremodeleduponknownattackpatternsandthetestdataischeckedfortheoccurrenceofthesepatterns.
Suchsystemshaveahighdegreeofaccuracybutsufferfromtheinabilitytodetectnovelattacks.
Anomalydetectioncomplementssignaturedetectionbymodelingnormalbehaviorofapplications.
Significantdeviationsfromthisbehaviorareconsideredanomalous.
Suchsystemscandetectnovelattacks,butgeneratefalsealarmssincenotallanomaliesarenecessarilyhostile.
Intrusiondetectionsystemscanalsobecategorizedasnetwork-based,whichdealswithnetworktraffic;andhost-based,whereoperatingsystemeventsaremonitored.
Mostofthetraditionalhost-basedanomalydetectionsystemsfocusonsystemcallsequences,theassumptionbeingthatamaliciousactivityresultsinanabnormal(novel)sequenceofsystemcalls.
Recentresearchhasshownthatsequence-basedsystemscanbecompromisedbyconductingmimicryattacks.
Suchattacksarepossiblebyinsertingdummysystemcallswithinvalidargumentssuchthattheyformalegitimatesequenceofevents.
Adrawbackofsequence-basedapproachesliesintheirnon-utilizationofotherkeyattributes,namelythesystemcallarguments.
Theefficacyofsuchsystemsmightbeimproveduponifarichersetofattributes(returnvalue,errorstatusandotherarguments)associatedwithasystemCopyright2005,AmericanAssociationforArtificialIntelligence(www.
aaai.
org).
Allrightsreserved.
callisusedtocreatethemodel.
Inthispaperwepresentahost-basedanomalydetectionsystemthatisbaseduponsystemcallarguments.
WelearntheimportantattributesusingavariantofarulelearningalgorithmcalledLERAD.
Wealsopresentvariousargument-basedrepresentationsandcomparetheirperformancewithsomeofthewell-knownsequence-basedtechniques.
Ourmaincontributionsare:(1)weincorporatevarioussystemcallattributes(returnvalue,errorstatusandotherarguments)forbetterapplicationmodeling;(2)weproposeenrichedrepresentationsusingsystemcallsequencesandarguments;(3)weuseavariantofarulelearningalgorithmtolearntheimportantattributesfromthefeaturespace;(4)wedemonstratetheeffectivenessofourmodels(intermsofnumberofattackdetections,timeandspaceoverhead)byperformingexperimentsonthreedifferentdatasets;and(5)wepresentananalysisoftheanomaliesdetected.
Oursequence-basedmodeldetectsmoreattacksthantraditionaltechniques,indicatingthattherulelearningtechniqueisabletogeneralizewell.
Ourargument-basedsystemsareabletodetectmoreattacksthantheirsequence-basedcounterparts.
Thetimeandspacerequirementsforourmodelsarereasonableforonlinedetection.
RelatedWorkTime-delayembedding(tide)recordsexecutionsofnormalapplicationexecutionsusinglook-aheadpairs(Forrestetal.
1996).
UNIXcommandsequenceswerealsoexaminedtocaptureuserprofilesandcomputesequencesimilarityusingadjacenteventsinaslidingwindow(LaneandBrodley1997).
Sequencetime-delayembedding(stide)memorizesallcontiguoussequencesofpredetermined,fixedlengthsduringtraining(Warrender,Forrest,andPearlmutter1999).
Afurtherextension,calledsequencetime-delayembeddingwith(frequency)threshold(t-stide),wassimilartostidewiththeexceptionthatthefrequenciesofthesefixedlengthsequenceswerealsotakenintoaccount.
Raresequenceswereignoredfromthenormalsequencedatabaseinthisapproach.
Allthesetechniquesmodelednormalbehaviorbyusingfixedlengthpatternsoftrainingsequences.
AschemetogeneratevariablelengthpatternsbyusingTeiresias(RigoutsosandFloratos1998),apattern-discoveryalgorithminbiologicalsequences,wasproposedin(Wespi,Dacier,andDebar1999,2000).
Thesetechniquesimproveduponthefixedlengthmethods.
Thoughalltheaboveapproachesusesystemcallsequences,noneofthemmakeuseofthesystemcallarguments.
GivensomeknowledgeabouttheIDS,attackerscandevisesomemethodologiestoevadesuchintrusiondetectionsystems(Tan,Killourhy,andMaxion2002;WagnerandSoto2002).
Suchattacksmightbedetectedifthesystemcallargumentsarealsoevaluated(Kruegeletal.
2003),andthismotivatesourcurrentwork.
Ourtechniquemodelsonlytheimportantcharacteristicsandgeneralizesfromit;previousworkemphasizesonthestructureofallthearguments.
ApproachSinceourgoalistodetecthost-basedintrusions,systemcallsareinstrumentalinoursystem.
Weincorporatethesystemcallswithitsargumentstogeneratearichermodel.
ThenwepresentdifferentrepresentationsformodelingasystemusingLERAD,whichisdiscussednext.
LearningRulesforAnomalyDetection(LERAD)Algorithmsforfindingassociationrules,suchasApriori(Agrawal,Imielinski,andSwami1993),generatealargenumberofrules.
Thisincursalargeoverheadandmaynotbeappropriateforonlinedetection.
Wewouldliketohaveaminimalsetofrulesdescribingthenormaltrainingdata.
LERADisaconditionalrule-learningalgorithmthatformsasmallsetofrules.
Itisbrieflydescribedhere;moredetailscanbeobtainedfrom(MahoneyandChan2003).
LERADlearnsrulesoftheform:},,{,,21KKxxXbBaA∈==(1)whereA,B,andXareattributesanda,b,x1,x2arevaluesforthecorrespondingattributes.
Thelearnedrulesrepresentthepatternspresentinthenormaltrainingdata.
Theset{x1,x2,…}intheconsequentconstitutesalluniquevaluesofXwhentheantecedentoccursinthetrainingdata.
Duringthedetectionphase,records(ortuples)thatmatchtheantecedentbutnottheconsequentofaruleareconsideredanomalousandananomalyscoreisassociatedwitheveryruleviolation.
Thedegreeofanomalyisbasedonaprobabilisticmodel.
Foreachrule,fromthetrainingdata,theprobability,p,ofobservingavaluenotintheconsequentisestimatedby:nrp/=(2)whereristhecardinalityoftheset,{x1,x2,…},intheconsequentandnisthenumberofrecords(tuples)thatsatisfytheruleduringtraining.
Thisprobabilityestimationofnovel(zerofrequency)eventsisfrom(WittenandBell1991).
Sincepestimatestheprobabilityofanovelevent,thelargerpis,thelessanomalousanoveleventis.
Hence,duringdetection,whenanoveleventisobserved,thedegreeofanomaly(anomalyscore)isestimatedby:rnpScoreAnomaly//1==(3)Anon-stationarymodelisassumedforLERAD–onlythelastoccurrenceofaneventisassumedimportant.
Sincenoveleventsareburstyinconjunctionwithattacks,afactortisintroduced–itisthetimeintervalsincethelastnovel(anomalous)attributevalue.
Whenanoveleventoccurredrecently(smallvalueoft),anoveleventismorelikelytooccuratthepresentmoment.
Hence,theanomalyscoreismeasuredbyt/p.
Sincearecordcandeviatefromtheconsequentofmorethanonerule,thetotalanomalyscoreofarecordisaggregatedoveralltherulesviolatedbythetupletocombinetheeffectfromviolationofmultiplerules:∑∑==rntptScoreAnomalyTotal//(4)Themoretheviolations,moresignificanttheanomalyis,andthehighertheanomalyscoreshouldbe.
Analarmisraisedifthetotalanomalyscoreisaboveathreshold.
TherulegenerationphaseofLERADcomprisesof4mainsteps:(i)Generateinitialruleset:TrainingsamplesarepickedupatrandomfromarandomsubsetSoftrainingexamples.
Candidaterules(asdepictedinEquation1)aregeneratedfromthesesamples.
(ii)Coveragetest:Therulesetisfilteredbyremovingrulesthatdonotcover/describeallthetrainingexamplesinS.
Ruleswithlowerrateofanomalies(lowerr/n)arekept.
(iii)UpdaterulesetbeyondS:Extendtherulesovertheremainingtrainingdatabyaddingvaluesfortheattributeintheconsequentwhentheantecedentistrue.
(iv)Validatetheruleset:Rulesareremovediftheyareviolatedbyanytupleinthevalidationset.
Sincesystemcallisthekey(pivotal)attributeinahostbasedsystem,wemodifiedLERADsuchthattheruleswereforcedtohaveasystemcallasaconditionintheantecedent.
Theonlyexceptionwemadewasthegenerationofruleswithnoantecedent.
SystemcallandargumentbasedrepresentationsWenowpresentthedifferentrepresentationsforLERAD.
Sequenceofsystemcalls:S-LERAD.
Usingsequenceofsystemcallsisaverypopularapproachforanomalydetection.
Weusedawindowoffixedlength6(asthisisclaimedtogivebestresultsinstideandt-stide)andfedthesesequencesofsixsystemcalltokensasinputtuplestoLERAD.
ThisrepresentationisselectedtoexplorewhetherLERADwouldbeabletocapturethecorrelationsamongsystemcallsinasequence.
Also,thisexperimentwouldassistusincomparingresultsbyusingthesamealgorithmforsystemcallsequencesaswellastheirarguments.
AsamplerulelearnedinaparticularrunofS-LERADis:}{,,3621munmapSCopenSCmmapSCcloseSC∈===(1/pvalue=455/1)Thisruleisanalogoustoencounteringcloseasthefirstsystemcall(representedasSC1),followedbymmapandmunmap,andopenasthesixthsystemcall(SC6)inawindowofsize6slidingacrosstheaudittrail.
Eachruleisassociatedwithann/rvalue.
Thenumber455inthenumeratorreferstothenumberoftraininginstancesthatcomplywiththerule(ninEquation3).
Thenumber1inthedenominatorimpliesthatthereexistsjustonedistinctvalueoftheconsequent(munmapinthiscase)whenalltheconditionsinthepremiseholdtrue(rinEquation3).
Argument-basedmodel:A-LERAD.
Weproposethatargumentandotherkeyattributeinformationisintegraltomodelingagoodhost-basedanomalydetectionsystem.
Weextractedarguments,returnvalueanderrorstatusofsystemcallsfromtheauditlogsandexaminedtheeffectsoflearningrulesbaseduponsystemcallsalongwiththeseattributes.
Anyvaluefortheotherarguments(giventhesystemcall)thatwasneverencounteredinthetrainingperiodforalongtimewouldraiseanalarm.
Weperformedexperimentsonthetrainingdatatomeasurethemaximumnumberofattributes(MAX)foreveryuniquesystemcall.
Wedidnotusethetestdatafortheseexperimentssothatwedonotgetanyinformationaboutitbeforeourmodelisbuilt.
SinceLERADacceptsthesame(fixed)numberofattributesforeverytuple,wehadtoinsertaNULLvalueforanattributethatwasabsentinaparticularsystemcall.
Theorderoftheattributeswithinthetuplewasmadesystemcalldependent.
SincewemodifiedLERADtoformrulesbaseduponthesystemcalls,thereisconsistencyamongsttheattributesforanyspecificsystemacrossallmodels.
Byincludingallattributesweutilizedthemaximumamountofinformationpossible.
Mergingsystemcallsequenceandargumentinformationofthecurrentsystemcall:M-LERAD.
Thefirstrepresentationwediscussedisbaseduponsequenceofsystemcalls;thesecondonetakesintoconsiderationotherrelevantattributes,whoseefficacyweclaiminthispaper;sofusingthetwotostudytheeffectswasanobviouschoice.
MergingisaccomplishedbyaddingmoreattributesineachtuplebeforeinputtoLERAD.
Eachtuplenowcomprisesofthesystemcall,MAXnumberofattributesforthecurrentsystemcall,andthepreviousfivesystemcalls.
Then/rvaluesobtainedfromtheallrulesviolatedareaggregatedintoananomalyscore,whichisthenusedtogenerateanalarmbaseduponthethreshold.
Mergingsystemcallsequenceandargumentinformationforallsystemcallsinthesequence:M*-LERAD.
Alltheproposedvariants,namelyS-LERAD,A-LERADandM-LERAD,considerasequenceof6systemcallsand/ortakeintotheargumentsforthecurrentsystemcall.
WeproposeanothervariantcalledmultipleargumentLERAD(M*-LERAD)–inadditiontousingthesystemcallsequenceandtheargumentsforthecurrentsystemcall,thetuplesnowalsocomprisetheargumentsforallsystemcallswithinthefixedlengthsequenceofsize6.
Eachtuplenowcomprisesofthecurrentsystemcall,MAXattributesforthecurrentsystemcall,5previoussystemcallsandMAXattributesforeachofthosesystemcalls.
ExperimentalEvaluationOurgoalistostudyifLERADcanbemodifiedtodetectattack-basedanomalieswithfeaturespacescomprisingsystemcallsandtheirarguments.
DatasetsandexperimentalprocedureWeusedthefollowingdatasetsforourexperiments:(i)The1999DARPAintrusiondetectionevaluationdataset:DevelopedattheMITLincolnLab,weselectedtheBSMlogsfromSolarishosttracingsystemcallsthatcontains33attacks.
Attackclassificationisprovidedin(Kendell1999).
Thefollowingapplicationswerechosen:ftpd,telnetd,sendmail,tcsh,login,ps,eject,fdformat,sh,quotaandufsdump,duetotheirvariedsizes(1500–over1millionsystemcalls).
Weexpectedtofindagoodmixofbenignandmaliciousbehaviorintheseapplications.
Trainingwasperformedonweek3dataandtestingonweeks4and5.
Anattackisconsideredtobedetectedifanalarmisraisedwithin60secondsofitsoccurrence(sameastheDARPAevaluation).
(ii)lpr,loginandpsapplicationsfromtheUniversityofNewMexico(UNM):Thelprapplicationcomprisedof2703normaltracescollectedfrom77hostsrunningSUNOS4.
1.
4attheMITAILab.
Another1001tracesresultfromtheexecutionofthelprcpattackscript.
TracesfromtheloginandpsapplicationswereobtainedfromLinuxmachinesrunningkernel2.
0.
35.
HomegrownTrojanprogramswereusedfortheattacktraces.
(iii)Microsoftexcelmacrosexecutions(FIT-UTKdata):Normalexcelmacroexecutionsareloggedin36distincttraces.
2malicioustracesmodifyregistrysettingsandexecutesomeotherapplication.
SuchabehaviorisexhibitedbytheILOVEYOUwormwhichopensthewebbrowsertoaspecifiedwebsiteandexecutesaprogram,modifyingregistrykeysandcorruptinguserfiles,resultinginadistributeddenialofservice(DDoS)attack.
TheinputtuplesforS-LERADwere6contiguoussystemcalls;forA-LERADtheyweresystemcallswiththeirreturnvalue,errorstatusandarguments;TheinputsforM-LERADweresequencesofsystemcallswithargumentsofthecurrentsystemcall;whereasinM*-LERAD,theyweresystemcallsequenceswithargumentsforallthe6systemcalls.
Fortide,theinputswereallthepairsofsystemcallswithinawindowoffixedsize6;stideandt-stidecomprisedallcontiguoussequencesoflength6.
Forallthetechniques,alarmsweremergedindecreasingorderoftheanomalyscoresandevaluatedatvariedfalsealarmrates.
ResultsSincet-stideissupposedtogivebestresultsamongthesequence-basedtechniques,wecompareditsperformancewithS-LERADontheUNMandFIT-UTKdatasets.
Table1:t-stidevs.
S-LERAD(UNM,FIT-UTKdata).
Numberofattacksdetected(Numberoffalsealarms)ProgramnameNumberoftrainingsequencesNumberoftestsequencest-stideS-LERADlpr100027041(0)1(1)ps12272(58)2(2)login881(0)1(1)excel3262(92)2(0)04812162000.
250.
512.
5Falsealarms(x10-3%perday)Numberofdetectionstidestidet-stideS-LERADA-LERADM-LERADM*-LERAD01234567DoSU2RR2LAttacktypesNumberofdetectionstidestidet-stideS-LERADA-LERADM-LERADM*-LERADFigure1.
Numberofdetections(DARPA/LLdata).
Figure2.
Numberofdetectionsat10falsealarmsperdayfordifferentattackcategories(DARPA/LLdata).
ResultsfromTable1showthatboththetechniqueswereabletodetectalltheattacks.
However,t-stidegeneratedmorefalsealarmsforpsandexcel.
WealsoperformedexperimentsontheDARPA/LLdatasetstoevaluateallthetechniques.
Figure1illustratesthetotalattacksdetected(Y-axis)atvariedfalsealarmsrates(X-axis).
Atzerofalsealarms,tide,stideandt-stidedetectedthemostattacks,suggestingthatmaximumdeviationsintemporalsequencesaretruerepresentationsofactualattacks.
Butasthethresholdisrelaxed,S-LERADoutperformedallthe3sequence-basedtechniques.
ThiscanbeattributedtothefactthatS-LERADisabletogeneralizewellandlearnstheimportantcorrelations.
TheUNMandFIT-UTKdatasetsdonothavecompleteargumentinformationtoevaluateLERADvariantsthatinvolvearguments.
FortheDARPA/LLdataset,A-LERADfaredbetterthanS-LERADandtheothersequence-basedtechniques(Figure1),suggestingthatargumentinformationismoreusefulthansequenceinformation.
Usingargumentscouldalsomakeasystemrobustagainstmimicryattackswhichevadesequence-basedsystems.
ItcanalsobeseenthattheA-LERADcurvecloselyfollowsthecurveforM-LERAD.
Thisimpliesthatthesequenceinformationisredundant;itdoesnotaddsubstantialinformationtowhatisalreadygatheredfromarguments.
M*-LERADperformedtheworstamongallthetechniquesatfalsealarmsratelowerthan0.
5x10-3%perday.
ThereasonforsuchaperformanceisthatM*-LERADgeneratedalarmsforbothsequenceandargumentbasedanomalies.
Ananomalousargumentinonesystemcallraisedanalarminsixdifferenttuples,leadingtoahigherfalsealarmrate.
Asthealarmthresholdwasrelaxed,thedetectionrateimproved.
ThebetterperformanceofLERADvariantscanbeattributedtoitsanomalyscoringfunction.
Itassociatesaprobabilisticscorewitheveryrule.
Insteadofabinary(present/absent)value(asinthecaseofstideandt-stide),thisprobabilityvalueisusedtocomputethedegreeofanomalousness.
Italsoincorporatesaparameterforthetimeelapsedsinceanovelvaluewasseenforanattribute.
Theadvantageistwofold:(i)itassistsindetectinglongtermanomalies;(ii)suppressesthegenerationofmultiplealarmsfornovelattributevaluesinasuddenburstofdata.
Figure2plotstheresultat10falsealarmsperday,makingatotalof100falsealarmsforthe10daysoftesting(criterionusedinthe1999DARPAevaluation).
DifferentattacktypesarerepresentedalongtheX-axisandtheY-axisdenotedthetotalattacksdetectedineachattackcategory.
M-LERADwasabletodetectthelargestnumberofattacks–5DoS,3U2Rand6R2Lattacks.
Aninterestingobservationisthatthesequence-basedtechniquesgenerallydetectedtheU2RattackswhereastheR2LandDoSattackswerebetterdetectedbytheargument-basedtechniques.
Ourtechniqueswereabletodetectsomepoorlydetectedattacksquotedin(Lippmannetal.
1999),warezclientbeingoneofthem.
Ourmodelsalsodetected3stealthypsattacks.
Table2.
A-LERADvs.
AC-LERAD(DARPA/LL).
NumberofdetectionsFalsealarmsperdayA-LERADAC-LERAD5109101311201716ExperimentswereperformedtoseeifNULLattributeshelpindetectinganomaliesoriftheyformedmeaninglessrules.
WeaddedaconstraintthattheNULLvaluescouldnotbeaddedtotheattributevaluesintherules.
WecallthisvariantAC-LERAD(A-LERADwithconstraint).
Table2summarizestheresults.
A-LERADwasabletodetectmoreattacksthantheconstrainedcounterpart,suggestingthatruleswithNULLvaluedattributesarebeneficialtothedetectionofanomaliescorrespondingtoattacks.
AnalysisofanomaliesAnanomalyisadeviationfromnormalcyand,bydefinition,doesnotnecessarilyidentifythenatureofanattack.
Anomalydetectionservesasanearlywarningsystem;humansneedtoinvestigateifananomalyactuallycorrespondstoamaliciousactivity.
Theanomaliesthatledtotheattacksdetectedbyargument-basedvariantsofLERAD,inmanycases,donotrepresentthetruenatureoftheattacks.
Instead,itmayberepresentativeofbehavioralpatternsresultingfromtheexecutionofsomeotherprogramaftertheintrudersuccessfullygainedaccesstothehost.
Forexample,aninstanceofguestattackisdetectedbyA-LERADnotbyobservingattemptsbythehackertryingtogainaccess,butbyencounteringnovelargumentstotheioctlsystemcallwhichwasexecutedbythehackertryingtoperformacontrolfunctiononaparticulardevice.
Astealthypsattackwasdetectedbyoursystemwhentheintrudertriedtochangeownerusinganovelgroupid.
Eveniftheanomalyisrelatedtotheattackitself,itmayreflectverylittleinformationabouttheattack.
Oursystemisabletolearnonlyapartialsignatureoftheattack.
Guessftpisdetectedbyabadpasswordforanillegitimateusertryingtogainaccess.
However,theattackercouldhavemadeinterspersedattemptstoevadethesystem.
Attackswerealsodetectedbycapturingerrorscommittedbytheintruder,possiblytoevadetheIDS.
Ftpwriteisavulnerabilitythatexploitsaconfigurationerrorwhereinaremoteftpuserisabletosuccessfullycreateandaddfiles(suchas.
rhost)andgainaccesstothesystem.
Thisattackisdetectedbymonitoringthesubsequentactionsoftheintruder,whereinheattemptstosettheauditstateusinganinvalidpreselectionmask.
Thisanomalywouldgounnoticedinasystemmonitoringonlysystemcalls.
Table3.
TopanomalousattributesforA-LERAD.
AttributecausingfalsealarmWhethersomeattackwasdetectedbythesameattributeioctlargumentYesioctlreturnvalueYessetegidmaskYesopenreturnvalueNoopenerrorstatusNofcntlerrorstatusNosetpgrpreturnvalueNoWere-emphasizethatourgoalistodetectanomalies,theunderlyingassumptionbeingthatanomaliesgenerallycorrespondtoattacks.
Sincenotallanomalouseventsaremalicious,weexpectfalsealarmstobegenerated.
Table3liststheattributesresponsibleforthegenerationofalarmsandwhethertheseresultedinactualdetectionsornot.
Itisobservedthatsomeanomalieswerepartofbenignapplicationbehavior.
Atotherinstances,theanomalousvalueforthesameattributewasresponsiblefordetectingactualmaliciousexecutionofprocesses.
Asanexample,manyattacksweredetectedbyobservingnovelargumentsfortheioctlsystemcall,butmanyfalsealarmswerealsogeneratedbythisattribute.
Eventhoughnotallnovelvaluescorrespondtoanyillegitimateactivity,argument-basedanomalieswereinstrumentalindetectingtheattacks.
TimeandspacerequirementsComparedtosequence-basedmethods,ourtechniquesextractandutilizemoreinformation(systemcallargumentsandotherattributes),makingitimperativetostudythefeasibilityofourtechniquesforonlineusage.
Fort-stide,allcontiguoussystemcallsequencesoflength6arestoredduringtraining.
ForA-LERAD,systemcallsequencesandotherattributesarestored.
Inboththecases,spacecomplexityisoftheorderofO(n),wherenisthetotalnumberofsystemcalls,thoughtheA-LERADrequirementismorebyaconstantfactorksinceitstoresadditionalargumentinformation.
Duringdetection,A-LERADusesonlyasmallsetofrules(intherange14-25fortheapplicationsusedinourexperiments).
t-stide,ontheotherhand,stillrequirestheentiredatabaseoffixedlengthsequencesduringtesting,whichincurlargerspaceoverheadduringdetection.
Weconductedexperimentsonthetcshapplication,whichcomprisesofover2millionsystemcallsintrainingandhasover7millionsystemcallsintestdata.
TherulesformedbyA-LERADrequirearound1KBspace,apartfromamappingtabletomapstringsandintegers.
Thememoryrequirementsforstoringasystemcallsequencedatabasefort-stidewereover5KBplusamappingtablebetweenstringsandintegers.
TheresultssuggestthatA-LERADhasbettermemoryrequirementsduringthedetectionphase.
Wereiteratethatthetrainingcanbedoneoffline.
Oncetherulesaregenerated,A-LERADcanbeusedtodoonlinetestingwithlowermemoryrequirements.
ThetimeoverheadincurredbyA-LERADandt-stideinourexperimentsisgiveninTable4.
TheCPUtimeshavebeenobtainedonaSunUltra5workstationwith256MBRAMand400MHzprocessorspeed.
ItcanbeinferredfromtheresultsthatA-LERADisslowerthant-stide.
Duringtraining,t-stideisamuchsimpleralgorithmandprocesseslessdatathanA-LERADforbuildingamodelandhencet-stidehasamuchshortertrainingtime.
Duringdetection,t-stidejustneedstocheckifasequenceispresentinthedatabase,whichcanbeefficientlyimplementedwithahashtable.
Ontheotherhand,A-LERADneedstocheckifarecordmatchesanyofthelearnedrules.
Also,A-LERADhastoprocessadditionalargumentinformation.
Run-timeperformanceofA-LERADcanbeimprovedwithmoreefficientrulematchingalgorithm.
Also,t-stidewillincursignificantlylargertimeoverheadwhenthestoredsequencesexceedthememorycapacityanddiskaccessesbecomeunavoidable–A-LERADdoesnotencounterthisproblemaseasilyast-stidesinceitwillstilluseasmallsetofrules.
Moreover,therun-timeoverheadofA-LERADisabouttensofsecondsfordaysofdata,whichisreasonableforpracticalpurposes.
Table4.
Executiontimecomparison.
ApplicationTrainingTime(seconds)[on1weekofdata]TestingTime(seconds)[on2weeksofdata]t-stideA-LERADt-stideA-LERADftpd0.
190.
900.
190.
89telnetd0.
967.
121.
059.
79ufsdump6.
7630.
040.
421.
66tcsh6.
3229.
565.
9129.
38login2.
4115.
122.
4515.
97sendmail2.
7314.
793.
2319.
63quota0.
203.
040.
203.
01sh0.
212.
980.
403.
93ConclusionsInthispaper,weportrayedtheefficacyofincorporatingsystemcallargumentinformationandusedarule-learningalgorithmtomodelahost-basedanomalydetectionsystem.
Baseduponexperimentsonvariousdatasets,weclaimthatourargument-basedmodel,A-LERAD,detectedmoreattacksthanallthesequence-basedtechniques.
Oursequence-basedvariant(S-LERAD)wasalsoabletogeneralizebetterthantheprevalentsequencebasedtechniques,whichrelyonpurememorization.
Mergingargumentandsequenceinformationcreatesarichermodelforanomalydetection,asillustratedbytheempiricalresultsofM-LERAD.
M*-LERADdetectedlessernumberofattacksatlowerfalsealarmratessinceeveryanomalousattributeresultsinalarmsbeingraisedin6successivetuples,leadingtoeithermultipledetectionsofthesameattack(countedasasingledetection)ormultiplefalsealarms(allseparateentities).
Resultsalsoindicatedthatsequence-basedmethodshelpdetectU2RattackswhereasR2LandDoSattackswerebetterdetectedbyargument-basedmodels.
Ourargument-basedtechniquesdetecteddifferenttypesofanomalies.
Someanomaliesdidnotrepresentthetruenatureoftheattack.
Someattacksweredetectedbysubsequentanomaloususerbehavior,liketryingtochangegroupownership.
Someotheranomaliesweredetectedbylearningonlyaportionoftheattack,whilesomeweredetectedbycapturingintrudererrors.
Thoughourtechniquesincurhighertimeoverheadduetothecomplexityofourtechniques(sincemoreinformationisprocessed)ascomparedtot-stide,theybuildmoresuccinctmodelsthatincurmuchlessspaceoverhead–ourtechniquesaimtogeneralizefromthetrainingdata,ratherthanpurememorization.
Moreover,3secondsperday(themostanapplicationtookduringtestingphase)isreasonableforonlinesystems,eventhoughitissignificantlylongerthant-stide.
Thoughourtechniquesdiddetectmoreattackswithfewerfalsealarms,therearisesaneedformoresophisticatedattributes.
Insteadofhavingafixedsequence,wecouldextendourmodelstoincorporatevariablelengthsub-sequencesofsystemcalls.
Eventheargument-basedmodelsareoffixedwindowsize,creatinganeedforamodelacceptingvariedargumentinformation.
Ourtechniquescanbeeasilyextendedtomonitoraudittrailsincontinuum.
Sincewemodeleachapplicationseparately,somedegreeofparallelismcanalsobeachievedtotestprocesssequencesastheyarebeinglogged.
ReferencesAgrawal,R.
;Imielinski,T.
;andSwamiA.
1993.
Miningassociationrulesbetweensetsofitemsinlargedatabases.
ACMSIGMOD,207-216.
Forrest,S.
;Hofmeyr,S.
;Somayaji,A.
;andLongstaff,T.
1996.
ASenseofSelfforUNIXProcesses.
IEEESymposiumonSecurityandPrivacy,120-128.
Kendell,K.
1999.
ADatabaseofComputerAttacksfortheEvaluationofIntrusionDetectionSystems.
MastersThesis,MIT.
Kruegel,C.
;Mutz,D.
;Valeur,F.
;andVigna,G.
2003.
OntheDetectionofAnomalousSystemCallArguments,EuropeanSymposiumonResearchinComputerSecurity,326-343.
Lane,T.
,andBrodleyC.
E.
1997.
SequenceMatchingandLearninginAnomalyDetectionforComputerSecurity.
AAAIWorkshoponAIApproachestoFraudDetectionandRiskManagement,43-49.
Lippmann,R.
;Haines,J.
;Fried,D.
;Korba,J.
;andDas,K.
2000.
The1999DARPAOff-LineIntrusionDetectionEvaluation.
ComputerNetworks,34:579-595.
Mahoney,M.
,andChan,P.
2003.
LearningRulesforAnomalyDetectionofHostileNetworkTraffic,IEEEInternationalConferenceonDataMining,601-604.
Rigoutsos,I.
,andFloratos,A.
1998.
Combinatorialpatterndiscoveryinbiologicalsequences.
Bioinformatics,14(1):55-67.
Tan,K.
M.
C.
;Killourhy,K.
S.
;andMaxion,R.
A.
2002.
UndermininganAnomaly-basedIntrusionDetectionSystemUsingCommonExploits.
RAID,54-74.
Wagner,D.
,andSoto,P.
2002.
MimicryAttacksonHost-BasedIntrusionDetectionSystems.
ACMCCS,255-264.
Warrender,C.
;Forrest,S.
;andPearlmutter,B.
1999.
DetectingIntrusionsUsingSystemCalls:AlternativeDataModels.
IEEESymposiumonSecurityandPrivacy,133-145.
Wespi,A.
;Dacier,M.
;andDebar,H.
1999.
AnIntrusion-DetectionSystemBasedontheTeiresiasPattern-DiscoveryAlgorithm.
EICARConference,1-15.
Wespi,A.
;Dacier,M.
;andDebar,H.
2000.
Intrusiondetectionusingvariable-lengthaudittrailpatterns.
RAID,110-129.
Witten,I.
,andBell,T.
1991.
Thezero-frequencyproblem:estimatingtheprobabilitiesofnoveleventsinadaptivetextcompression.
IEEETrans.
onInformationTheory,37(4):1085-1094.

展开全文