medianlinuxcp

linuxcp  时间:2021-04-10  阅读:()
Chapter21DATACORPORAFORDIGITALFORENSICSEDUCATIONANDRESEARCHYorkYannikos,LukasGraner,MartinSteinebach,andChristianWinterAbstractDatacorporaareveryimportantfordigitalforensicseducationandre-search.
Severalcorporaareavailabletoacademia;theserangefromsmallmanually-createddatasetsofafewmegabytestomanyterabytesofreal-worlddata.
However,dierentcorporaaresuitedtodierentforensictasks.
Forexample,realdatacorporaareoftendesirablefortestingforensictoolpropertiessuchaseectivenessandeciency,butthesecorporatypicallylackthegroundtruththatisvitaltoperform-ingproperevaluations.
Syntheticdatacorporacansupporttooldevel-opmentandtesting,butonlyifthemethodologiesforgeneratingthecorporaguaranteedatawithrealisticproperties.
Thispaperpresentsanoverviewoftheavailabledigitalforensiccor-poraanddiscussestheproblemsthatmayarisewhenworkingwithspeciccorpora.
Thepaperalsodescribesaframeworkforgeneratingsyntheticcorporaforeducationandresearchwhensuitablereal-worlddataisnotavailable.
Keywords:Forensicdatacorpora,syntheticdiskimages,model-basedsimulation1.
IntroductionAdigitalforensicinvestigatormusthaveabroadknowledgeofforensicmethodologiesandexperiencewithawiderangeoftools.
Thisincludesmulti-purposeforensicsuiteswithadvancedfunctionalityandgoodus-abilityaswellassmalltoolsforspecialtasksthatmayhavemoderatetolowusability.
Gainingexpert-levelskillsintheoperationofforensictoolsrequiresasubstantialamountoftime.
Additionally,advancesinanalysismethods,toolsandtechnologiesrequirecontinuouslearningtomaintaincurrency.
G.
PetersonandS.
Shenoi(Eds.
):AdvancesinDigitalForensicsX,IFIPAICT433,pp.
309–325,2014.
cIFIPInternationalFederationforInformationProcessing2014310ADVANCESINDIGITALFORENSICSXIndigitalforensicseducation,itisimportanttoprovideinsightsintospecictechnologiesandhowforensicmethodsmustbeappliedtoper-formthoroughandsoundanalyses.
Itisalsoveryimportanttoprovidearichlearningenvironmentwherestudentscanuseforensictoolstorigorouslyanalyzesuitabletestdata.
Thesameistrueindigitalforensicsresearch.
Newmethodologiesandnewtoolshavetobetestedagainstwell-knowndatacorpora.
Thisprovidesabasisforcomparingmethodologiesandtoolssothatthead-vantagesandshortcomingscanbeidentied.
Forensicinvestigatorscanusetheresultsofsuchevaluationstomakeinformeddecisionsaboutthemethodologiesandtoolsthatshouldbeusedforspecictasks.
Thishelpsincreasetheeciencyandthequalityofforensicexaminationswhileallowingobjectiveevaluationsbythirdparties.
Thepaperprovidesanoverviewofseveralreal-worldandsyntheticdatacorporathatareavailablefordigitalforensicseducationandre-search.
Also,ithighlightsthepotentialrisksandproblemsencounteredwhenusingdatacorpora,alongwiththecapabilitiesofexistingtoolsthatallowthegenerationofsyntheticdatacorporawhenreal-worlddataisnotavailable.
Additionally,thepaperdescribesacustomframeworkforsyntheticdatagenerationandevaluatestheperformanceoftheframe-work.
2.
AvailableDataCorporaSeveraldatacorporahavebeenmadeavailableforpublicuse.
Whilesomeofthecorporaareusefulfordigitalforensicseducationandre-search,othersaresuitedtoveryspecicareassuchasnetworkforensicsandforensiclinguistics.
Thissectionpresentsanoverviewofthemostrelevantcorpora.
2.
1RealDataCorpusAfewreal-worlddatacorporaareavailabletosupportdigitalforen-sicseducationandresearch.
Garnkel,etal.
[7]havecreatedtheRealDataCorpusfromusedharddisksthatwerepurchasedfromaroundtheworld.
Inalaterwork,Garnkel[5]describedthechallengesandlessonslearnedwhilehandlingtheRealDataCorpus,whichbythenhadgrowntomorethan30terabytes[5].
AsofSeptember2013,theRealDataCorpusincorporated1,289harddiskimages,643ashmemoryimagesand98opticaldiscs.
However,becausethiscorpuswaspartlyfundedbytheU.
S.
Government,accesstothecorpusrequirestheapprovalofaninstitutionalreviewboardinaccordancewithU.
S.
legislation.
Ad-Yannikos,Graner,Steinebach&Winter311ditionalinformationaboutthecorpusanditsaccessrequirementsareavailableat[6].
Asmallercorpus,whichincludesspecicscenarioscreatedforeduca-tionalpurposes[25],canbedownloadedwithoutanyrestrictions.
Thissmallercorpuscontains:Threetestdiskimagescreatedespeciallyforeducationalandtest-ingpurposes(e.
g.
,lesystemanalysis,lecarvingandhandlingencodings).
FourrealisticdiskimagesetscreatedfromUSBmemorysticks,adigitalcameraandaWindowsXPcomputer.
Asetofalmost1,000,000les,including109,282JPEGles.
Fivephoneimagesfromfourdierentcellphonemodels.
Mixeddatacorrespondingtothreectionalscenariosforeduca-tionalpurposes,includingmultiplenetworkpacketdumpsanddiskimages.
Duetothevarietyofdataitcontains,theRealDataCorpusisavaluableresourceforeducatorsandresearchersintheareasofmulti-mediaforensics,mobilephoneforensicsandnetworkforensics.
Toourknowledge,itisthelargestpublicly-availablecorpusintheareaofdigitalforensics.
2.
2DARPAIntrusionDetectionDataSetsIn1998and1999,researchersatMITLincolnLaboratory[12,13]createdasimulationnetworkinordertoproducenetworktracandauditlogsforevaluatingintrusiondetectionsystems.
Thesimulatedin-frastructurewasattackedusingwell-knowntechniquesaswellasnewtechniquesthatwerespeciallydevelopedfortheevaluation.
In2000,additionalexperimentswereperformedinvolvingspecicscenarios,in-cludingtwoDDoSattacksandanattackonaWindowsNTsystem.
Thedatasetsforallthreeexperimentsareavailableat[11];theyin-cludenetworktracdataintcpdumpformat,auditlogsandlesystemsnapshots.
Themethodologiesemployedinthe1998and1999evaluationswerecriticizedbyMcHugh[16].
McHughstatesthattheevaluationresultsmissimportantdetailsandthatportionsoftheevaluationproceduresareunclearorinappropriate.
Additionally,Garnkel[4]pointsoutthatthedatasetsdonotrepresentreal-worldtracbecausetheylackcomplexityandheterogeneity.
Therefore,thiscorpushaslimiteduseinnetworkforensicsresearch.
312ADVANCESINDIGITALFORENSICSX2.
3MemCorpCorpusTheMemCorpCorpus[22]containsmemoryimagescreatedfromsev-eralvirtualandphysicalmachines.
Inparticular,thecorpuscontainsimagesextractedfrom87computersystemsrunningvariousversionsofMicrosoftWindows;theimageswereextractedusingcommonmemoryimagingtools.
Thecorpusincludesthefollowingimages:53systemmemoryimagescreatedfromvirtualmachines.
23systemmemoryimagescreatedfromphysicalmachineswithfactorydefaultcongurations(i.
e.
,withnoadditionalsoftwarein-stalled).
11systemmemoryimagescreatedfrommachinesunderspecicscenarios(e.
g.
,aftermalwarewasinstalled).
Thiscorpussupportseducationandtrainingeortsfocusedonmem-oryanalysisusingtoolssuchastheVolatileFramework[23].
However,asnotedbythecorpuscreator[22],thecorpusdoesnotcontainimagescreatedfromreal-worldsystemsorimagesfromoperatingsystemsotherthanMicrosoftWindows,whichreducesitsapplicability.
ThecreatoroftheMemCorpCorpusprovidesaccesstotheimagesuponrequest.
2.
4MORPHCorpusSeveralcorporahavebeencreatedintheareaoffacerecognition[8].
Sincealargecorpuswithfacialimagestaggedwithageinformationwouldbeveryusefulformultimediaforensics,wehavepickedasamplecorpusthatcouldbeavaluableresourceforresearch(e.
g.
,fordetectingofillegalmultimediacontentlikechildpornography).
TheMORPHCorpus[20]comprises55,000uniquefacialimagesofmorethan13,000individuals.
Theagesoftheindividualsrangefrom16to77withamedianageof33.
Fourimagesonaverageweretakenofeachindividualwithanaveragetimeof164daysbetweeneachimage.
Facialimagesannotatedwithageinformationareusefulfordevelop-ingautomatedagedetectionsystems.
Currently,noreliablemethods(i.
e.
,withlowerrorrates)existforageidentication.
Steinebach,etal.
[21]haveemployedfacerecognitiontechniquestoidentifyknownil-legalmultimediacontent,buttheydidnotconsiderageclassication.
2.
5EnronCorpusTheEnronCorpusintroducedin2004isawell-knowncorpusintheareaofforensiclinguistics[9].
Initsrawform,thecorpuscontainsYannikos,Graner,Steinebach&Winter313619,446emailmessagesfrom158executivesofEnronCorporation;theemailmessageswereseizedduringtheinvestigationofthe2001Enronscandal.
Afterdatacleansing,thecorpuscontains200,399messages.
TheEnronCorpusisoneofthemostreferencedmasscollectionsofreal-worldemaildatathatispubliclyavailable.
Thecorpusprovidesavaluablebasisforresearchonemailclassi-cation,animportantareainforensiclinguistics.
KlimtandYang[10]suggestusingthreadmembershipdetectionforemailclassicationandprovidetheresultsofbaselineexperimentsconductedwiththeEnronCorpus.
DatasetsfromtheEnronCorpusareavailableat[3].
2.
6GlobalIntelligenceFilesInFebruary2012,WikiLeaksstartedpublishingtheGlobalIntelli-genceFiles,alargecorpusofemailmessagesgatheredfromthein-telligencecompanyStratfor.
WikiLeaksclaimstopossessmorethan5,000,000emailmessagesdatedbetweenJuly2004andDecember2011.
AsofSeptember2013,almost3,000,000ofthesemessageshavebeenavailablefordownloadbythepublic[24].
WikiLeakscontinuestore-leasenewemailmessagesfromthecorpusonanalmostdailybasis.
LiketheEnronCorpus,theGlobalIntelligenceFileswouldprovideavaluablebasisforresearchinforensiclinguistics.
However,wearenotawareofanysignicantresearchconductedusingtheGlobalIntelligenceFiles.
2.
7ComputerForensicReferenceDataSetsTheComputerForensicReferenceDataSetsmaintainedbyNIST[19]isasmalldatacorpuscreatedfortrainingandtestingpurposes.
Thedatasetsincludetestcasesforlecarving,systemmemoryanalysisandstringsearchusingdierentencodings.
Thecorpuscontainsthefollowingdata:Onehackingcasescenario.
Twoimagesforunicodestringsearches.
Fourimagesforlesystemanalysis.
Oneimageformobiledeviceanalysis.
Oneimageforsystemmemoryanalysis.
Twoimagesforverifyingtheresultsofforensicimagingtools.
Thiscorpusprovidesasmallbutvaluablereferencesetfortooldevel-opers.
Itisalsosuitablefortraininginforensicanalysismethods.
314ADVANCESINDIGITALFORENSICSX3.
PitfallsofDataCorporaForensiccorporaareveryusefulforeducationandresearch,buttheyhavecertainpitfalls.
SolutionSpecicity:Whileacorpusisveryvaluablewhende-velopingmethodologiesandtoolsthatsolveresearchproblemsindigitalforensics,itisdiculttondgeneralsolutionsthatarenotsomehowtailoredtothecorpus.
Evenwhenasolutionisintendedtoworkingeneral(withdierentcorporaandintherealworld),researchanddevelopmenteortsoftenslowlyadaptthesolutiontothecorpusovertime,probablywithoutevenbeingnoticedbytheresearchers.
Forexample,theEnronCorpusiswidelyusedbytheforensicslinguisticscommunityasasinglebasisforresearchonemailclassication.
Itwouldbediculttoshowthattheresearchresultsbasedonthiscorpusapplytogeneralemailclassicationproblems.
Thiscouldalsobecomeanissueif,forinstance,ageneralmethod-ologyortoolthatsolvesaspecicproblemalreadyexists,andanotherresearchgroupisworkingtoenhancethesolution.
Usingonlyonecorpusduringdevelopmentincreasestheriskofcraftingasolutionthatmaybemoreeectiveandecientthanprevioussolutions,butonlywhenusedwiththatspeciccorpus.
LegalIssues:ThedataincorporasuchasGarnkel'sRealDataCorpuscreatedfromusedharddisksboughtfromthesecondarymarketmaybesubjecttointellectualpropertyandpersonalpri-vacylaws.
Evenifthecountrythathoststhereal-worldcorpusallowsitsuseforresearch,legalrestrictionscouldbeimposedbyasecondcountryinwhichtheresearchthatusesthecorpusisbeingconducted.
Theworstcaseiswhenlocallawscompletelyprohibittheuseofthecorpus.
Relevance:Datacorporaareoftencreatedassnapshotsofaspe-cicscenariosorenvironments.
Thedatacontainedincorporaoftenlosesitsrelevanceasitages.
Forexample,networktracfromthe1990sisquitedierentfromcurrentnetworktrac–afactthatwaspointedoutfortheDARPAIntrusionDetectionDataSets[4,16].
Anotherexampleisadatacorpuscontainingdataex-tractedfrommobilephones.
Suchacorpusmustbeupdatedveryfrequentlywithdatafromthelatestdevicesifitistobeusefulformobilephoneforensics.
Yannikos,Graner,Steinebach&Winter315ScenarioModelSyntheticDataSimulationPurposeFigure1.
Generatingsyntheticdatabasedonareal-worldscenario.
Transferability:Manydatacorporaarecreatedortakenfromspeciclocalenvironments.
TheemailmessagesintheEnronCor-pusareinEnglish.
Whilethiscorpusisvaluabletoforensiclin-guistsinEnglish-speakingcountries,itsvaluetoresearchersfo-cusedonotherlanguagesisdebatable.
Indeed,manyimportantpropertiesthatarerelevanttoEnglishandusedforemailclassi-cationmaynotbeapplicabletoArabicorMandarinChinese.
Likewise,corporadevelopedfortestingforensictoolsthatana-lyzespecicapplications(e.
g.
,instantmessagingsoftwareandchatclients)maynotbeusefulinothercountriesbecauseofdierencesinjargonandcommunicationpatterns.
Also,acorpusthatmostlyincludesFacebookpostsandIRClogsmaynotbeofmuchvalueinacountrywheretheseservicesarenotpopular.
4.
SyntheticDataCorpusGenerationAsidefrommethodologiesforcreatingsyntheticdatacorporabyman-uallyreproducingreal-worldactions,littleresearchhasbeendonerelatedtotool-supportedsyntheticdatacorpusgeneration.
MochandFreil-ing[17]havedevelopedForensig2,atoolthatgeneratessyntheticdiskimagesusingvirtualmachines.
Whiletheprocessforgeneratingdiskimageshastobeprogrammedinadvance,thetoolallowsrandomnesstobeintroducedinordertocreatesimilar,butnotidentical,diskimages.
Inamorerecentwork,MochandFreiling[18]presenttheresultsofanevaluationofForensig2appliedtostudenteducationscenarios.
Amethodologyforgeneratingasyntheticdatacorpusforforensicac-countingisproposedin[14]andevaluatedin[15].
Theauthorsdemon-stratehowtogeneratesyntheticdatacontainingfraudulentactivitiesfromsmallercollectionsofreal-worlddata.
Thedataisthenusedfortrainingandtestingafrauddetectionsystem.
5.
CorpusGenerationProcessThissectiondescribestheprocessforgeneratingasyntheticdatacor-pususingthemodel-basedframeworkpresentedin[27].
Figure1presentsthesyntheticdatagenerationprocess.
Therststepingeneratingasyntheticdatacorpusistodenethedatausecases.
For316ADVANCESINDIGITALFORENSICSXexample,inadigitalforensicsclass,wherestudentswillbetestedontheirknowledgeaboutharddiskanalysis,oneormoresuitablediskimageswouldberequiredforeachstudent.
ThestudentswouldhavetosearchthediskimagesfortracesofmalwareorrecovermultimediadatafragmentsusingtoolssuchasForemost[1]andSleuthKit[2].
Thediskimagescouldbecreatedinareasonableamountoftimeman-uallyorviascripting.
However,ifeverystudentshouldreceivedierentdiskimagesforanalysis,thensignicanteortmayhavetobeexpendedtoinsertvariationsintheimages.
Also,ifdierenttasksareassignedtodierentstudents(e.
g.
,onestudentshouldrecoverJPEGlesandanotherstudentshouldsearchfortracesofarootkit),moresignicantvariationswouldhavetobeincorporatedinthediskimages.
Thesecondstepinthecorpusgenerationprocessistospecifyareal-worldscenarioinwhichtherequiredkindofdataistypicallycreated.
Oneexampleisacomputerthatisusedbymultipleindividuals,whotypicallyinstallandremovesoftware,anddownload,copy,deleteandoverwriteles.
Thethirdstepistocreateamodeltomatchthisscenarioandserveasthebasisofasimulation,whichisthelaststep.
AMarkovchainconsistingofstatesandstatetransitionscanbecreatedtomodeluserbehavior.
Thestatescorrespondtotheactionsperformedbytheusersandthetransitionsspecifytheactionsthatcanbeperformedaftertheprecedingactions.
5.
1ScenarioModelingusingMarkovChainsFinitediscrete-timeMarkovchainsasdescribedin[26]areusedforsyntheticdatageneration.
OneMarkovchainiscreatedforeachtypeofsubjectwhoseactionsaretobesimulated.
Asubjectcorrespondstoauserwhoperformsactionsonaharddisksuchassoftwareinstallationsandledeletions.
ThestatesintheMarkovchaincorrespondtotheactionsperformedbythesubjectinthescenario.
Inordertoconstructasuitablemodel,itisnecessarytorstde-nealltheactions(states)thatcausedatatobecreatedanddeleted.
Thetransitionsbetweenactionsarethendened.
Followingthis,theprobabilityofeachactionisspecied(stateprobability)alongwiththeprobabilityofeachtransitionbetweentwoactions(transitionprobabil-ity);theprobabilitiesareusedduringtheMarkovchainsimulationtogeneraterealisticdata.
Thecomputationoffeasibletransitionproba-bilitiesgivenstateprobabilitiescaninvolvesomeeort,buttheprocesshasbeensimpliedin[28].
Yannikos,Graner,Steinebach&Winter317Next,thenumberofsubjectswhoperformtheactionsarespecied(e.
g.
,numberofindividualswhosharethecomputer).
Finally,thedetailsofeachpossibleactionarespecied(e.
g.
,whatexactlyhappensduringadownloadleactionoradeleteleaction).
5.
2Model-BasedSimulationHavingconstructedamodelofthedesiredreal-worldscenario,itisnecessarytoconductasimulationbasedonthemodel.
Thenumberofactionstobeperformedbyeachuserisspeciedandthesimulationisthenstarted.
Attheendofthesimulation,thediskimagecontainssyntheticdatacorrespondingtothemodeledreal-worldscenario.
5.
3SampleScenarioandModelTodemonstratethesyntheticdatagenerationprocess,weconsiderasamplescenario.
Thepurposeforgeneratingthesyntheticdataistotesthowdierentlecarversdealwithfragmenteddata.
Thereal-worldscenarioinvolvesanindividualwhousesanUSBmemorysticktotransferlargeamountsofles,mainlyphotographs,betweencomputers.
Inthefollowing,wedeneallthecomponentsinamodelthatwouldfacilitatethecreationofasyntheticdiskimageofaUSBmemorystickcontainingalargenumberofles,deletedlesandlefragments.
Theresultingdiskimagewouldbeusedtotesttheabilityoflecarverstoreconstructfragmenteddata.
States:Inthesamplemodel,thefollowingfouractionsaredenedasMarkovchainstates:1.
AddDocumentFile:Thisactionaddsadocumentle(e.
g.
,PDForDOC)tothelesystemofthesyntheticdiskimage.
ItisequivalenttocopyingalefromoneharddisktoanotherusingtheLinuxcpcommand.
2.
AddImageFile:Thisactionaddsanimagele(e.
g.
,JPEG,PNGorGIF)tothelesystem.
Again,itisequivalenttousingtheLinuxcpcommand.
3.
WriteFragmentedData:Thisactiontakesarandomimagele,cutsitintomultiplefragmentsandwritesthefragmentstothediskimage,ignoringthelesystem.
ItisequivalenttousingtheLinuxddforeachlefragment.
4.
DeleteFile:Thisactionremovesarandomlefromthelesystem.
ItisequivalenttousingtheLinuxrmcommand.
318ADVANCESINDIGITALFORENSICSX3124Figure2.
Markovchainusedtogenerateasyntheticdiskimage.
Transitions:Next,thetransitionsbetweentheactionsarede-ned.
Sincethetransitionsarenotreallyimportantinthescenario,theMarkovchainissimplyconstructedasacompletedigraph(Fig-ure2).
ThestatenumbersintheMarkovchaincorrespondtothestatenumbersspeciedabove.
StateProbabilities:Next,theprobabilityπiofeachaction(state)itobeperformedduringaMarkovchainsimulationisspecied.
Wechosethefollowingprobabilitiesfortheactionstoensurethatalargenumberoflesandlefragmentsareaddedtothesyntheticdiskimageandonlyamaximumofabouthalfoftheaddedlesaredeleted:π=(π1,π4)=(0.
2,0.
2,0.
4,0.
2).
StateTransitionProbabilities:Finally,thefeasibleprobabil-itiesforthetransitionsbetweentheactionsarecomputed.
Theframeworkisdesignedtocomputethetransitionprobabilitiesau-tomatically.
Onepossibleresultisthesimplesetoftransitionprobabilitiesspeciedinthematrix:P=0.
20.
20.
40.
20.
20.
20.
40.
20.
20.
20.
40.
20.
20.
20.
40.
2wherepijdenotestheprobabilityofatransitionfromactionitoactionj.
6.
CorporaGenerationFrameworkTheframeworkdevelopedforgeneratingsyntheticdiskimagesisim-plementedinJava1.
7.
ItusesamodulardesignwithasmallsetofcoreYannikos,Graner,Steinebach&Winter319Figure3.
Screenshotofthemodelbuilder.
components,agraphicaluserinterface(GUI)andmodulesthatprovidespecicfunctionality.
TheGUIprovidesamodelbuildinginterfacethatallowsamodeltobecreatedquicklyforaspecicscenariousingtheactionsavailableintheframework.
Additionally,animageviewerisimplementedtoprovidedetailedviewsofthegeneratedsyntheticdiskimages.
Newactionsintheframeworkcanbeaddedbyimplementingasmallnumberofinterfacesthatrequireminimalprogrammingeort.
Sincetheframeworksupportsthespecicationandexecutionofanabstractsyntheticdatagenerationprocess,newactionscanbeimplementedinde-pendentlyofascenarioforwhichasyntheticdiskimageisbeingcreated.
Forexample,itispossibletoworkonacompletelydierentscenariowherenancialdataistobecreatedinanenterpriserelationshipman-agementsystem.
Thecorrespondingactionsthatarerelevanttocreatingthenancialdatacanbeimplementedinastraightforwardmatter.
ThescreenshotinFigure3showsthemodelbuildercomponentoftheframework.
TheMarkovchainusedforgeneratingdatacorrespondingtothesamplescenarioisshowninthecenterofthegure(greenbox).
7.
FrameworkEvaluationThissectionevaluatestheperformanceoftheframework.
Thesamplemodeldescribedaboveisexecutedtosimulateacomputeruserwhoper-formswriteanddeleteactionsonaUSBmemorystick.
Theevaluationsetupisasfollows:Model:DescribedinSection5.
3.
320ADVANCESINDIGITALFORENSICSXDiscreteSimulationSteps:4,000actions.
SyntheticDiskImageSize:2,048MiB(USBmemorystick).
Filesystem:FAT32with4,096-byteclustersize.
AddDocumentFileAction:Adocument(e.
g.
,DOC,PDForTXT)leisrandomlycopiedfromalocallesourcecontaining139documentles.
AddImageFileAction:Animage(e.
g.
,PNG,JPEGorGIF)leisrandomlycopiedfromalocallesourcecontaining752imageles.
DeleteFileAction:Aleisrandomlychosenanddeletedfromthelesystemofthesyntheticdiskimagewithoutoverwriting.
WriteFragmentedDataAction:Animageleisrandomlychosenfromthelocallesourcecontaining752imageles.
Theleiswrittentothelesystemofthesyntheticdiskimageusingarandomnumberoffragmentsbetween2and20,arandomfragmentsizecorrespondingtoamultipleofthelesystemclustersizeandrandomly-selectedcluster-alignedlocationsforfragmentinsertion.
Twentysimulationsofthemodelwereexecutedusingthesetup.
Aftereachrun,thetimeneededtocompletelygeneratethesyntheticdiskimagewasassessed,alongwiththeamountofdiskspaceused,numberoflesdeleted,numberoflesstillavailableinthelesystemandnumberofdierentlefragmentswrittentotheimage.
Figure4(a)showsthetimerequiredbyframeworktoruneachsim-ulation.
Ontheaverage,asimulationrunwascompletedin2minutesand21seconds.
Figure4(b)presentsanoverviewofthenumbersoflesthatwereallocatedinanddeletedfromthesyntheticdiskimages.
Notethattheallocated(created)lesareshowninlightgraywhilethedeletedlesareshownindarkgray;theaveragevalueisshownasagrayline.
Ontheaverage,adiskimagecontained792allocatedlesand803deletedles,whichareexpectedduetotheprobabilitieschosenfortheactionsinthemodel.
Figure5(a)showstheuseddiskspaceinthesyntheticimagecor-respondingtoallocatedles(lightgray),deletedles(gray)andlefragments(darkgray).
Theusedspacediersconsiderablyoverthesimulationrunsbecauseonlythenumbersoflestobewrittenanddeletedfromthediskimageweredened(individuallesizeswerenotspecied).
SincetheleswerechosenrandomlyduringthesimulationYannikos,Graner,Steinebach&Winter3211234567891011121314151617181920050100150200128137131135154165143151134131136135156123150147161118142151SimulationRun(a)Timerequiredforeachsimulationrun.
1,0005000906NumberofFilesSimulationRun1234567891011121314151617181920749902742753832854767808795791797808833797807759816778801706845782782728818854777714841770825861772742811770827786816(b)Numbersofallocatedlesanddeletedles.
Figure4.
Evaluationresultsfor20simulationruns.
runs,thelesizesand,therefore,thediskspaceusagedier.
Ontheaverage,57%oftheavailablediskspacewasused.
Figure5(b)showstheaveragenumberoflefragmentsperletypeoverall20simulationruns.
Thewritingoffragmenteddatausedadedicatedlesourcecontainingonlypictures;thisexplainsthelargenumbersofJPEGandPNGfragments.
Figure6showsascreenshotoftheimageviewerprovidedbytheframe-work.
Informationsuchasthedatatype,fragmentsizeandlesystemstatus(allocatedanddeleted)isprovidedforeachblock.
8.
ConclusionsTheframeworkpresentedinthispaperiswell-suitedtoscenario-basedmodelbuildingandsyntheticdatageneration.
Inparticular,itprovidesaexibleandecientapproachforgeneratingsyntheticdatacorpora.
The322ADVANCESINDIGITALFORENSICSX34.
7784.
3854.
0252.
06UnusedDiskSpace(%)71.
6274.
7351.
5262.
9968.
8546.
4743.
2759.
9759.
8935.
7158.
3264.
1161.
1239.
3647.
4167.
90100500(a)Useddiskspacecorrespondingtoallocatedles,deletedlesandlefragments.
bmpepsgifjpgmovmp4pdfpngsvgtifzip10210310428601514866112492242FileType13,5463,061(b)Averagenumberoffragmentsperletype.
Figure5.
Evaluationresultsfor20simulationruns.
experimentalevaluationofcreatingasyntheticdiskimagefortestingthefragmentrecoveryperformanceoflecarversdemonstratestheutilityfortheframework.
Unlikereal-worldcorpora,syntheticcorporaprovidegroundtruthdatathatisveryimportantindigitalforensicseducationandresearch.
Thisenablesstudentsaswellasdevelopersandtesterstoacquiredetailedunderstandingofthecapabilitiesandperformanceofdigitalforensictools.
Theabilityoftheframeworktogeneratesyntheticcorporabasedonrealisticscenarioscansatisfytheneedfortestdatainapplicationsforwhichsuitablereal-worlddatacorporaarenotavailable.
Moreover,theframeworkisgenericenoughtoproducesyntheticcorporaforavarietyofdomains,includingforensicaccountingandnetworkforensics.
Yannikos,Graner,Steinebach&Winter323Figure6.
Screenshotoftheimageviewer.
AcknowledgementThisresearchwassupportedbytheCenterforAdvancedSecurityResearchDarmstadt(CASED).
References[1]AirForceOceofSpecialInvestigations,Foremost(foremost.
sourceforge.
net),2001.
[2]B.
Carrier,TheSleuthKit(www.
sleuthkit.
org/sleuthkit),2013.
[3]W.
Cohen,EnronEmailDataset,SchoolofComputerScience,CarnegieMellonUniversity,Pittsburgh,Pennsylvania(www.
cs.
cmu.
edu/~enron),2009.
[4]S.
Garnkel,Forensiccorpora,achallengeforforensicresearch,un-publishedmanuscript,2007.
[5]S.
Garnkel,Lessonslearnedwritingdigitalforensicstoolsandman-aginga30TBdigitalevidencecorpus,DigitalInvestigation,vol.
9(S),pp.
S80–S89,2012.
[6]S.
Garnkel,DigitalCorpora(digitalcorpora.
org),2013.
[7]S.
Garnkel,P.
Farrell,V.
RoussevandG.
Dinolt,Bringingsci-encetodigitalforensicswithstandardizedforensiccorpora,DigitalInvestigation,vol.
6(S),pp.
S2–S11,2009.
[8]M.
GrgicandK.
Delac,FaceRecognitionHomepage,Zagreb,Croa-tia(www.
face-rec.
org/databases),2013.
324ADVANCESINDIGITALFORENSICSX[9]B.
KlimtandY.
Yang,IntroducingtheEnronCorpus,presentedattheFirstConferenceonEmailandAnti-Spam,2004.
[10]B.
KlimtandY.
Yang,TheEnronCorpus:Anewdatasetforemailclassicationresearch,ProceedingsoftheFifteenthEuropeanCon-ferenceonMachineLearning,pp.
217–226,2004.
[11]LincolnLaboratory,MassachusettsInstituteofTechnology,DARPAIntrusionDetectionDataSets,Lexington,Massachusetts(www.
ll.
mit.
edu/mission/communications/cyber/CSTcorpora/ideval/data),2013.
[12]R.
Lippmann,D.
Fried,I.
Graf,J.
Haines,K.
Kendall,D.
McClung,D.
Weber,S.
Webster,D.
Wyschogrod,R.
CunninghamandM.
Zissman,Evaluatingintrusiondetectionsystems:The1998DARPAo-lineintrusiondetectionevaluation,ProceedingsoftheDARPAInformationSurvivabilityConferenceandExposition,vol.
2,pp.
12–26,2000.
[13]R.
Lippmann,J.
Haines,D.
Fried,J.
KorbaandK.
Das,The1999DARPAo-lineintrusiondetectionevaluation,ComputerNetworks,vol.
34(4),pp.
579–595,2000.
[14]E.
Lundin,H.
KvarnstromandE.
Jonsson,Asyntheticfrauddatagenerationmethodology,ProceedingsoftheFourthInternationalConferenceonInformationandCommunicationsSecurity,pp.
265–277,2002.
[15]E.
LundinBarse,H.
KvarnstromandE.
Jonsson,Synthesizingtestdataforfrauddetectionsystems,ProceedingsoftheNineteenthAnnualComputerSecurityApplicationsConference,pp.
384–394,2003.
[16]J.
McHugh,Testingintrusiondetectionsystems:Acritiqueofthe1998and1999DARPAintrusiondetectionsystemevaluationsasperformedbyLincolnLaboratory,ACMTransactionsonInforma-tionandSystemSecurity,vol.
3(4),pp.
262–294,2000.
[17]C.
MochandF.
Freiling,TheForensicImageGeneratorGenerator(Forensig2),ProceedingsoftheFifthInternationalConferenceonITSecurityIncidentManagementandITForensics,pp.
78–93,2009.
[18]C.
MochandF.
Freiling,EvaluatingtheForensicImageGeneratorGenerator,ProceedingsoftheThirdInternationalConferenceonDigitalForensicsandCyberCrime,pp.
238–252,2011.
[19]NationalInstituteofStandardsandTechnology,TheCFReDSProject,Gaithersburg,Maryland(www.
cfreds.
nist.
gov),2013.
Yannikos,Graner,Steinebach&Winter325[20]K.
RicanekandT.
Tesafaye,Morph:Alongitudinalimagedatabaseofnormaladultage-progression,ProceedingsoftheSeventhInter-nationalConferenceonAutomaticFaceandGestureRecognition,pp.
341–345,2006.
[21]M.
Steinebach,H.
LiuandY.
Yannikos,FaceHash:Facedetectionandrobusthashing,presentedattheFifthInternationalConferenceonDigitalForensicsandCyberCrime,2013.
[22]T.
Vidas,MemCorp:Anopendatacorpusformemoryanalysis,ProceedingsoftheForty-FourthHawaiiInternationalConferenceonSystemSciences,2011.
[23]Volatilty,TheVolatilityFramework(code.
google.
com/p/volatility),2014.
[24]WikiLeaks,TheGlobalIntelligenceFiles(wikileaks.
org/the-gifiles.
html),2013.
[25]K.
Woods,C.
Lee,S.
Garnkel,D.
Dittrich,A.
RussellandK.
Kearton,Creatingrealisticcorporaforsecurityandforensiceduca-tion,ProceedingsoftheADFSLConferenceonDigitalForensics,SecurityandLaw,2011.
[26]Y.
Yannikos,F.
Franke,C.
WinterandM.
Schneider,3LSPG:Forensictoolevaluationbythreelayerstochasticprocess-basedgen-erationofdata,ProceedingsoftheFourthInternationalConferenceonComputationalForensics,pp.
200–211,2010.
[27]Y.
YannikosandC.
Winter,Model-basedgenerationofsyntheticdiskimagesfordigitalforensictooltesting,ProceedingsoftheEighthInternationalConferenceonAvailability,ReliabilityandSecurity,pp.
498–505,2013.
[28]Y.
Yannikos,C.
WinterandM.
Schneider,Syntheticdatacre-ationforforensictooltesting:Improvingperformanceofthe3LSPGFramework,ProceedingsoftheSeventhInternationalConferenceonAvailability,ReliabilityandSecurity,pp.
613–619,2012.

A400互联1H/1G/10M/300G流量37.8元/季

A400互联是一家成立于2020年的商家,本次给大家带来的是,全新上线的香港节点,cmi+cn2线路,全场香港产品7折优惠,优惠码0711,A400互联,只为给你提供更快,更稳,更实惠的套餐。目前,商家推出香港cn2节点+cmi线路云主机,1H/1G/10M/300G流量,37.8元/季,云上日子,你我共享。A400互联优惠码:七折优惠码:0711A400互联优惠方案:适合建站,个人开发爱好者配置...

Digital-VM:服务器,$80/月;挪威/丹麦英国/Digital-VM:日本/新加坡/digital-vm:日本VPS仅$2.4/月

digital-vm怎么样?digital-vm在今年1月份就新增了日本、新加坡独立服务器业务,但是不知为何,期间终止了销售日本服务器和新加坡服务器,今天无意中在webhostingtalk论坛看到Digital-VM在发日本和新加坡独立服务器销售信息。服务器硬件是 Supermicro、采用最新一代 Intel CPU、DDR4 RAM 和 Enterprise Samsung SSD内存,默认...

香港、美国、日本、韩国、新加坡、越南、泰国、加拿大、英国、德国、法国等VPS,全球独立服务器99元起步 湘南科技

全球独立服务器、站群多IP服务器、VPS(哪个国家都有),香港、美国、日本、韩国、新加坡、越南、泰国、加拿大、英国、德国、法国等等99元起步,湘南科技郴州市湘南科技有限公司官方网址:www.xiangnankeji.cn产品内容:全球独立服务器、站群多IP服务器、VPS(哪个国家都有),香港、美国、日本、韩国、新加坡、越南、泰国、加拿大、英国、德国、法国等等99元起步,湘南科技VPS价格表:独立服...

linuxcp为你推荐
酒店回应名媛拼单泰国酒店写错入住人姓名有影响吗?公司网络被攻击最近企业受到网络攻击的事件特别多,怎么才能有效地保护企业的网络安全呢?12306崩溃亲们,为什么12306手机订票系统打不开,显示网络异常,对对塔为什么不能玩天天擂台?(对对塔)微信回应封杀钉钉微信发过来的钉钉链接打不开?蓝色骨头手机宠物的一个蓝色骨头代表多少级,灰色又代表多少级,另外假如有骨头又代表多少级刘祚天你们知道21世纪的DJ分为几种类型吗?(答对者重赏)psbc.comwap.psbc.com网银激活777k7.comwww 地址 777rv怎么打不开了,还有好看的吗>comwww.36ybyb.com有什么网址有很多动漫可以看的啊?我知道的有www.hnnn.net.很多好看的!但是...都看了!我想看些别人哦!还有优酷网也不错...
顶级域名 查询域名 budgetvm softlayer namecheap wdcp 512m 太原联通测速平台 中国电信测速112 已备案删除域名 789电视剧 银盘服务是什么 google台湾 网通服务器 shuang12 百度云空间 qq金券 七牛云存储 深圳主机托管 闪讯网 更多