proceduresns

thinksns  时间:2021-02-12  阅读:()
FindingCluesforYourSecrets:Semantics-Driven,Learning-BasedPrivacyDiscoveryinMobileAppsYuhongNan,ZheminYang,XiaofengWang,YuanZhang,DonglaiZhuandMinYang§SchoolofComputerScience,FudanUniversityShanghaiInsituteofIntelligentElectronics&Systems§ShanghaiInstituteforAdvancedCommunicationandDataScienceShanghaiKeyLaboratoryofDataScienceIndianaUniversityBloomington{nanyuhong,yangzhemin,yuanxzhang,zhudl,myang}@fudan.
edu.
cn,xw7@indiana.
eduAbstract—Along-standingchallengeinanalyzinginformationleakswithinmobileappsistoautomaticallyidentifythecodeoperatingonsensitivedata.
WithallexistingsolutionsrelyingonSystemAPIs(e.
g.
,IMEI,GPSlocation)orfeaturesofuserinterfaces(UI),thecontentfromappservers,likeuser'sFacebookprole,paymenthistory,fallthroughthecrack.
Findingsuchcontentisimportantgiventhefactthatmostappstodayarewebapplications,whosecriticaldataareoftenontheserverside.
Inthemeantime,operationsonthedatawithinmobileappsareoftenhardtocapture,sinceallserver-sideinformationisdeliveredtotheappinthesameway,sensitiveornot.
Auniqueobservationofourresearchisthatinmodernapps,aprogramisessentiallyasemantics-richdocumentationcarryingmeaningfulprogramelementssuchasmethodnames,variablesandconstantsthatrevealthesensitivedatainvolved,evenwhentheprogramisundermoderateobfuscation.
Leveragingthisobservation,wedevelopanovelsemantics-drivensolutionforautomaticdiscoveryofsensitiveuserdata,includingthosefromtheserverside.
Ourapproachutilizesnaturallanguageprocessing(NLP)toautomaticallylocatetheprogramelements(variables,methods,etc.
)ofinterest,andthenperformsalearning-basedprogramstructureanalysistoaccuratelyidentifythoseindeedcarryingsensitivecontent.
Usingthisnewtechnique,weanalyzed445,668popularapps,anunprecedentedscaleforthistypeofresearch.
Ourworkbringstolightthepervasivenessofinformationleaks,andthechannelsthroughwhichtheleakshappen,includingunintentionalover-sharingacrosslibrariesandaggressivedataacquisitionbehaviors.
Furtherwefoundthatmanyhigh-proleappsandlibrariesareinvolvedinsuchleaks.
Ourndingscontributetoabetterunderstandingoftheprivacyriskinmobileappsandalsohighlighttheimportanceofdataprotectionintoday'ssoftwarecomposition.
I.
INTRODUCTIONMobileappstodayaremorecomposedthanwritten,of-tenbuiltontopofexistingwebservices(e.
g.
,analyticsorsingle-sign-onSDK).
Suchfunctionalitycomposition,how-ever,comeswithsignicantprivacyimplications:privateuserinformationgiventoanappcouldbefurthersharedtootherpartiesthroughtheircomponentsintegratedwithintheapp(e.
g.
,libraries),intheabsenceoftheuser'sconsent.
Indeed,priorresearchrevealsthatthird-partyserviceslikeadlibrariesandanalyticsaggressivelycollectsensitivedeviceinformation(e.
g,IMEI,phonenumber,andGPSlocationdata)[22],[37],[40].
Lessnoticeablehereisthedisclosureoftheprivateuserdataanappdownloadsfromitscloudoruploadsfromitslocalle,whichcouldbecomecompletelyoblivioustotheuser.
Asanexample,Figure1illustrateshowThe-Paper[14],oneofthemostpopularChinesenewsapps,works.
Theappintegratesathird-partylibraryShareSDK[12]forsharingnewspoststoWeibo,apopularChinesesocial-mediaplatform,throughitsAPIs.
Aproblemwefoundisthatthelibraryactuallyacquirestheuser'saccesstoken,withoutaproperauthorization,fromWeiboandfurtherutilizesittogathertheuser'spersonalinformation(likeone'sdetailproles,hersocialactivities,etc.
)fromtheWeibocloud.
Unlikeaccesstoon-devicedata,whichrequirespermissionsfromtheuser,ormanuallyenteringsecrets(e.
g.
,password)intotheapp'sUI,collectingsuchserver-sideinformationiscompletelyunawaretotheuser,sincethereisnouserinvolvements(e.
g.
,permissiongranting)atallbeforetheinformationisexposedtoShareSDKanddeliveredtotheuntrustedparty.
Suchinformationdisclosureisseriousandcanalsobepervasive,giventhefactthatmostmobileappsareessentiallywebapplications,keepingmostoftheirsensitiveuserdataontheserverside.
Anin-depthstudytounderstandthescopeandmagnitudeoftheproblematalargescale,however,hasneverbeendonebefore,duetothetechnicalchallengeinautomaticidenticationofsuchdatasourcesinsidetheappcode.
Fig.
1.
User'ssensitivedatainWeiboserverleakstoanotherservicewithoutherconsentNetworkandDistributedSystemsSecurity(NDSS)Symposium201818-21February2018,SanDiego,CA,USAISBN1-1891562-49-5http://dx.
doi.
org/10.
14722/ndss.
2018.
23092www.
ndss-symposium.
orgLeakageanalysis:challenges.
Morespecically,tondinfor-mationleaksinanapp,rstoneneedstolocatethesourcesofsensitivedatawithintheappcode.
Typically,thesesourcesarediscoveredfromtheprogrambaseduponasetofSystemAPIsthathandleprivateon-devicedata,suchasIMEI,phonenum-ber,GPSlocations,etc.
However,asmentionedearlier,privateinformationcomesfromvarioussources,whichcanhardlybecoveredbythesemanuallylabeledSystemAPIs.
Anexampleisuserinterfaces(UIs),whoseinputscanbesensitive(e.
g.
,password,homeaddress)orpublic(e.
g.
,comments)fromthesameAPI(e.
g.
,editText.
getText()).
Theyareclassiedinthepriorresearch[25],[32]usingthesemanticsoftheircontext,particularlytagsofGUIitems(suchasthestring"Password"rightinfrontofapasswordentry).
Morecomplicatedhereistheuserinformationmanagedbytheapp,whichcanbestoredinlocallesortheapp'sserver-sidedatabase.
LoadingsuchinformationintotheappgoesthroughgenericAPIswithoutanytags(leaccess,networkcommunication),therebygivinglittleclueabouttheimportanceofthedatatransferred.
Asaresult,disclosureofsuchinformationtounauthorizedpartiescannotbeeasilydiscovered.
1#Gettinglocationdatainsomewhere2Locationlocation=LocationManager.
getLastKnownLocation();3this.
locationStr=4"latitude"+location.
getLatitude()+"\n"5+"longitude"+location.
getLongitude();6.
.
.
.
7#Gatheringuserprofileinsomewhereelseandsendtoserver8#MethodgetUserBasicInfo()9JsonfBUserJson=getDataFromFacebook();10.
.
.
11HashMapbasicInfo=newHashMap();12basicInfo.
put("first_name",fBUserJson.
get("First_name"));13basicInfo.
put("last_name",fBUserJson.
get("Last_name"));14basicInfo.
put("last_location",this.
locationStr);15.
.
.
16returnbasicInfo;Fig.
2.
Motivatingexample.
CodesnipsfromappSnapTeeinGoogle-PlayAkeyobservationinourresearchisthatmostappstodaycontainalargeamountofsemanticinformationforsupportingtheirdevelopmentandmaintenance.
Asanexample,wecanseefromthecodesnippetofareal-worldappSnapTee[13]inFigure2thatvariables,functions,methodsandotherprogramelementsareallgivenmeaningfulnames,andplain-textcon-tent(stringsindoublequotationmarks)isincludedinthecodetoexplainotherrelatedcontentsuchasthevalueofaspecickey.
Further,theseprogramelementstendtobeorganizedindistinctivewayswithintheapp,supportinguniqueoperationsonsensitiveuserdata:forexample,formatingtheinformationaskey-valuepairsandstoringtheminaHashMap(line12-16inFigure2).
Essentially,thewholeprogramherecanbeviewedasasemantics-richdataset,fromwhichsensitiveusercontentcanbediscoveredwithproperdataanalysistechniques.
Suchsemanticinformationcouldalsohelpinformation-owtracking(whichoftencannotbedonebothefcientlyandaccurately),throughconnectingprogramlocationstorelatedsemantics(e.
g.
,directlyconrmingthepresenceoflocationdataatline14fromtheconstant"lastlocation",insteadoftrackingthedataowfromthegeolocationAPIatline4).
Semanticcluediscovery.
Basedupontheobservation,wedevelopedanewtechniquethatautomaticallyminesappcodetorecoversemantic"clues"indicatingthepresenceofsensitiveinformation,whichenablesaneffectiveleakageanalysisacrossalargenumberofpopularapps(SectionV).
Ourtechnique,calledClueFinder,rstutilizesasetofkeywords,prexesanduniqueacronymsrepresentingvarioustypesofsensitiveuserinformationtoidentifytheprogramelements(methods,vari-ables,constants,etc.
)thatmightinvolvesensitivecontent(e.
g.
,getUserPwd,homeaddr,"Lastname").
TheseelementsaretheninspectedthroughNaturalLanguageProcessing(NLP),toremovethosenotrepresentinganysensitivecontent.
Often-times,variables,constantsandmethodnamescarryingprivacy-relatedtermsendupbeingunrelatedtosensitiveinformation.
Forexample,themethodgetStreetViewActivityincludestheaddress-relatedkeyword"street"butclearlydoesnotinvolveprivatedata.
Anotherexampleistheconstant"invalidinputforhomedirectory",whichhasnothingtodowiththeuser'shome.
Toidentifythesefalsepositiveinstances,ClueFinderperformsagrammaticalanalysis,ndingthematchedtermsorprexesoracronymsnotservingasthe"theme"oftheirsemanticcontext:forexample,theword"street"hereonlyplaystheroleofdescribing"activity",whichisthetruesubjectofthewholeterm(theactivityname).
Ontheotherhand,whenakeywordactingasanouninitselementandalsoasasubjectofaverb(e.
g.
,"getEmail"),itlooksmorelikeaclueforthepresenceofoperationsonsensitiveuserdata.
Learning-Basedidentication.
Suchsemanticsanalysisalone,however,canstillbeinsufcienttoavoidfalsepositives,thatis,mistakenlyreportinganon-sensitiveprogramelementasinvolvingsensitivecontent:e.
g.
,sendingamessagewithaconstant-stringsetMessage("areyousuretodeleteaccount")orthrowinganexceptionlikeformatInvalidExp("username",Exceptione).
Toaddressthisissue,ClueFinderfurtherevalu-atestheprogramstructuresrelatedtothoseidentiedelements,lookingfortheoperationsmostlikelytohappenonsensi-tiveuserdata.
Morespecically,itrunsamachine-learningapproachtoclassifytheprogramstatementscontainingsuchelements,baseduponasetofkeyprogramstructuralfeatures(SectionIII-C).
Forexample,inFigure2,line14,weexpectthatwithinamethodinvocationstatementbasicInfo.
put(),anidentiedconstanttextstringinvolvingsensitivekeywords("location")appearstogetherwithavariableparameterofadatatype(Stringforthevariable"locationStr"),whichlikelyindicatesthepresenceofakey-valuepair.
Notethatthisfeaturehelpsexcludetheoperationthatsimplydisplaysthetextwithkeywords(e.
g.
,"account"),asintheaforementionedexample"areyousuretodeleteaccount".
Altogetherweidentied5featuresandtrainedanSVMmodelbaseduponthefeaturestodiscoversensitive-datarelatedoperationsfromAndroidcode,thustoidentifytheactualprivatecontentinmobileapps.
ThedesignofClueFinderenablesefcientdiscoveryofsensitivedatasources,coveringnotonlythoselabeledbySystemAPIs,butalsoserver-sideprivatedata(e.
g.
,userproles)andothercontentcontrolledbyindividualapps.
Eveninthepresenceofmoderateobfuscation(e.
g.
,producedbyPro-Guard[9]),oursemantics-basedapproachstillworks,thanksto2theprogramfeaturesthatneedstobepreservedduringobfus-cationtoavoiddisruptinganapp'snormalexecution(e.
g.
,APInames,parameters,constants,evensomedataoperations,seeSectionIV-B).
AlthoughClueFinderisprimarilydesignedtondhiddendatasources,weshowthatthesemanticknowledgerecoveredbyourapproachalsosupportsamoreefcientdata-owtracking(seeSectionIII-C),whichenablesalarge-scaleleakageanalysis.
WeimplementedClueFinderandevaluateditseffectivenessinourresearch(SectionIV).
TheexperimentalresultsshowthatClueFinderaccuratelydiscoverssensitivedatasourcesinappcode(withaprecisionof91.
5%),signicantlyoutperform-ingallpriorapproaches[35],[25],[32],[26],intermsofbothcoverageandprecision.
Measurementandndings.
ArmedwithmoresensitivedatasourcesdiscoveredbyClueFinder,wewereabletoevaluateinformationleaksin445,668appsdownloadedfrom2differentappmarkets,gainingnewinsightsintothewayprivateuserinformation(especiallyforthoseapp-specicsensitivedata)isaccessedbythird-partylibraries.
Acrossalltheseapps,ourstudyshowsthatatleast118,296(26.
5%)disclosetheircustomers'informationto3,502libraries,whichconstitutesaprivacyriskmuchmoresignicantthanreportedbyallpriorstudies.
Morespecically,wefoundthatpersonalcontenthasbeenextensivelydisseminated,includingone'sprole,installedapplist,hersocialnetworkingactivities(e.
g.
prolesonFacebookandpersonalposts)andothers.
Particularly,among13,500mostpopularappsdownloadedfromGoogle-Playin2015,39.
9%ofthemwerefoundtoexposeuser'sinformationto709distinctthird-partylibraries,witheachapponaveragesharingmorethan7.
6privatedataitems(e.
g.
,address,prole,etc.
)withatleast2third-partylibraries.
ManyofthelibrarieswerefoundtoindeedsendcollecteduserdataouttotheInternet,andonlyafewofthemcouldbeconrmedtoonlyusesuchinformationondevice(SeeSectionV-B).
Also,suchaninformationexposurerisk(thatis,usingthird-partylibrariestoprocesssensitiveuserdata,whichoftenleadstoanunauthorizedleakofthedatatoathird-party,asfurthershowedinouradversarymodel)occurswhentheappdeveloperover-sharesdataforfunctionalityenrichmentorthethird-partylibraryaggressivelygathersdatathroughitshostingapp.
Amongthetop100librarieswiththerisk,65%ofthemarenon-adlibraries,suchasAnalytics,Social-Networkutilities,etc.
,withhundredsofmillionsofinstallsthroughpopularapps.
AprominentexampleisTinder(casestudyinSectionV-C),apopulardatingappthatexposesitsuser'sprolesandaccountnameonInstagram,togetherwithherinstantlocationstothelibraryAppboy[6].
Alsohigh-prolelibrarieslikeShareSDKaregivenoractivelyacquireprivateinformation(e.
g.
,user'ssocialnetworkproles)unrelatedtotheirmissions(SectionV-C).
Notonlydothesendingsconrmthelong-standingsuspicionthatuserinformationhasbeeninappropriatelydisseminatedthroughapps,buttheyalsounderlinethescaleandthebreadthofsuchrisks,whichhaveneverbeenfullyunderstoodbefore.
Contributions.
Thecontributionofthispaperaresummarizedasfollows:Newtechniqueforsensitivedatasourcediscovery.
Wedesignedandimplementedaninnovative,semantics-driventechniqueforautomaticallyrecoveringsensitiveuserdatafromappcode,acriticalstepforleakageanalysis.
Ourap-proachleveragessemanticinformationofprogramelements,togetherwiththeuniqueprogramstructuresoftheircontext,toaccuratelyandalsoefcientlyidentifythepresenceofsensitiveoperations,whichtakesasteptowardssolvingthislong-standingchallengeinappleakageanalysis.
Large-scaleexposureriskanalysisandnewndings.
Usingournewtechnique,weinvestigatedthepotentialinformationexposuretothird-partylibrariesover445,668popularapps,ascaleneverachievedbeforeincomparablestudies.
Ourresearchbringstolightthegravityoftheproblem,whichhasneverbeenfullyunderstood,andthechannelsthroughwhichsuchexposureshappen,includingover-sharingbyappdevelopersandaggressivedataacquisitionbythird-partylibraries.
Furthermanyhigh-proleappsandlibrarieswerefoundtobeinvolvedintheinformationleaks.
Thesendingshelpbetterunderstandthisprivacyriskandhighlighttheim-portanceofdataprotectionintoday'ssoftwarecomposition.
Roadmap.
Therestofthepaperisorganizedasfollows:SectionIIpresentsthebackgroundofourresearchandassump-tionswemade;SectionIIIelaboratesthedesignofClueFinder;SectionIVpresentstheimplementationandevaluationofClueFinderandthesupportsitprovidesforascalableleakageanalysis;SectionVdescribesourlarge-scaleleakagestudyover445,668appsandourndings;SectionVIdiscussesthelimitationsofourresearchandpotentialfutureresearch;SectionVIIsurveystherelatedpriorworkandSectionVIIIconcludesthepaper.
II.
BACKGROUNDInthissection,welayoutthebackgroundforourstudy,includingprivacyleakageanalysis,theNLPpreliminariesusedinourresearch,andtheassumptionswemade.
Appleakageanalysis.
Mobileusers'privacyhaslongbeenknowntobeunderthethreatsfromtheappsrunningontheirdevices.
Informationcanbeleakedbothintentionally(oftenbymaliciousorgrayappcomponents)[42]orinadver-tently(e.
g.
,byleveragingthevulnerabilitiesinapps/mobileframewoks)[28].
Particularlywhenitcomestothird-partylibraries,whathasbeenfoundisthatmanyadvertising(ad)librariesaggressivelycollectuserdata[30],[38]throughdif-ferentchannels(un-protectedAPIs,privilege-escalationetc.
),disclosingsensitiveattributeslikeage,marriagestatusandworkinformationtoadnetworksoradvertisers.
Thesendings,however,havebeenmadeonasmallsetofapps,duetothelimitationtomanuallylabelandanalyzeprivacydatasourcesandthedatainvolved.
Asmentionedearlier,automaticleakageanalysistech-niqueshavebeenwidelystudied,mainlythroughtracking"tainted"dataowsacrossappcode,fromsources(e.
g.
,theAPIsforcollectingGPSlocations)tosinks(typicallytheAPIsfornetworkcommunication)[16],[23].
Awell-knownchallengeforsuchanalysisisidenticationofsensitivedatasources,whichmainlyreliesontheAndroidAPIswithknownsensitivereturns,suchasgetLastKnownLocation()forlocations,getLine1Number()forphonenumber,AccountMan-ager.
getAccounts()foraccountinformation,andothers.
Other3sourcesoftenneedtobelabeledmanually.
Tofacilitatedata-sourceidentication,toolslikeSUSI[35]canautomaticallyrecoverfromappcodealargenumberofSystemAPIslikelytoimportdata.
Lessclear,however,iswhethertheseAPIsreturnsensitiveinformationandthereforeshouldbelabeledasdatasources.
Tocapturesuchsensitiveinputs,semanticsoftheimportedcontentandthecontextoftherelatedoperationsneedtobestudied.
Theideahasbeenusedtondthesourcesonuserinterfaces,baseduponthetextcontentassociatedwithsensitiveuserinputssuchas"enterusername"and"password"[32],[25].
Evenmorechallenginghereisthelabelingoftheprivatedatadownloadedfromtheapp'sserveroruploadedfromitslocalrepository.
Forexample,whentheuserlogsintoheraccount,littlecontextinformationisgivenduringtheimportationofaccountdata.
Naturallanguageprocessing.
ClueFinderleveragesasetofNLPtechniquestodiscoversensitiveprogramelementsandcontrolfalsepositives.
Followingwedescribethekeytechniquesusedinourapproach:Stemming.
Stemmingisaprocessthatreducesinected(orsometimesderived)wordstotheirstem,baseorrootforms:forexample,converting"changes","changing"alltothesinglecommonroot"change".
Inourcase,stemminghelpsustondmoresemanticclues,particularlytheprogramelementswithprexesandacronymsintheirnames:forexample,thestem"addr"derivedfrom"address"canmatchthevariablesnameslike"useraddr".
Parts-Of-Speech(POS)tagging.
POStaggingisaproceduretomarkwordsasaparticularpartofaspeech,basedupontheirmeaningsandcontext(relationswithotherwordsinasentence,suchasnounsandverbs).
State-of-the-artPOStaggingtechniquecanalreadyachieveover90%accuracy[29].
HereweusePOStaggingtodeterminewhetheraprivacy-relatedkeywordisactuallyanouninthetermorthesentence.
Forexample,"address"in"addressthisproblem"isaverb,justdescribingtheactionhappeningtoanotherword,soitisnotlikelytorepresentaphysicalhomeaddress.
Dependencyrelationparsing.
Dependencyrelationparsinganalyzesasentence,identiesthegrammaticrelationsbetweendifferentwordsandrepresentsthestructureofasentence,baseduponsuchpairwiserelations,asadependencytree.
ForexampleInthesentence"Bell,basedinLosAngeles,distributeselectronics",therelationbetween"Bell"and"dis-tributes"isdescribedasnominalsubject,where"Los"and"Angeles"representsacompoundrelation.
Inourresearch,suchadependencyrelationhelpsustodeterminewhetheraspecicprivacy-relatedkeywordisthedominatorofitssentence,whichismostlikelytobethethemeofthesentence.
Assumptions.
ThepurposeofClueFinderistodetectsensitivedatasourcesfromlegitimateappcode,coveringthosemissedbyallpriorstudies,particularlyprogramelementsrelatedtotheprivateuserdataimportedfromappservers.
Wedonotconsiderdeeplyobfuscatedprogramsthatremoveallsemanticinformationfromtheirprogramelements.
Actually,Ourstudy(seeSectionIV-B)showsthatappdeveloperstendnottoobfuscatedata-relatedcodewithintheirappsandthethird-partylibrariestheyintegratetoavoiddisruptingtheapps'normalexecutions(e.
g.
,causingacrash).
Asaresult,wefoundthatevenmoderatelyobfuscatedcode(e.
g.
,throughProGuard)preservesalotofsemanticinformation:e.
g.
,amongallsuchappsdiscoveredinourresearch,wefoundthatover50%ofthemethodnamesarenotobfuscated(SectionIV-B)andover98%oftheappsstillcontainreadableconstantstrings.
Notethatwhatweareinterestedinisunauthorizeddisclosureofsensitivedatawithinanapptoathird-partylibrary,andthereforemaliciousappscovertlysendinguserdatatotheadversaryareoutsidethescopeofourstudy.
Furtherinourmeasurementstudy(SectionV),weconsideracquisitionofsensitiveuserdatabyanuntrustedthird-partylibrarytobeanexposurerisk.
Eventhoughthisexposuredoesnotnecessarilymeanthatthedatawillbeleakedtoanunauthorizedparty,oftenthepossibilityoftheleakscannotbeeliminatedduetothecomplexityofdata-owanalysisontheselibraries.
III.
CLUEFINDERDESIGNAsmentionedearlier,althoughappcodeissemantics-rich,recoveringtrulysensitivedatasourcesfromthecodeisbynomeanstrivial.
Particularly,directsearchforkeywordsdoesnotworkwell,whichmissesmanypotentiallysensitivetokens(e.
g.
,prexes,acronyms)intheidentiersofvariousprogramelements(variable,method,function,etc.
).
Alsoimportantly,thepresenceofasinglespecictokendoesnotnecessarilyindicatestheoperationsonsensitiveuserdata.
Insomecases,evenwhenthetokensemanticallylooksperfectlyrelevant,forexamplethemethodgetPhoneNumberPrex,thecorrespondingprogramelementmaynottouchsensitivedataatall:intheexample,thefunctiondoesnothingbutaformatcheckonphonenumbers.
Thus,aprecisesemanticanalysisisrequiredtodeterminewhetheratokenreferstoaprivacy-relatedactivity.
Further,theprogramelementscarryingsensitivetokensmaynotcarryprivatecontent(e.
g.
,aconstantstring)themselves.
Theyneedtobelinkedtothetruesourcesofprivatedatathroughprogramanalysis.
Inthissection,weshowhowthedesignofClueFinderaddressesthesechallengesandhowthetechniquefaresonrealappcode.
A.
DesignOverviewTheideabehindClueFinderistoquicklyscreenacrossaprogramtolocateprivacy-relatedtokenswithinprogramele-mentsandthensemanticallyevaluatetheelementstodropfalsepositives.
Finally,aprogramstructureanalysisisperformedtodeterminewhetherindeedeachoftheseelementsisinvolvedinasensitivedataoperationthroughamethodinvocation.
Followingwedescribethearchitectureofthisdesignandutilizeanexampletoexplainhowitworks.
Architecture.
Figure3illustratestheindividualcomponentsofClueFinderandtheirrelations.
OurdesignincludesaSemanticsLocator,aSemanticChecker,aStructureAnalyzerandaLeakageTracker.
GivenanAndroidapp,SemanticsLocatorrstdissemblesitscodeandidentiestheprogramelementscarryingsensitivetokens.
Theseputativesensitiveelements(stringconstants,variables,andmethodnames)aretheninspectedbyChecker,whichusesNLPtechniquestodeterminewhetheridentiedtokensareindeedthemainsub-jectofeachelementoritsidentier'scontent(SectionIII-B).
4Fig.
3.
DesignofClueFinderThosemeetingthestandardfurthergothroughaprogramstructuralanalysis,inwhichAnalyzerclassieseachfunctioninvocationstatementinvolvinganelementaseithersensitiveornot.
ThereportedsensitiveonesarehandledovertoTrackerthatincorporatesthesemanticinformationinsidetheappcodeintoadata-owandreachableanalysistotracethepropagationofsensitiveinformation.
Example.
TheexampleinFigure4showsthecodesnippetfromSnapTee,throughwhichonecandesignandpurchasepersonalizedT-shirts.
Notethatherewere-organizepartoftheappcodeforeaseofillustration,whilemaintainingallsemanticsintheoriginal(decompiled)appcode.
Aswecanseefromline25to29,theapprstacquiresauser'sFacebookprole(line26)andthensendsasharingpost(line28)totheuser'sFacebookaccount.
Afterthat,however,boththeproleandtheFacebookpostarehandedovertoathird-partylibraryfunctiontrackShareEvent(line30).
Toanalyzethesnippet,theLocaterleveragesasetofkeywordsrepresenting4categoriesofsensitivedataasiden-tiedbyGooglePrivacyPolicies[3]andpriorresearch[39],[22],[32],togetherwiththeirderivedprexesandacronymsthroughstemmingtocaptureelementslike"homeaddr"(line6),getUserFbProle(line26),"I'mdesigningmyownteesonmyphone!
"(line18),whichallcarrysensitivetokens"homeaddress","prole"and"phone"respectively.
TheCheckerthenlooksintotheseelements,pickingouttheoneslikegetUserFbProle,giventheobservationthatthetokenitinvolves("Prole")playstheroleofasubjectdescribedbyverb"get".
Also,forthelongsentenceinline18,theCheckerndsthat"phone"actuallynotservesthetheme("designtees")bythedependencyrelationparsing,andthusltersoutthiselementasthereisnoindicationitcarryingprivatedata.
Tofurtherdeterminewhetherotherelementsleftareindeedsensitive,theircorrespondingfunctioninvocationstatements(e.
g.
,line5,6,8,26)areinspectedbyStructureAnalyzer,basedupontheirfeatures:forexample,statementinline26isatruepositivesinceitreturnsaJsontypedobject,whichcouldcontainsensitivedata,whilethestatementinline5isafalsepositive,sinceitonlyreturnsabooleanvaluetocheckwhethertheJsonobjectcontainsakeywithaname"homeaddr".
Allidentiedstatementsareusedassourcesforadata-owandreachableanalysisperformedbytheTracker.
Suchsemantics-drivenapproachhelpsreducethecomplexityoftrackingpropagationofsensitivedata.
Forexample,conrmauserproleleakage(acoarse-grainedprivatedata)fromLine1##Inco.
snaptee.
android.
utils.
FacebookFunctions2JsongetUserFbProfile(HashMapuserBasicInfo){3JsonObjectuserJson=UserBasicInfo.
toJson();4##Gatherotheruserinformation5If(userJson.
contains("home_addr")){6jsonObject.
put("home_addr",this.
homeAddr);7}8this.
uri=jsonObject.
get("userProfile_uri");9if(this.
uri==null){10throwNullPointerException("ProfileURIisnull",exception);11}12returnjsonObject;13}1415BuildershareToFacebook(StringshareContent)16{17Builderbuilder=newBuilder();18builder.
setContentTitle("I'mdesigningmyownteesonmyphone!
");19builder.
setContentUrl(Uri.
parse("https://snaptee.
co/getapp"));20builder.
setShareContent(shareContent);21Log.
d("FacebookFunctions","TrytoinviteFB");22returnbuilder;23}2425##GettinguserprofileonFacebook26currentUser=getUserFbProfile();27##Triggersharingactivity28shareToFacebook(shareContent);29##TrackinguseractivitybyinvokingAPIfromthird-partylibrary30trackShareEvent(currentUser,builder.
shareContent);Fig.
4.
Overviewcodeexample26,withoutanalysingtheactualcodeinthismethod(Line2-13).
Theprocessendswhentherelateddataisfoundtobeaccessedbyadesiredsink(e.
g.
,athird-partylibraryinline30).
B.
SemanticClueLocatingSemanticsLocator.
Toidentifyprivacy-relateddatathroughthesemanticsofprogramelements,werstneedtodeterminethesetofdataconsideredtobesensitiveandkeywordsassociatedwiththem.
Suchinformationwasgatheredfrommultiplesourcesinourresearch.
Particularly,weutilized35dataitemsidentiedbyGooglePrivacyPoliciestobeprivatecontent[3],togetherwithadditional17itemsreportedby5priorprivacy-relatedresearch[39],[22],[32].
Forexample,FinancialTimes(FT)[39]providesacalculatorforevaluatingthepriceofone'sprivatedata.
TheseitemsareorganizedbyClueFinderinto4categories,includinguseridentiers,userattributes,locationdataandaccountinformation.
Intotal,121keywordsorkeywordpairsareidentied(seeexamplesinTableI).
Further,weuseWord2Vec[31]tondmoresynonymsofthesesensitiveitems.
Also,thekeywordsetisextendedusingstemming(tondouttheirprexes),withmoresimilartextsextractedfrom10,000popularGoogle-Playapps(e.
g.
,"addr").
ThisallowsClueFindertocaptureasmuchsensitivesemanticsaspossiblefromappcode.
TABLEI.
KNOWLEDGEBASEFORPRIVACY-RELATEDSEMANTICSCategorySampleKeywordsUserAttributesrstname,lastname,gender,birthdate,nickname,education,applist,deviceos,creditcard,etc.
UserIdentiersuserid,accountnumber,accesstoken,sinaid,facebookid,twitterid,etc.
Locationlatitude,longitude,lat,lng,useraddress,zipcode,city,street,etc.
Accountaccountname,username,phonenumber,mobileno,password,passwd,pwdetc.
Forallelementsindecompiledappcode,ClueFinderusesword-splittingtobreaktheirnamesintotokens,usingcommondelimiters(e.
g.
,useraddr)andcapitalizedletters(e.
g.
,ge-tUserFbProle).
Then,itperformsbest-effortmatchingusingitsknowledgebase(4datacategoriesandtheirrepresentingtokens),searchingforthetokens(keywords,prexesandabbreviations)insidetheidentiersofprogramelements.
Asaresult,theelementsinvolvingprivacy-relatedtokensarelabeledforamorein-depthsemanticanalysis.
SemanticsLocatorndsouttheelementswithsensitivetokensintheirnames.
This,however,doesnotnecessarilymeanthattheseelementsareindeedprivacy-related.
Asanexample,inTableII,Index1,themethodgetStreetViewActivitycontainsthesensitivekeyword"street"butisactuallyunrelatedtotheuser'slocationdata.
Toremovesuchfalsepositives,ourapproachrunsSemanticCheckertofurtheranalyzethesemanticsoftheseelements.
SemanticChecker.
SemanticCheckerrunsPOStagginganddependencyrelationparsingtogetmorein-depthsemanticinformationfromlabeledelements.
Whatwewanttoun-derstandiswhetherasensitivetokenactuallyservesasthe"theme"oflabeledcontent(elementnamesorthecontentofaconstant),whichismorelikelytoindicatethepresenceofsensitiveinformation,comparedwiththesituationthatthetokensareonlyusedtodescribeotherlessnonsensitiveterms(e.
g.
,"street"ingetStreetViewActivity).
Forthispurpose,theCheckertriestodeterminewhetherthetokenisanounandalsocharacterizedbythefollowingdependencyrelationswithitscontexttermsinaphraseorasentence:Direct-objectrelation(Dobj):Thedirectobjectofaverbphraseisthenounphrasethatisthe(accusative)objectoftheverb:e.
g.
,getAddressFromServerinTableII,Index4.
Here,theidentiedsensitivetokenwithanounPOStagger("Address")hastheDobjrelationwith("get"),indicatingthatthistermisrelatedtoanaccesstolocationinformation.
Otherexamples(1,2,3)arealsopresentedinTableII.
Nominalsubject(Nsubj):Anominalsubjectisanounphrasethatisthesyntacticsubjectofaclause.
ThisisarelationtheCheckerlooksforintheabsenceofDobjbetweenanidentiedsensitivetokenanditscontext.
Forexample,"businessphonenumberselected",inwhichthesensitivetoken"phonenumber"isthetopicofthesentence,indicatingthepresenceoftheinformationinitsrelatedprogramlocation.
ExampleforsuchcaseisalsopresentedinTableII,Index5.
Negationmodier(Neg):Thenegationmodieristherelationbetweenanegationwordandtheworditmodies.
Inourcase,ifthesensitivetokenfoundintheelementappearswithaNegmodier,likelytheelementdoesnotrelatetosensitivecontent.
E.
g.
,"Donotinputyourpasswordhere".
Otherrelations:Whenthesensitivetokenlabeledactuallyhasadependent(Dep)orcompound(Compound)oropen-clausalcomplement(Xcomp)relationorotherrelationswithitscontext(otherwordsinthesameelementnameorconstantstring),wefoundthatthetokenbecomeslessofanindicatorforthepresenceofprivatecontent,sincethetokeninthiscaseisnolongerthethemeofitscontext(thetargetofanaccessorthetopicofasentence).
ExamplesforsuchrelationsarepresentedinTableII,Index1,2,3,6.
Usingtherelationsabove,SemanticCheckerltersouttheprogramelementsinvolvingsensitivetokensbutlesslikelytobeactuallyrelatedtoprivacycontent.
TheseelementsaretheninspectedbyStructureAnalyzertofurtherreducefalsepositives.
Inourimplementation,theCheckerwasbuiltuponStanfordParser[29],astandardNLPtoolforPOStagginganddependencyrelationparsing.
C.
SensitiveDataDiscoveryandTrackingStructureAnalyzer.
Evenwhenasensitivetokenplaysacentralroleinthenameofavariableoramethod,orthecontentofaconstantstring,suchprogramelementmaynotnecessarilyrelatetoprivatecontent.
Forexample,thestatementinline5ofFigure4talksabouthomeaddress;however,theoperationhereisjustcheckingwhetherthedataobjectcontainsakey"homeaddr".
Anotherexampleisline10"ProleURIisnull",whichactuallyisanoutputtoexplainanexception.
So,toidentifytrulysensitiveoperations,notonlydoweneedtocheckthesemanticsoftheprogramelements'identiersandconstantcontent,butitisalsoimportanttolookintothesemanticsofactualprogramoperationsspeciedbythestatementsinvolvingtheseelements.
ServingthispurposeisStructureAnalyzer,whichutilizesasetofprogramstructuralfeaturestodeterminewhetherasensitive-tokenrelatedstatementindeedtouchesprivateusercontent.
Inourresearch,wefocusonmethodinvocationstatements,sincesensitiveuserdataareaccessedbythird-partylibrariestypicallythroughmethodcalls.
Tondsuchstatements,theAnalyzerrstlocatesallmethodinvocations(e.
g.
,line5,6,8,10inFigure4)directlyorindirectlyrelatedtoalabeledprogramelement(whichinvolvessensitivetokens),andthenextractsfeaturesfromthesestatementstocapturethoseaccessedsensitivedata.
Specically,whenamethod6TABLEII.
EXAMPLESFORSEMANTICCHECKERIndexElementDescription1getStreetViewActivityAsanegativeexample,"street"onlyholdsaCompoundrelationwith"Activity",theDobjrelationhereisbetween"get"and"activity".
2getLocationUpdate-Asanegativeexample,"Location"onlyholdsaCompoundrelationwith"Interval",TimeIntervalInMillistheDobjrelationhereisbetween"get"and"Interval"3"I'mdesigningmyownAsanegativeexample,"Phone"onlyholdsaNmod:possrelationwith"my",andateesonmyphone"Nmod:onrelationwith"design".
TheDobjrelationhereisbetween"design"and"tee".
4getAddressFromServerAsapositiveexample,"address"hereiswithPOStagging"NN",andholdsaDobjrelationwithverb"get".
5"UsernamemustbeAsapositiveexample,althoughthere'snotNsubjrelationinthesentence,"Username"invalidformat"holdsaDobjrelationwith"Format".
6newfriendnumAsanegativeexample,"Friend"onlyholdsaCompoundrelationwith"num".
nameislabeled,allstatementsthattriggerthemethodareconsideredtobepotentialsourcesofprivateinformation.
Forlabeledvariablesandconstantstrings,theAnalyzerperformsadata-owanalysisonthemtoidentifyalltheinvocationsthattaketheelementsortheirderivativesasparameters.
Allsuchstatementsaretheninspectedfortheirprogramstructuralfeatures.
Ourkeyobservationisthatwhenlabeledelementsareinvolvedindatareadorwriteoperations,almostalwaystheoperationsarerelatedtosensitiveinformation,withthesourceoftheinformationbeingtheelementwhenitisavariable,anothervariableinthesamemethodcallwhentheelementisaconstant,orthereturnvalueofthecallwhenitisamethod.
Leveragingthisobservation,ourapproachanalyzeshowtheseelementsareusedinaninvocationstatementtoseekevidencethatsuchsensitivedataoperationsindeedtakeplace.
Suchevidencecouldbeassimpleasthepresenceofkeywordssuchas"get","put"inamethodname(e.
g.
,getUserFbProleinFigure4).
Itcanalsobethereturnofadata-typedobjectfromamethodcall:e.
g.
,getUserFbProlereturnsaJsonobject(line12inFigure4).
Anotherexampleisthepatternofusingdifferenttypesofdatatogether:e.
g.
,aconstantstring(key)oftenappearsinfrontofastringvariable(value)inamethodinvocation;anexampleisline6ofFigure4.
Thesefeaturesaresummarizedasfollows:Methodname.
Asmentionedearlier,afeatureusedinourresearchiswhetheramethodnameinalabeledstatementcontainsaspecictokenrepresentingdataoperations,suchasget/set/put/add/insert/delete/remove/read/write/save.
Parametertype.
Wealsolookattheprimitivedata-typesoftheparametersinamethodinvocation,whichindicatesthepresenceofdataoperations.
ExamplesincludeString,HashMap,Json,etc.
Returntype.
Thereturntypeofamethodcallalsoprovidesevidenceforthepresenceofdataoperations:e.
g.
,adatareadbringsbackaresultinString,HashMap,Json,etc.
Basevaluetype.
ManydataoperationshappenthroughspecicJavaclasslibraries.
Therefore,forthosestatementswhichcontainabasevalue(e.
g.
,hashMapinmethodhashMap.
put(key,value)),theclasstypeofthebasevaluecanalsohelpdifferentiatedataaccessfromotheroperations.
Forexample,inFigure4line6,theJsonclassforjsonObjectisusedtoprocessdatawhileinline21,theandroid.
util.
LogclassforbasevalueLogdoesnotrelatetodatause.
Constant-variablepattern.
Alsousefultoidenticationofsensitivedataoperationsarethepatternsofconstant-variableparametercombinationsinmethodcalls.
Forexample,therstparameterofhashMap.
put("user",$u)isaconstantandthesecondisavariable,whichisastandardkey-valuecombinationforadata-processingmethodcall.
InanotherexamplehashMap.
put("user","default"),itsparametersareallString-Constant,andthusthecalldoesnotindicatetheexistenceofdataaccess.
Ontopofsuchfeatures,StructureAnalyzerrunsaSupport-VectorMachine(SVM)classiertodeterminewhetheragivenstatementindeedinvolvesprivatedata.
Theclassierwastrainedusing4,326statementsrandomlyselectedandmanuallylabeledfrom100apps,aselaboratedinSectionIV.
LeakageTracker.
Thestatementstogetherwithprivacy-relatedsemanticsrecoveredbyClueFinderaretreatedastheactual"sensitive"sourcesfordetectinginformationleaks.
Specically,LeakageTrackerextractsdata-typedobjectswithinthestatementsfromtheirparametersorreturnvalues,andthenperformsadata-owbasedtaintanalysisontheseobjects.
Thepurposeofthisanalysisistondoutwhethersensitivedataowsgetintothesinksthatindicateleaksoftheinformationtounauthorizedthirdparties.
Ideally,onemayexpectthatsuchasinkisanAPIusedbyanuntrustedlibrarytosendtainteddataouttotheInternet,asdidinpriorresearch[37].
Inpractice,however,trackingtaintedowsacrosslibrarycodeisoftentooheavyweightandlessprecise,particularlyforastaticanalysisimportantforevaluatingalargenumberofapps.
Therefore,inourresearch,weinsteadlookedforthepresenceofanexposurerisk,whenthetainteddataowintoanuntrustedlibrary,sinceinthiscase,thedataisnolongersafeanditscontentcouldbedisclosedtothethird-partiesthroughvariouschannelshardtocapturebytheexistingtechnologies(i.
e.
,coverchannels).
WhatisuniqueforClueFinderisitsutilizationofsemanticstoenhancethetaintanalysis,whichenablesmoreefcientdetectionoftheexposurerisk.
ForexampleinFigure2,evenwithoutanalyzingthecodefromLine2to5,theseman-ticsofthemethodinvocationatLine14(e.
g.
,theconstant"lastlocation"involved)immediatelyrevealstheinvolvementofsensitivecontentinthefunction'sreturnvalue(basicInfo).
7Inthisway,wecanquicklydeterminewhetherprivatedataareundertheexposurerisk,avoidingmoreexpensivedata-owanalysis.
IV.
EVALUATIONOFCLUEFINDERInthissection,werstdescribeourexperimentalsettingsforevaluatingClueFinder,andthenreportitseffectivenessandperformance.
Also,wecomparedourapproachwithpriorwork,whichdemonstratesthatClueFinderoutperformsthepriorapproachesintermsofsensitivedatadiscovery.
A.
ExperimentSettingWeimplementedClueFinderinJava(1,604LOCs)andPython(609LOCs).
OurimplementationextendstheFlow-DroidframeworkforanalyzingdecompiledpackagesintheJimpleformat(anintermediateexpressionforanalyzingDEXcode).
NotethatsinceFlowDroidrenamesalllocalvariables(like"$r1","$r2")whendecompilingappcode,currentim-plementationofClueFinderonlyutilizesglobalvariableslikestaticelds.
ClueFinderalsoutilizestheJavaimplementationofStanfordParser[29]foritsNLPanalysis.
ItsStructureAnalyzercomponentextractsthefeaturesfromtheJimplestatementsandrunsthePythonimplementationofSVMfromScikit-Learn[11]totraintheclassier.
Allourexperimentswereconductedona32-coreserver,withaLinux2.
6.
32kerneland64GBmemory.
Trainingdata.
Theclassierwastrainedusingalabeledsetof4,326statements(halfpositiveandhalfnegative)whichweremanuallylabelledbytwoAndroidexpertsfrom100popularapps.
Specically,inthismanual-labellingprocess,werstrandomlyselected100appsfromGoogle-Play(crawledinAugust,2016)basedonthetop-popularlistduringthatperiod.
Then,weautomaticallyextractedallstatementsinvolvingpri-vacytokensfromtheseappsbySemanticsLocaterandChecker(SeeSectionIII-B),andleteachoftheexperttoidentifyifthegivenstatementscontainprivatedataornot.
Tocreateamoreprecisetrainingset,eachstatementwaslabelledaseitherpositiveornegativeonlywhenbothofthetwoexpertsgivethesameresult.
Intotal,wecollected7,354labelledstatements,including5,191positivesamplesand2,163negativeones.
SinceSVMclassierusuallygetsbetterresultsunderbalancedtrainingset[21],weusedallnegativestatements,togetherwiththesameamountofpositivestatementsbyrandomselectionfromlabelleddataasourtrainingset.
Asaresult,thetotalamountofourtrainingsetis4,326.
B.
EffectivenessInourexperiment,werstranaten-foldcrossvalidationonourlabeledset(with4,326labeledstatementsin100apps).
ClueFinderachievedaprecisionof92.
7%,arecallof97.
2%andaF1-Scoreof94.
8%.
Sinceourtrainingsetisrandomlypicked,theeffectivenessoftheclassiershouldcarryovertheentireappcodewithhighprobability.
Also,weemployedanothermanualvalidatingprocess,byrunningClueFinderoveranother100randomlyselectedappscrawledatthesametimeasanunknownset.
Ourmanualvalidationshowedthat320outof3,775statementsarefalsepositives,whichgivesaprecisionof91.
5%.
Wedidnotgettherecallinthismanualvalidationduetothelackofgroundtruth(itisrarelypossibletomanuallygothroughallcodeinthedatasettoidentifywhichofthemareindeedsensitivesources).
Thetotalanalysistimeforthe100unknownsetwere97minutes(lessthan1minuteperapp).
SuchaperformancelevelenablesClueFindertoprocessalargenumberofapps,aswedidinourresearch(SectionV).
Falsepositivesandfalsenegatives.
Mostfalsepositivesreportedwerecausedbyrarecasesthatwerenotcoveredbyourlabeledset:asanexample,inFigure5,theconstantparameterforthemethodsaveEventincludesthesensitiveterm"accesstoken",whichhoweverturnsouttohavenothingtodowiththevariabler1,anobjectwithan"Event"type.
Also,insomecases,eventheprogramstructuredoesnotoffersufcientinformationfordeterminingwhetherastatementinvolvessensitiveinformation.
Forexample,inline2ofFigure5,themethodsaveAppKeyAndAppSecretcontainssensitivetokenslike"Key"and"Secret";however,suchprivatedataonlyappearswithinthemethodandtheinvocationstatementisactuallynonsensitive.
Whenitcomestofalsenegatives,againmanyproblemswereintroducedbytheoutliers.
Forexample,thestatementatline4ofFigure5returnsanintegervaluetoencodethegenderinformation(1formaleand-1forfemale),whichdoesnotmeettheexpectationthatthegenderdataissupposedtohaveastringtype.
Anothersourceoftheproblemistheincompleteknowledgebase:somesensitiveterms,suchas"lon","father'sname"and"mother'sname",arenotconsideredkeywordsforsensitivecontent;asaresult,whatClueFinderdiscoversisonlyasubsetoftrulysensitivedataitems.
1voidsaveEvent("init","putaccesstokentoextras",$r1);2Umeng.
UMTencentSsoHandler:voidsaveAppKeyAndAppSecret();3Java.
Util.
HashMap.
put("username",$r1);4Integergender=getUserGender(user);Fig.
5.
FalsepositiveandfalsenegativesamplesCodeobfuscation.
Asmentionedearlier,ClueFinderisnotdesignedtoanalyzedeeplyobfuscatedcodewithallitsse-manticinformationremoved.
This,however,doesnotmeanthatourapproachcannottolerateanyobfuscationandcanbeeasilydefeatedbythetoolslikeProGuard[9].
Actuallywefoundthatasignicantamountofsemanticsispreservedinmoderatelyobfuscatedcode,e.
g.
,thatprotectedbyPro-Guard,andthereforecanstillbeanalyzedusingClueFinder.
Forexample,Figure6showsacodesnippetobfuscatedbyProGuard.
Aswecanseehere,strings(e.
g.
,"ReportLocation"inline1),parametertypes(e.
g.
,"String"and"Object"appearedtogetherinline2)andAPIcalls(e.
g.
,JSONObject.
put()inline4)allcarrymeaningfulcontent,whichcanbeleveragedasfeaturesbyClueFinder'sclassiertodeterminethepresenceofsensitivedatasources1.
Inourresearchfromtheaforemen-tionedunknownsetwith100randomlyselectedapps,wefound1NotethateventhoughBidTextcouldalsoutilizeconstantstrings,itdoesnotworkonothercodefeaturesandthereforewillbelesseffectiveinanalyzingsuchcode,aswefurtherelaboratedinthecomparisonwithClueFinder8that11.
3%(426/3,775)ofthestatementsintheseappswereobfuscated.
Nevertheless,sensitivedatasourcesandexposureriskswithintheseappswereallidentiedbyourapproach,sinceourclassierleveragesawholesetoffeaturesthatcannotbeeasilyobfuscated,suchassystem-levelparameterobjectsandreturnvalueslikeString,Json,etc.
1$r0.
com.
*.
sdk.
ei:voida(String,Object)>("ReportLocation",$r3)2$r1=staticinvoke($r0,"Gender",$r1)3$r4=virtualinvoke$r3.
("user_password")4$r7.
("cust_gender",$r2)Fig.
6.
Samplesofpartiallyobfuscatedstatements(inJimpleformat)identiedbyClueFinderSuchsemanticinformationispreservedduetoafewpracticalconstraintsincodeobfuscation.
Specically,system-levelmethods,asdiscoveredbySUSI,cannotbeeasilyobfus-catedandtheirmeaningfulnamesandparametersthereforeareretainedbythetoolslikeProGuard.
Asanexample,98%ofconstantstringsinourunknownsetarehuman-readable.
Also,wefoundthatappdeveloperstendtoavoidobfuscatingdata-relatedmodules(e.
g.
,thosecontainingGSONobjects)andthird-partySDKs,sinceimproperchangestotheseprogramelementscouldeasilyintroduceerrorstoprogramexecutionorevencauseacrash.
Asanexample,GSONutilizesreectionatruntimetodynamicallymapJSONobjectstoclasses,constructingpropertiesbaseduponmatchingstringsdiscoveredfromtheobjectswithkeywords;thisapproachnolongerworkswhenthestringsuchas"model.
name"isreplacedwith"a.
b"byProGuard.
Further,third-partyframeworks(e.
g.
,Inmobi[4])andSDKinterfacesarerarelyobfuscated,tomakesurethatthedeveloperscaneasilyincorporatethemintoherappcode.
C.
ComparingwithPriorApproachesAPI-basedlabeling.
Asmentionedearlier(SectionII),priorworkSUSI[35]canautomaticallydiscoverhundredsofsourcesfromvariousAndroidSystemAPIs.
However,itoftencannotdeterminewhetherasourceisindeedsensitive.
Forexample,json.
get("password")becomesrelatedtosensitivecontentonlybecauseitsparameterrevealsthatthemethodreturnspassword.
ClueFinderisdesignedtoidentifysuchsensitivesourcesfromtheircontext.
Inourresearch,werandomlyselected15popularAPIsandextracted10,116statementsinvolvingthemfrom100randomlychosenapps.
Amongallthesestatements,ClueFinderdetected2,266sensitivedatasources.
Asaresult,over77.
6%(7,850)ofstatementswhichSUSIfoundturningouttobefalsepositives(notsensitive).
GiventheeffectivenessofClueFinderthatalreadydiscussedbefore(with92.
7%pre-cisionand97.
2%recall),thecomparisonresultindicatesthatourapproachismuchmoreeffectiveinndingtrulysensitivedatasourcescomparedwithSUSI.
UI-basedlabeling.
PriorapproacheslikeUIPicker[32]andSUPOR[25]canidentifysensitivedatafromanapp'sUIelements(e.
g.
aninputeld).
However,theseelementsareonlyasubsetofprivatedatasourcesinanapp.
Incontrast,ClueFinderiscapableofndingallsensitivesources,includingnotonlyUIelementsbutalsoimportsofprivatedatafromservers.
Inourstudy,fromtheaforementionedunknownsetwith100randomlyselectedapps,wemanuallyidentied892uniqueUIelementsrelatedtoprivateinputs(e.
g.
,usernameandpasswordinUI).
Theseelementsareallthepriorap-proachescouldnd.
Then,weranClueFinderoverthe100appswhichreported2,388uniquesensitivedatasources.
Ourfurthermanualvalidationoverthesesourcesshowedthatinmostcases,ClueFinderidentiesalltheUIsources.
What'smore,itidenties2timesmorenon-UIsourcesmissedbyapproacheslikeUIPickerandSUPORintotal.
Semantics-basedtainttracking.
SimilartoClueFinder,Bid-Textalsosearchesconstantstringsinsideprogramsforsen-sitivekeywords.
However,BidTextismorefocusedonitsuniquebi-directionaltaintanalysisthansemantics-basedsen-sitivesourcediscovery.
Itdoesnotworkonvariable,methodnames,prexesandabbreviationsofkeywords,nordoesitevaluategrammaticaldependenciesamongsemanticstokensexceptthenegativerelation.
Inourresearch,bysettingupthesetwoapproacheswithbasicallythesamesettingsfordata-owanalysis(weimplementedanintra-process,ow-sensitiveanalysiswithsinkstoHTTPnetwork),wecomparedourimplementationofClueFinderwiththereleasedversionofBidText[2],intermsofprecision,coverageandperformanceindiscoveringsensitivesources.
AsTableIIIshows,amongthe100popularappsintheunknownset,ClueFinderreported50(44.
6%)moresensitivesourcesthanBidText(162truepositivesvs.
112),resultinginamuchhighercoveragethanBidText.
ThisismainlyduetoClueFinder'sin-depthNLPanalysisforunderstandingcodesemantics,aswellastheutilizationofcodestructureforlocatingprivatedata.
Forexample,ClueFinderfound32sensitivesourcesusingsemanticinformationfrommethodnames,whichBidTextcouldnothandle.
Also,ClueFinderhasamoredetailedknowledgebase(characterizedbynotonlykeywordsbutalsomeaningfulprexesandabbrievations),anditssemanticlocatingmechanism(inSectionIII-B)enablesittocapturemoreprivacy-relatedsemantics:e.
g.
"addr"for"address".
BidTextutilizesonlyaxedkeywordset,andmatchesthesekeywordsfromappcodebyhuman-denedreg-exexpressions.
Also,ClueFinderreducesthefalsepositiveratecomparedwithBidText(8.
5%vs.
14.
5%),becauseourapproachutilizesmoregrammaticalrelationsandprogramstructurestocontrolfalsepositives,whileBidTextonlydropsthelabledstringsinvolvingnegativedescription.
Otherstrings,suchas1,2,3inTableII,willbefalselyreported.
Further,sinceBidTextonlyevaluatesconstantstringsandmostofthemdonotcontaincomplicatedexpressions,ourapproachonlylowersdownthefalsepositiverateby6%comparedwiththepriorapproach.
ItisimportanttonotethatthemajorstrengthofClueFinderisitexpandsthetypesofsensitivesourcesthatcanbediscovered.
Inthisperspective,theadvantageofourapproachissignicant,detecting44.
6%moretruepositives.
Finally,ClueFinderoutperformsBidTextinperformance(1.
86timesfaster),duetoitslightweightsemantics-based9leakageanalysis,whichlargelyavoidstheexpensivedata-owanalysis.
Inthemeantime,weacknowledgethatBidText'suniquebi-directionaldataowtechniquecouldhelpitmoreaccuratelytracksomeinformationleaksClueFindermisses,giventhatthefocusofourapproachisjustsensitivesourcediscovery.
TABLEIII.
COMPARISONWITHBIDTEXTBidTextClueFinderDetectedsensitivedata131177Num.
offalsepositives1915Avg.
AnalysisTime(Sec)9755Precision83.
5%91.
5%V.
LARGE-SCALELEAKAGESTUDYInthissection,wereportourmeasurementstudyover445,668real-worldapps,whichanalyzedtheirprivacyleak-agetothird-partylibraries.
NotethatalthoughClueFinderiscapableofdetectingallkindsofprivatedatawithinangivenappcode,herewejustfocusonthendingsrelatedtothesourcesmissedbythepriorresearch,sincemoreconventionalsources,suchasAPI-basedimportsofIMEI,IMSIandGPSlocations,havealreadybeenstudiedbefore[22],[37],[40].
Ourresearchbringstolightthepervasivenessoftheexposurerisk(disclosingsensitiveuserdatatothird-partylibraries)andinterestingcasesneverreportedbefore.
A.
MeasurementSettingsExposurerisk.
Asmentionedearlier(SectionI),inourmeasurementstudy,welookedfortheexposurerisk,thatis,leaksofsensitiveuserdatatothird-partylibraries.
Wefocusonthisriskinsteadofthelibrary'sexportofsensitivedatatotheInternetbecausethelatterismoredifculttodetectthroughastaticanalysis(necessaryforevaluatingalargenumberofapps),intermsofperformanceandaccuracy.
Also,onceanuntrustedlibraryobtainsprivatedata,itoftencanmanagetosendthedataoutthroughcoverchannelswithoutgettingcaught.
Therefore,inourstudy,wejustconservativelyconsideredthatinformationleakscouldhappenwhenevertheuntrustedlibrarygetsaccesstothesensitivedata.
Appgathering.
AsTableIVshows,ourdatasetsarecrawledfrom2differentAndroidmarkets:theofcialGoogle-Playmarketandathird-partymarket(TencentAppStore).
EachappinthesedatasetshasauniqueMD5-Hashtomakesurethere'snooverlappingbetweendifferentdatasets.
Amongthem,appsinthePlay-15datasetwereselectedaccordingtothetopapplistprovidedbytheGoogle-Playwebsite,andthoseintheother3datasetswererandomlycrawledfromtheirmarkets.
Inthisway,wecanbetterunderstandhowdataleakstothird-partylibrarieshappeninbothpopularandordinaryapps.
ImplementationforLeakageTracker.
Specically,servingthepurposeofdetectingprivacyleak-agetothird-partylibraries,LeakageTrackerinClueFinder(SectionIII-C)wentthroughalltheinvocationstatementsreportedbyitspreviousmoduleSemanticChecker,andcon-ductedainter-proceduredata-owanalysisovertheidentieddataobjects.
Meanwhile,itpickedoutthosestatementseitherinsideathird-partylibraryorcallingthelibrary'smethods.
AsanexampleinFigure2,ifthemethodcontainHashMapobject"basicInfo"owstoanAPIofathird-partylibrary,immediatelyweconcludethattheuser'slocationdataareexposedtothelibrarybythisstatement.
Tothisend,wecheckedwhetherthepackageorclassnameoftheidentiedstatementisdifferentfromthatoftheapp,usingitsrsttwoprexes,e.
g.
com.
facebookforcom.
facebook.
message,whichindicatesthatthestatementiseitherinsideathird-partylibrary'scodeorinvolvestheli-brary'smethod.
Althoughthistreatmentisabitcoarse(e.
g.
,whichcannotdistinguishtheadlibrarycom.
facebook.
adsfromtheanalyticonecom.
facebook.
analytic),itisstillinformativeforustodeterminewhetherprivatedatahavebeenaccessedbyathird-partylibraryorbytheappitself.
Furtherweveriedthatsuchastatementisnotdeadcodethroughastandardreachabilityanalysis:thatis,buildingcall-graphsfromtheapp'sentrypointstoconrmthatindeedthetargetmethodinvocationcanbereached.
Notethatthistreatmentcanmisssomeinformationleaks,however,itissufcientlyaccuratefordetectingmostleakstothird-partylibrariesbecausemostofsuchinvocationscouldbetheinterfacesbetweenalibraryanditshostingapp,andalsolightweight,whichisimportantforalarge-scalestudy.
WeutilizedtheexperimentalsettingdescribedinSec-tionIVforthemeasurementstudy.
Duringtheexperiments,eachdatasetwasprocessedby8concurrently-runningpro-cesses,witha20-minutetimeoutsetforeachapp.
Overall,our32-coreservertook710hourstogothroughall445,668apps,with45.
88secondseachonaverage.
Amongalltheseapps,32,533(7.
3%)couldnotbesuccessfullyanalyzedwithinthetimeoutwindow.
B.
MeasurementResultsLandscape.
AscanbeseenfromTableIV,amongall445,668apps,ClueFindertotallydiscovered118,296(26.
5%)leakingprivateuserdatato3,502third-partylibraries2.
Onaverage,eachappexposes8.
07dataitems(e.
g.
,anidentiers,fullname,location,etc.
)to1.
97libraries.
Thisindicatesthatsuchinformationexposureisindeedpervasive(over26.
5%ofalltheappsanalyzed).
Forexample,whentheuserlogsintoanappwithherFacebookaccount,herFacebookprolecouldbesenttoanadlibraryformarketing,andtoananalyticallibrarytotrackheronlineactivities.
Also,foralldiscovered3,502librariesaccessinguser'sprivatedata,averagelyeachofthemcollects2.
45dataitems,includingnotonlydifferentidentierssuchasFacebookid,butalsootherinformationlikehervariousattributes,forthepurposeliketargetedadvertising.
Particularly,thePlay-15dataset,withselected13,500mostpopularGoogle-Playapps,wasfoundtohave39.
9%ofitsappsleakingoutuserdata.
AsillustratedinTableV,suchdataare2Toavoidincludingoutliers(e.
g.
,anobfuscatedpackagename)asathird-partylibrary,werstexcludethoseextremelyshortpackagenames(e.
g.
,com.
a.
ab)whichobviouslytobeobfuscated.
Meanwhile,wedeneathreshold=10todecidewhetherapackagenamesurelypresentsathird-partylibrary.
Thethresholdisthenumberoftotalappearancesofapackagenameinourwholedataset.
Also,weexcludecommonsocialnetworklibraries(e.
g.
,Facebook,Twitter,Weibo,etc.
)sincemostofprivatedatainsuchlibrariesareoriginatedfromthemselves.
10TABLEIV.
OVERALLLEAKAGESTATISTICSDataSetAffectedAppsAffectedLibsCollectTimeTotalApps#Apps%AppsAvg.
Items/AppAvg.
Libs/App#LibsAvg.
Items/LibPlay-2015Nov.
15-Dec.
1513,5005,38539.
9%7.
62.
837092.
45Play-2016Jul.
16-Aug.
1671,68616,31022.
8%5.
261.
321,0112.
36Tencent-2015Feb.
15-Apr.
15169,05144,39226.
3%7.
551.
642,3152.
43Tencent-2016Jun.
16-Jul.
16191,43152,20927.
3%9.
532.
13,0972.
33TotalNov.
15-Aug.
16445,668118,29626.
5%8.
071.
973,5022.
39uniformlydistributedacrossseveralcategories(userattributes,useridentiers,accountinformationandlocationdata),witheachappexposing7.
6dataitemsto2.
83third-partylibrariesonaverage.
ComparedwithrandomlyselectedappsinPlay-16,thesetopappsapparentlyexposemoreinformation.
Thisindicatesthatpopularappsextensivelydiscloseallkindsofpri-vateuserinformationtomultiplelibrarieswithinasingleapp.
Further,bymanuallylookingintothecodeof100randomlyselectedappsidentiedbyClueFinder,wefoundoverhalfoftheaggedmethodinvocations(53.
1%)arerelatedtoHTTPconnections(e.
g.
,anHTTPpostwhereitsparameterscontainprivacy-relatedcontents).
Also,ourruntimevericationbyinterceptingthenetworktrafcoftheseappsconrmedthat59outof100appsareindeedleakedprivatedatatotheserversofdifferentthird-partylibraries.
Notethattheactualleakagescaleshouldbehigherthanwhatweobserved.
Wedidn'tseethetrafcfortheother41appssincemostofthemrequirefurthermanualsteps,e.
g.
,logginginorevenpre-registeringanaccount.
Additionally,sincesomelibrariesencodeorencrypttheirtrafc,theleakagecannotbedirectlyconrmedevenwhentheappwaswell-explored.
TABLEV.
LEAKAGERESULTSBYPRIVACYCATEGORYINPLAY-15DATASETCategoryApps(%)Avg.
ItemsLibsAvg.
Libs/AppUserAttributes4,928(36.
5%)4.
194012.
38Account2,444(18.
1%)2.
472101.
81UserIdentiers5,157(38.
2%)3.
436591.
69LocationData4,307(31.
9%)2.
773791.
84Total5,385(39.
9%)7.
607092.
83Further,bycomparingPlay-16withTencent-16inTa-bleIV,weobservedthatindividualappsontheun-ofcialmarket(Tencent)tendtointegratemorethird-partylibraries(1.
32vs.
2.
1).
Sincethesecurityvettingprocessinapp-marketlikeTencentusuallynotbeasstrictasGoogle,appsinsuchun-ofcialstorestendtobemoreaggressiveincollectingprivatedata.
Anotherobservationisthatalthoughtheamountofthird-partylibrarieshasareasonableincrement,bycomparingfromsamedatasetcrawledindifferentperiods(Tencent-15andTencent-16,column7-9inTableIV),theoverallappsinthemarkettendtohavealmostidenticalleakagescale(seecolumn9).
Thisindicatessuchprivacyleakstothird-partylibrariesisalong-standingproblemwithoutnoticed,duetothelackeffectivediscovertoolslikeClueFinder.
Librarydistributionandleakagepatterns.
ClueFinderdis-coveredthat3,502librariesaccessprivateuserdata.
Tounder-standwhattheselibrariesareandhowtheycollectsensitiveinformation,wetookacloselookatthetop100mostpopularTABLEVI.
DISTRIBUTIONOFTOP100THIRD-PARTYLIBRARIESFROMPLAY-15DATASET.
LibraryCategory%Libs%AppsAds35%80.
7%Analytics27%68.
9%AppDevFramework26%36.
9%Utils21%16.
4%SocialNetwork14%6.
2%GameFramework11%9.
6%librariesfromourdatasets.
TableVIsummarizesourndings3,wherecolumn2showsthepercentageoflibrariesindifferentcategory,andcolumn3showsthepercentageofappscontainoneofsuchlibrariesfromallappsinvolvedinprivacyleakage.
Asweseehere,mostofthosegatheringuserdataturnouttobeadandanalyticallibraries(e.
g.
,Inmobi,AppBrain,etc.
).
Theselibrariesdonotenrichtheirhostingapps'functionalitiesbutconstitutethemajorsourceofinformationleaks.
Fromthewaystheselibrariesinteractwiththeirhostingapps,wecanseethattheyareeithergivenprivateinformationbytheappsthroughAPIcallsoractivelyharvestinformation(suchastransferredlocationdata,installedapplistondeviceandtimestampsforspecicevents)fromtheapps,withouttheappdeveloper'sawareness.
ClueFinderdifferentiatesthesetwoscenariosbylookingatwheretheidentiedsensitivestate-mentsarelocated:ifthestatementisinsidethehostingapp'scode,clearlytheapp'sdeveloperintendstopassinformationtoalibrary,oftenforenrichingtheapp'sfunctionalitiesorcom-municatingwithadvertisers;Otherwise,whenthestatementisfoundinthelibrarycode,apparentlythelibrarycollectsuserdatawithoutproperauthorization.
Forexample,thelibrarystartsaserviceinbackgroundwhenappinvokesoneofitspublicinterface.
WeshowthebreakdownofthesepatternsinFigure7.
Also,wepresentourndingsaboutthesecasesinCaseStudy(SectionV-C).
Leakedcontent.
TableVIIpresentsprominentexamplesforthedataitemsexposedtothethird-party,asdiscoveredbyClueFinder.
Itdoesnotcomeasasurprisethatseveralkindsofidentiersaredisclosed(e.
g.
,facebookid),oftentogether,sincetheyareoftenusedincombinationtotrackauser,evenwhenshere-installstheapporchangesherdevice.
Regardinguserproleslikegenderandnickname,mostofthemarefromsocialnetworkslikeFacebook.
Duetotheextensiveuseofmobilesingle-sign-on,oncetheuserauthenticatesanappwithhersocialnetworkaccount,anauthorizationofproleaccessonsocialnetworkhasalsobeengrantedtotheapp.
3Onelibrarycanhavemultiplefunctionalitiesindifferentcategories,aswecanseefromthetable.
11Fig.
7.
DistributionofleakagepatternsindetectedlibrariesAsaresult,someofsuchdataareopentootherthird-partylibrariestheappintegrates.
WepresentacasestudyaboutsuchaleakageinSectionV-C.
ExposureofuserlocationsisanothermajorsourceofleakscapturedbyClueFinder.
UnlikepriorndingsthatlocationdataarereaddirectlythroughSystemAPIs(e.
g.
,getLastKnownLocation[42]),interestinglymostsuchleaksreportedbyClueFinderarecausedbyretrievinglocation-relateddatafromother,lesssensitivesources,suchasanapp'spersistentstorage(e.
g.
,SharedPreference,localdatabase,etc.
).
Thisnewlocationacquisitionstrategycouldattributetotheenhancementofprivacyprotectionontoday'sAndroiddevices.
Increasingly,iOS-styleruntimeaccesscontrolhasbeenadoptedandevenmorene-grainedcontrol[10],suchasaskingfortheuserconsentforeverylocationaccess.
Asaresult,third-partylibrariestendtoavoidfrequentlyinvokingthesensitiveSystemAPIs,evenwhentheappindeedshasthelocationpermission,andinsteadreusethelocationdatacollectedwhentheapphasalegitimatereasontodoso,e.
g.
,whentheappisjustlaunchedtotheforeground.
Alsoweobservedthatsomelibrarieseventrytogatherotherlocation-relatedinformationthatdoesnotneedalocation-relatedpermissiontoaccess.
ExamplesincludetheBSSIDsofWi-Fihotspots,whichcanbeusedtoinferlocations,asreportedbythepriorresearch[41].
Itisalsoworthnotingthat12.
0%ofaggedthird-partylibrariesgathertheinformationaboutinstalledappsonadevice.
Thelistofinstalledappscanbeusedfordifferentpurposes.
Forexample,Ironsec[5]claimsonitswebsite,"Usingthisplatform,we'reabletoaccuratelypredictwhatappapersonwillwanttoinstallnext".
Asanotherexample,onthewebsiteofMoPub[8],itstates"Usertargetingallowsyoutotargetusersthathaveordon'thavespecicapplications".
Besides,Wefoundthatsomelibraries,suchasco.
inset.
sdk[1]andShareSDK[12],evenpersistentlymonitorappinstallation,collectingpackagenamesandotherdatasuchaslocations,app-usagetimeetc.
Asreportedbythepriorstudies[37],[18],[22],linkinginstalledappstopublicauxiliaryinformationcanleadtoviolationofuserprivacy:forexample,thepresenceofagaydatingappexposestheuser'ssexorientation.
C.
CaseStudyHerewepresentsomehigh-impactcasesdiscoveredinourresearchwithruntimevericationbyinterceptingtheirnetworktrafc.
AsshowedinTableVIII,thesecasesinvolvehigh-proleappsaswellaspopularlibraries.
Case1:Deliberateharvesting.
The-PaperisapopularappTABLEVII.
SELECTEDPROMINENTLEAKAGESAMPLESItem%inDetectedApps%inDetectedLibslocation86.
6%90.
5%facebookid26.
7%32.
5%gender22.
5%32.
6%applist15.
9%12.
0%nickname13.
1%10.
9%oauthtoken10.
4%12.
3%date-of-birth4.
2%3.
1%TABLEVIII.
APP&LIBRARYUSEDINCASESTUDYAppNameNum.
ofInstallsLibraryNum.
of.
AppsThe-Paper16millionShareSDK13,468Tinder50-100millionAppBoy419SnapTee10-50millionMixPanel7,284focusingonChinesepoliticalnewswith16milliondown-loads[14].
Likemanyothernewsapps,itallowsitsuserstosharenewstheyreadwithanysocialnetworks(e.
g.
Weibo)orfriendsusingthesenetworks.
Insidetheapp,thisfunctionisactuallyprovidedbyShareSDK,whichactsasasyndicatorthatintegratesmultiplesocialnetworks.
Thelibraryissupposedtoserveasa"proxy",acceptingtheuser'ssharingrequestandforwardingthecontenttotheintendedsocialnetworkplatform(e.
g.
,Weibo).
However,ClueFinderfoundthatthislibraryalsoaccessesdetaileduserproledata,whichobviouslygoesbeyondthelibrary'sstatedfunctionality.
OurmanualreviewoftheappcodefurthershowsthatShareSDKactuallydeliberatelycollectsmuchmoreprivatedatathannecessary.
ByutilizingtheauthorizedpermissionforSharing,ShareSDKalsogainstheabilitytoreadotherdataabouttheuseronitssocialnetwork.
Asaresult,itcollectsalluserproleinformationliketruename,gender,verifystatus,eveneducationbackgroundinformation.
Also,itrecordssuchdataandsendthemtoitsownserver.
WelistpartofthesensitiveinformationShareSDKcollectsinTableIX.
TABLEIX.
PRIVATEDATAWHICHCOULDBECOLLECTEDBYSHARESDKAppInfotop-taskapplist,appstarttimestamp,appendtimestamp,newinstallapp,newuninstallappinfo,etc.
SocialNetworkInfoWeiboweiboid,nickname,truename,veriedreason,gender,snsurl,resume,friendlist,sharedposts,latitude,longitude,likedposts,etc.
Facebookfacebookid,nickname,gender,birthday,sns-url,friendlist(includingaccessiblefriendinfo),verifystatus,education(schoolname,type,year),work(company,employer,start&enddate),etc.
Otherstumblr,dropbox,pinterest,line,tencentqq,tencentqzone,wechat(friendlist),twitter,net-easemicroblog,evernote,google+,etc.
OurfurtherinvestigationshowsthatShareSDKiswidelyintegratedbymostpopularChineseapps,eachwithmorethanmillionsorevenbillionsofdownloads.
However,thelibrary'sprivacyharvestingbehaviorshaveneverbeenreported12andthereforearetotallyoblivioustotheappusers.
Usingsuchproledata,thisSDKcantrackauserandidentifyherpersonalcharactersfromdifferentvectors(e.
g.
,whatshe"liked"onWeibo,whatpostsshemarkedasfavourite),aswellasherownsocialconnections(e,g.
,friendlist,followers,workingcompany,etc.
).
Also,sincethelibraryrecordstheuser'soperationsonitshostingapps(whatshesharedtosocialnetwork)andthereforeknowsalotabouther,forexample,herpoliticalstands.
Case2:Appdataover-sharing.
Tinderisafamousdatingapp,witharound50-100millionsofdownloadsontheGoogle-Playstore[15].
TheappintegratesAppBoy[6]tocollectstatisticsinformationaboutitsusers.
Eachtimeausertakesacertainactionwithintheapp,TindersynchronizesitsactionrecordtoAppBoy,togetherwithmanysensitivedataabouttheuser.
Asillustratedbelow,whenauserrefreshesawin-dowtodisplayothernearbyusers,Tindersendsherpreciselocation,bioinformation,datingtargets,aswellashernameonInstagramtoAppBoy.
Allsuchinformationdisclosuresareunknowntotheuser,asshowedbelow.
{"package_name":"com.
tinder","extras":{"device":{"push_token""user":{"SeekingDistance":50,"gender":"f","AccountCreationDate":"2017-05-1*T16:56:32.
163Z","SeekingGender":1,"HasWorkInfo":true,"HasEducationInfo":true,"Instagram":"Susan_***","HasBio":true,"NumberofProfilePhotos":15},"sessions":[{"guid":".
.
.
","start_time":1.
479401816693E9,"events":[{"d":{"ll_accuracy":19.
80900,"longitude":-8*.
4778,"latitude":3*.
1615},}]}Case3:Socialnetworkdataover-sharing.
SnapTee(co.
snaptee.
android)isapopularT-Shirtdesignappthatallowsuserstobuyteeseithercustomizedbythemselvesorbyotherdesigners.
Also,userscansharetheirdesignswithvarioussocialnetworks(e.
g.
,Facebook,Twitter)throughtheapp.
WeobservedthatwhenauserconnectsherSnapTeeaccountwithasocialnetwork,Snapteeupdatesherproleincludingfullname,email,accountIDandotherinformationcollectedfromthesocialnetwork.
Further,theapppassesallsuchproledatatoadataanalyticlibraryMixPanel[7].
FromMixPanel'swebsite,wefoundthatthelibraryisdesignedto"understandwhoyourusersare,seewhattheydobeforeoraftertheysignup".
However,theuseriskeptinthedarkwhensuchdatacollectionandsharinghappen.
FollowingistheinformationSnapteeshareswithMixPanel.
{"$set":{"$username":"p***t",.
.
,"$email":"li**v@gmail.
com",.
.
,"$first_name":"John","$last_name":"Smith","Twitter":"795**16"},"$token":"f81d***cdf96","$time":"1479324910201",.
.
.
}}VI.
DISCUSSIONAndroidusershavebeensufferingfromprivacyleak-ageissueforalongtime.
Fortunately,withtheadventofClueFinder,theissuewillbemitigatedbecausedeveloperscantrackvarioustypesofsensitivedatainamoreefcientway.
Specically,thecombinationofsemantic-basedandcodestructurebasedanalysismakespreciselocalizationagainstprivatedatapossible,whereastraditionalmethodsusingxedAPIscannotlabel.
WiththehelpofmoresensitivesourcesfoundbyClueFinder,existingprivacyleakageanalysistoolscanbeimprovedbytakingadvantageofbetterprecisionandwidercoverageofusers'sensitivedata.
Forexample,ClueFindercanbeemployedwithbothstaticanddynamictaintanalysis[23][16],byassigningdataobjectswithinthestatementsassensitivesources.
Itcanalsobeappliedinvariousaccesscontrolmechanisms[20][19]fornegrainedcontroloverthesensitivedatawithintheapp.
Ourmeasurementstudyagainst445,668appshelpedustogetabetterunderstandingtotheprivacy-leakageissueofAndroidapps.
Althoughmostoflegitimateappsprovidesinformationabouthowtheymanageusers'privatedata,includ-ingwhatfromthird-partylibraries,theirvaguedescriptionsareproventobeweakandunpracticalforeffectiveprotectiontouser'sprivatedata[43].
Ourndingsincludingtheover-sharingandthird-partyaggressivedatacollectionhighlightthenecessityofne-grainedaccesscontrolsovertheseprivatedata.
Forexample,alertinguserswithdetailedprivatedataleakageinformationbythird-partylibraryatruntime.
Admittedly,ClueFinderdoeshavelimitations.
Forinstance,sinceClueFinderheavilyreliesonsemanticsinappcodetodis-coverpossibleprivatedata,obfuscationmayhelpadversariestoevadeouranalysis,assemanticsincludingstringsormethodnamesarehelplessforthosecases.
However,asmentionedinourevaluation(SeesectionIV),giventhatmostappsdonotobfuscatetheentirecodebase,ClueFinderisstillaverypracticalapproachofdiscoveringprivatedataatalarge-scale.
Besides,theeffectivenessofClueFindercanbefurtherimprovedfromseveralaspects.
(1)Findmoresemanticre-sourcesintheapptoimprovethecoverageofClueFinder.
CurrentimplementationofClueFinderonlyconsiderseman-ticsfrommethodnames,variablenamesandstringcon-stants.
Otherinformationlikepackagename(e.
g.
,face-book.
userInfo.
facebookUserProle)mayalsoprovideabundantsemantics.
(2)FindmorefeaturesinappcodetoimprovetheprecisionoftheSVMclassier.
E.
g.
,featuresatthecaller/calleeofthecandidatestatementsmayalsohelptodecideifitcontainssensitivedata.
What'smore,ourmeasurementresultsforprivacyleakage(SectionV-B)indeedtellifaspecicprivatedatahavebeenaccessedbythird-partylibraries,whiletheresultsneedfurtherpruning:First,themeasurementdidnotconrmifallsuchprivatedataaccessedbythird-partylibrariesareindeedleakedoutatalargescale.
Instead,asmentionedinSectionV-B,wemanuallyvalidatedasmallsetofappsandconrmedoverhalfoftheminvolvedinprivacyleakage,asalower-boundoftheactualleakagescale.
AlthoughitispossibletogiveafurtherstatictaintanalysisbyassigningnetworkAPIsasthenalsinks,theresultmaynotbefeasibleduetothefundamentallimitationofstaticanalysisapproach(e.
g.
,13heavy-weightandlessprecise).
Furthermore,oursystemClueFinder,wasdesignedtondmoresensitivedatasources.
Servingthispurpose,ourapproachachievesaprecisionof91.
5%(SectionIV-B).
Themeasurementofprivacyleakagetothird-partylibrariesisjustademonstrationofhowourtechniquecanbeused.
Second,ourcurrentapproachcannotautomaticallydistinguishifagivenaccessbythird-partylibrariesisreasonable,thoughourmanualanalysisshowsthatmostofsuchaccesstoprivatedataissuspicious.
FurtheranalysiscouldutilizesemanticsfromappUI,appdescriptionsandmanyotherpossiblesourcestodetermineifsuchaccessisbenignormalicious.
AnaccessisregardedasmaliciousonlyifthereisnomatchedUIorappdescriptionfortheprivatedataaccessinapps.
VII.
RELATEDWORKPrivacyleakagedetection.
EffectiveprivacyleakagedetectionmethodsinAndroidplatformhavebeenstudiedforalongtime.
Bothstatic[16],[24]anddynamic[23]taintanalysistechniquesaredevelopedandwidelyusedtotrackprivatedata.
However,alltheseapproachesonlytakeintoconsiderationxedSystemAPIsassensitivedatasources,likeIMEI,phonenumber,etc.
AnexceptionisSUSI[35]thatidentiesmoreprivacysourcesinAndroidbyusingmachine-learningtoanalyzeAndroidsystemlibraries.
MudFlow[17]leveragessuchsourceslabelledbySUSItomineappsforabnormalusageofsensitivedatainmobileapps.
However,thesedataarestillwalkingaroundAPIsandaremainlycontrolledbysystem.
Further,UIPicker[32]andSUPOR[25]proposedifferentapproachestoidentifysensitivedatafromappUIs,theseapproachesidentifysensitivedatafromuserinput.
UIPickerusesaSVMclassiertojudgeifagivenelementinaUIisprivacy-criticalornot,bylearningonlysemanticfeatures(e.
g.
,ifasetofprivacy-relatedkeywordsappearsimultane-ously).
Incontrast,ClueFinderpipesthecodestructureasafeaturetoaSVMclassiertolocateprivatedatawithinappcodes.
TheseapproachesmentionedabovecannotcompletelycoverallprivatedataidentiedbyClueFinder.
BidText[26]introducesabi-directionaldatapropagationmechanismfordetectingprivacyleaks.
Differentfromours,BidTextonlydetectswhetheraspecicprivatedataisleakedtosystemlogsornetworklikeHTTPrequests,regardlessofitsresponsibility.
Bycomparison,ourworkfocusesonthemeasurementagainstprivacyleakagetothird-partylibraries,thatismorehelpfultotheunderstandingofrealworldthreatsresultedfromsuchprivacyleakages.
SimilartoClueFinder,Recon[36]detectstheleakageofawiderangeofusers'privatedata,whichiscalledpersonalidentiableinformation(PII)byRecon.
However,differentfromClueFinderinbothapproachesandpurposes,Reconemploysadynamicanalysisovermobileappstodirectlyconrmleaksbymonitoringnetworktrafc,whileClueFinderfocusesondiscoveringprivatesourcesthroughitsstaticanalysisoverdecompiledappcode.
Also,RecondirectlyenablesuserstoviewPIIleaksfromnetworkows,whileClueFinderprovidesabasictoolforotherexistingapproachestodetectmoreprivacyleaksinastaticway.
NLPanalysisovermobileapps.
TherearelotsofworksutilizingNLPtechniquestoconductsemantic-basedanalysisagainstmobileappsfordifferentpurposesintheeldofmobilesecurity.
Whyper[33]andAutoCog[34]inspectifapermis-sionrequestisreasonablebyanalyzingitsappdescriptions.
SimilartoClueFinder,theyusedependencyrelationparsingtounderstandwhetheragivenappcontainsdescriptionsaboutitspermissionusage.
BidText[26]introducesdependencyrelationparsingtodecideifaphraseorsentenceisrelatedtoprivatedata,however,itonlyexcludesspecickeywordswithimperativenegation(e.
g.
,"youshouldnot")forlabellingsensitivedata.
AsDroid[27]detectsifasensitiveoperation(e.
g.
,sendingSMS)matchesitscontentsintheuserinterface,foridentifyingsuspiciousbehaviorswithinapps.
UIPicker[32]alsoutilizessomebasicNLPtechniques(e.
g.
,stemmingforkeywords)asitspre-processingstepforanalysingtextualresourcesinappUIforlocatingprivateinformation.
However,bothAsDroidandUIPickerdidnotconsiderdependencyparsingoversensitivekeywordswithinthesentence,thusmayintroducefalsepositivesforrecognizingprivacy-relatedentities.
AlltheseapproachescanfurthertakeadvantagesfromClueFinder,byemployingamorecomprehensiveNLPanalysisoverappcodeorlayoutresourcestoimprovetheireffectiveness.
VIII.
CONCLUSIONInthispaper,wegiveourresearchondetectingprivacyleakageonmobileappsatalarge-scale.
Toaddressthemainchallengethatmanynewtypesofprivatedata(e.
g.
,sensitivedataonserver-side)cannotbeeffectivelyidentiedbytraditionalapproaches,weproposeClueFinder,anewtech-niqueforsensitivedatasourcediscovery.
ClueFinderleveragessemanticinformationfromappcode,togetherwiththeiruniqueprogramstructuresoftheircontexttoaccuratelyandefcientlyndprivacy-relateddatawithinagivenapp.
TheevaluationresultsshowedClueFinderachievesaveryhighprecisionandoutperformssimilarexistingworktoalargeextent.
Also,usingthistechnique,weinvestigatedthepotentialinformationexposuretothird-partylibrariesover445,668appswithaseriesofndings.
Thesendingshelpbetterunderstandtheprivacyexposureriskandhighlighttheimportanceofdataprotectionintoday'ssoftwarecomposition.
ACKNOWLEDGEMENTSWewouldliketothanktheanonymousreviewersandourshepherdChrisKanichfortheirinsightfulcommentsthathelpedimprovethequalityofthepaper.
WealsothankTongxinLifromPekingUniversity,NanZhangfromIU,andLiTanfortheirassistanceinourexperiments.
ThisworkisfundedinpartbytheNationalProgramonKeyBasicResearch(NO.
2015CB358800),theNationalNaturalScienceFoundationofChina(61602121,U1636204,61602123),theShanghaiSailingProgramunderGrant16YF1400800.
TheIUauthorissupportedinpartbytheNSFCNS-1527141,1618493,AROW911NF1610127andSamsunggiftfund.
REFERENCES[1]"Analyticsdk:co.
inset.
sdk,"https://www.
youtube.
com/watchv=sV0GwIl4oWs,accessed:2017-08-10.
[2]"Bidtext-releasedversion,"https://bitbucket.
org/hjjandy/toydroid.
bidtext,accessed:2017-08-10.
[3]"Googleprivacypolicy,"https://www.
google.
com/policies/privacy/,ac-cessed:2017-08-10.
14[4]"Inmobi,"http://inmobi.
com,accessed:2017-08-10.
[5]"Ironsec-userprolingfunction,"http://www.
ironsrc.
com/atom/user-proling/,accessed:2017-08-10.
[6]"Meetappboy-mobileengagementmarketingtechstartup,"https://www.
appboy.
com/about/,accessed:2017-08-10.
[7]"Mixpanel,"https://mixpanel.
com,accessed:2017-08-10.
[8]"mopub,"http://www.
mopub.
com/resources/docs/mopub-ui-account-setup/creating-managing-orders-and-line-items/line-item-targeting/,accessed:2017-08-10.
[9]"Proguard-theopensourceoptimizerforjavabytecode,"https://www.
guardsquare.
com/en/proguard,accessed:2017-08-10.
[10]"Requestingpermissions,"https://developer.
android.
com/training/permissions/requesting.
html,accessed:2017-08-10.
[11]"scikit-learn,"http://scikit-learn.
org/,accessed:2017-08-10.
[12]"Sharesdkforandroid,"http://www.
mob.
com/downloadDetail/ShareSDK/android,accessed:2017-08-10.
[13]"Snaptee:T-shritdesign,"https://play.
google.
com/store/apps/detailsid=co.
snaptee.
android,accessed:2017-08-10.
[14]"The-paper-news,"http://www.
thepaper.
cn/,accessed:2017-08-10.
[15]"Tinder,"https://play.
google.
com/store/apps/detailsid=com.
tinder,ac-cessed:2017-08-10.
[16]S.
Arzt,S.
Rasthofer,C.
Fritz,E.
Bodden,A.
Bartel,J.
Klein,Y.
LeTraon,D.
Octeau,andP.
McDaniel,"Flowdroid:Precisecontext,ow,eld,object-sensitiveandlifecycle-awaretaintanalysisforandroidapps,"inProceedingsofthe35thACMSIGPLANConferenceonProgrammingLanguageDesignandImplementation,2014.
[17]V.
Avdiienko,K.
Kuznetsov,A.
Gorla,A.
Zeller,S.
Arzt,S.
Rasthofer,andE.
Bodden,"Miningappsforabnormalusageofsensitivedata,"inProceedingsofthe37thInternationalConferenceonSoftwareEngineering-Volume1.
IEEEPress,2015,pp.
426–436.
[18]M.
Backes,S.
Bugiel,andE.
Derr,"Reliablethird-partylibrarydetec-tioninandroidanditssecurityapplications,"inProceedingsofthe2016ACMSIGSACConferenceonComputerandCommunicationsSecurity.
ACM,2016,pp.
356–367.
[19]A.
R.
Beresford,A.
Rice,N.
Skehin,andR.
Sohan,"Mockdroid:tradingprivacyforapplicationfunctionalityonsmartphones,"inProceedingsofthe12thworkshoponmobilecomputingsystemsandapplications.
ACM,2011,pp.
49–54.
[20]S.
Bugiel,S.
Heuser,andA.
-R.
Sadeghi,"Flexibleandne-grainedmandatoryaccesscontrolonandroidfordiversesecurityandprivacypolicies.
"inUSENIXSecuritySymposium,2013,pp.
131–146.
[21]Y.
-W.
ChenandC.
-J.
Lin,"Combiningsvmswithvariousfeatureselectionstrategies.
"Springer,2006,pp.
315–324.
[22]S.
Demetriou,W.
Merrill,W.
Yang,A.
Zhang,andC.
A.
Gunter,"Freeforall!
assessinguserdataexposuretoadvertisinglibrariesonandroid,"inProc.
ofNDSS'16,2016.
[23]W.
Enck,P.
Gilbert,B.
-G.
Chun,L.
P.
Cox,J.
Jung,P.
McDaniel,andA.
N.
Sheth,"Taintdroid:aninformationowtrackingsystemforreal-timeprivacymonitoringonsmartphones,"inCommunicationsoftheACM,vol.
57,no.
3.
ACM,2014,pp.
99–106.
[24]M.
I.
Gordon,D.
Kim,J.
Perkins,L.
Gilham,N.
Nguyen,andM.
Rinard,"Information-owanalysisofandroidapplicationsindroidsafe,"inProc.
ofNDSS'15,2015.
[25]J.
Huang,Z.
Li,X.
Xiao,Z.
Wu,K.
Lu,X.
Zhang,andG.
Jiang,"Supor:preciseandscalablesensitiveuserinputdetectionforandroidapps,"in24thUSENIXSecuritySymposium,2015,pp.
977–992.
[26]J.
Huang,X.
Zhang,andL.
Tan,"Detectingsensitivedatadisclosureviabi-directionaltextcorrelationanalysis,"inProceedingsofthe24thACMSIGSOFTInternationalSymposiumonFoundationsofSoftwareEngineering.
ACM,2016,pp.
169–180.
[27]J.
Huang,X.
Zhang,L.
Tan,P.
Wang,andB.
Liang,"Asdroid:detectingstealthybehaviorsinandroidapplicationsbyuserinterfaceandprogrambehaviorcontradiction.
"inProc.
ofICSE'14,2014,pp.
1036–1046.
[28]Y.
Z.
X.
JiangandZ.
Xuxian,"Detectingpassivecontentleaksandpollutioninandroidapplications,"inProc.
ofNDSS'13,2013.
[29]D.
KleinandC.
D.
Manning,"Accurateunlexicalizedparsing,"inPro-ceedingsofthe41stAnnualMeetingonAssociationforComputationalLinguistics-Volume1.
AssociationforComputationalLinguistics,2003,pp.
423–430.
[30]W.
Meng,R.
Ding,S.
P.
Chung,S.
Han,andW.
Lee,"Thepriceoffree:Privacyleakageinpersonalizedmobilein-appads,"inProc.
ofNDSS'16,2016.
[31]T.
Mikolov,I.
Sutskever,K.
Chen,G.
S.
Corrado,andJ.
Dean,"Distributedrepresentationsofwordsandphrasesandtheircomposi-tionality,"inAdvancesinneuralinformationprocessingsystems,2013,pp.
3111–3119.
[32]Y.
Nan,M.
Yang,Z.
Yang,S.
Zhou,G.
Gu,andX.
Wang,"Uipicker:User-inputprivacyidenticationinmobileapplications,"in24thUSENIXSecuritySymposium,2015,pp.
993–1008.
[33]R.
Pandita,X.
Xiao,W.
Yang,W.
Enck,andT.
Xie,"Whyper:Towardsautomatingriskassessmentofmobileapplications.
"inUSENIXSecuritySymposium,vol.
13,no.
20,2013.
[34]Z.
Qu,V.
Rastogi,X.
Zhang,Y.
Chen,T.
Zhu,andZ.
Chen,"Autocog:Measuringthedescription-to-permissiondelityinandroidapplica-tions,"inProc.
ofACMCCS'14,2014,pp.
1354–1365.
[35]S.
Rasthofer,S.
Arzt,andE.
Bodden,"Amachine-learningapproachforclassifyingandcategorizingandroidsourcesandsinks,"inProc.
ofNDSS'14,2014.
[36]J.
Ren,A.
Rao,M.
Lindorfer,A.
Legout,andD.
Choffnes,"Recon:Revealingandcontrollingpiileaksinmobilenetworktrafc,"inProceedingsofthe14thAnnualInternationalConferenceonMobileSystems,Applications,andServices.
ACM,2016,pp.
361–374.
[37]J.
Rubin,M.
I.
Gordon,N.
Nguyen,andM.
Rinard,"Covertcommuni-cationinmobileapplications(t),"inAutomatedSoftwareEngineering(ASE),30thIEEE/ACMInternationalConference.
IEEE,2015,pp.
647–657.
[38]S.
Son,D.
Kim,andV.
Shmatikov,"Whatmobileadsknowaboutmobileusers,"inProc.
ofNDSS'16,2016.
[39]E.
Steel,C.
Locke,E.
Cadman,andB.
Freese,"Howmuchisyourpersonaldataworth,"2013.
[40]R.
Stevens,C.
Gibler,J.
Crussell,J.
Erickson,andH.
Chen,"Inves-tigatinguserprivacyinandroidadlibraries,"inWorkshoponMobileSecurityTechnologies(MoST),2012,p.
10.
[41]X.
Zhou,S.
Demetriou,D.
He,M.
Naveed,X.
Pan,X.
Wang,C.
A.
Gunter,andK.
Nahrstedt,"Identity,location,diseaseandmore:In-ferringyoursecretsfromandroidpublicresources,"inProceedingsofthe2013ACMSIGSACconferenceonComputer&communicationssecurity.
ACM,2013,pp.
1017–1028.
[42]Y.
Zhou,Z.
Wang,W.
Zhou,andX.
Jiang,"Hey,you,getoffofmymarket:detectingmaliciousappsinofcialandalternativeandroidmarkets.
"inNDSS,vol.
25,no.
4,2012,pp.
50–52.
[43]S.
Zimmeck,Z.
Wang,L.
Zou,R.
Iyengar,B.
Liu,F.
Schaub,S.
Wilson,N.
Sadeh,S.
M.
Bellovin,andJ.
Reidenberg,"Automatedanalysisofprivacyrequirementsformobileapps,"inProc.
ofNDSS'17,2017.
15

易探云:香港CN2云服务器低至18元/月起,183.60元/年

易探云怎么样?易探云最早是主攻香港云服务器的品牌商家,由于之前香港云服务器性价比高、稳定性不错获得了不少用户的支持。易探云推出大量香港云服务器,采用BGP、CN2线路,机房有香港九龙、香港新界、香港沙田、香港葵湾等,香港1核1G低至18元/月,183.60元/年,老站长建站推荐香港2核4G5M+10G数据盘仅799元/年,性价比超强,关键是延迟全球为50ms左右,适合国内境外外贸行业网站等,如果需...

腾讯云CVM云服务器大硬盘方案400GB和800GB数据盘方案

最近看到群里的不少网友在搭建大数据内容网站,内容量有百万篇幅,包括图片可能有超过50GB,如果一台服务器有需要多个站点的话,那肯定默认的服务器50GB存储空间是不够用的。如果单独在购买数据盘会成本提高不少。这里我们看到腾讯云促销活动中有2款带大数据盘的套餐还是比较实惠的,一台是400GB数据盘,一台是800GB数据盘,适合他们的大数据网站。 直达链接 - 腾讯云 大数据盘套餐服务器这里我们看到当前...

LiCloud:香港CMI/香港CN2+BGP服务器,30Mbps,$39.99/月;香港KVM VPS仅$6.99/月

licloud怎么样?licloud目前提供香港cmi服务器及香港CN2+BGP服务器/E3-1230v2/16GB内存/240GB SSD硬盘/不限流量/30Mbps带宽,$39.99/月。licloud 成立於2021年,是香港LiCloud Limited(CR No.3013909)旗下的品牌,主要提供香港kvm vps,分为精简网络和高级网络A、高级网络B,现在精简网络和高级网络A。现在...

thinksns为你推荐
phpadmin下载phpmyadmin怎么安装啊?可以直接下载安装吗?还需要下载其他数据库吗?企业推广推广专员一般每天要做哪些工作linux防火墙设置LINUX系统怎么关闭防火墙企业信息查询系统官网我公司注册不久,如何在网上查询到河南省全民健康信息平台建设指引(试行)资费标准中国电信套餐资费一览表2021科创板首批名单江苏北人的机器人在同行中的评价怎么样?什么是seo学习SEO的好处是什么?powerbydedecms如何去掉织梦网站底部的powered by dedecms方法联系我们代码农业银行代码
过期域名查询 欧洲欧洲vps 河南vps 如何注册中文域名 阿里云搜索 分销主机 海外服务器 美国主机代购 12u机柜尺寸 南昌服务器托管 百兆独享 免费个人空间 789电视网 135邮箱 lol台服官网 电信虚拟主机 网通服务器托管 中国电信宽带测速网 卡巴斯基是免费的吗 web服务器搭建 更多