usergoogle

google统计  时间:2021-02-11  阅读:()
CopyrightIBMCorporation2013TrademarksDatascienceandopensourcePage1of8DatascienceandopensourceLearnaboutopensourcetoolsforconvertingdataintousefulinformationM.
TimJonesAugust09,2013Datasciencecombinesmathematicsandcomputerscienceforthepurposeofextractingvaluefromdata.
Thisarticleintroducesdatascienceandsurveysprominentopensourcetoolsinthisrapidlygrowingfield.
Thegoalofdatascienceistheextractionofusefulinformationfromadataset.
Companieshaverecognizedthevalueofdataasabusinessassetforalongtime.
Butthehugedatavolumesthatarenowavailablenecessitatenewwaystomakesenseofdataandmanageitefficiently.
Agrowingcadreofengineersandscientistsarebuildingsystemstoapplydatasciencetomassivedatavolumes.
Thisarticleintroducesyoutothefieldofdatascienceandtoopensourcetoolsthatareavailablefortoday'sdatascientist.
DatascienceanddatascientistsDatasciencebeginswiththecollectionofdata.
Candidatesforcollectioncanbeopendataordatathatcomesfrominternalbusinessprocesses(forexample,websitestatistics).
Nextcomesrefinement:theinventiveprocessthatreducesthedatatousefulinformationthatanswersspecificquestions.
Typically,thequestionsdefinetheapproachtotheextractionoftheinformation.
Withinthecollectionandrefinementstepsareotherimportantaspectssuchasdatacleansing(orpreprocessing)anddatavisualization.
OpendataOpendataistheconceptofdemocratizingdatabymakingitfreelyavailabletoeveryonetouseastheywant.
Thegrowingopendatamovementfollowstheideasbehindopensource.
AusefulsourceofopendataisData.
gov(seeRelatedtopics),aUSgovernmentwebsitethatwascreatedtoincreasepublicaccesstodatageneratedbytheexecutivebranchofthefederalgovernment.
Youcanalsoviewdatascienceasabusinessprocess.
MikeLoukidesofO'Reillymakesacompellingcasethatdatascienceistheconversionofdatanotonlyintoinformationbutalsointoproducts(seeRelatedtopics).
Fromthatperspective,thefieldisamodern-daygoldrush—acompetitivesearchforthevaluablenuggetsinmountainsofinformation.
developerWorksibm.
com/developerWorks/DatascienceandopensourcePage2of8Theprospectorsinthedatagoldrusharecalleddatascientists.
Asbusinessesrecognizethevalueintheirdata,theneedfortalentedmultidisciplinaryengineersandscientistsisgrowing.
Datascientistsmusthaveskillsincomputerscience,math,andstatistics.
Ideally,theyalsohavedomainknowledge—anunderstandingofthesourceofthedata(medical,financial,web,andotherdomains).
Figure1illustratesdatascienceastheintersectionofcomputerscience,mathandstatistics,anddomainknowledge:Figure1.
KeydisciplinesofthedatascientistWiththiscompleteskillset,thedatascientistcantranslatedomainknowledgeandmathintoanapplication(fromthecomputersciencedomain)thatminesdataandrefinesitintoinformation.
Thekeyisamultidisciplinaryfocus(whichcanalsoincludedomainssuchasmachinelearningandinformationretrieval).
Engineersandscientistswithbigdataanalyticsexperienceareinhighdemandthesedays.
McKinsey&Companypredictsthatby2018ashortageofpeoplewhocanfitthedatascientistrolewilloccur(seeRelatedtopics).
Theideasandapproachesindatascienceareusefulinmanyotherdisciplinestoo.
Evenifyoudon'taspiretobecomeadatascientist,datascienceskillscanbeagreatadditiontoyourengineeringtoolbox.
WheredatascienceisusedLikecloudcomputing,datascienceisrapidlygaininginterestandadoption.
Overtheyearbeforethisarticlewaswritten,interestindatascienceroughlydoubled,accordingtoGoogleInsightsforSearch(formerlyGoogleTrends).
GoogleInsightsforSearchisitselfanexampleofdatascienceinaction.
Figure2showsthatthefrequencyofdatascienceasawebsearchtermincreaseddramaticallybetweenthesummerof2011andthespringof2012:ibm.
com/developerWorks/developerWorksDatascienceandopensourcePage3of8Figure2.
GoogleInsightsforSearchdataoninterestindatascienceDatascienceisquicklybecomingastaplewithinorganizationsthatharvestdataonline(beitcrawling-basedcollectionorinternalcollectionthatisbasedonuserbehaviorssuchasclicks).
MajorwebsitessuchasGoogle,Amazon,Facebook,andLinkedInallhavetheirowndatascienceteamstousetheiravailabledata(seeRelatedtopics).
Google'sdevelopmentofthePageRankalgorithmisanearlyexampleofdatascience.
Googlecrawlsthewebandassignsanumericalweighttothehyperlinksoneverypagetomeasuretherelativeimportanceofthoselinks.
(FulldetailsofPageRankareknownonlywithinGoogle.
)Thealgorithmservesasthemeansofrankingwebcontentasafunctionofsearchterms.
LargeonlineretailerssuchaslikeAmazonandWalmartusedatasciencetotrytoincreasesales.
Theygeneraterecommendationstoindividualusersthatarebasedtheuser'sproductsearchesandpastpurchases.
LinkedIn,aprofessionalnetworkingsite,maintainsahugeamountofdatathatisrelatedtopeopleandtheircareers,interests,andconnections.
Thismassivenetworkofdataresultedinvariousrecommendationengines(forindividuals,groups,andcompanies)andprojectsthatusethedataatadeeperleveltoproducenewproductsatLinkedIn.
Onenovelexampleofdatascienceatawebpropertyisthecompanybitly.
Onthesurface,bitlyisaservicethatenablesuserstoshortenanyURLtoa19-charactermaximumURL(whichisstoredpermanentlyinbitly'sdatacenter).
ReferencestotheshortenedURLareredirectedfrombitlytotheoriginalURL.
bitlycanthenseewhichURLspeopleshortenandwhichURLsotherusersclick.
Thistacticprovidesanenormousamountofdatathatbitly(anditschiefscientist,HilaryMason)canusetogenerateawealthofstatisticsaboutbrowsinghabits.
UserswhoareregisteredwithbitlycanseewhentheirshortenedURLswereclicked,throughwhichreferrer(emailclient,Twitter,oranotherURL),andfromwhichcountry.
Businessescanalsousebitlytotrackuserbehaviorforasetofcontent.
developerWorksibm.
com/developerWorks/DatascienceandopensourcePage4of8OpensourcetoolsfordatascienceJustascomputerprogrammingisn'tconstrainedtoasinglelanguageordevelopmentenvironment,datascienceisn'tassociatedwithasingletoolortoolsuite.
Arichandbroadarrayoftoolsintheopensourcedomainadvancedatascience.
Theyincludetoolsthatprocesslargedatasetsnumerically,andvisualizationandprototypingtoolsthataidinthedevelopmentofcomplexprocessing.
Table1listsprominentopensourcetoolsfordatascientistsanddefinestheirroles:Table1.
OpensourcetoolsfordatascienceToolDescriptionApacheHadoopFrameworkforprocessingbigdataApacheMahoutScalablemachine-learningalgorithmsforHadoopSparkCluster-computingframeworkfordataanalyticsTheRProjectforStatisticalComputingAccessibledatamanipulationandgraphingPython,Ruby,PerlPrototypingandproductionscriptinglanguagesSciPyPythonpackageforscientificcomputingscikit-learnPythonpackageformachinelearningAxiisInteractivedatavisualizationThelistinTable1isn'texhaustivebutinsteadrepresentssomeofthecoreelementswithinthedatascientist'stoolbox.
Theopensourcedomainisalsofilledwithhighlyspecializedanddomain-specificlibrariesandtools(forexample,utilitiesforinteractivemapvisualizationandfortextanalysis).
Hadoop,Mahout,andSparkTheInternetcreatesopportunitiestocollectmassesofdataaboutusers'behaviorandhabits.
ApacheHadoopisthepremierframeworkforprocessingmassivedatasets.
Hadoopisimportantfordatasciencebecauseitprovidesascalableframeworkfordistributeddataprocessing.
Notalldatascienceproblemsrequirebigdataprocessing,butHadoopisidealwhenyourprobleminvolvesInternet-scaledata.
TheGoogleMapReduceframework'simplementationofthePageRankalgorithmisanearlyexampleofdatascienceonabigdataframework.
(HadoopisanimplementationofMapReduce.
)ApachePigcanmakeHadoopevenmoreaccessible,bringingaquerylanguagethatautomaticallybuildsMapReduceapplications(seeRelatedtopics).
ApacheMahoutisanimplementationofscalablemachine-learningalgorithmsontheHadoopplatform(seeRelatedtopics).
Mahoutincludesscalableimplementationsofclusteringalgorithmsandbatch-basedcollaborativefilteringalgorithms(forimplementingrecommendationsystems).
AnothernoteworthysolutionforlargedatasetsistheSparkframework(seeRelatedtopics).
Sparkincludesoptimizationssuchasin-memoryclustercomputingwithfault-tolerantabstractions.
TheRprojectAtoolthat'softenfoundinthedataminer'stoolkitisaprogramminglanguageanddevelopmentenvironmentcalledR.
Rfocusesonstatisticalcomputingandgraphics.
Risrelativelysimpleibm.
com/developerWorks/developerWorksDatascienceandopensourcePage5of8tolearnandiswidelyusedinthedomainofdataanalysis.
Beingopensourceandfree,Risapopularlanguagewithalargeuserbase.
Risamultiparadigmlanguagethatsupportsobject-oriented,functional,procedural,andimperativeprogrammingstyles.
Thelanguageisinterpretedthroughacommand-lineinterfaceandalsoincludesextensiveproduction-levelgraphicalcapabilities.
Staticgraphicsareavailableoutofthebox.
Withadditionalpackages,bothdynamicandinteractivegraphsarepossible.
Figure3showsanexampleplotthatwasgeneratedwithR:Figure3.
Sample3DsincplotthatusesRTheRprogramminglanguagewasdevelopedinCandFortran.
ManyoftheinternalstandardfunctionsinRwerewritteninRitself.
Rsupportsmixed-languageprogramming,enablingaccesstoRobjectsfromlanguagessuchasCandJava.
YoucaneasilyextendthecapabilitiesofRbyusingpackages,whichcanbedevelopedintheR,C,Java,andFortranprogramminglanguages.
ScriptinglanguagesMultiparadigmscriptinglanguagessuchasPython,Ruby,andPerlprovideaprofessionalplatformforapplicationdevelopmentanddeployment.
Andtheyareidealforprototypingandtestingnewideas.
Theselanguagesalsosupportvariousdatastorageandcommunicationformats,suchasXMLandJavaScriptObjectNotation(JSON),andalargevarietyofopensourcelibrariesforscientificcomputingandmachinelearning.
Pythonistheclearleaderinthisspace,probablybecauseitistheeasiesttolearnforuserswhocomefrombackgroundsotherthancomputerscience.
KnowledgeofPythonisoftenarequirementfordatascientistjobs.
SciPyandscikit-learnTheSciPypackageextendsPythonintothedomainofscientificprogramming.
Itsupportsvariousfunctions,includingparallelprogrammingtools,integration,ordinarydifferentialequationsolvers,andevenanextension(calledWeave)forincludingC/C++codewithinPythoncode.
RelatedtoSciPyisscikit-learn,whichisapackageforPython-basedmachinelearning.
Scikit-learnincludesmanyalgorithmsunderthemachine-learningumbrellaforsupervisedlearning(supportforvectormachines,naiveBayes),unsupervisedlearning(clusteringalgorithms),andotheralgorithmsfordata-setmanipulation.
developerWorksibm.
com/developerWorks/DatascienceandopensourcePage6of8BothofthesepackagesextendthecapabilitiesofPythonforuseasadatascienceplatform.
AxiisinteractivedatavisualizationManyopensourcesolutionsfocussolelyonvisualization.
OneespeciallyinterestingexampleistheAxiisframework,whichprovidesaconcisemarkuplanguageforrichandcolorfulvisualizations.
Figure4showsanexample:Figure4.
WedgestackgraphvisualizationusingtheAxiisframeworkFigure4isastaticversionofaninteractiveexamplefromTomGonzalez,ManagingDirectoratBrightPointConsulting.
SeeRelatedtopicsforalinktotheinteractiveversion.
GoingfurtherTheroleofdatascientistbuildsonasolidplatformofknowledgeandexperience.
Buttoolsarealsoanimportantaspectofthedatasciencefield.
Inemergingdisciplines,theopensourcecommunityisoftenatthevanguardinestablishingsoftwarewherenoneexistedbefore.
Thefieldofdatascienceisnoexception.
Datascienceisrelativelynew,somorenewtools,dataprotocols,anddataformatsarealmostcertainlyintheworks.
Butindatascience,asinmanyotherdisciplines,opensourcesolutionsalreadyleadinbreadthanddepth.
ibm.
com/developerWorks/developerWorksDatascienceandopensourcePage7of8RelatedtopicsGoogleInsightsforSearch:ThisGooglesiteenablesanyonetoviewsearchtrendsforatopicacrossregionsoftheworld,includingcomparativetrendsoftwoormoretopics.
Opendata:ReadaboutopendataonWikipedia.
"Whatisdatascience"(MikeLoukides,O'ReillyRadar,June2010):Readagreatintroductiontodatascienceandtheideabehindtransformingdataintoproducts.
"GrowingYourOwnDataScientists"(DanWoods,Forbes,March2012):Thearticleseriessurveysdefinitionsofdatascientistfromleadingexpertsinthefield.
HadoopondeveloperWorks:ExploreawealthofarticlesandotherresourcesonApacheHadoopanditsrelatedtechnologies.
"ApacheMahout:Scalablemachinelearningforeveryone"(GrantIngersoll,developerWorks,November2011):MahoutcommitterIngersolldescribesMahout'sfeaturesandwalksthroughanexampleofhowtodeployandscalesomeofMahout'smorepopularalgorithms.
"DatavisualizationtoolsforLinux"(M.
TimJones,developerWorks,November2006):ThisarticlepresentsseveralusefuldatavisualizationtoolsthatbearsomesimilaritytotheRProject.
Bigdata:Thenextfrontierforcompetition:ReadaboutresearchfromMcKinsey&Co.
andontheroleofbigdataanddatascientists.
Data.
gov:BrowsetheData.
govdatasetsavailablethroughtheonlinecatalogandusemultiplecriteriatofilteryoursearch.
Science.
gov:Thisportalprovidesaccesstomorethan55databasesand2,100websitesfrom13federalagenciesforUSgovernmentscienceinformation.
AsonData.
gov,youcanrestrictyoursearchesbysearchcriteriaorbyspecificagencies.
"ProcessyourdatawithApachePig"(M.
TimJones,developerWorks,February2012):LearnmoreaboutPigandhowtoputittoworkinyourapplications.
"Spark,analternativeforfastdataanalytics"(M.
TimJones,developerWorks,November2011):GettoknowtheSparkapproachtoclustercomputinganditsdifferencesfromHadoop.
ApacheHadoop:DownloadHadoop.
ApacheMahout:DownloadMahoutfromanApachemirror.
Spark:GetthelatestSparkrelease.
Rprogramminglanguage:GetR,amultiparadigmlanguageanddevelopmentenvironmentwithbroaduseinstatisticsandvisualizationPython,Ruby,andPerl:Simplifythedevelopmentandprototypingofalgorithmsfordatarefinementwiththesemultiparadigmscriptinglanguages.
SciPyandscikit-learn:UsePython'sdatasciencecapabilitieswiththeSciPypackageforscientificcomputingandthescikit-learnpackageformachinelearning.
Axiis:TheAxiisdatavisualizationframeworkisausefulsolutionforbothbeginnersandexperts.
Checkouttheexamplespagetoseewhat'spossiblewiththeframework,includingtheinteractiveversionofFigure4.
developerWorksibm.
com/developerWorks/DatascienceandopensourcePage8of8CopyrightIBMCorporation2013(www.
ibm.
com/legal/copytrade.
shtml)Trademarks(www.
ibm.
com/developerworks/ibm/trademarks/)

【IT狗】在线ping,在线tcping,路由追踪

IT狗为用户提供 在线ping、在线tcping、在线路由追踪、域名被墙检测、域名被污染检测 等实用工具。【工具地址】https://www.itdog.cn/【工具特色】1、目前同类网站中,在线ping 仅支持1次或少量次数的测试,无法客观的展现目标服务器一段时间的网络状况,IT狗Ping工具可持续的进行一段时间的ping测试,并生成更为直观的网络质量柱状图,让用户更容易掌握服务器在各地区、各线...

hypervmart:英国/荷兰vps,2核/3GB内存/25GB NVMe空间/不限流量/1Gbps端口/Hyper-V,$10.97/季

hypervmart怎么样?hypervmart是一家国外主机商,成立于2011年,提供虚拟主机、VPS等,vps基于Hyper-V 2012 R2,宣称不超售,支持linux和windows,有荷兰和英国2个数据中心,特色是1Gbps带宽、不限流量。现在配置提高,价格不变,性价比提高了很多。(数据中心不太清楚,按以前的记录,应该是欧洲),支持Paypal付款。点击进入:hypervmart官方网...

云基最高500G DDoS无视CC攻击(Yunbase),洛杉矶CN2GIA、国内外高防服务器

云基成立于2020年,目前主要提供高防海内外独立服务器用户,欢迎各类追求稳定和高防优质线路的用户。业务可选:洛杉矶CN2-GIA+高防(默认500G高防)、洛杉矶CN2-GIA(默认带50Gbps防御)、香港CN2-GIA高防(双向CN2GIA专线,突发带宽支持,15G-20G DDoS防御,无视CC)、国内高防服务器(广州移动、北京多线、石家庄BGP、保定联通、扬州BGP、厦门BGP、厦门电信、...

google统计为你推荐
界面winrar5投资者适当性客户端系统腾讯周鸿祎蓝牙itunesI:\Sam-research\QEF\Publications\Conferencelowercasecss支持ipad图书馆学、情报学期刊投稿指南css3圆角怎样用css实现圆角矩形?google中国地图求教谷歌中国地图~手机如何使用?
服务器租赁 西安域名注册 cn域名价格 花生壳免费域名 主机优惠码 linkcloud unsplash 好玩的桌面 京东商城双十一活动 美国十次啦服务器 cdn联盟 双十一秒杀 1g内存 卡巴斯基是免费的吗 阿里云官方网站 photobucket 重庆服务器 phpwind论坛 nic alertpay 更多