TheNewAlgorithmoftheItem-basedonMapReduceZHAOWei1,a1CollegesoftwareTechnologySchool,ZhengzhouUniversityZhengzhou450002,Chinaaiezhaowei@163.
comKeywords:RecommendationsystemparallelcomputingClusteringAbstract.
TraditionalcollaborativefilteringalgorithmbasedonitemandK-meansclusteringalgorithmarestudied,theparallelalgorithmofcollaborativefilteringItem-basedonMapReduceisproposedbyusingMapReduceprogrammingmodel.
Thealgorithmismainlydividedintotwosteps,onestepisK-Meansalgorithmclusteringforusers,anotherstepistheparallelItem-basedalgorithmforclusteringuserrecommendation.
Experimentalresultsshowthatthealgorithmhasobtainedverygoodeffect,improvedtherunningspeedandexecutionefficiency,theimprovedalgorithmismuchsuitableforprocessingbigdata.
IntroductionBigdatausuallyincludesdatasetswithsizesbeyondtheabilityofcommonlyusedsoftwaretoolstocapture,curate,manage,andprocessdatawithinatolerableelapsedtime.
Bigdataishighvolume,highvelocity,and/orhighvarietyinformationassetsthatrequirenewformsofprocessingtoenableenhanceddecisionmaking,insightdiscoveryandprocessoptimization.
Volumemeansbigdatadoesn'tsample;itjustobservesandtrackswhathappens;Velocitymeansbigdataisoftenavailableinreal-time;Varietymeansbigdatadrawsfromtext,images,audio,video;plusitcompletesmissingpiecesthroughdatafusion[1].
Therefore,thebigdatamustbethroughthecomputerstatistics,comparison,analysisofthedatacanbetheobjectiveresults.
Nowelectroniccommercesystemsofeverytransaction,everyinputandeverysearchcanasdata,datathroughthecomputersystemtodothescreening,sorting,analysis,sothattheanalysisresultsisnotonlyanobjectiveconclusion,moreabletohelpbusinessprovidedthedecision-makingofenterprisesandalsocollectedusefuldatacanalsobereasonableplanning,activelyguidethedevelopmentoflargerpowerconsumption,andmoreeffectivemarketingandpromotion.
Withtheincreasingamountofdataintheelectroniccommercesystem,theneedforalargenumberofdatadepthanalysisisincreasinglyurgent.
Therefore,theuseofasimpleandhighscalabilityoftheprogramfortheanalysisofproductrecommendationisparticularlyimportant.
Atpresentdomesticmanyecommercesitesusecollaborativefilteringalgorithm,suchasAmazon,Dangdang,collaborativefilteringalgorithmismainlydividedintobasedontheitemsofthecollaborativefilteringalgorithmanduserbasedcollaborativefilteringalgorithm.
Basedonitemsofcollaborativefilteringalgorithmistomeasurethesimilaritybetweenitemsaccordingtotheuser'spreferences,donotneedtoconsidertheitemspecificcontentfeatures,sothealgorithmismainlyusedine-commercerecommendationandmovierecommendationdomain,thealgorithmwhileinthefieldofelectroniccommercerecommendationhasbeenacertaindegreeofsuccess.
Butinmassivedataarerecommendedwhenthedataisrecommendedperformanceisnothighandthedatainformationlackofsharingandextendedtheleadtothehardwarerequirementscomparedhigherinherentshortcomingsmakeitdidnotreceiveapromotionandsupportofenterpriseelectroniccommerce[2].
SoifweuseMapReducetoachievedistributedparallelcomputing,itwillgreatlyimprovetheefficiencyandperformanceofthealgorithm,andpromotethefurtherdevelopmentofthealgorithm[3-4].
Basedontheitemsofthecollaborativefilteringalgorithmisaccordingtoitemsimilarityanduserhistoryaccessrecordrecommendedtotheusertogeneratealistofitems,buttherearesomesmallproblems,suchasdatasparsityproblemandwhenthemassofusersandthenumberofitems,theuserbehaviorandrecorddatawillgreatly,andthealgorithmforcomputingitemswithsimilarmatrixcostgreatly,algorithmefficiencyandperformancewillgreatlyreduce.
Aimingattheaboveproblems,theclusteringalgorithmhasalsobeenappliedtoacollaborativefilteringalgorithmbasedonitem,themassiveuserclusteringanalysis,soitcanavoidthequestioncarefully,foreachusertorecommendoperation.
Thefirstshoppinguserswithsimilarinterestsintoauserclass,withaclusterofuserrecommendedgoodsarethesame.
Thesecondistoreducethemassiveuserdimensionsbecomedozensofclusteringlimited,thetimecomplexityencounteredabottleneck,andtheparallelclusteringalgorithmusingMapReduceistheeffectivewaytosolvethebottleneck[5].
MapReduceisadistributedprogrammingmodelframeworkonHadoopplatform,intheconditionofnotfamiliarwiththeunderlyingdetailsofthedistributedimplementationoftheimplementationoftheprogram[6].
TheMapReduceasparallelcomputingprogrammingmodel,firstofalltousersofMapReducebasedparallelclusteringandaccordingtotheresultsofuserclustering,ineveryuserclassusingtheMapReduceparallelcollaborativefilteringrecommendation,eventuallygiveusersareasonablepersonalizedcommodityrecommendationlist.
Therunningtimeofdifferentnodesinthequantitativedataiscomparedwiththenewalgorithm.
Theresultsshowthatthedataprocessingperformanceoftheproposedalgorithmisgreatlyimproved.
TheprincipleofMapReduceprogrammingmodelMapReduceisinHadoopplatformbyusingparallelcomputingprogrammingmodel,thetechniqueisproposedbyGoogleforatypicaldistributedparallelprogrammingmodel,theuserintheMapReducemodeldevelopthemapandreducefunctions,canrealizetheparallelprocessing.
Mapwillberesponsiblefordatadispersion,Reduceisresponsiblefordataaggregation.
UsersonlyneedtoachieveMapandReducetwointerface,youcancompletethecalculationofTBleveldata.
BecauseoftheMapReducemodel,thedetailsoftheparallelandfault-tolerantprocessingareencapsulated,whichmakesprogrammingveryeasytoimplement.
MapReduceparallelcalculationisdividedintotwoparts,thefirststepisinitializingtheoriginalinputdatafileandthedatasetisdividedintoapluralityofacertainsizeofdatablock,facilitateparallelcomputing;thesecondstepistostartthemapandreducefunctionsalgorithmofparallelcomputing,finallyproducedthefinalresult.
Figure1ParallelflowchartofMapReduceKeytechnologyresearchandImplementation1.
ThebasicideaofthetraditionalcollaborativefilteringalgorithmbasedonItem-basedThetraditionalbasedonitemsofcollaborativefilteringalgorithmthebasicideaisdividedintothreeparts,thefirstpartistocomputethesimilaritybetweenitems,commonsimilaritycalculationmethodwithcosinesimilarity,Pearsoncorrelationcoefficient,Tanmotocoefficientcorrelationof.
ThispaperselectstheEuclideansimilarityalgorithm,asfollows:TheassumptionisthatthereisavectorXandavectorY:X=(1x,2x,3x),Y=(1y,2y,3y),UsingtheEuclideansimilarityalgorithmtocalculatethesimilaritybetweenXandYSvector(x,y)formulaisasfollows[7]:1(,)1(,)Sxydxy=+(1)Where(,)dxyisthedistancebetweenthevectorXandY,thecalculationformulaisasfollows:222231123(dxyxyyyxx2)Thesecondpartistocalculatetheuserratingsmatrixontheitemsofthegoodsaccordingtothesimilaritymatrix;thethirdpartistheitemsimilaritymatrixWandtheusersoftheitemscorematrixmultiplicationtoobtaintherecommendationresults.
TraditionalItem-Basedcollaborativefilteringrecommendationalgorithmbasedonitemisthestagethataffectstheperformanceofthealgorithm.
Ifthenumberofusersisn,thenumberofcommodityitemsism,thetimecomplexityoffindingalltheitemsinthenprojectisO(2m),thetotalsearchspaceisnusers,sothetimecomplexityofcomputingsimilarityisO(2nm).
Sowhencalculatingthesimilaritymatrixofitems,itisindependentofthesimilaritybetweenthecalculatedandtheotherpairofitemstoaproject,soitispossibletocalculatethesimilaritymatrix.
2.
AnewalgorithmofItem-basedbasedonMapReduceThenewalgorithmismainlydividedintotwosteps;thefirststepistheMapReduceimplementationofK-Meansalgorithmbasedonclusteringofusers.
ThesecondstepistoachievetheparallelrecommendationalgorithmofItem-basedonMapReduce,theproductofuserclusteringrecommendation.
2.
1ThenewalgorithmK-MeansbasedonMapReduceThebasicideaofthetraditionalK-meansclusteringalgorithm:fromMdataobjectsinarbitrarychoiceofKobjectsastheinitialclustercenters;fortherestoftheotherobjects,accordingtotheirdistanceandtheclustercenters,respectively,theyallocatedtoitsmostsimilarclustering;thencalculateeachreceivedanewclusteringalgorithmclusteringcenter;keeprepeatingtheprocessuntilnochangesinacore.
Inthek-meansalgorithmtocalculatethedistancebetweendataobjectsandclustercentersisthemosttime-consumingoperation.
ThedataobjectandKclustercenterdistancecomparisonatthesametime,datafromotherobjectscanalsobecomparedwiththeKdistanceofthecenterofcluster,sotheoperationcanbeparallelized[8]BasedonMapReduceparallelimplementationofK-meansalgorithmcanimprovethespeedoftheclusteringalgorithm,isdividedintothreesteps:thefirststep:themapfunction,foreverypointcalculationrecentlythecenterdistanceandthecorrespondingtothenearestclustercenter.
Thesecondstep:Combinefunction,justcompletedtheMapmachineonthemachinearecompletedwiththesamepointoftheclusterpointofsummation,reducetheamountofcommunicationandcomputationofReduceoperation.
ThisstepisthekeytotheuseofCombinefunctiononthemachineonthefirstofthesameclustermerge,reducedtotheReducefunctionofthetransferandtheamountofcomputation.
Thethirdstep:theReducefunction,theintermediatedataofeachclustercenterwillbeformedandthenewclustercentercanbeobtained.
Eachiterationisrepeatedonthethreestep.
Figure2ParallelFlowChartofK-meansAlgorithmbasedonMapReduce2.
2thecollaborativefilteringalgorithmbasedonMapReduceforparallelimplementationofItem-basedBasedonthesimilaritycalculationformulamentionedabove(1),thispaperpresentsacollaborativefilteringrecommendationalgorithmbasedonMapReduce.
Algorithm1ThecollaborativefilteringrecommendationalgorithmbasedonMapReduceINPUT:Userinformationfile,Iteminformationfile,IntendeduserOUTPUT:IntendeduserrecommendedlistTheprocessisasfollows:Step1:Transformingtheuservectorintoanitemvector;Step2:Parallelcalculationofthesimilaritybetweenitems;thecalculationofthesimilaritybetweenitemsaccordingtotheformula(2)tocalculate;Step3:Similaritymatrixofparallelcomputingobjects;Step4:Parallelcomputinguserratingmatrix;inthecalculationoftheuser'sscoringmatrix,iftheuserisnotontheitemstoomuch,thenthedefaultscoreis1;Step5:Theresultsobtainedbythemultiplicationofthesimilaritymatrixofparallelcomputingobjectsandtheuser'sscorematrixarerecommended.
Experimentalresultanalysis1.
experimentalenvironmentThesimulationexperimentusingVMware_Workstation_10.
0.
3,virtualizationsoftwaretovirtualHadoopcloudplatform.
EightvirtualmachinesareinstalledonthevirtualHadoopcloudplatform,andaHadoopclusterenvironmentisbuiltontheseeightvirtualmachines.
OneofthevirtualmachineasagoodJobTrackernodeNameNode,theothersevenvirtualmachinesdeployedTaskTrackerandDataNode.
Thesemachinesareinthesamelocalareanetwork.
Theexperimentuseseightsetsofvirtualmachinehardwareconfigurationandsoftwareconfigurationasshownintable1:Table1HadoopClusterConfigurationOSCentos6.
4JDKVersion1.
6.
0Hadoop1.
1.
2HardWare2GRAM100GHardDisk2.
ExperimentandanalysisBasedonMapReduceparallelimplementationofItem-basedcollaborativefilteringalgorithminparallelmodeexpansionrateperformancecomparisontest,selectthesizeofthedataset,respectively,intheefficiencyof1-8nodesrunning.
Theexperimentalresultsareshownbelow:Figure3PerformanceTestChartFigure3isbasedonMapReduceparallelimplementationofitembasedcollaborativefilteringalgorithmcantestchart,theXaxisisthenumberofclients,they-axisistheresponsetimeofthesystem.
TheexperimentalresultsshowthatbasedonMapReduceparallelimplementationofitembasedcollaborativefilteringalgorithmperformancecomparedtothetraditionalrecommendationalgorithmissignificantlyimproved.
ConclusionInthispaper,anewalgorithmofcollaborativefilteringalgorithmbasedonMapReduceisproposed.
Theexperimentresultsshowthatthenewalgorithmhashighefficiencyandcanachievehighperformanceatalowcost.
Butinthispaper,theuserclusteringiscompletedonthebasisoftheuserwithasmallnumberofattributes,forhighdimensionalattributesoftheusergroups,butalsotodofurtherresearch.
Inadditiontothenewalgorithminthispaperhasbeenputforward,wewillcontinuetoimprovetheexperimentalmethod,andconstantlyimprovetheaccuracyoftherecommendationalgorithm.
References[1]Chenruming,Challenges,valuesandcopingstrategiesintheeraofbigdata[J].
MobileCommunications.
2012(17):14-15.
[2]SunLingfang,ZhangJing.
ElectronicrecommendationmechanismbasedonRFMmodelandcollaborativefiltering[J].
JournalofJiangsuUniversityofScienceandTechnology(NaturalScienceEdition).
2010,24(3):285-289.
[3]LIGai,PANRong.
etCollaborativefilteringalgorithmparallelizeresearchbasedonlargedatasetsa[J].
ComputerEngineeringandDesign,2012,33(6):2437-2441.
[4]LIWenhai;XUShuren;DesignandimplementationofrecommendationsystemforE-commerceonHadoop[J].
ComputerEngineeringandDesign,2014(35):131-136.
[5]SUNTianhao,LIAnnenget.
ResearchonDistributedCollaborativeFilteringRecommendationAlgorithmBasedonHadoop[J].
ComputerEngineeringandApplications,2014,51(15):124:128[6]XieXuelian,LiLanyou.
ResearchonParallelK-meansAlgorithmBasedonCloundComputingPlatform[J].
ComputerMeasurement&Control,2014,22(5):1510-1512.
[7]YanCun,JiGenlin.
DesignandImplementationofItem-BasedParallelCollaborativeFilteringAlgorithm[J].
JOURNALOFNANJINGNORMALUNIVERSITY(NaturalScienceEdition),2014,37(1):71-75.
[8]WAGNFei,QinXiaolin.
Algorithmfork-meansBasedonDataStreaminCloudComputing[J].
ComputerScience,2015,42(11):235:239.
IntoVPS是成立于2004年的Hosterion SRL旗下于2009年推出的无管理型VPS主机品牌,商家提供基于OpenStack构建的VPS产品,支持小时计费是他的一大特色,VPS可选数据中心包括美国弗里蒙特、达拉斯、英国伦敦、荷兰和罗马尼亚等6个地区机房。商家VPS主机基于KVM架构,最低每小时0.0075美元起($5/月)。下面列出几款VPS主机配置信息。CPU:1core内存:2GB...
这次RackNerd商家提供的美国大硬盘独立服务器,数据中心位于洛杉矶multacom,可选Windows、Linux镜像系统,默认内存是64GB,也可升级至128GB内存,而且硬盘采用的是256G SSD系统盘+10个16TSAS数据盘,端口提供的是1Gbps带宽,每月提供200TB,且包含5个IPv4,如果有需要更多IP,也可以升级增加。CPU核心内存硬盘流量带宽价格选择2XE5-2640V2...
Megalayer 商家算是新晋的服务商,商家才开始的时候主要是以香港、美国独立服务器。后来有新增菲律宾机房,包括有VPS云服务器、独立服务器、站群服务器等产品。线路上有CN2优化带宽、全向带宽和国际带宽,这里有看到商家的特价方案有增加至9个,之前是四个的。在这篇文章中,我来整理看看。第一、香港服务器系列这里香港服务器会根据带宽的不同区别。我这里将香港机房的都整理到一个系列里。核心内存硬盘IP带宽...
centos6.0为你推荐
www.kkk.comwww.kkk103.com网站产品质量有保证吗曲妙玲张婉悠香艳版《白蛇传》是电影还是写真集?www.99cycy.com谁在这个http://www.sifangmall.com网站上买过东西?haole16.com国色天香16 17全集高清在线观看 国色天香qvod快播迅雷下载地址www.03ggg.comwww.tvb33.com这里好像有中国性戏观看吧??dadi.tv1223tv影院首页地址是什么?1223tv影院在哪里可以找到?4399宠物连连看2.5我怎么找不到QQ里面的宠物连连看呢邯郸纠风网邯郸市信访局地址恋战千年“一眼千年”是什么意思www.finnciti.comfinnciti理财可靠吗
网站域名 备案未注册域名 广东服务器租用 便宜vps 域名主机基地 域名解析文件 wordpress主机 美国主机评论 2017年万圣节 dd444 怎么测试下载速度 129邮箱 免费活动 网通服务器托管 in域名 yundun 免费的asp空间 个人免费邮箱 畅行云 镇江高防 更多