Wikidata:AplatformfordataintegrationanddisseminationforthelifesciencesandbeyondElviraMitraka1,AndraWaagmeester2,SebastianBurgstaller-Muehlbacher3,LynnM.
Schriml1,AndrewI.
Su3,BenjaminM.
Good3UniversityofMarylandSchoolofMedicine,Baltimore,USA{emitraka,lschriml}@som.
umaryland.
eduMicelio,Antwerp,Belgiumandra@micelio.
beDepartmentofMolecularandExperimentalMedicine,ScrippsResearchInstitute,LaJolla,USA{sburgs,asu,bgood}@scripps.
eduAbstract.
Wikidataisanopen,SemanticWeb-compatibledatabasethatanyonecanedit.
This'datacommons'providesstructureddataforWikipediaarticlesandotherapplications.
EveryarticleonWikipediahasahyperlinktoaneditableiteminthisdatabase.
Thisuniqueconnectiontotheworld'slargestcommunityofvolunteerknowledgeeditorscouldhelpmakeWikidataakeyhubwithinthegreaterSemanticWeb.
Thelifesciences,asever,facescrucialchallengesindisseminatingandintegratingknowledge.
OurgroupisaddressingtheseissuesbypopulatingWikidatawiththeseedsofafoundationalsemanticnetworklink-inggenes,drugsanddiseases.
Usingthiscontent,weareenhancingWikipediaarticlestobothincreasetheirqualityandrecruithumaneditorstoexpandandimprovetheunderlyingdata.
Weencouragethecommunitytojoinusaswecollaborativelycreatewhatcanbecomethemostusedandmostcentralseman-ticdataresourceforthelifesciencesandbeyond.
Keywords:Wikidata,Wikipedia,LinkedData,SemanticWeb,Crowdsourcing,KnowledgeManagement1StoneDataSoupIntheStoneSoupfolktale[1],agroupofhungrytravelersarriveinavillagewithitsinhabitantsunwillingtosharetheirfood.
Withakettleofwaterandastonethetravelersmanagetotouchthecuriosityofthevillagers.
Thecuriosityfinallyspawnsacollaborativeefforttomakeagreatsoup.
Thisstoryisnowadaysusedtoexpressthepowerofcrowdsourcingandcollaborativeprojects[2],suchasWikipedia,wheremanyindividualseachmakesmallcontributionsbutcollectivelyproducesomethinglargerthanthesumofitsparts.
WikidataextendsthiscollaborativemodeltotheWebofdata[3].
InthisarticlewewilldescribeWikidataandthewaysthatthisopenpublicplatformcantakeacentralroleindatasharingandmanagementforthelifesciencecommunity.
2WikidataandWikipediaWikipediaisamongthemostvisitedsitesontheInternet.
Articlesaboutmedicaltopicswereviewedmorethan4.
88billiontimesin2013,anumberonparwithhttp://nih.
govandsignificantlygreaterthanWebMD[4].
Thisincrediblyimportantresource,createdthroughvolunteerlabor,isnowtightlycoupledtoWikidata-anopen,SemanticWeb-compatibledatabasethatanyonecanedit[3].
Wikipediainfoboxes-thetablesofdataoftenappearingontherightsideofarticles-cannowrendercontentstoredinWikidataandeachWikipediaarticlenowhasadirectlinktothecorrespondingWikidataitem,thusencouragingthecollaborativeeditingofthedata(Fig.
1).
Fig.
1.
Wikidataprovidesacentralizedresourceforstructureddata.
Applicationsincluding,butnotlimitedto,WikipediacannowreadandwritetoWikidata.
Infoboxesprovidethebridgebetweenmachine-readablestructureddataandtheunstructuredtextthatformsthemainbodyofeacharticle.
Since2008,theGeneWikiprojecthasautomaticallycreatedandmaintainedtheinfoboxesforaround10000articlesabouthumangenes[5].
Now,thisinitiativeisfocusedongeneratingafoundationofbiomedicalknowledgeinWikidatathatwillbeusedtoimproveinfoboxcontentonWikipediaandhelpdrivenewapplications.
Todate,wehaveloadedWikidatawithitemsabout:56451humanand73086mousegenesfromNCBIGene[6],6562conceptsintheDiseaseOntology[7],and1830FDA-approveddrugs.
ThisinitialdataloadgeneratedWikidataitemsforthesekeybiomedicalconcepts,mappedthemtoWikipediaarticlesandlinkedthemtothecorrespondingidentifiersinauthori-tativepublicdatabases.
Theidentifier-levelconnectionstothesourcedatabasesen-surethatWikidatacontentcanbeeasilyintegratedintotheexistingWebofbiomedi-caldata.
Moreover,theprovenanceofallWikidataclaimscanbeassessedthroughinspectionofthesupportingreferences.
Thedataiskeptuptodatebyperiodicallyrunning'bots'thatpropagatechangesfromauthoritativesourcestoWikidata.
WhenconflictsarisefromhumaneditstoWikidataitems,theseareflaggedformanualre-view.
Thenextphaseoftheprojectwillstitchtheseconceptsintoarichlyintercon-nectedsemanticnetwork.
3Takingasipofthedatasoup–WikidataandtheSemanticWebThefirstapplicationtouseWikidataextensivelyisWikipediabutthiscouldbethetipoftheiceberg.
TogiveapreviewofwhatWikidatacouldbecome,it'suse-fultobrieflyexamineitsclosestancestor,DBpedia.
TheDBpediaprojectminescon-tentfromWikipediabyparsinginfoboxes,mapsthiscontenttotheirownontology,andprovidesaccesstothisdataintheformofalargeRDFdatabaseavailablebothforbulkdownloadandSPARQLquery.
Whileenablinginterestingqueriesonitsown,itsmostimportantfunctionisasagloballinkinghubfortheSemanticWeb[8].
IncomparisontoDBpedia,Wikidatahasanumberofadvantages.
First,itcanbeediteddirectlyandchangesarereflectedinrealtime.
Second,itdoesnotrequireanyparsingbecausealldataismanagedinadatabasefromtheoutset.
Third,itcontainslargeamountsofcontentthatisnotpresentinWikipedia,suchasitemsforeverymousegene.
Finally,itsqueryAPIsupportsnotonlyqueriesalongitsassertedknowledgegraph,butalsoalongreferences,qualifiersandevenedithistories.
Theseadditionalcapabilities,viewedinlightofthesuccessoftheDBpediaproject,portendavitalfutureforWikidatainthecontextoftheSemanticWeb.
Withinthebiomedicaldomain,usefulqueriesarealreadypossibleasaresultofthe'single-pot'natureofWikidata.
Forexample,itispossibletouseWikidata'sSPARQLendpoint(https://query.
wikidata.
org/)toanswerquestionssuchas"whatclinicallyrelevantdrug-druginteractionsareknownforthedrugmethadone(CHEMBL651)"[9].
Importantly,thedatausedtoanswerthisquerycamefromtwogroupsworkingcompletelyindependently.
Our'drug_bot'botaddedtheCHEMBLidentifiers(aswellasmanyotheridentifiers)whileanotherbotdevelopedbyateamattheMedicalUniversityofViennaaddedthedrug-druginteractions[10].
Thishap-penedwithoutanydirectcoordinationbetweenourgroups.
Thiskindofserendipitous,automatic,cross-continentaldataintegrationistheprimarygoaloftheSemanticWeb,butisnotyetcommonplace.
ThekeybeautyandmainchallengeoftheSemanticWebisitsdistributednature.
InorderforthiskindofintegrationtohappenintheabsenceofacentralizedresourcelikeWikidata,severalmajorhurdleswouldneedtobeleaped.
First,bothteamswouldneedtoknowenoughaboutthefairlycomplexstackofsemantictechnologiestoprovidetheirdataasRDFthroughastable,publicSPARQLendpoint.
Second,theywouldhavetoworkwithoverlappingidentifiersystems.
Third,thewould-beconsumeroftheirdatawouldneedtodiscoverbothoftheirendpointsandbesophisticatedenoughwithSPARQLtoidentifyandissuetheappropriatedistributedquery.
Allofthisispossi-bleandcanwork,butitisnoteasy.
Byintegratingdatainacentralized,singlecommunitypot,Wikidatapro-videsaplatformthataddresseseachoftheseproblems.
DataprovidersdonothavetosetupandmaintaintheirownSPARQLendpoint–achallengethatveryfewteamshavesucceededatdoingforanylengthoftime[11].
Byvirtueofworkinginthesamedatabase,itisfarlesslikely-thoughnotimpossible-forindependentteamstogener-ateandpublishdifferentidentifiers,asthefirststepinworkingwithWikidataistoqueryittoseewhatisalreadythere.
Finally,thechallengeoffindingarelevantend-pointisnegatedwhenthereisonlyone.
NotethatWikidatacanbequeriedusingSPARQLortheWikidataQueryLanguage[12].
4ManyCooks.
.
.
ThefactthatWikidataisonecentralized,communityresourceimmediatelysurfacesthechallengesincurredinanycollaborativeontologydevelopmentpro-cess.
InWikidata,the'ontology'correspondstoitscollectionoflinkingpropertiesusedtodescribeitems.
AnewpropertyinWikidatahastobeproposedforcommuni-tydiscussionandisonlycreatedafteraconsensusregardingthevalueofthepropertyanditsrelationtoexistingpropertieshasbeenestablished.
Forthoseusedtocontrol-lingtheirowndataanddatamodels,thisprocesscanfeeltedious.
Butthissamefun-damentalprocessmustbeundertakeninanyattemptatdataintegration.
Thefactthatithappensupfront,whendataisfirstbeingloaded,shouldhelptokeepthedatacon-sistentandreducethedownstreamidentifierandontologicalmappingproblemsthatcontinuetoplaguebioinformatics.
ImaginethepowerofcombiningthestructureddatainWikidata,thehighaccessibilityanddedicatedcommunityofWikipediaandtheknowledgeofthescien-tificcommunity.
Contemplatefurtherthatallofthisdataisfreelyavailableandac-cessiblethroughastablequeryinterfaceandrobust,read/writeAPI.
Thismakesim-portant,high-qualityinformationeasilyaccessiblebyanyoneandopensupscientificknowledgeforpublicscrutiny.
Further,thebuilt-inprovenancetrackingcanprovidedetailedchainsofevidencetosupportorrefuteeachclaimandallofthiscanbedis-cussedusingthemanysocialtools,suchas'talkpages'foreverydataitem,bakedintotheMediaWikiinfrastructure.
Asidefromcreatingusefulwaystodisseminatedata,thissociotechnicalstructureprovidesaframeworkforthebroadcommunitytobroadcastfeedbackbacktotheoriginaldataowners.
Evenatthisearlystageofthisproject,thisprocesshasalreadyledtoimprovementsinsourcedata.
Forexample,intheDiseaseOntologytheterm'Ollierdisease'hadthesynonym'Maffuccisyndrome'.
UponimportingtheDiseaseOntologyintoWikidata,membersoftheWikidatacommunitypointedoutthatthetwoterms,thoughputativesynonyms,linkedtotwodifferentextantWikidataitems.
Uponcloserreviewitwasdeterminedthatthesetwotermsrepresenttwodif-ferent,albeitcloselyrelated,diseases,leadingtothecreationofanewtermintheDiseaseOntology.
AsWikidataexpandsitistobeexpectedthatadditionaldiffer-encesinrepresentationbetweenitandotherknowledgeresourceswillsurface.
ThesewillfirstbetriagedbytheWikidatacommunitytocheckforerrorsand,ifconsensusisachievedthatthereisanerrorintheoriginalsource,thiswillberelayedforconsid-eration.
Inthisway,theWikidatacommunitycanbecomethe'manyeyes'thatmakeallontologybugsshallow.
5.
.
.
CanMakeaDeliciousSoupWecancreateapowerfulcommonsofbiomedicalknowledgebybuildingonestablishedresourcesandthededicatedcommunitytoconnectgenes,proteins,drugs,diseases,phenotypesandsymptoms.
WikipediawillbethefirstapplicationtousethecontentinWikidata,butcertainlynotthelast.
Thefireisreadyandthepotisstartingtoheatup.
Somevillagersarealreadypeekingoutoftheirwindowsreadytojoinusaroundthepot,butitwilltaketheeffortofthewholecommunitytomakeadeliciousbiomedicaldatasoup.
Weinviteyoutojoinusinthiseffort.
References1.
HistoryoftheStoneSoupStoryfrom1720tonow.
Availablefrom:http://www.
stonesoup.
com/history-of-the-stone-soup-story-from-1720-to-now/.
2.
Taylor.
J.
TheStoneSoupofData.
20078May;Availablefrom:https://km.
aifb.
kit.
edu/ws/ckc2007/StoneSoup-www2007.
pdf.
3.
Vrandei,D.
andM.
Krtzsch,Wikidata:AFreeCollaborativeKnowledgebase,inCommunicationsoftheACM.
2014,ACM.
p.
78-85.
4.
Heilman,J.
M.
andA.
G.
West,Wikipediaandmedicine:quantifyingreadership,editors,andthesignificanceofnaturallanguage.
JMedInternetRes,2015.
17(3):p.
e62.
5.
Huss,J.
W.
,3rd,etal.
,Agenewikiforcommunityannotationofgenefunction.
PLoSBiol,2008.
6(7):p.
e175.
6.
Brown,G.
R.
,etal.
,Gene:agene-centeredinformationresourceatNCBI.
NucleicAcidsRes,2015.
43(Databaseissue):p.
D36-42.
7.
Kibbe,W.
A.
,etal.
,DiseaseOntology2015update:anexpandedandupdateddatabaseofhumandiseasesforlinkingbiomedicalknowledgethroughdiseasedata.
NucleicAcidsRes,2015.
43(Databaseissue):p.
D1071-8.
8.
Bizer,C.
,etal.
,DBpedia-AcrystallizationpointfortheWebofData.
WebSemantics:Science,ServicesandAgentsontheWorldWideWeb,2009.
7(3):p.
154-165.
9.
Getallthedrug-druginteractionsforMethadonebasedonitsCHEMBLidCHEMBL651.
2015[cited2015Sep.
14];Availablefrom:https://bitbucket.
org/sulab/wikidatasparqlexamples/overview#markdown-header-get-all-the-drug-drug-interactions-for-methadone-based-on-its-chembl-id-chembl651.
10.
Pfundner,A.
,etal.
,UtilizingtheWikidatasystemtoimprovethequalityofmedicalcontentinWikipediaindiverselanguages:apilotstudy.
JMedInternetRes,2015.
17(5):p.
e110.
11.
Buil-Arand,C.
,etal.
SPARQLWeb-QueryingInfrastructure:ReadyforActionin12thInternationalSemanticWebConference.
2013.
Sydney,Australia.
12.
WikidataQueryEditor.
[cited2015;Availablefrom:https://wdq.
wmflabs.
org/wdq/.
萤光云怎么样?萤光云是一家国人云厂商,总部位于福建福州。其成立于2002年,主打高防云服务器产品,主要提供福州、北京、上海BGP和香港CN2节点。萤光云的高防云服务器自带50G防御,适合高防建站、游戏高防等业务。目前萤光云推出北京云服务器优惠活动,机房为北京BGP机房,购买北京云服务器可享受6.5折优惠+51元代金券(折扣和代金券可叠加使用)。活动期间还支持申请免费试用,需提交工单开通免费试用体验...
今天上午有网友在群里聊到是不是有新注册域名的海外域名商家的优惠活动。如果我们并非一定要在国外注册域名的话,最近年中促销期间,国内的服务商优惠力度还是比较大的,以前我们可能较多选择海外域名商家注册域名在于海外商家便宜,如今这几年国内的商家价格也不贵的。比如在前一段时间有分享到几个商家的年中活动:1、DNSPOD域名欢购活动 - 提供域名抢购活动、DNS解析折扣、SSL证书活动2、难得再次关注新网商家...
大硬盘服务器、存储服务器、Chia矿机。RackNerd,2019年末成立的商家,主要提供各类KVM VPS主机、独立服务器和站群服务器等。当前RackNerd正在促销旗下几款美国大硬盘服务器,位于洛杉矶multacom数据中心,亚洲优化线路,非常适合存储、数据备份等应用场景,双路e5-2640v2,64G内存,56G SSD系统盘,160T SAS数据盘,流量是每月200T,1Gbps带宽,配5...
mediawiki为你推荐
thinksnsthinksns 好用吗?靠谱吗http500ZTCS500在哪能下载手机QQ?平阴县教育和体育局下属锦东小学教学设备采购项目竞争性磋商文件yixingjia合家欢是一种什么东西?piaonimai这位主播叫什么银花珠树晓来看下雪喝酒的诗句易名网诚询,易名网注册的域名怎么转到喜欢的网页上啊?三五互联股票三五互联是什么股票什么是通配符什么是介母免费代理加盟怎么开免费的代理网店
fc2最新域名 中文域名注册 重庆vps租用 vps优惠码cnyvps 国外免费域名网站 泛域名解析 踢楼 z.com mediafire下载 哈喽图床 一点优惠网 40g硬盘 韩国名字大全 idc资讯 免费吧 raid10 免费的域名 云服务器比较 服务器防火墙 域名转入 更多