SearchingforCommonSense:PopulatingCycfromtheWebCynthiaMatuszek,MichaelWitbrock,RobertC.
Kahlert,JohnCabral,DaveSchneider,PurveshShah,DougLenatCycorp,Inc.
3721ExecutiveCenterDrive,Suite100,Austin,TX78731{cynthia,witbrock,rck,jcabral,daves,shah,lenat}@cyc.
comAbstractTheCycprojectispredicatedontheideathateffectivemachinelearningdependsonhavingacoreofknowl-edgethatprovidesacontextfornovellearnedinforma-tion–whatisknowninformallyas"commonsense.
"Overthelasttwentyyears,asufficientcoreofcommonsenseknowledgehasbeenenteredintoCyctoallowittobegineffectivelyandflexiblysupportingitsmostimportanttask:increasingitsownstoreofworldknowledge.
Inthispaper,wepresentinitialworkonamethodofusingacombinationofCycandtheWorldWideWeb,accessedviaGoogle,toassistinenteringknowledgeintoCyc.
Thelong-termgoalisautomatingtheprocessofbuildingaconsistent,formalizedrepre-sentationoftheworldintheCycknowledgebaseviamachinelearning.
Wepresentpreliminaryresultsofthisworkanddescribehowweexpecttheknowledgeacqui-sitionprocesstobecomemoreaccurate,faster,andmoreautomatedinthefuture.
1IntroductionTheideaofbuildingaverylarge-scaleknowledgebasethatcanbeusedasafoundationforautomatedknowledgeacqui-sitionhasbeenpresentinartificialintelligenceresearchformorethantwentyyears[Lenatetal.
,1983].
Inthattime,anenormousamountofprogresshasbeenmade[Thrunetal.
,1998];techniquesdevelopedundertheumbrellaofmachinelearninghavebeensuccessfullyappliedtoworkrangingfromrobotics,tovoicerecognition,tobioinformatics.
Inallofthesefields,theuseofpreexistingknowledgeiswide-spread.
Muchofthisworkreliesoneitherprogramminganinductivebiasintoalearningsystem(e.
g.
,insystemslikeAM[Lenat,1976]);oronprovidinganinductivebiasintheformoftrainingexamples[Brown,1996].
Alsointhattime,theWebhasemergedasahugereposi-toryofelectronicallyavailableknowledge,andindexingsystemssuchasGooglehavemadethatknowledgeprogres-sivelymoreaccessible[BrinandPage,1998].
Workthatreliesonthewebingeneral,andGoogleinparticular,forinformationextractionisprovingtobeafertileresearcharea[Ghani,2000;Kwoketal.
2001;Etzionietal.
2004].
ThepurposeoftheCycprojectistoprovidecomputerswithastoreofformallyrepresented"commonsense":realworldknowledgethatcanprovideabasisforadditionalknowledgetobegatheredandinterpretedautomatically[Lenat,1995].
Inthelasttwentyyears,overthreemillionfactsandruleshavebeenformallyrepresentedintheCycknowledgebasebyontologistsskilledinCycL,Cyc'sfor-malrepresentationlanguage.
Toolshavebeendevelopedwhichallowsubjectmatterexpertstocontributedirectly[Pantonetal.
,2002;Witbrocketal.
,2003;Belascoetal.
,2004].
Inaddition,naturallanguagegenerationandparsingcapabilitieshavebeendevelopedtoprovidesupportforlearningfromEnglishcorpora[Witbrocketal.
,2004].
Asaresult,theCycknowledgebasenowcontainsenoughknowledgetosupportexperimentationwiththeacquisitionofadditionalknowledgeviamachinelearning.
Inthispaper,wedescribeamethodforgatheringandverifyingfactsfromtheWorldWideWeb.
Theknowledgeacquisitionprocedureisdescribedatbothanoverviewlevelandindetail.
Theworkfocusesonthreenovelapproaches:usingknowledgealreadyintheCycKBtofocustheacquisitionoffurtherknowledge;representingacquiredknowledgeintheknowl-edgebase;andusingGoogleintwodistinctways,tofindfactsand,separately,toverifythem.
Whilethisresearchisatanearlystage,theinitialresultsarepromisingintermsofboththeacquisitionspeedandqualityofresults.
Eveninitspreliminaryform,themecha-nismdescribedisausefultoolforreducingthecostofmanuallyenteringknowledgeintoCyc;thelevelofexper-tiserequiredtoenableapersontocontributetotheKBisreduced,andmanyofthenecessarystepsarehandledauto-matically,reducingthetotaltimerequired.
Thenumberofsentencesthatcanbeacquiredinthiswayandreviewedforaccuracybyanuntrainedrevieweroutstripstherateatwhichsentencescanbehand-authoredbyatrainedontolo-gist,andtheverificationstepsreducetheamountofworkrequiredofahumanreviewerbyapproximately90%.
2CycandCycLTheCycsystemismadeupofthreedistinctcomponents,allofwhicharecrucialtothemachinelearningprocess:theknowledgebase(KB),theinferenceengine,andthenaturallanguagesystem.
TheCycKBcontainsmorethan3.
2mil-lionassertions(factsandrules)describingmorethan280,000concepts,includingmorethan12,000concept-AAAI-05/1430interrelatingpredicates.
FactsstoredintheCycKBmaybeatomic(makingthemGroundAtomicFormul,orGAFs),ortheymaybecomplexsentences.
Dependingonthepredi-cateused,aGAFcandescribeinstance-levelortype-levelknowledge.
AllinformationintheKBisassertedintoahi-erarchicalgraphofmicrotheories,orreasoningcontexts[Guha,1991;Lenat,1998].
CycLqueriesaresyntacticallylegalCycLsentences,whichmaybepartiallybound(thatis,containoneormorevariables).
TheCycinferenceengineisresponsibleforusinginformationintheKBtodeterminethetruthofasentenceand,ifnecessary,findprovablycorrectvariablebindings.
Sampleinstance-levelGAF:(foundingDateCyc(YearFn1985))Sampleentity-to-typeandtype-to-typeGAFS:(sellsProductTypeSaudiAramcoPetroleumProduct)(conditionAffectsPartTypeCutaneousAnthraxSkin)Figure1:Learningisaprocessofselectinginterestingquestions,searchingforthatinformationontheweb,pars-ingtheresults,performingverificationandconsistencycheckswiththedocumentcorpusandtheKB,reviewing,andassertingthatknowledgeintotheKB.
Samplenon-atomicsentence:(or(foundingDateAlQaida(YearFn1987)))(foundingDateAlQaida(YearFn1988)))Samplequery:3.
Parsingresults:Therelevantcomponentsofsentencesareidentifiedbytheirlocationrelativetothesearchstring.
ThetermsarethenparsedintoCycLviathenaturallan-guageparsingprocessdescribedinsection3.
3,resultinginoneormoreGAFssuchas:(foundingAgentPalestineIslamicJihadWHO)Thenaturallanguagecomponentofthesystemconsistsofalexicon,andparsingandgenerationsubsystems.
ThelexiconisacomponentoftheknowledgebasethatmapswordsandphrasestoCycconcepts,whilevariousparsersprovidemethodsfortranslatingEnglishtextintoCycL.
ThesystemalsohasarelativelycompleteabilitytorenderCycLsen-tencesintoEnglish,althoughthephrasingcanbesomewhatstiltedwhenlongersentencesaregenerated.
(foundingAgentPalestineIslamicJihadTerrorist-Nafi)4.
KBconsistencychecking:Someoftheresultsretrievedduringthesearchprocessaredisprovable,becausetheyareinconsistentwithknowledgealreadypresentintheknowl-edgebase;othersarealreadyknownortriviallyprovable,andthereforeredundant.
AnyGAFfoundviainferencetobeinconsistentorredundantisdiscarded.
TheworkdescribedinthispapertargetstheautomaticacquisitionofGAFs.
Simplefactsaremorelikelytobereadilyfoundontheweb,andthisapproachminimizesdif-ficultiesingeneratingandparsingcomplexnaturallanguageconstructs.
5.
Googleverification:Duringsearch,thelargestpossiblesetofcandidateGAFsiscreated.
ThoseGAFsthatarenotdiscardedduringKBconsistencycheckingarere-renderedintoEnglishsearchstrings,suchas:2.
1OverviewoftheLearningCycleGatheringinformationfromthewebproceedsinsixstages,asillustratedinFigure1:"BashirNafiisafounderofPalestineIslamicJihad"andasecondGooglesearchoverthosestringsisperformed.
AnyGAFthatresultsinnoretrieveddocumentsduringthisphaseisdiscarded.
1.
Choosingaquery:BecausethenumberofconceptsintheKBissolarge,thenumberofpossibleCycLqueriesisenormous;choosinginteresting,productivequeriesauto-maticallyisanecessarystepinautomatingtheknowledgeacquisitionprocess.
Anexampleofsuchaquerymightbe:6.
Reviewingandasserting:TheremainingGAFsareas-sertedintospecialhypotheticalcontextsintheknowledgebase.
Anontologistorhumanvolunteerreviewsthemforaccuracy,usingatoolspecifictothattask[Witbrocketal.
,2005],andtheonesfoundtobecorrectareassertedintotheknowledgebase.
(foundingAgentPalestineIslamicJihadWHO)2.
Searching:Onceaqueryisselected,itistranslatedintooneormoreEnglishsearchstrings.
Thequeryabovemightberenderedintostringssuchas:3ImplementationoftheLearningCycle"PIJ,foundedby""PalestineIslamicJihadfounder"ThesestringsarepassedontotheGoogleAPI.
Theappro-priatesectionsofanyresultingdocumentsaredownloaded,andtherelevantsectionisextracted(e.
g.
,"PIJfounderBashirMusaMohammedNafiisstillatlarge…").
3.
1SelectingQueriesWhileitisoftenusefulinanapplicationcontexttolookfortheanswertoaspecificquestion,orautomaticallypopu-lateaclassofinformation(suchasfoundersofgroups,orAAAI-05/1431primeministersofcountries),satisfyingtheultimategoalofpopulatingtheCycKBviamachinelearningreliesinpartonautomaticallyselectingsuitablesentences.
Thereareanumberofchallengesthatmustbemetinthisregard.
Que-riesshouldhavereasonableprobabilityofhavinginterestingbindingsandofbeingfindableinthecorpus(inthiscase,ontheweb).
Somesearchesareunlikelytobeproductiveforsemanticreasons:someargumentpositionsareofaninfiniteorcontinuoustype,suchasTime-Quantity,andcouldthere-foreyieldaninfinitenumberofmostly-uninterestingsearches,suchas(ageOBJECT(YearsDuration300)).
Otherqueriesareguaranteedtoberedundant.
Weinitiallylimitedsearchestoasetof134binarypredi-cateswhich,whenusedtogeneratesearchstrings,tendedtomaximizeusefulresultsfromwebsearches.
1Thealgorithmforproceedingthroughthosepredicateswasasfollows:Foragivensearchrun,adepthofDisselected.
Disthemaximumnumberofdifferentvaluesthatcanbeusedforeachargumentofapredicate.
ForeachbinarypredicatepiinourtestsetP(where|P|=134),weretrievefromtheKBthetypeconstraintsoneachofitstwoarguments.
Unlessthetypegeneralizestoaninfiniteclass,weretrievetheDmostfullyrepresentedvaluesfromtheknowledgebase–thatis,thosethatappearinthemostassertions,andthereforeaboutwhichthemostisknown.
Theseareassumedtobethemostinterestingtermsofthattype,andthereforetheonesmostlikelytobefoundbyawebsearch.
ForpiwethenhavetypesTi1andTi2.
TheDbestrepresentedvalueswouldbe(ti11…ti1D)and(ti21…t12D).
Ifneitherofapredicate'sar-gumentstookvaluesofacontinuoustype,therewouldbe2D*|P|queriesgenerated:(p1t111VAR)…(p1t11DVAR)(p1VARt121)…(p1VARt12D)…(p|P|t|P|11VAR)…(p|P|t|P|1DVAR)(p|P|VARt|P|21)…(p|P|VARt|P|2D)Forexample,asetofthepredicatesfoundingAgentandfoundingDate,givenadepthof1,wouldproducethreequeries:(foundingAgentAlQaidaWHO)(foundingAgentWHATTerrorist-Salamat)(foundingDateAlQaidaWHEN)Thefourthpermutationisnotproduced,becausetheargu-mentconstraint,Date,isofacontinuoustype.
Thisapproachisnotwithoutproblems.
Itreliesheavilyonthenatureofthetypeconstraintsplacedonpredicates;forsomepredicates,suchasfoundingDate,thisworkswell,whileforotherstheargumentconstraintsaretoobroad.
Forexample,thepredicatesellsProductTypetakesaconstantoftypesomethingExistinginitssecondargumentposition,becausealmostanythingcanbesold.
TheproposedwaytoaddressthisproblemiswithapredicatetypicalArgIsa,which1Examples:foundingAgent,foundingDate,sellsPro-ductType,primeMinister,lifeExpectancy,awardWinners.
Productivepredicateswerefoundviamanualtrialanderror,fromasetofdomainsselectedtospanabroadportionoftheKB(terror-ism,medicaltechnology,conceptualworks,globalpolitics,familyrelationships,andsales).
wouldconnectpredicatessuchassellsProductTypewiththecollectionstowhichtheytypicallyrefer(inthiscase,Com-modityProduct)2.
AnotherproblemlieswiththeassumptionthatthemembersofaclassaboutwhichCycknowsthemostarethemostinterestingones,whichisonlysometimescorrect.
Thebest-describedinstancesoftheclassPerson,forexample,tendtobetheontologistswhoworkatCycorp,ratherthan,forexample,headsofstate.
FutureworkwillinvolveusingGoogleintheselectionofappropriatequeries.
RatherthanusingthetopDmostsupportedtermsintheKBoftypeT,itshouldbepossibletoretrievethehitcountforuptoseveralthousandmembersofT,andseekinformationabouttheDtermsofTforwhichthemostinformationisavailable.
3.
2SearchGeneratingSearchStringsOncethesystemhasselectedaquery,itgeneratesaseriesofsearchstrings.
TheexistingNLgenerationmachineryisappliedto233manuallycreatedspecialgenerationtem-platesforthe134predicates.
3SeveralfactorsmotivatedtheconstructionofspecializedsearchgenerationtemplateswithintheNLsystem.
InadditiontothefactthatproductivesearchstringsareoftenunlikestandardEnglish,CycLgen-erationstendtobesomewhatstilted,sinceontologistshavepreferredunambiguousexpressionofCycLmeaningsovernaturalness.
Inaddition,theKBgenerallycontainsoneortwogenerationtemplatesforanygivenpredicate,whiletheremaybemanycommonwaysofexpressingtheinfor-mationthatmaybeusefulforsearching.
Forexample,forthequery(foundingAgentPalestineIs-lamicJihadX),thesystemgeneratesthefollowingsetofsearchstrings:PalestineIslamicJihadfounder____PalestinianIslamicJihadfounder____PIJfounder____PalestineIslamicJihad,foundedby____PalestinianIslamicJihad,foundedby____PIJ,foundedby____AllpossiblesearchesweregeneratedfromtheCartesianproductofthegenerationtemplateswithallEnglishrender-ingsofthearguments.
Inthisexample,CycknowsthreenamesforPalestineIslamicJihad,andhastwotemplatesforfoundingAgent,resultinginsixsearchstrings.
Tosimplifymatchdetection,argumentpositionswereonlyallowedatthebeginningorendofthetemplatestring.
Inordertocarryouttheactualsearch,the"___"placeholderswerestripped2Inmostcases,theseconstraintswillbelearnedfromanalysisofcommonusageintheexistingcorpus.
3Slightlyfewerthanhalfofthepredicateshaveonlyoneasso-ciatedsearchtemplate.
Manyobvioustemplatesexistbuthavenotbeenrepresented;infuturework,ananalysisofthesearchstringswillbeperformedtodeterminewhattypesofstringsproducegoodresults.
Thatinformationwillbeusedforautomaticallygeneratingsearchstringsforotherpredicates.
AAAI-05/1432off,andtheremainingstringwassubmittedasaquotedstringtoGoogle.
SearchingviaGoogleThesearchstringisusedwithaninterfacetotheGoogleAPItoretrieveatupleconsistingoftheURL,theGoogleranking,thematchpositionandthewebpagetext,whichisthenhandedofftoaparserthatattemptstoconvertitintoameaningfulCycLentity.
3.
3ParsingintoCycLOnceadocumentisretrieved,theanswermustbefoundandinterpreted.
First,thesystemsearchesfortheexacttextofthequerystring,andreturnsthesearchstringpluseitherthebeginningortheendofthesentence,dependingontheposi-tionofthe"___"inthegeneratedstring.
TheresultingstringissearchedforphrasesthatcanbeinterpretedasaCycLconceptthatmeetsthetypeconstraintsimposedbythepredicate.
Forexample,inthestring"PIJfounderBashirMusaMohammedNafiisstill…","BashirMusaMoham-medNafi"isrecognizedasaperson,andthereforeasacan-didateforthearg2positionoffoundingAgent.
Forpredicatesthatrequirestrings(suchasnameString,whichrelatesanentitytoitsname),anamedentityrecognizer[Prageretal.
,2000]isusedtofindasuitablecandidate.
Inothercases,standardparsingtechniquesareusedtotrytofindausefulinterpretation,includinglookingupstringsinCyclexicon,interpretingthemasnouncompoundsordates,andcompositionalinterpretation.
Forspeed,composi-tionalinterpretationisonlyattemptedfortermsjudgedtobeconstituentsbyaprobabilisticCFGparser[Charniak,2001].
Thishastheeffectofeliminatingafewcorrectanswers(mostlywheretheparserproducesanincorrectsyntacticparse),butalsodecreasesthetotaltimespentonanalysisbyatleast50%.
CreatingcandidateCycLSentencesTheresultofparsingthematchedsectioninthewebpageisasetofcandidateCycLterms,usuallyconstantssuchasTerrorist-Nafi.
Substitutingthesetermsintotheoriginalin-completeCycLqueryproducesasetofcandidateGAFs.
3.
4CheckingCycKBConsistencyInprinciple,anyusefulfactaddedtothesystemshouldnei-therdisprovablenortriviallyprovable.
Forexample,thesentence:(foundingAgentPalestineIslamicJihadTerrorist-AlShikaki)isnotnovel,becauseCycalreadyknowsthis.
Meanwhile,(foundingAgentPalestineIslamicJihadAugusteRodin)isnovel,buttriviallydisprovable,asCycknowshediedin1917(72yearsbeforethePIJwasfounded).
IftheCycKBalreadycontainsknowledgethatrendersanewfactredundantorcontradictory,thatfactwillbedis-carded.
Thisischeckedviainference;eachnewfactistreatedasaquery,andinferenceisperformedtodeterminewhetheritcanbeprovenfalse(inconsistent)ortrue(redun-dant.
)Cycprovidesjustificationsforfactsusedtoproducequeryresults,asinFigure2,anditishelpfulforredundantfactstobemarkedasadditionallyconfirmedviasearch-basedlearning.
Previousworksuggeststhatone-stepinfer-enceissufficienttoidentifyduplicateinformationordis-provecontradictionsduringtypicalknowledgeacquisitiontasks[Pantonetal.
,2002].
Figure2:Justificationsfortheclaimthata2001attackonAnkarameetsthecriteriaforthequerybeingrun.
3.
5GoogleVerificationInordertoguardagainstparsererrorsandexcessivelygen-eralsearchterms(suchasambiguousacronyms),asecondGooglesearchisperformed,inordertodeterminewhethersearchstringsgeneratedfromthenewGAFwillproduceresults.
SearchstringsaregeneratedfromthecandidateGAFthatwaslearned,butanystringcontaininganacronymorabbreviation(e.
g.
,"PIJ"for"PalestianIslamicJihad")issupplementedwiththedisambiguationterm:theleastcom-monword(basedonGooglehitcounts)oftheexpandedacronym.
Inthiscase,thestring"Palestine"isaddedasaterm,sinceitistheleastcommonwordintheset'Pales-tine,''Palestinian,''Islamic,'and'Jihad.
'Theresultingverificationsearchstringis:"PIJfounderBashirNafi"+"Palestine"Anyfactforwhichthisverificationstepreturnsnoresultsisconsideredunverified,andwillnotbepresentedtoare-viewer.
3.
6ReviewandAssertionInthefinalstep,learnedsentencesarereviewedbyahumancurator,and,ifcorrect,assertedintotheCycKB.
Currently,suggestedsentencesarepresentedtothereviewerinnopar-ticularorder;infuture,sortingmethodswillbeimplementedandtested.
ThemoststraightforwardapproachesinvolvemakinggreateruseofinformationalreadyretrievedfromGoogle:sinceinformationaboutthesearchesunderlyingacandidatesentenceisstoredintheKB,itshouldbepossibleandproductivetogiveprioritytosentencesthataresup-portedbyalargertotalnumberofdocuments.
Ineffect,thenumericalvaluesreturnedduringtheverificationstepcouldbeusedtosortthemostwidelysupportedsentencesup-AAAI-05/1433wardsinthereviewprocess.
Anotherpossibility,ifseveralcontradictoryfactsarefound,isgivingreviewprioritytothosefoundindocumentswiththehighestGoogleranking.
4ResultsStatisticsweregatheredforacaseinwhich134predicatesinPwereused,andDwassetto20.
4Themajorityofthesearchesexpended,about80%,wereperformedintheveri-ficationphaseratherthantheinitialsearchphase.
There-sultswereasfollows:Queries:348Searchesexpended:4290(817initial,3477verification)GAFsfound:1016…andrejectedduetoKBinconsistency:4…andalreadyknowntotheKB:384…andrejectedbyGoogleverification:566NovelGAFSfoundandverified:61AhumanreviewerthenwentthroughtheverifiedGAFs,andasampleof53oftheunverifiedGAFs,anddeterminedtheiractualcorrectnessrate.
Theresultswereasfollows:VerifiedUnverifiedTrue(correct)328**False(incorrect)29*45Totalnovelfacts:114Novel,correctfactsdiscovered:77Incorrectfactsdiscovered:37Factscategorizedcorrectly:68%Factscategorizedincorrectly:32%…*falsepositives(falsebutverified):25%…**falsenegatives(truebutunverified):7%Examplesoftheseresulttypes:Query:(#$hasBeliefSystems#$IranX)Searchstring:"Iranadheresto"CandidateGAF:(#$hasBeliefSystems#$Iran#$Islam)Verificationsearchstrings:"IslamicRepublicofIranadherestoIslam""IranadherestoIslam""IranbelievesinIslam"(found)"IslamicRepublicofIranbelievesinIslam"(found)ExampleGAFsalreadyknowntotheKB:(#$vestedInterest#$Iran#$Iraq(#$inhabitantTypes#$Lebanon#$EthnicGroupOfKurds)ExampleGAFsrejectedduetoKBinconsistency:(#$northOf#$Iran#$Iran)(#$geopoliticalSubdivision#$Iraq#$Iran)4Ittakesbetweenfourandfivehourstoexhaustanallotmentof3,000searchesperdaythroughtheGoogleAPI.
Correct,verifiedGAF:(foundingDateAfricanNationalCongress(YearFn1912))*IncorrectbutverifiedGAF:(foundingDateJewishDefenseLeague(DecadeFn198))Incorrect,rejectedGAF:(objectFoundInLocationKuKluxKlanGillianAnderson)**CorrectbutrejectedGAF:(foundingDateKarenNationalUnion(MonthFnApril(YearFn1947)))Theverificationstepproducescomparativelyfewfalsenegatives(inwhichatruefactisincorrectlyclassifiedasfalse);inthisrun,80%ofthenovel,correctfactsretrievedwerecorrectlyidentifiedassuch.
Giventhis,itisreasonabletorejectallunverifiablesentences,especiallygiventhewealthofpossiblequeriesandthesizeandbreadthofthecorporaavailable.
Only61%oftheincorrectfactsretrievedwereidentified,suggestingthatsubstantialworkindecreas-ingtheoccurrenceoffalsepositiveswillbenecessarybe-foretheneedforhumanreviewiseliminated;thisisunsur-prising,astheInternetcontainslargeamountsofunstruc-tured,uncheckedinformation.
SlightlyoverathirdoftheGAFsdiscoveredwerefactsthatwerealreadyknowntotheKB,andpresumablycorrecttoabaselinelevel(i.
e.
,thecorrectnesslevelachievedbyhumanontologists);thetotalnumberofcorrectfactsdis-coveredwastherefore425,42%ofthetotal.
Verificationreducesthenumberofnovelsentencesthatmustundergohumanreviewfrom1016to61,andthehumanreviewproc-ess,whichtakesplaceentirelyinEnglish,isquickandstraightforward.
Anintermediatesteptowardsfullautoma-tionwouldbetoidentifyclassesofsentencesthatcanbeassertedwithouthumanreview.
5ConclusionsWhilegreatstrideshavebeenmadeinmachinelearninginthelastfewdecades,automaticallygatheringuseful,consis-tentknowledgeinamachine-usableformisstillarelativelyunexploredresearcharea.
TheoriginalpromiseoftheCycproject–toprovideabasisofreal-worldknowledgesuffi-cienttosupportthesortoflearningfromlanguageofwhichhumansarecapable–hasnotyetbeenfulfilled.
Inthattime,informationhasbecomeenormouslymoreaccessible,dueinnosmallparttothewidespreadpopularityoftheWebandtoeffectiveindexingsystemssuchasGoogle.
Makinguseofthatrepositoryrequiresastoreofreal-worldknowledgeandsomefacilityfornaturallanguageparsingandgeneration.
Theseresults,whileextremelypreliminary,areencourag-ing.
Inparticular,usingCycasabasisforlearningiseffec-tive,bothinguidingthelearningprocessandinrepresentingandusingtheresults.
Pre-existingknowledgeintheKBsupportstheconstructionofmeaningfulqueriesandpro-videsaframeworkintowhichlearnedknowledgecanbeassertedandreasonedover.
ComparativelyshallownaturallanguageparsingcombinedwiththetypeconstraintandrelationknowledgeintheCycsystemallowstheretrieval,AAAI-05/1434verification,andreviewofunconstrainedfactsatahigherratethanthatachievedbyhumanknowledgerepresentationexpertsworkingunassisted.
Perhapsmoreimportantly,thekindofknowledgeretrievedisexactlytheinstance-levelknowledgethatshouldnotrequirehumanexperts–itshouldinsteadbeobtained,maintained,andreasonedoverbytoolsthatneedandusethatknowledge.
InvolvingGoogleineverystageofthelearningprocessallowsustoexploitbothCyc'sknowledgeandtheknowledgeonthewebinanex-tremelynaturalway.
Theworkbeingdonehereisimmediatelyusefulasatoolthatmakeshumanknowledgeentryfaster,easierandmoreeffective,butitalsoprovidesabasisforanalysisofwhatinformationcanbelearnedeffectivelywithouthumaninter-action.
Thus,overtime,wehopetoprovideCycwithamechanismtotrulyacquireknowledgebylearning.
AcknowledgmentsThisresearchwaspartiallysponsoredbyARDA'sAQUAINTprogram.
Additionally,wethankGoogleforallowingaccesstotheirAPIforresearchsuchasthis.
References[Belascoetal.
,2004]A.
Belasco,J.
Curtis,RCKahlert,C.
Klein,C.
Mayans,R.
Reagan.
RepresentingKnowledgeGapsEffectively.
InProc.
ofthe5thInternationalCon-ferenceonPracticalAspectsofKnowledgeManage-ment,Vienna,Austria,p.
159-164.
Dec2004.
[BrinandPage,1998]SergeyBrinandLarryPage,Anat-omyofaLarge-scaleHypertextualSearchEngine.
InProc.
ofthe7thInternationalWorldWideWebConfer-ence,pp107-117,Brisbane,Australia,Apr1998.
[Brown,1996]R.
D.
Brown,Example-BasedMachineTranslationinthePanglossSystem.
InProc.
ofthe16thInternationalConferenceonComputationalLinguistics,pp169-174.
Copenhagen,Denmark,August5-9,1996.
[Charniak,2001]E.
Charniak.
AMaximum-Entropy-InspiredParser.
InProc.
ofthe1stconferenceonNorthAmericanchapteroftheAssociationforComputationalLinguistics,pp132-139.
Seattle,WA,2000.
MorganKaufmannPublishers.
[Etzionietal.
,2004]O.
Etzioni,M.
Cafarella,D.
Downey,A,Popescu,T.
Shaked,S.
Soderland,D.
Weld,A.
Yates.
Web-scaleInformationExtractioninKnowItAll.
InProc.
ofthe13thinternationalconferenceonWorldWideWeb,pp100-110,NewYork,NY,2004.
[Ghani,2000]R.
Ghani,R.
Jones,D.
Mladenic,K.
Nigam,S.
Slattery.
DataMiningonSymbolicKnowledgeEx-tractedfromtheWeb.
InProc.
ofthe6thInternationalConferenceonKnowledgeDiscoveryandDataMiningWorkshoponTextMining,pp29-36,Boston,MA,Aug2000.
[Guha,1991]R.
V.
Guha.
Contexts:AFormalizationandSomeApplications.
PhDthesis,StanfordUniversity,STAN-CS-91-1399-Thesis,1991.
[Kwoketal.
,2001]C.
Kwok,O.
Etzioni,D.
Weld.
ScalingQuestionAnsweringtotheWeb.
InACMTransactionsonInformationSystems,Vol19,Issue3,pp242–262.
2001[Lenat,1976]D.
B.
Lenat.
AM:AnArtificialIntelligenceApproachtoDiscoveryinMathematicsasHeuristicSearch,Ph.
D.
Dissertation,StanfordUniversity,STAN-CS-76-570,1976.
[Lenatetal.
,1983]D.
B.
Lenat,A.
Borning,D.
McDonald,C.
Taylor,S.
Weyer.
Knoesphere:BuildingExpertSys-temswithEncyclopedicKnowledge.
InProc.
ofthe8thInternationalJointConferenceonArtificialIntelligence,Vol1,pp167–169,Karlsruhe,Germany,August1983.
[Lenat,1995]D.
B.
Lenat.
Cyc:aLarge-ScaleInvestmentinKnowledgeInfrastructure.
InCommunicationsoftheACM,Vol38,Issue11,pp33-38.
Nov1995.
[Lenat,1998]D.
B.
Lenat,TheDimensionsofContext-Space,fromhttp://www.
cyc.
com/doc/context-space.
pdf.
[Pantonetal.
,2002]K.
Panton,P.
Miraglia,N.
Salay,R.
C.
Kahlert,D.
Baxter,R.
Reagan.
KnowledgeFormationandDialogueUsingtheKRAKENToolset.
InProc.
ofthe18thNationalConferenceonArtificialIntelligence,pp900-905,Edmonton,Canada,2002.
[Prageretal.
,2000]J.
Prager,E.
Brown,A.
Coden,D.
Radev.
QuestionAnsweringbyPredictiveAnnotation.
InProc.
ofthe23rdAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInforma-tionRetrieval,pp184-191.
Athens,Greece,2000.
[Thrunetal.
,1998]S.
Thrun,C.
Faloutsos,T.
Mitchell,L.
Wasserman.
AutomatedLearningandDiscovery:State-Of-The-ArtandResearchTopicsinaRapidlyGrowingField,tech.
reportCMU-CALD-98-100,ComputerSci-enceDepartment,CarnegieMellonUniversity,1998.
[Witbrocketal.
,2003]M.
Witbrock,D.
Baxter,J.
Curtis,D.
SchneiderR.
C.
Kahlert,P.
Miraglia,P.
Wagner,K.
Panton,G.
Matthews,A.
Vizedom.
AnInteractiveDia-logueSystemforKnowledgeAcquisitioninCyc.
InProc.
ofthe18thInternationalJointConferenceonArti-ficialIntelligence,Acapulco,Mexico,2003.
[Witbrocketal.
,2004]M.
Witbrock,K.
Panton,S.
Reed,D.
Schneider,B.
Aldag,M.
ReimersandS.
Bertolo.
AutomatedOWLAnnotationAssistedbyaLargeKnowledgeBase.
InWorkshopNotesofthe2004Work-shoponKnowledgeMarkupandSemanticAnnotationatthe3rdInternationalSemanticWebConference,Hi-roshima,Japan,pp71-80.
Nov2004.
[Witbrocketal.
,2005]:M.
Witbrock,C.
Matuszek,A.
Brusseau,R.
C.
Kahlert,C.
B.
Fraser,D.
Lenat.
"Knowl-edgeBegetsKnowledge:StepstowardsAssistedKnowl-edgeAcquisitioninCyc,"inProc.
oftheAAAI2005SpringSymposiumonKnowledgeCollectionfromVol-unteerContributors,Stanford,CA,March2005.
AAAI-05/1435
tmhhost怎么样?tmhhost正在搞暑假大促销活动,全部是高端线路VPS,现在直接季付8折优惠,活动截止时间是8月31日。可选机房及线路有美国洛杉矶cn2 gia+200G高防、洛杉矶三网CN2 GIA、洛杉矶CERA机房CN2 GIA,日本软银(100M带宽)、香港BGP直连200M带宽、香港三网CN2 GIA、韩国双向CN2。点击进入:tmhhost官方网站地址tmhhost优惠码:Tm...
Tudcloud是一家新开的主机商,提供VPS和独立服务器租用,数据中心在中国香港(VPS和独立服务器)和美国洛杉矶(独立服务器),商家VPS基于KVM架构,开设在香港机房,可以选择限制流量大带宽或者限制带宽不限流量套餐。目前提供8折优惠码,优惠后最低每月7.2美元起。虽然主机商网站为英文界面,但是支付方式仅支付宝和Stripe,可能是国人商家。下面列出部分VPS主机套餐配置信息。CPU:1cor...
LOCVPS(全球云)发布了新上韩国机房KVM架构主机信息,提供流量和带宽方式,适用全场8折优惠码,优惠码最低2G内存套餐月付仅44元起。这是一家成立较早的国人VPS服务商,目前提供洛杉矶MC、洛杉矶C3、和香港邦联、香港沙田电信、香港大埔、日本东京、日本大阪、新加坡、德国和荷兰等机房VPS主机,基于KVM或者XEN架构。下面分别列出几款韩国机房KVM主机配置信息。韩国KVM流量型套餐:KR-Pl...
169pp com为你推荐
赛我网赛我网(cyworld)怎么进不去?vista系统重装vista怎样重装系统?缓冲区溢出教程溢出攻击法使用什么样的原理邮箱打不开怎么办126邮箱打不开怎么办bluestacksBluestacks安卓模拟器是什么机型的?百度手写百度如何手写:迅雷云点播账号求一个迅雷云点播vip的账号,只是看的,绝不动任何手脚。雅虎天盾我机器上有瑞星杀毒和防火墙 我用雅虎天盾来查杀木马怎样?创维云电视功能谁能具体介绍一下创维云电视的主要功能,以及基本的使用方式,如果能分型号介绍就更好了,O(∩_∩)O谢谢小米手柄小米手柄怎么用?
虚拟主机是什么 网页空间租用 已备案域名出售 hostmonster 美国翻墙 lamp配置 dropbox网盘 国内php空间 免费个人空间申请 hostloc idc是什么 网通服务器托管 怎么建立邮箱 优酷黄金会员账号共享 域名dns 跟踪路由命令 smtp服务器地址 国内域名 华为k3 免费网络空间 更多