candidate169pp

169pp com  时间:2021-03-03  阅读:()
SearchingforCommonSense:PopulatingCycfromtheWebCynthiaMatuszek,MichaelWitbrock,RobertC.
Kahlert,JohnCabral,DaveSchneider,PurveshShah,DougLenatCycorp,Inc.
3721ExecutiveCenterDrive,Suite100,Austin,TX78731{cynthia,witbrock,rck,jcabral,daves,shah,lenat}@cyc.
comAbstractTheCycprojectispredicatedontheideathateffectivemachinelearningdependsonhavingacoreofknowl-edgethatprovidesacontextfornovellearnedinforma-tion–whatisknowninformallyas"commonsense.
"Overthelasttwentyyears,asufficientcoreofcommonsenseknowledgehasbeenenteredintoCyctoallowittobegineffectivelyandflexiblysupportingitsmostimportanttask:increasingitsownstoreofworldknowledge.
Inthispaper,wepresentinitialworkonamethodofusingacombinationofCycandtheWorldWideWeb,accessedviaGoogle,toassistinenteringknowledgeintoCyc.
Thelong-termgoalisautomatingtheprocessofbuildingaconsistent,formalizedrepre-sentationoftheworldintheCycknowledgebaseviamachinelearning.
Wepresentpreliminaryresultsofthisworkanddescribehowweexpecttheknowledgeacqui-sitionprocesstobecomemoreaccurate,faster,andmoreautomatedinthefuture.
1IntroductionTheideaofbuildingaverylarge-scaleknowledgebasethatcanbeusedasafoundationforautomatedknowledgeacqui-sitionhasbeenpresentinartificialintelligenceresearchformorethantwentyyears[Lenatetal.
,1983].
Inthattime,anenormousamountofprogresshasbeenmade[Thrunetal.
,1998];techniquesdevelopedundertheumbrellaofmachinelearninghavebeensuccessfullyappliedtoworkrangingfromrobotics,tovoicerecognition,tobioinformatics.
Inallofthesefields,theuseofpreexistingknowledgeiswide-spread.
Muchofthisworkreliesoneitherprogramminganinductivebiasintoalearningsystem(e.
g.
,insystemslikeAM[Lenat,1976]);oronprovidinganinductivebiasintheformoftrainingexamples[Brown,1996].
Alsointhattime,theWebhasemergedasahugereposi-toryofelectronicallyavailableknowledge,andindexingsystemssuchasGooglehavemadethatknowledgeprogres-sivelymoreaccessible[BrinandPage,1998].
Workthatreliesonthewebingeneral,andGoogleinparticular,forinformationextractionisprovingtobeafertileresearcharea[Ghani,2000;Kwoketal.
2001;Etzionietal.
2004].
ThepurposeoftheCycprojectistoprovidecomputerswithastoreofformallyrepresented"commonsense":realworldknowledgethatcanprovideabasisforadditionalknowledgetobegatheredandinterpretedautomatically[Lenat,1995].
Inthelasttwentyyears,overthreemillionfactsandruleshavebeenformallyrepresentedintheCycknowledgebasebyontologistsskilledinCycL,Cyc'sfor-malrepresentationlanguage.
Toolshavebeendevelopedwhichallowsubjectmatterexpertstocontributedirectly[Pantonetal.
,2002;Witbrocketal.
,2003;Belascoetal.
,2004].
Inaddition,naturallanguagegenerationandparsingcapabilitieshavebeendevelopedtoprovidesupportforlearningfromEnglishcorpora[Witbrocketal.
,2004].
Asaresult,theCycknowledgebasenowcontainsenoughknowledgetosupportexperimentationwiththeacquisitionofadditionalknowledgeviamachinelearning.
Inthispaper,wedescribeamethodforgatheringandverifyingfactsfromtheWorldWideWeb.
Theknowledgeacquisitionprocedureisdescribedatbothanoverviewlevelandindetail.
Theworkfocusesonthreenovelapproaches:usingknowledgealreadyintheCycKBtofocustheacquisitionoffurtherknowledge;representingacquiredknowledgeintheknowl-edgebase;andusingGoogleintwodistinctways,tofindfactsand,separately,toverifythem.
Whilethisresearchisatanearlystage,theinitialresultsarepromisingintermsofboththeacquisitionspeedandqualityofresults.
Eveninitspreliminaryform,themecha-nismdescribedisausefultoolforreducingthecostofmanuallyenteringknowledgeintoCyc;thelevelofexper-tiserequiredtoenableapersontocontributetotheKBisreduced,andmanyofthenecessarystepsarehandledauto-matically,reducingthetotaltimerequired.
Thenumberofsentencesthatcanbeacquiredinthiswayandreviewedforaccuracybyanuntrainedrevieweroutstripstherateatwhichsentencescanbehand-authoredbyatrainedontolo-gist,andtheverificationstepsreducetheamountofworkrequiredofahumanreviewerbyapproximately90%.
2CycandCycLTheCycsystemismadeupofthreedistinctcomponents,allofwhicharecrucialtothemachinelearningprocess:theknowledgebase(KB),theinferenceengine,andthenaturallanguagesystem.
TheCycKBcontainsmorethan3.
2mil-lionassertions(factsandrules)describingmorethan280,000concepts,includingmorethan12,000concept-AAAI-05/1430interrelatingpredicates.
FactsstoredintheCycKBmaybeatomic(makingthemGroundAtomicFormul,orGAFs),ortheymaybecomplexsentences.
Dependingonthepredi-cateused,aGAFcandescribeinstance-levelortype-levelknowledge.
AllinformationintheKBisassertedintoahi-erarchicalgraphofmicrotheories,orreasoningcontexts[Guha,1991;Lenat,1998].
CycLqueriesaresyntacticallylegalCycLsentences,whichmaybepartiallybound(thatis,containoneormorevariables).
TheCycinferenceengineisresponsibleforusinginformationintheKBtodeterminethetruthofasentenceand,ifnecessary,findprovablycorrectvariablebindings.
Sampleinstance-levelGAF:(foundingDateCyc(YearFn1985))Sampleentity-to-typeandtype-to-typeGAFS:(sellsProductTypeSaudiAramcoPetroleumProduct)(conditionAffectsPartTypeCutaneousAnthraxSkin)Figure1:Learningisaprocessofselectinginterestingquestions,searchingforthatinformationontheweb,pars-ingtheresults,performingverificationandconsistencycheckswiththedocumentcorpusandtheKB,reviewing,andassertingthatknowledgeintotheKB.
Samplenon-atomicsentence:(or(foundingDateAlQaida(YearFn1987)))(foundingDateAlQaida(YearFn1988)))Samplequery:3.
Parsingresults:Therelevantcomponentsofsentencesareidentifiedbytheirlocationrelativetothesearchstring.
ThetermsarethenparsedintoCycLviathenaturallan-guageparsingprocessdescribedinsection3.
3,resultinginoneormoreGAFssuchas:(foundingAgentPalestineIslamicJihadWHO)Thenaturallanguagecomponentofthesystemconsistsofalexicon,andparsingandgenerationsubsystems.
ThelexiconisacomponentoftheknowledgebasethatmapswordsandphrasestoCycconcepts,whilevariousparsersprovidemethodsfortranslatingEnglishtextintoCycL.
ThesystemalsohasarelativelycompleteabilitytorenderCycLsen-tencesintoEnglish,althoughthephrasingcanbesomewhatstiltedwhenlongersentencesaregenerated.
(foundingAgentPalestineIslamicJihadTerrorist-Nafi)4.
KBconsistencychecking:Someoftheresultsretrievedduringthesearchprocessaredisprovable,becausetheyareinconsistentwithknowledgealreadypresentintheknowl-edgebase;othersarealreadyknownortriviallyprovable,andthereforeredundant.
AnyGAFfoundviainferencetobeinconsistentorredundantisdiscarded.
TheworkdescribedinthispapertargetstheautomaticacquisitionofGAFs.
Simplefactsaremorelikelytobereadilyfoundontheweb,andthisapproachminimizesdif-ficultiesingeneratingandparsingcomplexnaturallanguageconstructs.
5.
Googleverification:Duringsearch,thelargestpossiblesetofcandidateGAFsiscreated.
ThoseGAFsthatarenotdiscardedduringKBconsistencycheckingarere-renderedintoEnglishsearchstrings,suchas:2.
1OverviewoftheLearningCycleGatheringinformationfromthewebproceedsinsixstages,asillustratedinFigure1:"BashirNafiisafounderofPalestineIslamicJihad"andasecondGooglesearchoverthosestringsisperformed.
AnyGAFthatresultsinnoretrieveddocumentsduringthisphaseisdiscarded.
1.
Choosingaquery:BecausethenumberofconceptsintheKBissolarge,thenumberofpossibleCycLqueriesisenormous;choosinginteresting,productivequeriesauto-maticallyisanecessarystepinautomatingtheknowledgeacquisitionprocess.
Anexampleofsuchaquerymightbe:6.
Reviewingandasserting:TheremainingGAFsareas-sertedintospecialhypotheticalcontextsintheknowledgebase.
Anontologistorhumanvolunteerreviewsthemforaccuracy,usingatoolspecifictothattask[Witbrocketal.
,2005],andtheonesfoundtobecorrectareassertedintotheknowledgebase.
(foundingAgentPalestineIslamicJihadWHO)2.
Searching:Onceaqueryisselected,itistranslatedintooneormoreEnglishsearchstrings.
Thequeryabovemightberenderedintostringssuchas:3ImplementationoftheLearningCycle"PIJ,foundedby""PalestineIslamicJihadfounder"ThesestringsarepassedontotheGoogleAPI.
Theappro-priatesectionsofanyresultingdocumentsaredownloaded,andtherelevantsectionisextracted(e.
g.
,"PIJfounderBashirMusaMohammedNafiisstillatlarge…").
3.
1SelectingQueriesWhileitisoftenusefulinanapplicationcontexttolookfortheanswertoaspecificquestion,orautomaticallypopu-lateaclassofinformation(suchasfoundersofgroups,orAAAI-05/1431primeministersofcountries),satisfyingtheultimategoalofpopulatingtheCycKBviamachinelearningreliesinpartonautomaticallyselectingsuitablesentences.
Thereareanumberofchallengesthatmustbemetinthisregard.
Que-riesshouldhavereasonableprobabilityofhavinginterestingbindingsandofbeingfindableinthecorpus(inthiscase,ontheweb).
Somesearchesareunlikelytobeproductiveforsemanticreasons:someargumentpositionsareofaninfiniteorcontinuoustype,suchasTime-Quantity,andcouldthere-foreyieldaninfinitenumberofmostly-uninterestingsearches,suchas(ageOBJECT(YearsDuration300)).
Otherqueriesareguaranteedtoberedundant.
Weinitiallylimitedsearchestoasetof134binarypredi-cateswhich,whenusedtogeneratesearchstrings,tendedtomaximizeusefulresultsfromwebsearches.
1Thealgorithmforproceedingthroughthosepredicateswasasfollows:Foragivensearchrun,adepthofDisselected.
Disthemaximumnumberofdifferentvaluesthatcanbeusedforeachargumentofapredicate.
ForeachbinarypredicatepiinourtestsetP(where|P|=134),weretrievefromtheKBthetypeconstraintsoneachofitstwoarguments.
Unlessthetypegeneralizestoaninfiniteclass,weretrievetheDmostfullyrepresentedvaluesfromtheknowledgebase–thatis,thosethatappearinthemostassertions,andthereforeaboutwhichthemostisknown.
Theseareassumedtobethemostinterestingtermsofthattype,andthereforetheonesmostlikelytobefoundbyawebsearch.
ForpiwethenhavetypesTi1andTi2.
TheDbestrepresentedvalueswouldbe(ti11…ti1D)and(ti21…t12D).
Ifneitherofapredicate'sar-gumentstookvaluesofacontinuoustype,therewouldbe2D*|P|queriesgenerated:(p1t111VAR)…(p1t11DVAR)(p1VARt121)…(p1VARt12D)…(p|P|t|P|11VAR)…(p|P|t|P|1DVAR)(p|P|VARt|P|21)…(p|P|VARt|P|2D)Forexample,asetofthepredicatesfoundingAgentandfoundingDate,givenadepthof1,wouldproducethreequeries:(foundingAgentAlQaidaWHO)(foundingAgentWHATTerrorist-Salamat)(foundingDateAlQaidaWHEN)Thefourthpermutationisnotproduced,becausetheargu-mentconstraint,Date,isofacontinuoustype.
Thisapproachisnotwithoutproblems.
Itreliesheavilyonthenatureofthetypeconstraintsplacedonpredicates;forsomepredicates,suchasfoundingDate,thisworkswell,whileforotherstheargumentconstraintsaretoobroad.
Forexample,thepredicatesellsProductTypetakesaconstantoftypesomethingExistinginitssecondargumentposition,becausealmostanythingcanbesold.
TheproposedwaytoaddressthisproblemiswithapredicatetypicalArgIsa,which1Examples:foundingAgent,foundingDate,sellsPro-ductType,primeMinister,lifeExpectancy,awardWinners.
Productivepredicateswerefoundviamanualtrialanderror,fromasetofdomainsselectedtospanabroadportionoftheKB(terror-ism,medicaltechnology,conceptualworks,globalpolitics,familyrelationships,andsales).
wouldconnectpredicatessuchassellsProductTypewiththecollectionstowhichtheytypicallyrefer(inthiscase,Com-modityProduct)2.
AnotherproblemlieswiththeassumptionthatthemembersofaclassaboutwhichCycknowsthemostarethemostinterestingones,whichisonlysometimescorrect.
Thebest-describedinstancesoftheclassPerson,forexample,tendtobetheontologistswhoworkatCycorp,ratherthan,forexample,headsofstate.
FutureworkwillinvolveusingGoogleintheselectionofappropriatequeries.
RatherthanusingthetopDmostsupportedtermsintheKBoftypeT,itshouldbepossibletoretrievethehitcountforuptoseveralthousandmembersofT,andseekinformationabouttheDtermsofTforwhichthemostinformationisavailable.
3.
2SearchGeneratingSearchStringsOncethesystemhasselectedaquery,itgeneratesaseriesofsearchstrings.
TheexistingNLgenerationmachineryisappliedto233manuallycreatedspecialgenerationtem-platesforthe134predicates.
3SeveralfactorsmotivatedtheconstructionofspecializedsearchgenerationtemplateswithintheNLsystem.
InadditiontothefactthatproductivesearchstringsareoftenunlikestandardEnglish,CycLgen-erationstendtobesomewhatstilted,sinceontologistshavepreferredunambiguousexpressionofCycLmeaningsovernaturalness.
Inaddition,theKBgenerallycontainsoneortwogenerationtemplatesforanygivenpredicate,whiletheremaybemanycommonwaysofexpressingtheinfor-mationthatmaybeusefulforsearching.
Forexample,forthequery(foundingAgentPalestineIs-lamicJihadX),thesystemgeneratesthefollowingsetofsearchstrings:PalestineIslamicJihadfounder____PalestinianIslamicJihadfounder____PIJfounder____PalestineIslamicJihad,foundedby____PalestinianIslamicJihad,foundedby____PIJ,foundedby____AllpossiblesearchesweregeneratedfromtheCartesianproductofthegenerationtemplateswithallEnglishrender-ingsofthearguments.
Inthisexample,CycknowsthreenamesforPalestineIslamicJihad,andhastwotemplatesforfoundingAgent,resultinginsixsearchstrings.
Tosimplifymatchdetection,argumentpositionswereonlyallowedatthebeginningorendofthetemplatestring.
Inordertocarryouttheactualsearch,the"___"placeholderswerestripped2Inmostcases,theseconstraintswillbelearnedfromanalysisofcommonusageintheexistingcorpus.
3Slightlyfewerthanhalfofthepredicateshaveonlyoneasso-ciatedsearchtemplate.
Manyobvioustemplatesexistbuthavenotbeenrepresented;infuturework,ananalysisofthesearchstringswillbeperformedtodeterminewhattypesofstringsproducegoodresults.
Thatinformationwillbeusedforautomaticallygeneratingsearchstringsforotherpredicates.
AAAI-05/1432off,andtheremainingstringwassubmittedasaquotedstringtoGoogle.
SearchingviaGoogleThesearchstringisusedwithaninterfacetotheGoogleAPItoretrieveatupleconsistingoftheURL,theGoogleranking,thematchpositionandthewebpagetext,whichisthenhandedofftoaparserthatattemptstoconvertitintoameaningfulCycLentity.
3.
3ParsingintoCycLOnceadocumentisretrieved,theanswermustbefoundandinterpreted.
First,thesystemsearchesfortheexacttextofthequerystring,andreturnsthesearchstringpluseitherthebeginningortheendofthesentence,dependingontheposi-tionofthe"___"inthegeneratedstring.
TheresultingstringissearchedforphrasesthatcanbeinterpretedasaCycLconceptthatmeetsthetypeconstraintsimposedbythepredicate.
Forexample,inthestring"PIJfounderBashirMusaMohammedNafiisstill…","BashirMusaMoham-medNafi"isrecognizedasaperson,andthereforeasacan-didateforthearg2positionoffoundingAgent.
Forpredicatesthatrequirestrings(suchasnameString,whichrelatesanentitytoitsname),anamedentityrecognizer[Prageretal.
,2000]isusedtofindasuitablecandidate.
Inothercases,standardparsingtechniquesareusedtotrytofindausefulinterpretation,includinglookingupstringsinCyclexicon,interpretingthemasnouncompoundsordates,andcompositionalinterpretation.
Forspeed,composi-tionalinterpretationisonlyattemptedfortermsjudgedtobeconstituentsbyaprobabilisticCFGparser[Charniak,2001].
Thishastheeffectofeliminatingafewcorrectanswers(mostlywheretheparserproducesanincorrectsyntacticparse),butalsodecreasesthetotaltimespentonanalysisbyatleast50%.
CreatingcandidateCycLSentencesTheresultofparsingthematchedsectioninthewebpageisasetofcandidateCycLterms,usuallyconstantssuchasTerrorist-Nafi.
Substitutingthesetermsintotheoriginalin-completeCycLqueryproducesasetofcandidateGAFs.
3.
4CheckingCycKBConsistencyInprinciple,anyusefulfactaddedtothesystemshouldnei-therdisprovablenortriviallyprovable.
Forexample,thesentence:(foundingAgentPalestineIslamicJihadTerrorist-AlShikaki)isnotnovel,becauseCycalreadyknowsthis.
Meanwhile,(foundingAgentPalestineIslamicJihadAugusteRodin)isnovel,buttriviallydisprovable,asCycknowshediedin1917(72yearsbeforethePIJwasfounded).
IftheCycKBalreadycontainsknowledgethatrendersanewfactredundantorcontradictory,thatfactwillbedis-carded.
Thisischeckedviainference;eachnewfactistreatedasaquery,andinferenceisperformedtodeterminewhetheritcanbeprovenfalse(inconsistent)ortrue(redun-dant.
)Cycprovidesjustificationsforfactsusedtoproducequeryresults,asinFigure2,anditishelpfulforredundantfactstobemarkedasadditionallyconfirmedviasearch-basedlearning.
Previousworksuggeststhatone-stepinfer-enceissufficienttoidentifyduplicateinformationordis-provecontradictionsduringtypicalknowledgeacquisitiontasks[Pantonetal.
,2002].
Figure2:Justificationsfortheclaimthata2001attackonAnkarameetsthecriteriaforthequerybeingrun.
3.
5GoogleVerificationInordertoguardagainstparsererrorsandexcessivelygen-eralsearchterms(suchasambiguousacronyms),asecondGooglesearchisperformed,inordertodeterminewhethersearchstringsgeneratedfromthenewGAFwillproduceresults.
SearchstringsaregeneratedfromthecandidateGAFthatwaslearned,butanystringcontaininganacronymorabbreviation(e.
g.
,"PIJ"for"PalestianIslamicJihad")issupplementedwiththedisambiguationterm:theleastcom-monword(basedonGooglehitcounts)oftheexpandedacronym.
Inthiscase,thestring"Palestine"isaddedasaterm,sinceitistheleastcommonwordintheset'Pales-tine,''Palestinian,''Islamic,'and'Jihad.
'Theresultingverificationsearchstringis:"PIJfounderBashirNafi"+"Palestine"Anyfactforwhichthisverificationstepreturnsnoresultsisconsideredunverified,andwillnotbepresentedtoare-viewer.
3.
6ReviewandAssertionInthefinalstep,learnedsentencesarereviewedbyahumancurator,and,ifcorrect,assertedintotheCycKB.
Currently,suggestedsentencesarepresentedtothereviewerinnopar-ticularorder;infuture,sortingmethodswillbeimplementedandtested.
ThemoststraightforwardapproachesinvolvemakinggreateruseofinformationalreadyretrievedfromGoogle:sinceinformationaboutthesearchesunderlyingacandidatesentenceisstoredintheKB,itshouldbepossibleandproductivetogiveprioritytosentencesthataresup-portedbyalargertotalnumberofdocuments.
Ineffect,thenumericalvaluesreturnedduringtheverificationstepcouldbeusedtosortthemostwidelysupportedsentencesup-AAAI-05/1433wardsinthereviewprocess.
Anotherpossibility,ifseveralcontradictoryfactsarefound,isgivingreviewprioritytothosefoundindocumentswiththehighestGoogleranking.
4ResultsStatisticsweregatheredforacaseinwhich134predicatesinPwereused,andDwassetto20.
4Themajorityofthesearchesexpended,about80%,wereperformedintheveri-ficationphaseratherthantheinitialsearchphase.
There-sultswereasfollows:Queries:348Searchesexpended:4290(817initial,3477verification)GAFsfound:1016…andrejectedduetoKBinconsistency:4…andalreadyknowntotheKB:384…andrejectedbyGoogleverification:566NovelGAFSfoundandverified:61AhumanreviewerthenwentthroughtheverifiedGAFs,andasampleof53oftheunverifiedGAFs,anddeterminedtheiractualcorrectnessrate.
Theresultswereasfollows:VerifiedUnverifiedTrue(correct)328**False(incorrect)29*45Totalnovelfacts:114Novel,correctfactsdiscovered:77Incorrectfactsdiscovered:37Factscategorizedcorrectly:68%Factscategorizedincorrectly:32%…*falsepositives(falsebutverified):25%…**falsenegatives(truebutunverified):7%Examplesoftheseresulttypes:Query:(#$hasBeliefSystems#$IranX)Searchstring:"Iranadheresto"CandidateGAF:(#$hasBeliefSystems#$Iran#$Islam)Verificationsearchstrings:"IslamicRepublicofIranadherestoIslam""IranadherestoIslam""IranbelievesinIslam"(found)"IslamicRepublicofIranbelievesinIslam"(found)ExampleGAFsalreadyknowntotheKB:(#$vestedInterest#$Iran#$Iraq(#$inhabitantTypes#$Lebanon#$EthnicGroupOfKurds)ExampleGAFsrejectedduetoKBinconsistency:(#$northOf#$Iran#$Iran)(#$geopoliticalSubdivision#$Iraq#$Iran)4Ittakesbetweenfourandfivehourstoexhaustanallotmentof3,000searchesperdaythroughtheGoogleAPI.
Correct,verifiedGAF:(foundingDateAfricanNationalCongress(YearFn1912))*IncorrectbutverifiedGAF:(foundingDateJewishDefenseLeague(DecadeFn198))Incorrect,rejectedGAF:(objectFoundInLocationKuKluxKlanGillianAnderson)**CorrectbutrejectedGAF:(foundingDateKarenNationalUnion(MonthFnApril(YearFn1947)))Theverificationstepproducescomparativelyfewfalsenegatives(inwhichatruefactisincorrectlyclassifiedasfalse);inthisrun,80%ofthenovel,correctfactsretrievedwerecorrectlyidentifiedassuch.
Giventhis,itisreasonabletorejectallunverifiablesentences,especiallygiventhewealthofpossiblequeriesandthesizeandbreadthofthecorporaavailable.
Only61%oftheincorrectfactsretrievedwereidentified,suggestingthatsubstantialworkindecreas-ingtheoccurrenceoffalsepositiveswillbenecessarybe-foretheneedforhumanreviewiseliminated;thisisunsur-prising,astheInternetcontainslargeamountsofunstruc-tured,uncheckedinformation.
SlightlyoverathirdoftheGAFsdiscoveredwerefactsthatwerealreadyknowntotheKB,andpresumablycorrecttoabaselinelevel(i.
e.
,thecorrectnesslevelachievedbyhumanontologists);thetotalnumberofcorrectfactsdis-coveredwastherefore425,42%ofthetotal.
Verificationreducesthenumberofnovelsentencesthatmustundergohumanreviewfrom1016to61,andthehumanreviewproc-ess,whichtakesplaceentirelyinEnglish,isquickandstraightforward.
Anintermediatesteptowardsfullautoma-tionwouldbetoidentifyclassesofsentencesthatcanbeassertedwithouthumanreview.
5ConclusionsWhilegreatstrideshavebeenmadeinmachinelearninginthelastfewdecades,automaticallygatheringuseful,consis-tentknowledgeinamachine-usableformisstillarelativelyunexploredresearcharea.
TheoriginalpromiseoftheCycproject–toprovideabasisofreal-worldknowledgesuffi-cienttosupportthesortoflearningfromlanguageofwhichhumansarecapable–hasnotyetbeenfulfilled.
Inthattime,informationhasbecomeenormouslymoreaccessible,dueinnosmallparttothewidespreadpopularityoftheWebandtoeffectiveindexingsystemssuchasGoogle.
Makinguseofthatrepositoryrequiresastoreofreal-worldknowledgeandsomefacilityfornaturallanguageparsingandgeneration.
Theseresults,whileextremelypreliminary,areencourag-ing.
Inparticular,usingCycasabasisforlearningiseffec-tive,bothinguidingthelearningprocessandinrepresentingandusingtheresults.
Pre-existingknowledgeintheKBsupportstheconstructionofmeaningfulqueriesandpro-videsaframeworkintowhichlearnedknowledgecanbeassertedandreasonedover.
ComparativelyshallownaturallanguageparsingcombinedwiththetypeconstraintandrelationknowledgeintheCycsystemallowstheretrieval,AAAI-05/1434verification,andreviewofunconstrainedfactsatahigherratethanthatachievedbyhumanknowledgerepresentationexpertsworkingunassisted.
Perhapsmoreimportantly,thekindofknowledgeretrievedisexactlytheinstance-levelknowledgethatshouldnotrequirehumanexperts–itshouldinsteadbeobtained,maintained,andreasonedoverbytoolsthatneedandusethatknowledge.
InvolvingGoogleineverystageofthelearningprocessallowsustoexploitbothCyc'sknowledgeandtheknowledgeonthewebinanex-tremelynaturalway.
Theworkbeingdonehereisimmediatelyusefulasatoolthatmakeshumanknowledgeentryfaster,easierandmoreeffective,butitalsoprovidesabasisforanalysisofwhatinformationcanbelearnedeffectivelywithouthumaninter-action.
Thus,overtime,wehopetoprovideCycwithamechanismtotrulyacquireknowledgebylearning.
AcknowledgmentsThisresearchwaspartiallysponsoredbyARDA'sAQUAINTprogram.
Additionally,wethankGoogleforallowingaccesstotheirAPIforresearchsuchasthis.
References[Belascoetal.
,2004]A.
Belasco,J.
Curtis,RCKahlert,C.
Klein,C.
Mayans,R.
Reagan.
RepresentingKnowledgeGapsEffectively.
InProc.
ofthe5thInternationalCon-ferenceonPracticalAspectsofKnowledgeManage-ment,Vienna,Austria,p.
159-164.
Dec2004.
[BrinandPage,1998]SergeyBrinandLarryPage,Anat-omyofaLarge-scaleHypertextualSearchEngine.
InProc.
ofthe7thInternationalWorldWideWebConfer-ence,pp107-117,Brisbane,Australia,Apr1998.
[Brown,1996]R.
D.
Brown,Example-BasedMachineTranslationinthePanglossSystem.
InProc.
ofthe16thInternationalConferenceonComputationalLinguistics,pp169-174.
Copenhagen,Denmark,August5-9,1996.
[Charniak,2001]E.
Charniak.
AMaximum-Entropy-InspiredParser.
InProc.
ofthe1stconferenceonNorthAmericanchapteroftheAssociationforComputationalLinguistics,pp132-139.
Seattle,WA,2000.
MorganKaufmannPublishers.
[Etzionietal.
,2004]O.
Etzioni,M.
Cafarella,D.
Downey,A,Popescu,T.
Shaked,S.
Soderland,D.
Weld,A.
Yates.
Web-scaleInformationExtractioninKnowItAll.
InProc.
ofthe13thinternationalconferenceonWorldWideWeb,pp100-110,NewYork,NY,2004.
[Ghani,2000]R.
Ghani,R.
Jones,D.
Mladenic,K.
Nigam,S.
Slattery.
DataMiningonSymbolicKnowledgeEx-tractedfromtheWeb.
InProc.
ofthe6thInternationalConferenceonKnowledgeDiscoveryandDataMiningWorkshoponTextMining,pp29-36,Boston,MA,Aug2000.
[Guha,1991]R.
V.
Guha.
Contexts:AFormalizationandSomeApplications.
PhDthesis,StanfordUniversity,STAN-CS-91-1399-Thesis,1991.
[Kwoketal.
,2001]C.
Kwok,O.
Etzioni,D.
Weld.
ScalingQuestionAnsweringtotheWeb.
InACMTransactionsonInformationSystems,Vol19,Issue3,pp242–262.
2001[Lenat,1976]D.
B.
Lenat.
AM:AnArtificialIntelligenceApproachtoDiscoveryinMathematicsasHeuristicSearch,Ph.
D.
Dissertation,StanfordUniversity,STAN-CS-76-570,1976.
[Lenatetal.
,1983]D.
B.
Lenat,A.
Borning,D.
McDonald,C.
Taylor,S.
Weyer.
Knoesphere:BuildingExpertSys-temswithEncyclopedicKnowledge.
InProc.
ofthe8thInternationalJointConferenceonArtificialIntelligence,Vol1,pp167–169,Karlsruhe,Germany,August1983.
[Lenat,1995]D.
B.
Lenat.
Cyc:aLarge-ScaleInvestmentinKnowledgeInfrastructure.
InCommunicationsoftheACM,Vol38,Issue11,pp33-38.
Nov1995.
[Lenat,1998]D.
B.
Lenat,TheDimensionsofContext-Space,fromhttp://www.
cyc.
com/doc/context-space.
pdf.
[Pantonetal.
,2002]K.
Panton,P.
Miraglia,N.
Salay,R.
C.
Kahlert,D.
Baxter,R.
Reagan.
KnowledgeFormationandDialogueUsingtheKRAKENToolset.
InProc.
ofthe18thNationalConferenceonArtificialIntelligence,pp900-905,Edmonton,Canada,2002.
[Prageretal.
,2000]J.
Prager,E.
Brown,A.
Coden,D.
Radev.
QuestionAnsweringbyPredictiveAnnotation.
InProc.
ofthe23rdAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInforma-tionRetrieval,pp184-191.
Athens,Greece,2000.
[Thrunetal.
,1998]S.
Thrun,C.
Faloutsos,T.
Mitchell,L.
Wasserman.
AutomatedLearningandDiscovery:State-Of-The-ArtandResearchTopicsinaRapidlyGrowingField,tech.
reportCMU-CALD-98-100,ComputerSci-enceDepartment,CarnegieMellonUniversity,1998.
[Witbrocketal.
,2003]M.
Witbrock,D.
Baxter,J.
Curtis,D.
SchneiderR.
C.
Kahlert,P.
Miraglia,P.
Wagner,K.
Panton,G.
Matthews,A.
Vizedom.
AnInteractiveDia-logueSystemforKnowledgeAcquisitioninCyc.
InProc.
ofthe18thInternationalJointConferenceonArti-ficialIntelligence,Acapulco,Mexico,2003.
[Witbrocketal.
,2004]M.
Witbrock,K.
Panton,S.
Reed,D.
Schneider,B.
Aldag,M.
ReimersandS.
Bertolo.
AutomatedOWLAnnotationAssistedbyaLargeKnowledgeBase.
InWorkshopNotesofthe2004Work-shoponKnowledgeMarkupandSemanticAnnotationatthe3rdInternationalSemanticWebConference,Hi-roshima,Japan,pp71-80.
Nov2004.
[Witbrocketal.
,2005]:M.
Witbrock,C.
Matuszek,A.
Brusseau,R.
C.
Kahlert,C.
B.
Fraser,D.
Lenat.
"Knowl-edgeBegetsKnowledge:StepstowardsAssistedKnowl-edgeAcquisitioninCyc,"inProc.
oftheAAAI2005SpringSymposiumonKnowledgeCollectionfromVol-unteerContributors,Stanford,CA,March2005.
AAAI-05/1435

LOCVPS:VPS主机全场8折,德国/荷兰/美国KVM终身7折

LOCVPS发来了针对元旦新年的促销活动,除了全场VPS主机8折优惠外,针对德国/荷兰KVM #1/美国KVM#2 VPS提供终身7折优惠码(限量50名,先到先得)。LOCVPS是一家成立于2012年的国人VPS服务商,提供中国香港、韩国、美国、日本、新加坡、德国、荷兰、俄罗斯等地区VPS服务器,基于KVM或XEN架构(推荐优先选择KVM),均选择直连或者优化线路,国内延迟低,适合建站或远程办公使...

创梦云 香港沙田、长沙联通2核1G仅需29元一个月 挂机宝7元一个月

商家介绍:创梦云是来自国内的主机销售商,成立于2018年4月30日,创梦云前期主要从事免备案虚拟主机产品销售,现在将提供5元挂机宝、特惠挂机宝、香港云服务器、美国云服务器、低价挂机宝等产品销售。主打高性价比高稳定性挂机宝、香港云服务器、美国云服务器、香港虚拟主机、美国虚拟主机。官方网站:http://cmy0.vnetdns.com本次促销产品:地区CPU内存硬盘带宽价格购买地址香港特价云服务器1...

Hosteons - 限时洛杉矶/达拉斯/纽约 免费升级至10G带宽 低至年$21

Hosteons,一家海外主机商成立于2018年,在之前还没有介绍和接触这个主机商,今天是有在LEB上看到有官方发送的活动主要是针对LEB的用户提供的洛杉矶、达拉斯和纽约三个机房的方案,最低年付21美元,其特点主要在于可以从1G带宽升级至10G,而且是免费的,是不是很吸引人?本来这次活动是仅仅在LEB留言提交账单ID才可以,这个感觉有点麻烦。不过看到老龚同学有拿到识别优惠码,于是就一并来分享给有需...

169pp com为你推荐
yy频道中心YY频道管理中心怎么登录?人人时光机寻时光机歌词免费开通黄钻怎样能免费开通黄钻会员吴晓波频道买粉罗辑思维,晓松奇谈,鸿观,吴晓波频道,财经郎眼哪个更有深度渗透测试网站渗透测试怎么做?伪静态什么是伪静态申请证书手机申请证书申请证书求高手教下怎么申请证书畅想中国20年后中国会变成什么样?--畅想一下未来的中国!!ios系统ios系统的手机有哪些?
北京主机租用 东莞电信局 oneasiahost 免费主机 美国便宜货网站 debian6 qq数据库 hnyd java虚拟主机 免空 cdn联盟 免费活动 vip域名 免费私人服务器 raid10 畅行云 申请免费空间 lamp兄弟连 lamp的音标 hdchina 更多