Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in the manual
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
Vijay Rajon 6 May 2010
Transcript of Cyc-Wikipedia Mapping
Knowledge base Individuals Some "thing".. Any "thing"
Can have parts
Can not have instances Mountain Standard Time
International Space Station
United States Army
Alice in Wonderland X-ray
Collections Always, the first question? ... What IS that? X-ray
Gilgamesh Time Zone
Book Collections give perspective
Enables educated guess
Share features or properties (#$isa ?IND ?COL) - 1.2 million
(#$genls ?COL1 ?COL2) - 120 thousand
#$genls is transitive and reflexive Microtheory Time/Space Context Construct
Any assertion is VALID ONLY IN a Microtheory
Why is it necessary?
Assertion: (#$isa #$Cyc #$SoftwareAgent) Indexing efficiency by knowledge isolation
Context assumption - BoulderColorado Context
Consistency enforcement Predicates and Functions #$isa, #$genls are predicates
Predicates relate two or more things
Predicates specify property of things
Predicates arguments are constrained
Functions create new individuals Encyclopedia
Knowledge base? Categories The Good
A lot of "is a", Rivers by country, Basketball players, Phone Companies
A lot of "genls", Geography -> Water bodies -> Streams -> Rivers
A lot of predicates, "Yale Alumni", "Grammy Award winners", "1947 Births"
Not "is a" always, Rivers category has "Drainage", "River surfing", "Dams". Mostly "is about"..
Categories used as Microtheories, "US Elections in 2010", "History of Spain"
"Genls" not consistent, Wiki -> Ontology, can't be automated
Category's main article vs category, have same name
Global ID lacking, article titles keep changing.
"The Explosive, The BOB, The Chef", overly general synonyms
Disambiguation in title, (band), (politician) vs depth (/artist/band) Freebase type
Search space always global, only seperated by language
Article Title The Good
Mostly equivalent to Individuals, sometimes Collections
Redirects, "article interlinks" are mostly equivalent to synonyms
Mostly about a single topic, event, abstract concept (Bird_Flight)
Each article is an "instance of" a Category Infoboxes/Tables The Good
Huge amount of domain specific info. Tournaments won/ scientific classification
New predicate harvesting - Collective consensus on whats important The Bad
No re-use/heirarchy (genlPred), Snooker/Tennis player, can share basic bio.
No consistency, Snooker: "Nationality", Tennis: "Country"
Not easy to parse. Too many formats for same field, no array (tributaries)
Tables are even less structured, but hold even more information
Article Text The Good
Can't say enough about it. Holds millions of man-hours work. Active community
Tagging[[ | ]], makes disambiguation easy
Millions of assertions, breadth/depth, ontologist's dream
References, makes getting redundancy easier The Bad
Its still geared towards humans
Predicates are not available, not much extra effort to tag #$Orion-Constellation What is it?
(#$isa #$O-C #$Constellation)
English name for it?
(#$nameString #$O-C "Orion")
(#$nameString #$O-C "the Hunter") Basic Info Advance Info (#$celestialSubRegion #$O-C #$OrionsBelt)
(#$inRegion #$Rigel-Star #$O-C)
(#$inRegion #$Betelgeuse-Star #$O-C) Term Cloud Orion
Constellation Celestial Sub Region
Orion's Belt Three good terms enough for a match (Google, only Orion vs Orion + Constellation/Hunter
Bank vs Bank + River Mapping Create search set Proximity metric Orion (constellation) Basic Info Redirects
The Hunter Categories
Orion Constellation Article Info "Orion, often referred to as The Hunter, is a prominent constellation located" "A line from Rigel through Betelgeuse points to Castor and Pollux" "the three stars in Orion's Belt" Article Title
Wikipedia Redirects Cyc #$nameString Article Interlinks [[ Art | Syn]] 163K Cyc Individuals/
Collections 1.7M Wikipedia Articles 80K Useful Cyc Constants
45K Article/Cyc name match
(Sometimes not accurate)
15K Non-trivial matches
Rest: No match Examples RescuingSomeone<cycSep>Rescue<cycSep>999
Reference/Research My Links (Expect inaccuracies)
http://www.cs.waikato.ac.nz/~olena/publications/Medelyan_Legg_Wikiai08.pdf Term Cloud weight - number of times any term in term cloud is found in article
Term weight vector match - vector of number of times each term is found in article
Wikipedia synonym weight - In article set, one with most interlinks with synonym equal to Cyc #$namesString
Wiki synonyms article set, Wiki "Cyc context" article set, and "Hyperlink interconnection metric".
Exact Match Partial Match Article Title
Wikipedia Redirects Exact Match Exact Match Cyc Genls/Siblings Article Title/Redirects