Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Sematic Web

Making it easier for machines
by

Nikhil Jadhav

on 10 November 2012

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Sematic Web

Making the Web more readable for machines Semantic Web Web 2.0 Semantic Web Structures and Ontologies NLP & semantic web Resource Description Format RDF Type something on a search engine It just matches the keywords in your query Lot of work !!!! ~ 1 million hits on an avg query Limitation of HTML Used to represent information NOT DATA Semantic Web Road Map Our Roadmap OWL ( WOL ) Web Ontology Language Why not be inconsistent in at least one aspect of a language which is all about consistency?
—Guus Schreiber, Why OWL and not WOL? DESCRIPTION LOGIC Representation and Languages Defines data in the form of a <Subject,Predicate,Object> triple Graph Database Syntax <rdf:Description rdf:about="subject">
<predicate rdf:resource="object" />
<predicate>literal value</predicate>
<rdf:Description> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:feature="http://www.linkeddatatools.com/clothing-features#">
<rdf:Description rdf:about="http://www.linkeddatatools.com/clothes#t-shirt">
<feature:size>12</feature:size>
<feature:color rdf:resource="http://www.linkeddatatools.com/colors#white"/>
</rdf:Description>
</rdf:RDF> Example LEMON POWLA Drawback RDF, whilst the foundation of defining data structures for the semantic web, does not in itself describe the semantics, or meaning, behind the data for that we need Schema or Ontology Lexicon model for ontologies Separate Lexicon models and Ontologies
Linking them can aid in lot of NLP applications like Q/A, Machine Translation Interoperability of Linguistic corpora Reification Involves representation of factual assertions that are representated by some other assertions To compare logical assertions from different witnesses in order to determine their credibility Motivation Same text, Different annotations A processing stage may require different annotations to the same text Common representation which provides access to all the linguistic information conveyed in the annotations POS tags, Parse trees, etc Conceptual
Interoperatibility Heterogenous annotation schemes Terminological Reference repository Provide interlingua that allow mapping form scheme A to scheme B GOLD Structural
Interoperatibility Use RDF to represent all the annotations of the corpus in an interoperable way, integrate their information without restrictions and query the information Use OWL/DL to specify and verify formal constraints on the correct representation of linguistic corpora in RDF Crux of the Problem Machine didn't understand what the user actually wants even if it did It Didn't understand how to get it Structuring
Required BABLENET multilingual lexicalized semantic network.
automatically created by linking the largest multilingual Web encyclopedia - i.e., Wikipedia - to the most popular computational lexicon of the English language - i.e., Wordnet MOTIVATION We Have
Multilingual Natural Language Processing
-Semantic relatedness
-Multilingual word sense disambiguation. INDO WORDNET linked lexical knowledge base of wordnets of 18 scheduled languages of India
Assamese, Bangla, Bodo, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Manipuri, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Tamil, Telugu and Urdu. Indowordnet project started with creations of Hindi WordNet by the Natural Language Processing group at the Center for Indian Language Technology (CFILT) in the Computer Science and Engineering Department at IIT Bombay. Publicaly browsable at http://www.cfilt.iitb.ac.in/indowordnet
Available under GNU license
<todo SCREEN SHOTS HERE> Publicaly browsable at http://www.cfilt.iitb.ac.in/indowordnet
Available under GNU licence Cyc (pronounced like syke)

To codify, in machine-usable form, millions of pieces of knowledge that compose human common sense.

To enable AI applications to perform human-like reasoning. MOTIVATION The Knowledge Base (KB) contains over one million human-defined assertions, rules or common sense ideas. The original knowledge base is proprietary, but a smaller version of the knowledge base, intended to establish a common vocabulary for automatic reasoning, was released as OpenCyc under an open source (Apache) license. More recently, Cyc has been made available to AI researchers under a research-purposes license as ResearchCyc. Example Scenario Why NLP needed for Semantic Web ? Online Shopping 1.Ontology Query
2.Ontology Learning
3.Multilingual Ontology Mapping KNOWLEDGE BASE These are formulated in the language CycL, which is based on predicate calculus and has a syntax similar to that of the Lisp programming language. CycL At this point in the evolution of the Web, your best bet would be to look at different retailers' web pages, comparing prices and shipping times and rates. You could also look for a site that will compare prices and shipping options from several retailers all at once. Either way, you have to do most of the virtual legwork, then make your buying decision and place your order yourself. Ontology Query -Semantic Web has the quite complex structure and strict logical form
- It is not expected that Users will make a query logic form

NL Query : Who is wife of Rama ?
Logical Query :
MarriedTo(?,Rama) and typeOf(?,Men) With the Semantic Web, you'd have another option. You could enter your preferences into a computerized agent, which would search the Web, find the best option for you, and place your order. Visual Structured
Information Who is wife of Rama ?



MarriedTo(X,Rama) and typeOf(X,Women)



X= <ABCD>



Wife of Rama is <ABCD>. NL to Formal Query Semantic Web Agent Presentation Semantic Web Agent Conclusion -Without any doubt, Semantic Web needs natural language technology
-To acquire knowledge from massive unstructured/
semi-structured Web 2.0
-To understand user's Natural Language Query
-Finally, To present answers to users
-Semantic Web should improve the NLP's performance in IE and WSD
-Almost all problems are in open to research , a long journey to go. Ontology Learning Unstructured sources
Involves NLP techniques, morphological and syntactic analysis, etc.

Semi-structured source
elicit an ontology from sources that have some predefined structure, such as XML Schema

Structured data
Extracting concepts and relations from knowledge contained in structured data, such as databases APPLICATIONS Manually maintaining and updating lexical knowledge resources is expensive and time-consuming.
Second, such resources are typically lexicographic, and thus contain mainly concepts and only a few named entities.
Third, resources for non-English languages often have a much poorer coverage since the construction effort must be repeated for every language of interest. How to tackle these? Terms
- Linguistic realizations of domain-specific concepts
- Are the basis of the ontology learning process
Term Extraction
-Run a Part-Of-Speech (POS) tagger over the domain corpus
-Identify possible terms by constructing patterns, such as: Adj-Noun, Noun-noun, Adj-Noun-Noun,…
-Ignore Names
-Identify only the relevant to the text terms by applying statistical metrics Taxonomy Extraction

1.With the use of WordNet
2.Co-occurrence Analysis
3.
Lexico-syntactic patterns -Given two terms t1 and t2, check if they stand in a hypernym relation with regard to WordNet

-Normalize the number of hypernym paths by dividing by the number of senses of t1

Example: 4 different hypernym paths between synsets ‘country’ and ‘region’
- ‘country’ has 5 senses

value of isa (country, region) = 0.8 Taxonomy Extraction with wordnet Why Semantic Web is needed for NLP ? Word Sense Disambiguation
- Locate the Least Weighted Path from One
Ontology Concept to Other Concept Machine Translation The BIG PICTURE It collectss (a) from
WordNet, all available word senses (as concepts)
and all the semantic pointers between synsets (as
relations); (b) from Wikipedia, all encyclopedic
entries (i.e. pages, as concepts) and semantically
unspecified relations from hyperlinked text. In order to provide a unified resource, we merge
the intersection of these two knowledge sources
(i.e. their concepts in common) by establishing a
mapping between Wikipedia pages and WordNet
senses + Duplicate concepts? Multilinguality? The lexical realizations of the available concepts in different languages by
using
(a) the human-generated translations pro-
vided in Wikipedia (the so-called inter-language
links), as well as
(b) a machine translation system to translate occurrences of the concepts within sense-tagged corpora, namely SemCor Example 1. Sachin's play is awesome.

2. Shakesphere's play is awesome.

Sachin -> Cricketer-> Play_1
Shakesphere -> Playwriter-> Play_2 Motivation Conceptual Models are required in Artificial Intelligence
Database Design
Software Engineering
Information Integration Fundamental Ontology

Conceptual model is populated by Terms Synonyms Concepts Taxonomy Relations Axioms & Rules Vx,y Suffering(x,y) -> ill(x) cure(doctor, disease) Individuals
related by binary relationships (called roles & features)
grouped into classes (concepts) is_a(doctor, person) disease disease, illness Disease,illness,
Hospital So we need ability to describe concepts, relationships and individuals FOL cannot be used as it is not decidable Description logic is characterized by a set of constructors that allow to build complex concepts and roles from atomic ones. concept - class / set of objects role - binary relations on objects - NPo such as {NP1, NP2,…, (and | or)} NPn
Vehicles such as cars, trucks and bikes….

- such NP as {NP,} * { (or | and) } NP
Such fruits as oranges, nectarines or apples…

- NP {, NP} * { , } { or | and } other NP
Swimming, running, or/and other activities…

- NP { , } including {NP, } * { or | and } NP
Injuries, including broken bones, wounds and bruises… Lexico-syntactic patterns s TBox and ABox TBox ( Terminological Box) :
contains sentences describing concept hierarchies i.e; relations between concepts ABox ( Assertional Box) :
contains ground sentences stating where in the hierarchy individual belongs ( relation between concepts and individuals ) Relations Example Every TA is a student (TBox) Nikhil is a TA (ABox) <p pnum=3>
<s snum=3>
<wf cmd=ignore pos=DT>The</wf>
<wf cmd=done pos=NN lemma=september wnsn=1 lexsn=1:28:00::>September</wf>
<wf cmd=done pos=NN lemma=october wnsn=1 lexsn=1:28:00::>October</wf>
<wf cmd=done pos=NN lemma=term wnsn=2 lexsn=1:28:00::>term</wf>
<wf cmd=done pos=NN lemma=jury wnsn=1 lexsn=1:14:00::>jury</wf>
<wf cmd=done pos=VBD ot=notag>had</wf>
<wf cmd=done pos=VBN ot=notag>been</wf>
<wf cmd=done pos=VB lemma=charge wnsn=5 lexsn=2:41:00::>charged</wf>
<wf cmd=ignore pos=IN>by</wf>
<wf cmd=done rdf=location pos=NNP lemma=location wnsn=1 lexsn=1:03:00:: pn=location>Fulton</wf>
<wf cmd=done rdf=person pos=NNP lemma=person wnsn=1 lexsn=1:03:00:: pn=person>Superior_Court_Judge_Durwood_Pye</wf>
<wf cmd=ignore pos=TO>to</wf>
<wf cmd=done pos=VB lemma=investigate wnsn=2 lexsn=2:32:01::>investigate</wf>
<wf cmd=done pos=NN lemma=report wnsn=2 lexsn=1:10:00::>reports</wf>
<wf cmd=ignore pos=IN>of</wf>
<wf cmd=done pos=JJ lemma=possible wnsn=2 lexsn=3:00:04::>possible</wf>
<punc>``</punc>
<wf cmd=done pos=NN lemma=irregularity wnsn=1 lexsn=1:04:00::>irregularities</wf>
<punc>''</punc>
<wf cmd=ignore pos=IN>in</wf>
<wf cmd=ignore pos=DT>the</wf>
<wf cmd=done pos=JJ lemma=hard-fought wnsn=1 lexsn=5:00:00:difficult:00>hard-fought</wf>
<wf cmd=done pos=NN lemma=primary wnsn=1 lexsn=1:04:00::>primary</wf>
<wf cmd=ignore pos=WDT>which</wf>
<wf cmd=done pos=VBD ot=notag>was</wf>
<wf cmd=done pos=VB lemma=win wnsn=1 lexsn=2:33:00::>won</wf>
<wf cmd=ignore pos=IN>by</wf>
<wf cmd=done rdf=person pos=NNP lemma=person wnsn=1 lexsn=1:03:00:: pn=person>Mayor-nominate_Ivan_Allen_Jr.</wf>
<punc>.</punc>
</s>
</p> Why two separate boxes? Various inference methods require different boxes -General Relations

Exploiting linguistic structure

Ex: Author wrote a book.

Relation: write(Author, Book) Classification will require TBox and Instance checking requires ABox. AVAILABILITY Other Significant Properties No UNA (Unique Name Assumption) Layer - Learning Two concepts/roles having different names may be shown to be equivalent by inference
e.g; married_to(X,Y) is equivalent to spouse_of(X,Y) OWA (Open World Assumption) Lack of knowledge of a fact does not imply negation of the fact <rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:dc="http://purl.org/dc/elements/1.1/">

<!-- OWL Header Example -->
<owl:Ontology rdf:about="http://www.linkeddatatools.com/plants">
<dc:title>The LinkedDataTools.com Example Plant Ontology</dc:title>
<dc:description>An example ontology written for the LinkedDataTools.com RDFS & OWL introduction tutorial</dc:description>
</owl:Ontology>

<!-- Remainder Of Document Omitted For Brevity... -->

</rdf:RDF> REPRESENTATION The concept names in Cyc are known as constants. Constants start with an optional "#$" and are case-sensitive. There are constants for:

Individual items known as individuals, such as #$BillClinton or #$France.

Collections, such as #$Tree-ThePlant (containing all trees) or #$EquivalenceRelation (containing all equivalence relations). A member of a collection is called an instance of that collection.

Truth Functions which can be applied to one or more other concepts and return either true or false. For example #$siblings is the sibling relationship, true if the two arguments are siblings.

Functions, which produce new terms from given ones. For example, #$FruitFn, when provided with an argument describing a type (or collection) of plants, will return the collection of its fruits. By convention, function constants start with an upper-case letter and end with the string "Fn". <rdf:RDF
...
<!-- OWL Subclass Definition - Flower -->
<owl:Class rdf:about="http://www.linkeddatatools.com/plants#flowers">
<!-- Flowers is a subclassification of planttype -->
<rdfs:subClassOf rdf:resource="http://www.linkeddatatools.com/plants#planttype"/>
<rdfs:label>Flowering plants</rdfs:label>
<rdfs:comment>Flowering plants, also known as angiosperms.</rdfs:comment>
</owl:Class>
<!-- OWL Subclass Definition - Shrub -->
<owl:Class rdf:about="http://www.linkeddatatools.com/plants#shrubs">
<!-- Shrubs is a subclassification of planttype -->
<rdfs:subClassOf rdf:resource="http://www.linkeddatatools.com/plants#planttype"/>
<rdfs:label>Shrubbery</rdfs:label>
<rdfs:comment>Shrubs, a type of plant which branches from the base.</rdfs:comment>
</owl:Class>
<!-- Individual (Instance) Example RDF Statement -->
<rdf:Description rdf:about="http://www.linkeddatatools.com/plants#magnolia">
<!-- Magnolia is a type (instance) of the flowers classification -->
<rdf:type rdf:resource="http://www.linkeddatatools.com/plants#flowers"/>
</rdf:Description>
</rdf:RDF> Properties / roles Individuals in OWL are related by properties Object properties (owl:ObjectProperty) relates individuals (instances) of two OWL classes.

Datatype properties (owl:DatatypeProperty) relates individuals (instances) of OWL classes to literal values. <rdf:RDF
...
<!-- Define the family property -->
<owl:DatatypeProperty rdf:about="http://www.linkeddatatools.com/plants#family"/>

<rdf:Description rdf:about="http://www.linkeddatatools.com/plants#magnolia">
<!-- Magnolia is a type (instance) of the flowers class -->
<rdf:type rdf:resource="http://www.linkeddatatools.com/plants#flowers"/>
<!-- The magnolia is part of the 'Magnoliaceae' family -->
<plants:family>Magnoliaceae</plants:family>
</rdf:Description>

</rdf:RDF> <rdf:RDF
<!-- Define the family property -->
<owl:DatatypeProperty rdf:about="http://www.linkeddatatools.com/plants#family"/>
<!-- Define the similarlyPopularTo property -->
<owl:ObjectProperty rdf:about="http://www.linkeddatatools.com/plants#similarlyPopularTo"/>
<!-- Define the Orchid class instance -->
<rdf:Description rdf:about="http://www.linkeddatatools.com/plants#orchid">
...
</rdf:Description>

<!-- Define the Magnolia class instance -->
<rdf:Description rdf:about="http://www.linkeddatatools.com/plants#magnolia">
<!-- Magnolia is an individual (instance) of the flowers class -->
<rdf:type rdf:resource="http://www.linkeddatatools.com/plants#flowers"/>
<!-- The magnolia is part of the 'Magnoliaceae' family -->
<plants:family>Magnoliaceae</plants:family>
<!-- The magnolia is similarly popular to the orchid -->
<plants:similarlyPopularTo rdf:resource="http://www.linkeddatatools.com/plants#orchid"/>
</rdf:Description>

</rdf:RDF> References eg:

(#$isa #$BillClinton #$UnitedStatesPresident)

"Bill Clinton belongs to the collection of U.S. presidents"

(#$genls #$Tree-ThePlant #$Plant)

"All trees are plants". INFERENCE ENGINE An inference engine is a computer program that tries to derive answers from a knowledge base. The Cyc inference engine performs general logical deduction (including modus ponens, modus tollens, universal quantification and existential quantification). MOTIVATION Indian languages form a very significant component of the languages landscape of the world.

Many languages rank within top 10 in the world in terms of the population speaking them, e.g., Hindi-Urdu 5th, Bangla 7th, Marathi 12th and so on as per the List of languages by number of native speakers. AVAILABILITY APPLICATIONS 1. Philipp Cimiano and Bernardo Magnini, Ontology Learning from Text: An Overview, IOS Press, 2003
2. Ruiqiang Guo, Fuji Ren, Towards the Relationship Between Semantic Web and NLP, IEEE, 2009
3. C'ssia Trojahn , Paulo Quaresma , Renata Vieira, A Framework for Multilingual Ontology Mapping, The International Conference on Language Resources and Evaluation, 2008
4. Christian Chiarcos, Interoperability of Corpora and Annotations, DFG, 2011
5. John McCrae, Dennis Spohr, and Philipp Cimiano
Linking Lexical Resources and Ontologies on the
Semantic Web with lemon, AG Semantic Computing, CITEC, University of Bielefeld, 2012
6. Roberto Navigli, Simone Paolo Ponzetto,
BabelNet: Building a Very Large Multilingual Semantic Network, ACL, 2012
Terrorism Knowledge Base

The comprehensive Terrorism Knowledge Base is an application of Cyc in development that will try to ultimately contain all relevant knowledge about "terrorist" groups, their members, leaders, ideology, founders, sponsors, affiliations, facilities, locations, finances, capabilities, intentions, behaviors, tactics, and full descriptions of specific terrorist events. The knowledge is stored as statements in mathematical logic, suitable for computer understanding and reasoning. - Relationship between NLP & Semantic Web
- Mutual Benefit Cyclopedia

Cyclopedia is being developed; it superimposes Cyc keywords on pages taken from Wikipedia pages
Cleveland Clinic Foundation

The Cleveland Clinic has used Cyc to develop a natural language query interface of biomedical information. Typical pieces of knowledge represented in the database are "Every tree is a plant" and "Plants die eventually". When asked whether trees die, the inference engine can draw the obvious conclusion and answer the question correctly. POTENTIAL Experiments show that this fully-automated
approach produces a large-scale lexical resource
with high accuracy.

The resource includes millions
of semantic relations, mainly from Wikipedia
(however, WordNet relations are labeled), and
contains almost 3 million concepts (6.7 labels per
concept on average).

Such coverage is much wider than that of ex-
isting wordnets in non-English languages.
While BabelNet currently includes 6 languages, links to freely-available wordnets can immediately be established by utilizing the English WordNet as an interlanguage index.

BabelNet can be extended to virtually any language of interest. The translation method allows it to cope with any resource-poor language. Cyc is an artificial intelligence project that attempts to assemble a comprehensive ontology and knowledge base of everyday common sense knowledge Lexicon is the vocabulary of the language Uses RDF for defining links Meaning of a word given by reference Reference (Ontology) capable of representing more complex semantic
information Lemon model provides a
principled chain between
the semantic representation
and its linguistic realization Web 2.0
Semantic Web Representation and Languages Structures and Ontologies NLP and Semantic Web Conclusion
Full transcript