Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

LOD Tutorial @dev/summer/2014

No description
by

Marco Brandizi

on 8 June 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of LOD Tutorial @dev/summer/2014

Developing LOD Applications: an Introduction
A world of Openess
Open Data?
Open Data?
The 'Geek' Point of View
Why new formats/standards?
A World of
Open Data

Natural language / text: very powerful and flexible
but not much machine-readable, ambiguous, imprecise
Tables/CSV/etc: simple and good enough in many cases (much used by the OD community)
but too simple in many others
Relational models/SQL a significant standardised improvement
XML/objects
allow for more complex data structures
standardised for sharing

Many people already happy with the above anyways
(RSS, hCalendar, vCard and other microformats)
We need good models/schemas to:
describe data structure
describe meaning (i.e., semantics)
integrate/link data
including identify things world-wide
including integrate/extend models
validate data

We need standard solutions
We highly desire to re-use existing information sharing/exchange technoloy
i.e., the WWW
Familiar models
Data Publishing needs
The idea (Berners-Lee T, et al 2001):
Let's use the
web of documents
and its protocols (eg, http)
and its formats (eg, xml)
and its basic concepts (eg, hyper-link)

To add up a
web of data
LOD Principles
Implemented by the
Semantic Web
From the web of documents...
But why?
...to the web of data
Resources and not only pages
URIs: Universal and resolvable identifiers
Typed Link,
i.e., the RDF (Resource Description Framework) building block
like a predicate that relates a
subject
to an
object
like in a
statement
a.k.a. known as
triple
which is also an instance of a m
athematical binary relation

But why?
Multiple statements/relations/properties can be stated by just re-using resources/URIs
...to the web of data
Schemas are just more statements

Schemas are where you put semantics
...to the web of data
Seamless integration from different (web) sources. Schema/Semantic Integration, well...
What's the point?
What's the point?
The Semantic Web
Approach

URIs and URI Best Practices
A URL generalisation (in turn IRI is even more general, support internationalisation)
URIs should be resolvable, to provide useful and discoverable information
In the SW/LOD world, ideally they should return RDF (about the identified resource)
Even better, should return different docs, based on content negotiation:
curl --location-trusted -H 'Accept: application/rdf+xml'
'http://dbpedia.org/resource/The_Matrix'

Should be stable, (reasonably) resolve to stable semantics
projects like purl.org to cope with it
Tricky details behind (
http://tinyurl.com/pzwn4mx
)

Predicates in RDF statements are URIs
http://dbpedia.org/resource/The_Matrix
http://dbpedia.org/ontology/starring
http://dbpedia.org/resource/Keanu_Reeves

==> properties too are universally identified
==> their description can be given in RDF itself and URI-discovered
Encoding RDF
A URI returns a document, containing statements about the thing behind that URI
So, how does such a document look like? How do we create/serve it?
Is it XML?
Source: http://tinyurl.com/qdueje8
So, let's go for the simpler one

Encoding RDF in Turtle
@prefix dbp: <http://dbpedia.org/resource/>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix dbp-owl: <http://dbpedia.org/ontology/>.

dbp:The_Matrix dbp:starring dbp:Keanu_Reeves.
dbp:The_Matrix dbp-owl:runtime "8160^xsd:float".
dbp:The_Matrix rdfs:label "The Matrix".

dbp:Keanu_Reeves rdf:type
<http://dbpedia.org/class/yago/Actor109765278>.

dbp:The_Matrix a <http://dbpedia.org/ontology/Film>
Namespaces, as in XML

Explicit URIs
shorthand for rdf:type
XML datatypes
(custom types supported)

Encoding RDF in Turtle
# This is a comment

# Statements about the same subject
dbp:The_Matrix
dbp:starring dbp:Keanu_Reeves;
rdfs:label "The Matrix";
dbp-owl:runtime 8160.

# Same subject and same predicate
dbp:Keanu_Reeves
a yago:Actor109765278;
rdfs:label "Ривз, Киану"@ru, "Keanu Reaves".
@lang, i.e., literals have value, type, language (in case of strings)
@prefix dbpedia: <http://dbpedia.org/page/>.
@prefix ex: <http://www.example.com/>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.

dbpedia:Sandro_Veronesi rdf:type ex:Oncologist. # or just 'a'

ex:Oncologist rdfs:subClassOf ex:Doctor;
rdfs:comment "The class of doctors who are specialized in oncology".

ex:Doctor rdfs:subClassOf ex:Person.
ex:Person a rdfs:Class.
It's still RDF, i.e., it's reflexive
(like XML-Schema)
dbpedia:Sandro_Veronesi
ex:president-of dbpedia dbpedia:Istituto_Europeo_di_Oncologia.

ex:president-of rdfs:subPropertyOf ex:involved-in;
rdfs:label 'is president of'.

ex:involved-in a rdfs:Property;
rdfs:domain ex:Person;
rdfs:range ex:Organization.
dbpedia:Sandro_Veronesi
a ex:Doctor, ex:Person.
dbpedia:Sandro_Veronesi a ex:Person.
dbpedia:Istituto_Europeo_di_Oncologia a ex:Organization
ex:Oncologist rdfs:subClassOf
ex:Person, rdfs:Class.

ex:Doctor rdfs:subClassOf rdfs:Class.
ex:Person rdfs:subClassOf rdfs:Class.
Transitive
ex:part-of a owl:TransitiveProperty

Symmetric
ex:knows a owl:SymmetricProperty

Inverse
ex:supervises owl:inverseOf ex:is-supervised-by

and more (functional, equivalent, asymmetric, reflexive...)
Existential
ex:Oncologist owl:equivalentClass ex:Doctor, [ a owl:Restriction;
owl:onProperty ex:has-specialization;
owl:someValuesFrom ex:OncologySpeciality ].

Universals (in Manchester Syntax)
ex:Cow SubClassOf ( ex:eats only ex:Vegetable )

Cardinality (min, max also supported)
ex:Person SubClassOf ( ex:has-parent exactly 2 )
What's the point?
Property types
ex:ieo ex:located-in ex:milano.
ex:milano ex:part-of ex:italy.
ex:italy ex:part-of ex:europe.

ex:john foaf:knows ex:anne.

Property Restrictions
dbp:Umberto_Veronesi
ex:has-specialization ex:MammalCarcinoma.
select ?institute where {
?institute ex:located-in ?loc.
?loc ex-part-of ?ex:europe.
}
select ?doc where {
?doc a ex:Oncologist.
}
select ?anneFriend where {
ex:anne foaf:knows ?anneFriend.
}
# Person subclass of ( ex:has-parent only Person)
ex:beth ex:has-parent ex:nicole.
does not
imply ex:Person, cause subClassOf means it's only necessary
=> OWL is axiomatic
==> See
http://prezi.com/hbwaivbln9yz/4-ontologies/

for tricks about OWL
OWL Flavours
The more expressivity/constructs/inference you want, the more performance issues
you need more memory, more CPU
Very expressive logics are also undecidable (as OWL Full)
Many triple stores offer inference that cross these predefined categories, e.g.,
Virtuoso, not much more that RDF-S
Jena, a bit less than OWL-DL
RDF-S
Standard schemas and Ontologies: examples
schema.org: a very lightweight and general 'ontology', for most common things
Google (and other search engines) supports it
RDFa: allows you to annotate your web pages with RDF statements
RDFa + schema.org + other ontologies(*): allows you to be more visibile on Google
Potentially lets Google know more than it can "understand" via text mining
(*) Examples:
Dublin Core (general document metadata
FOAF (people's relationships)
SIOC (Blogs, web sites, social networks)
Standard schemas and Ontologies: examples
GoodRelations: a raher rich ontology to describe commercial products, businesses and alike
BestBuy known to be using it
Google is probably detecting it

Standard schemas and Ontologies: examples
LOD fits with Life Science
Very heterogeneous
In strong need to integrate, collaborate etc
Often can benefit from advanced OWL logics features
Let's go for a little demo
A Demo
Let's do some hands-on
data1.csv
data2.xml
Let's do some hands-on
Let's do some hands-on: XML->RDF
Have a look at the sources in data2_to_rdf/
including Xml2Rdf.java and EFOResolver.java
Run it and see the results in data2.ttl

Exercise:
having the Java variables sampleId, uniProtId, efoId (coming from data1.csv),
generate the statements like (as the ones in data1.ttl):

@prefix atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> .

<http://rdf.ebi.ac.uk/demo/sample/3CB6B2564EA5C81F4EC5069DB5A75877>
rdf:type obo:OBI_0000747 ;
rdfs:label
"Human Sample from Experimental Data Set 1, ID #3CB6B2564EA5C81F4EC5069DB5A75877" ;
atlasterms:dbXref <http://purl.uniprot.org/uniprot/Q9UKT5> ;
atlasterms:dbXref <http://purl.uniprot.org/uniprot/Q9BV07> ;
atlasterms:hasFactorValue
<http://www.ebi.ac.uk/efo/EFO_0001071>

It's not much more than copy/paste the previous code (here on the right)

Solution is in test/java, Exercise1.java

Making LODs available
Files
Applications
(eg, data exporters, text mining, RDFa data crawlers)
SQL DBs
Client Applications
Web Interfaces
(eg, dbpedia.org/sparql)
eg, Virtuoso, Fuseki/Jena, BigOWL
Frameworks
(eg, Jena, Sesame)
Putting LOD
on LINE

Making LODs available
Look at demo/fuseki/fuseki_config.ttl
Start Fuseki with
./fuseki-server --config=/path/to/fuseki_config.ttl
Open fuseki/sparql.html

Have a try with SPARQL:
Select top 10 specimens
(http://purl.obolibrary.org/obo/), ordered by label
and their labels
order results by label
Click on the reported links to see
URI resolution

The webapp seen before can be started from
demo/webapp
mvn jetty:run
it runs against the Fuseki instance
# From http://it.dbpedia.org/sparql
PREFIX dbp-onto: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?composerName ?birthplace
WHERE
{
?movie a dbp-onto:Film;
dbp-onto:country <http://it.dbpedia.org/resource/Italia>;
dbp-onto:musicComposer ?composer.

?composer
dbp-onto:birthPlace ?birthplace;
rdfs:label ?composerName.

?birthplace
dbp-onto:populationTotal ?population;
rdfs:label ?birthplaceName.

FILTER ( ?population > 100000 ).
}
ORDER BY ?composerName
LIMIT 100.
Querying with SPARQL
Querying with SPARQL
<http://it.dbpedia.org/resource/La_bella_addormentata_(film_1942)>
rdf:type dbpedia-owl:Film;
dbpedia-owl:musicComposer dbpedia-it:Achille_Longo;
dbpedia-owl:country dbpedia-it:Italia .

dbpedia-it:Achille_Longo
rdfs:label "Achille Longo"@it;
dbpedia-owl:birthPlace dbpedia-it:Napoli.

dbpedia-it:Napoli rdfs:label "Napoli"@it.
dbpedia-it:Napoli dbpedia-owl:populationTotal 957430.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo: <http://purl.obolibrary.org/obo/>

SELECT DISTINCT *
WHERE
{
?specimen
a obo:OBI_0100051;
rdfs:label ?specimenLabel.
}
ORDER BY ?specimenLabel
LIMIT 10
Let's see an application

Let's delve into the web app
Open demo/webapp (eg, in Eclipse)
SPARQL invocation from Jena, look at:
ConditionSearch.java
SemWebUtils.java
Look at the queries in main/resources/sparql
in particular,
federated queries
Look at the JSPs. Note it's a simple MVC application backed by a triple store
not much different than client+server+DBMS
Back to our conversion task
TARQL (github.com/cygri/tarql)

CSV->RDF based on SPARQL
<http://rdf.ebi.ac.uk/demo/sample/3CB6B>
rdf:type obo:OBI_0000747 ;
rdfs:label
"Human Sample from Experimental Data Set 1, ID #3CB6B" ;
atlasterms:dbXref <http://purl.uniprot.org/uniprot/Q9UKT5> ;
atlasterms:dbXref <http://purl.uniprot.org/uniprot/Q9BV07> ;
atlasterms:hasFactorValue
<http://www.ebi.ac.uk/efo/EFO_0001071>.
CONSTRUCT {
?sample rdf:type obo:OBI_0000747. # Material Sample
?sample atlasterms:dbXref ?uniprotUri.
?sample atlasterms:hasFactorValue ?diseaseTerm.
?sample rdfs:label ?sampleLabel.

?uniprotUri rdfs:label ?UniProt.
}
WHERE {
BIND ( URI ( CONCAT ( 'http://purl.uniprot.org/uniprot/', ?UniProt ) ) AS ?uniprotUri )
BIND ( URI ( CONCAT ( 'http://www.ebi.ac.uk/efo/', ?EFO ) ) AS ?diseaseTerm )
BIND ( URI ( CONCAT ( 'http://rdf.ebi.ac.uk/demo/sample/', ?Sample ) ) AS ?sample )
BIND ( CONCAT ( 'Human Sample from Experimental Data Set 1, ID #', ?Sample ) AS ?sampleLabel )
}
OFFSET 1
data1.csv
We do it with
www.ebi.ac.uk/fgpt/sw/lodestar
A few other tools
Jena Code
RDFization
http://refine.deri.ie/, for CSV, based on Open Refine
Various Java-mapping libs: http://simile.mit.edu/wiki/RDFizers
http://d2rq.org/, for mapping SQL tables/tuples
More listed here: http://tinyurl.com/qx2vfv3

UI building
Non RDF-specific: http://www.simile-widgets.org/exhibit3/
Semantic mashups: http://tinyurl.com/kpwelzg
Browse/edit knowledge: http://aksw.org/Projects/OntoWiki.html
Graph-based approach: http://code.google.com/p/relfinder/

Complete framework to publish and query RDF
http://bioinformatics.ua.pt/coeus/
Wrap-up
RDF and the SW, pros and cons
It's a very flexible and standard mean to share and integrate knowledge
that's why they make Open Data available as LOD
It's no magic,
SW doesn't solve the interoperability problem, it just puts it on the table (F. V. Hamerlen)
Isn't a one-solution-fit-all approach
n-ary relationships and context-referring statements
similarly, XML schemas might be just enough sometimes (eg, micro-formats)
Data integration features are as easy as a weak point
provenance is lost once you've merged two graphs. You need to manage this issue (eg, named graphs)
Open World Assumption may be a problem, i.e., a missing property/link doesn't mean it's invalid
very hard to keep consistency (well, even in the old web you find 'chemical trails')
Performance is bad, you cannot have big data sets
used to be true, now try Virtuoso or Jena TDB
it is still true with advanced reasoning => OWL is complicated, also because of OWA
RDF and the SW, pros and cons
In summary
It's good for certain purposes, but
other approaches might emerge in future that considered better (e.g., MongoDB with JSON/JSON-LD documents)
Yet, the linked data principle is likely here to stay
Google Knowledge Graph
http://www.google.co.uk/insidesearch/features/search/knowledge.html
Facebook Social Graph
https://developers.facebook.com/docs/graph-api
Take-home message
Open Data are cool
Linked Open Data are even better
You might benefit from LOD
World might benefit from your LOD
Think about it
Let's talk about it
Marco Brandizi
www.marcobrandizi.info/mysite/about
and all of you!
From the web of documents...
The Semantic Web Languages
Schemas, the RDF-S vocabulary
OWL: more expressivity and grounding into (description) logics
Data RDF-ization
Exercise
Say this in RDF:
'Sample #
3CB6' is an instance of 'material sample' (obo:OBI_0000747)
has a label like (rdfs:label)
is associated to proteins identified by Q9UKT5 and Q9UKT5
is known to be the condition 'lung carcinoma' (EFO_0001071)

This presentation: http://tinyurl.com/lodtut14

A longer version: http://tiny.cc/lodman
source: Wikipedia
source: opendefinition.org
source: http://tinyurl.com/pcrdaj8
source: 5stardata.info
Background source: 5stardata.info
source: http://tinyurl.com/pmfnvo6
source: Wikipedia
Background source: www.fotopedia.com/items/flickr-71138081
(http://github.com/marco-brandizi/lod_tutorial_demo)
Full transcript