Loading…
Transcript

The essentials about RDF

Editing RDF, Advanced

Encoding RDF

RDF, Limits

Literal attributes

Collections

Encoding RDF

RDF and the SW, pros and cons

  • A URI returns a document, containing statements about the thing behind that URI
  • So, how does such a document look like? How do we create/serve it?
  • Is it XML?

dbpedia:Milan

rdfs:label "Milan", "Milano"@it;

dbpedia-owl:elevation "120"^^xsd:float;

ex:location "45.465454,9.186516"^^ex:latlong

dbpedia-owl:leaderName dbpedia:Giuliano_Pisapia.

dbpedia:Giuliano_Pisapia

dbpedia-owl:birthDate 1949-05-20^^xsd:date.

  • Uses XML-schema data types (http://www.w3.org/TR/xmlschema-2/)
  • (advanced) You can have custom data types (ex:latlong), details here: http://tinyurl.com/o2yfe5v

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.

dbpedia:The_Wachowskis a rdf:Bag;

rdf:li "Andy Wachowski", "Lana Wachowski".

ex:The_Matrix_Trilogy a rdf:Seq ;

rdf:_1 dbpedia:The_Matrix ;

rdf:_2 dbpedia:The_Matrix_Reloaded;

rdf:_3 dbpedia:The_Matrix_Revolutions.

  • Unfortunately, they cannot be 'closed'. We might be able to live with it

@prefix s: <http://example.org/students/vocab#> .

@prefix sl: <http://example.org/students/> .

@prefix c: <http://example.org/courses/> .

c:6.001 s:students ( sl:Amy sl:Mohamed sl:Johann ) .

  • It's closed, but low-performance when searching an item (engines can do the conversion)
  • OWL offers the 'oneOf' alternative (more later)
  • Generally speaking, they're not used so often (more specific membership preferred)
  • It's a very flexible and standard mean to share and integrate knowledge
  • That's why they make OD available as LOD
  • Might not be good in other contexts
  • e.g., Feeds and Atom/RSS (without text mining), are fine with a simpler XML schema
  • It's no magic, SW doesn't solve the interoperability problem, it just puts it on the table (F. V. Hamerlen)
  • Supports only n-ary relationships
  • time-dependent and other context-referring data are problematic
  • Data integration is as easy as a weak point
  • provenance is lost once you've merged two graphs
  • Open World Assumption may be a problem, see the collection examples above, more later, about OWL
  • other models might emerge in future that are better on this (e.g., MongoDB with JSON/JSON-LD documents)
  • Performance is bad, you cannot have big data sets
  • used to be true, now try Virtuoso or Jena TDB
  • Tools are immature, SW smells of academia, data sets and projects come and go
  • nowadays state of art better than a few years ago, but, alas, still true
  • Difficult to learn, syntax is hard
  • I don't think so, but JSON/LD might be preferred in future
  • More to come about OWL...

So, let's go for the simpler one

URIs, or: Things IDs, Valid in the whole Universe (almost)

Sources: Tommaso di Noia,

http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#containers

URIs

Source: http://tinyurl.com/qdueje8

Blank Nodes

Reification

  • URLs allow you to identify and locate web resources:

http://www.example.com/path/to/resource

  • anywhere
  • in a standardised way
  • URIs are a generalisation, which embeds URNs too
  • e.g., an ISBN no. is a URN and a URI too, (details on wikipedia:Uniform_resource_locator)
  • Most times they're just web-based URLs (safe assumption here)
  • IRIs are even more general, add support to internationalisation
  • Just to let you know

  • Predicates in RDF statements are URIs

http://dbpedia.org/page/Bologna

http://dbpedia.org/ontology/birthPlace

http://dbpedia.org/page/Luigi_Galvani

  • Allows you to universally identify properties too
  • in a standard way
  • in a resolvable way

(i.e., to know about the property, see next slide)

Encoding RDF in Turtle

  • Not a lot of support available
  • Data set size problems
  • Named graphs and quads are similar,
  • but not the same (no nesting, aimed at whole data sets)

<http://www.w3.org/People/Berners-Lee> contact:office _:bn-off .

_:bn-off

contact:address _:bn-add ;

contact:phone <tel:+1-617-253-5702> .

_:bn-add

contact:city "Cambridge" ;

contact:country "USA" ;

contact:postalCode "02139" ;

contact:street "32 Vassar Street" .

  • Why should I learn it?
  • Good to explore data manually, prepare tests
  • SPARQL, the RDF query language, has a similar syntax (more later on)

Out of curiosity, N3, Turtle, N-Triples are similar (http://tinyurl.com/q35qdd6)

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.

@prefix ex: <http://example.com/>.

@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.

ex:result-1898_1

a rdf:Statement;

rdf:subject ex:bio-sample-1898;

rdf:predicate ex:detected-gene;

rdf:object "P53".

# Now we can talk about the statement itself

ex:result-1898_1

ex:probability "0.98"^^xsd:double;

ex:part-of ex:experiment-2900.

source: Tommaso Di Noia

When are they useful?

  • We don't care about identifying certain nodes
  • Collections, like lists and bags (more later)
  • To specify graph templates in the SPARQL query language (more later)
  • Schema/ontology language, for to define expressions (more later)

<http://dbpedia.org/resource/The_Matrix>

<http://dbpedia.org/ontology/starring>

<http://dbpedia.org/resource/Keanu_Reeves> .

<http://dbpedia.org/resource/The_Matrix>

<http://www.w3.org/2000/01/rdf-schema#label> "The Matrix".

<http://dbpedia.org/resource/Keanu_Reeves>

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

<http://dbpedia.org/class/yago/Actor109765278>.

Yes, it sucks!

DOES NOT assert the statement

It's not so different than TTL

Blank Nodes, Compact Turtle Syntax

Might attract web geeks and alike, boost "LODization"

Prefixes are arbitrary and scoped within the "document", though there are common ones

Encoding RDF in Turtle: Namespaces

MongoDB and alike might be useful for LOD

URIs should be resolvable

<http://www.w3.org/People/Berners-Lee>

contact:office [

contact:phone <tel:+1-617-253-5702>;

contact:address [

contact:city "Cambridge" ;

contact:country "USA" ;

contact:postalCode "02139" ;

contact:street "32 Vassar Street"

]

].

dbp: is a shorthand for

http://dbpedia.org/resource/

_:1

You can still use explicit forms

@prefix dbp: <http://dbpedia.org/resource/>.

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.

@prefix dbp-owl: <http://dbpedia.org/ontology/>.

dbp:The_Matrix dbp:starring dbp:Keanu_Reeves.

dbp:The_Matrix dbp-owl:runtime 8160.

dbp:The_Matrix rdfs:label "The Matrix".

dbp:Keanu_Reeves rdf:type

<http://dbpedia.org/class/yago/Actor109765278>.

dbp:The_Matrix a http://dbpedia.org/ontology/Film

  • So that useful stuff can be provided and discovered
  • In the LOD/SW context, the ideal is: RDF statements about the entity the URI is about, ex:
  • http://dbpedia.org/page/Bologna
  • Even better, should resolve differently, depending on the client requirements
  • i.e., Content negotiation (wikipedia:Content_negotiation), ex:

_:2

Most of software define

internal identifiers

No <>, no "", no number or

other scalar => it's invalid

Conventions, desiderata, issues about URIs

source: Tommaso di Noia

http://rdf.ebi.ac.uk/resource/biosamples/sample/SAMEA1904958

Browser -> HTML

curl --location-trusted -H "Accept: application/rdf+xml" \

'http://rdf.ebi.ac.uk/resource/biosamples/sample/SAMEA1904958'

-> RDF in XML format

x a Y, a is equivalent to rdf:type, which is equivalent to...

Blank Nodes, issues

  • URIs should be stable, well, as much as possible
  • Strict: as a single bit changes, make a new URI (e.g., with a new version)
  • Use the Provenance Ontology to link versions
  • More relaxed: new URI only if something substantial changes
  • Variant: prevent migration issues by means of neutral URIs (e.g., purl.org)
  • Yet, we still live in the real world: map with owl:sameAs
  • (and enjoy sameas.org or identifiers.org)
  • Variant: should it be http://dbpedia.org/data/Gene or /concept64635421?
  • :-) human-readable :-( hard to change, if/when needed

  • What should a URI resolve to, really? e.g.,
  • http://dbpedia.org/page/Bologna

http://dbpedia.org/ontology/wikiPageID

2106933

  • What?! Is the URI about the town, or the data?
  • not relevant in many cases
  • different solutions available, including "don't resolve", HTTP-forward
  • In my opinion, an example of SW madness (http://tinyurl.com/pzwn4mx)

In the LOD context, Open+Linked+Blank Nodes don't play very well

what if I want to refer to the address of TBL in data set 1, from DS2?

many LOD projects don't use them (e.g., Bio2RDF, DBPedia)

Ambiguity and performance issues in RDF data querying (http://goo.gl/hZwldX):

Every good computer language

has comments. Which are often used too little..

Encoding RDF in Turtle, Compact forms

Returns {}

Makes sense, but RDF doesn't specify that different blank nodes should have different identifiers

SELECT DISTINCT ?x ?y

WHERE { 

?x :has-child ?xc.

?y :has-child ?yc.

FILTER ( ?xc != ?yc )

}

:John :has-child [ :name "Lucy" ].

:Beth :has-child [ :name "Lucy" ].

:John :married-to :Beth

Typically translated to (for pratical/performance-related reasons):

:John :has-child _:b1.

:Beth :has-child _:b2.

...

# Namespaces omitted, imagine they're here

# Statements about the same subject

dbp:The_Matrix

dbp:starring dbp:Keanu_Reeves;

rdfs:label "The Matrix";

dbp-owl:runtime 8160.

# Same subject and same predicate

dbp:Keanu_Reeves

a yago:Actor109765278;

rdfs:label "Ривз, Киану"@ru, "Keanu Reaves".

Conclusion: Use them for collections, schemas/OWL, SPARQL, avoid when defining data

String values can have a language attached

The "," works for any kind of values