Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Linking Data For Real: How and Why The Current

No description
by

Robert Guralnick

on 30 April 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Linking Data For Real: How and Why The Current

Linking Data For Real: How and Why The Current
Situation Stinks Big Time

Walled Gardens Abound
identifiers
linked open data
The Darwin Core Triplet:

Institution Code: Collection Code: Catalog Number
Bold - Museum ID

specimen_voucher=
Definition: identifier for the specimen from which the nucleic acid sequenced was obtained.
Value format: /specimen_voucher="[<institution-code>:[<collection-code>:]]<specimen_id>"
Genbank INSDC
Darwin Core Occurrence ID
What is happening in the wild?
Lets Find Out!
Triplify VertNet (VN) data
Assemble Genbank, BOLD, Morphbank records for institutions in VN
Query for shared identifiers
Do so under more or less stringent conditions (e.g. matches for records missing collection code)


The Bad and Ugly


So What Does This Have To Say About Semantics?

Lack of curation and validation when entering data is a HUGE problem


(e.g. problems propagate)
CONCLUSIONS:

1. This whole thing was a lot bloody harder to do than it should be.
2. This exercise was set up to help find matches, and the rate was still abysmal
3. And when we did, most of those matches were neither canonical or unique. Bummer.
A schematic representation showing proportional numbers of Darwin Core Triplets – represented as different sized ellipses - across repositories and the overlap between them.
Full transcript