Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Linking Data For Real: How and Why The Current
Transcript of Linking Data For Real: How and Why The Current
Situation Stinks Big Time
Walled Gardens Abound
linked open data
The Darwin Core Triplet:
Institution Code: Collection Code: Catalog Number
Bold - Museum ID
Definition: identifier for the specimen from which the nucleic acid sequenced was obtained.
Value format: /specimen_voucher="[<institution-code>:[<collection-code>:]]<specimen_id>"
Darwin Core Occurrence ID
What is happening in the wild?
Lets Find Out!
Triplify VertNet (VN) data
Assemble Genbank, BOLD, Morphbank records for institutions in VN
Query for shared identifiers
Do so under more or less stringent conditions (e.g. matches for records missing collection code)
The Bad and Ugly
So What Does This Have To Say About Semantics?
Lack of curation and validation when entering data is a HUGE problem
(e.g. problems propagate)
1. This whole thing was a lot bloody harder to do than it should be.
2. This exercise was set up to help find matches, and the rate was still abysmal
3. And when we did, most of those matches were neither canonical or unique. Bummer.
A schematic representation showing proportional numbers of Darwin Core Triplets – represented as different sized ellipses - across repositories and the overlap between them.