Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Hunting for semantic clusters in Europeana

OCLC EMEA Regional Council 2013

Shenghui Wang

on 8 April 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Hunting for semantic clusters in Europeana

Hunting for Semantic Clusters Findings How can we find interesting stuff in
over 22 million Europeana objects? 1. Connecting as many objects (books, films, paintings, etc) to the resources of Virtual International Authority Files (VIAF) Europeana Innovation Pilots 1. Cluster records into small clusters OCLC Research: Two step approach On the clusters
Clusters are generally good but are limited to close relationships

On the data use for the research
Quality issues in the data
Standard are interpreted differently by providers despite the presence of guidelines
Creation of digital object is not always in line with the creation of descriptive metadata

Logical structure of cultural heritage objects is not always reflected in the metadata.

Applying the types of relations available in EDM to the types of clusters found during the experiment.

Findings from the pilot could feed into best practice guides for content providers and thereby improve the quality of the whole Europeana dataset Shenghui Wang
OCLC Research
Leiden, The Netherlands Duplicates Same objects, different providers Same page digitized 3 times --- Duplicates? Aggregation of metadata from heterogeneous collections leads to data quality issues
Large scale aggregation also brings opportunities for data enrichment and enhancement 2. Connect related Europeana records
Detect duplicates or near-duplicates
Identify and create semantic links between objects that are related
a painting and photographs of that painting
all digitized pages of the same book
a collection of letters that belong to the same person.
different editions of one book The Europeana case is quite different from many library-focused ones
Persons are referred to in the simple ESE (Europeana Semantic Element) metadata
There is no direct linking, for example, via a reference to an authority number used at a national library.

The pilot would allow an improvement of the enrichment process in Europeana. Current situation in Europeana A fast clustering method which clusters 23.6 million records in 4 minutes
Genetic algorithm to automatically select important metadata for more meaningful clusters, such as
all pages of the same book
all postcards sent by one person
Different similarity thresholds for a hierarchical way of exploring records 2. Categorize clusters and identify semantic links between records Digitized content of Europe's galleries, libraries, museums, archives and audiovisual collections.

Over 22 million books, films, paintings, museum objects and archival documents from some 2,200 content providers. http://thoth.pica.nl/eu/results_en/level40/40_8251.html Europeana
Clusters (Near-)Duplicates Parts of the same CHO Views of the same object Derivatives works Thematic clusters
or collections Mutual benefits Clustering
innovation OCLC internal data
(Digital Collection
Gateway, etc) Data services for
third parties Europeana data
model New browsing
Full transcript