Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript
  • Aggregation of metadata from heterogeneous collections leads to data quality issues
  • Large scale aggregation also brings opportunities for data enrichment and enhancement

The Europeana case is quite different from many library-focused ones

  • Persons are referred to in the simple ESE (Europeana Semantic Element) metadata
  • There is no direct linking, for example, via a reference to an authority number used at a national library.

The pilot would allow an improvement of the enrichment process in Europeana.

2. Connect related Europeana records

  • Detect duplicates or near-duplicates
  • Identify and create semantic links between objects that are related
  • a painting and photographs of that painting
  • all digitized pages of the same book
  • a collection of letters that belong to the same person.
  • different editions of one book

http://thoth.pica.nl/eu/results_en/level40/40_8251.html

2. Categorize clusters and identify semantic links between records

Duplicates

Findings

Same page digitized 3 times --- Duplicates?

Mutual benefits

OCLC internal data

(Digital Collection

Gateway, etc)

Europeana data

model

On the clusters

  • Clusters are generally good but are limited to close relationships

On the data use for the research

  • Quality issues in the data
  • Standard are interpreted differently by providers despite the presence of guidelines
  • Creation of digital object is not always in line with the creation of descriptive metadata

Logical structure of cultural heritage objects is not always reflected in the metadata.

Applying the types of relations available in EDM to the types of clusters found during the experiment.

Findings from the pilot could feed into best practice guides for content providers and thereby improve the quality of the whole Europeana dataset

Same objects, different providers

Clustering

and

enrichment

innovation

New browsing

experiences

Data services for

third parties

Digitized content of Europe's galleries, libraries, museums, archives and audiovisual collections.

Over 22 million books, films, paintings, museum objects and archival documents from some 2,200 content providers.

Hunting for Semantic Clusters

Europeana Innovation Pilots

How can we find interesting stuff in

over 22 million Europeana objects?

1. Connecting as many objects (books, films, paintings, etc) to the resources of Virtual International Authority Files (VIAF)

Shenghui Wang

OCLC Research

Leiden, The Netherlands

OCLC Research: Two step approach

1. Cluster records into small clusters

  • A fast clustering method which clusters 23.6 million records in 4 minutes
  • Genetic algorithm to automatically select important metadata for more meaningful clusters, such as
  • all pages of the same book
  • all postcards sent by one person
  • Different similarity thresholds for a hierarchical way of exploring records

Current situation in Europeana

Europeana

Clusters

(Near-)Duplicates

Thematic clusters

or collections

Views of the same object

Parts of the same CHO

Derivatives works

Learn more about creating dynamic, engaging presentations with Prezi