Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

OpenAIRE workflows

No description
by

Andrea Mannocci

on 27 January 2017

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of OpenAIRE workflows

Data sources
Institutional and thematic repositories
Open Access publishers and journals
Data archives (DataCite, BASE)
3rd party aggregator services
Entity registries (Corda, DOAJ, OpenDOAR, re3data)
CRIS (end of 2017)
The OpenAIRE workflows for data management
Claudio Atzori, Alessia Bardi, Paolo Manghi,
Andrea Mannocci
Istituto di Scienza e Tecnologie dell’Informazione ”A. Faedo” - CNR, Pisa, Italy.
13th Italian Research Conference on Digital Libraries
27 January 2017
Andrea Mannocci, PhD
andrea.mannocci@isti.cnr.it
The OpenAIRE project
Objectives
Create a
pan-European
network for
guidelines
definition w.r.t. doing research and science
Foster
Open Access (OA)
and
Open Science
across research communities
Assess
impact
of OA and RoI and provide statistics to the EU commission
Offer a centralized
entry point
to research & science outcomes to the general public
The OpenAIRE data model
Data model entities
Results
(publication or dataset)
Persons
(authors, project coordinators, etc.)
Organizations
(research centres, universities, companies, etc.)
Funders
(organization responsible for funding schemes)
Projects
Data

sources
(publication repositories, dataset repositories, journals, publishers, etc.)
Information space population
Collected Information Packages are
split into different entities
loaded into the OpenAIRE
Information Space Graph (ISG)
Connections among different entities are maintained
PDFs aggregation
Crawl and download PDFs pointed by literature metadata records
~5 million PDFs
Information packages
XML DublinCore (DC) literature records
Proprietary formats
JSON files
CSV
XML
Deduplication of the Information Space
Apply to publications, organizations and authors
Identify similar records and merge their information into one
representative record
Produces an
ActionSet
: a set of equivalence relationships that will be applied to the ISG
Enrichment of the Information Space
Load downloaded PDF
Extract full-text from PDFs (CERMINE)
Infer new knowledge
by combining
The most recent ISG available
The fulltexts
Generate a new
ActionSet
for enriching the ISG
Publications to projects, datasets
Subjects, citations, metadata enrichment
etc
Publication of the Information Space
The ISG is projected into four different backends serving four different use cases
Index
for portal queries
Key-value
database for statistics
Triple store
for LOD export
Document store
: for bulk DublinCore export
Two step procedure: pre-public + public

Monitoring exhibited quality
Data consumers demand for guarantees and consistency
Quality metrics
are automatically extracted from the backends in order to asses that
trends
are respected over time
different ISG projections are
aligned
Algorithm steps
Candidates identification
Candidates matching
Graph disambiguation
Full transcript