Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

The ARCOMEM system

The ARCOMEM system made easy.
by

Dominik Frey

on 18 February 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of The ARCOMEM system

Only content that matches the archivist's setup is preserved and exported as WARC Files - including the social and semantic RDF annotations.
Web Archive
ARCOMEM has two use cases that drive the project. We have made a short movie to introduce you to the broadcaster use case:
CRAWLER COCKPIT
Archivists setup and control the event crawling process. What content should be archived?
crawler
Twitter Dynamics
API Crawler
queries the APIs of Twitter, Facebook, Google+, Youtube and Flickr for public accessible content and user details
http://pierre.senellart.com/publications/gouriten2012api.pdf
https://github.com/netiru/arcomem-apicrawler
https://github.com/netiru/apiblender
Adaptive Heritrix
open source crawler with a prioritization module that can be updated in real time as a service
Application Aware Helper
This modul extracts links by taking application specific functionalities into account, e.g. wordpress or twitter.
IMF Crawler
... is a EU-funded research project about memory institutions like archives, museums, and libraries in the age of the Social Web.
AR
chive
CO
mmunity
MEM
ories
At this level, the system decides and fetches the relevant web objects as these are initially defined by the archivist. The crawling level includes, besides the traditional crawler and its decision modules, some important data cleaning, annotation and extraction steps. Next we show you the crawling modules that we use in ARCOMEM.

read more
http://de.slideshare.net/arcomem/arcomem-training-specifyingcrawls
data
analyzing
This module detects the evolution of entities over time e.g.
Joseph Ratzinger > Pope Benedict XVI
> Pope Emeritus Benedict XVI
Read more:
http://de.slideshare.net/arcomem/arcomem-training-neerbeginner
http://www.slideshare.net/arcomem/arcomem-training-neeradvanced

Identification of cultural differences in the social web, detection of domain experts and social guided search.
Read more:

http://de.slideshare.net/arcomem/arcomem-training-culturalanalysisbeginner
http://de.slideshare.net/arcomem/arcomem-training-diversification


Read more:
http://de.slideshare.net/arcomem/arcomem-training-simpletextminingbeginner
GATE introduction
GATE performs also more time-consuming extraction of entities, opinions, events from text.
Read more:
http://de.slideshare.net/arcomem/arcomem-training-opinionsadvanced
http://de.slideshare.net/arcomem/arcomem-training-entitiesandeventsadvanced
This module extracts entities and locations from images & videos plus identifies duplicate items.


Read more:
http://de.slideshare.net/arcomem/arcomem-training-multimedia
Online Analysis
feeds directly the crawlers
How do Twitter #hashtags evolve over time? Which terms are linked together?

read more:
http://de.slideshare.net/arcomem/arcomem-training-twitterdynamicsbeginner


All content items are enriched with semantic information about topics, entities and events plus references to linked data, e.g. DBpedia.

Read more:
http://de.slideshare.net/arcomem/enrichment-trainingbeginnerupdate
http://de.slideshare.net/arcomem/arcomem-training-enrichment-advanced-update
http://www.slideshare.net/arcomem/arcomem-training-topicmodelsbeginners

2
3
Social signals such as information
about persons, locations, or social
structure are used to prioritize
the crawlers.
Read more:
http://de.slideshare.net/arcomem/arcomem-training-twitterdomainexpertsbg-25809109

ARCOMEM database
All crawled and analyzed data is stored in the ARCOMEM database consisting of an object store (HBase) and a knowledege base (RDF triplets).
Named Entity
Evolution
Consolidation
& Enrichment
Social Web Analysis
Image & Video Analysis
Simple content analysis (e.g. keyword detection) based on GATE allows an efficient relevance ranking of extracted links.

GATE Online Analysis
Social Web Analysis
GATE Offline Analysis
Offline Analysis
thorough analysis of crawled data
database
Cross Crawl Analysis
analyzing several crawls

The ARCOMEM System made easy.
WARC
1
The standard WebARChive file format ensures long time accessibility and facilitates sharing and exchange of content within archiving institutions.
read more
Twitter’s Visual Pulse
OpenIMAJ and ImageTerrier
Proprietary Large Scale Crawler by Internet Memory Foundation (IMF) supports efficient and scalable crawling of millions of URLs.

http://pierre.senellart.com/publications/faheem2013demonstrating.pdf
http://internetmemory.org/en/
read more
read more
step
step
step
step
4

Where is the story? Different modules analyze the semantic & social context of the crawled resources in order to filter relevant items.

We do not want to archive everything! But only content that matches the archivist's setup in the Crawler Cockpit.


Analyze and filter
http://cockpitdemo2.internetmemory.org/cockpit
try out the demo
Feel free to click through this Prezi by clicking next or get adventurous and just click on any object!
SARA
End users like journalists, parliamentarians or students browse the semantic and social enriched archive. What was the context of this event?
5
step
So it's easy to integrate ARCOMEM results in your existing Web Archive

for example by using the open source Wayback Machine
or
you can access the archive via the specialized ARCOMEM Search And Retrieval Application
SARA
http://epart.atc.gr:8088/arcomem-sara-4.0/
try out the demo
http://de.slideshare.net/arcomem/arcomem-training-heritrixbeginner

read more
read more
about the ARCOMEM system architecture
http://de.slideshare.net/arcomem/arcomem-training-systemoverviewbeginner
http://de.slideshare.net/arcomem/arcomem-training-systemoverviewadvanced
read more
http://www.arcomem.eu/wp-content/uploads/2012/01/h2rdf_www2012_demo.pdf
Full transcript