Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
The ARCOMEM system
Transcript of The ARCOMEM system
ARCOMEM has two use cases that drive the project. We have made a short movie to introduce you to the broadcaster use case:
Archivists setup and control the event crawling process. What content should be archived?
queries the APIs of Twitter, Facebook, Google+, Youtube and Flickr for public accessible content and user details
open source crawler with a prioritization module that can be updated in real time as a service
Application Aware Helper
This modul extracts links by taking application specific functionalities into account, e.g. wordpress or twitter.
... is a EU-funded research project about memory institutions like archives, museums, and libraries in the age of the Social Web.
At this level, the system decides and fetches the relevant web objects as these are initially defined by the archivist. The crawling level includes, besides the traditional crawler and its decision modules, some important data cleaning, annotation and extraction steps. Next we show you the crawling modules that we use in ARCOMEM.
This module detects the evolution of entities over time e.g.
Joseph Ratzinger > Pope Benedict XVI
> Pope Emeritus Benedict XVI
Identification of cultural differences in the social web, detection of domain experts and social guided search.
GATE performs also more time-consuming extraction of entities, opinions, events from text.
This module extracts entities and locations from images & videos plus identifies duplicate items.
feeds directly the crawlers
How do Twitter #hashtags evolve over time? Which terms are linked together?
All content items are enriched with semantic information about topics, entities and events plus references to linked data, e.g. DBpedia.
Social signals such as information
about persons, locations, or social
structure are used to prioritize
All crawled and analyzed data is stored in the ARCOMEM database consisting of an object store (HBase) and a knowledege base (RDF triplets).
Social Web Analysis
Image & Video Analysis
Simple content analysis (e.g. keyword detection) based on GATE allows an efficient relevance ranking of extracted links.
GATE Online Analysis
Social Web Analysis
GATE Offline Analysis
thorough analysis of crawled data
Cross Crawl Analysis
analyzing several crawls
The ARCOMEM System made easy.
The standard WebARChive file format ensures long time accessibility and facilitates sharing and exchange of content within archiving institutions.
Twitter’s Visual Pulse
OpenIMAJ and ImageTerrier
Proprietary Large Scale Crawler by Internet Memory Foundation (IMF) supports efficient and scalable crawling of millions of URLs.
Where is the story? Different modules analyze the semantic & social context of the crawled resources in order to filter relevant items.
We do not want to archive everything! But only content that matches the archivist's setup in the Crawler Cockpit.
Analyze and filter
try out the demo
Feel free to click through this Prezi by clicking next or get adventurous and just click on any object!
End users like journalists, parliamentarians or students browse the semantic and social enriched archive. What was the context of this event?
So it's easy to integrate ARCOMEM results in your existing Web Archive
for example by using the open source Wayback Machine
you can access the archive via the specialized ARCOMEM Search And Retrieval Application
try out the demo
about the ARCOMEM system architecture