Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Flux of MEME - Description of Work

results achieved during the 1st semester of research - keywords: semantic web, twitter, clustering, geo, topic extraction

thomas alisi

on 7 March 2011

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Flux of MEME - Description of Work

it is said that geo tagging is growing
still it represents around 1% of published content from twitter blog, dec 2010:
"Twitter users now send more than 95 million Tweets a day, on just about every topic imaginable." first big problem: fetch geo-localized data and create clusters of concepts in time/space axis
status: solved step 1: fetch data from twitter attempt n.2: access a continuous flow of data through twitter streaming API
the client does not need to perform a specific query
all tweets are fed through the client to our database
the stream can be tweaked to filter specific keywords/locations
keyword filtering can still be applied with "wikiminer" expansion

main problems:
twitter gives limited access to content (account "spritzer" has access to approx. 1% of total tweets) step 2: improve quality of results problem:
too much heterogeneous data carrying too little information

filter geo-localized content
enrich data with related links
filter related links to meaningful content only step 3: store content locally the database needs to store all the strucure needed for fetched data + cluster structure
a flexible architecture of DAOs allowed subsequent interventions on the database for refinements of the structure step 4: clusters! (at last) create time slices read all the posts + links in the timeline create geo-clusters using
hierarchical agglomeration create semantic-clusters using Latent Dirichlet Allocation step 5: web prototype what's next? Flux of MEME, the idea behind:
analyze clusters of concepts and understand how they move in space and time attempt n.1: access data through twitter search API
the client performs a specific query
a specific query implies the use of a limited amount of text/concepts
hence concepts must be expanded using clever algorithms
expansion of concepts was implemented using "wikiminer" library

main problems:
the client needs to wait results from twitter live search, wasting most of the time on hold
the twitter API gives access to a limited time frame, running 1 week in the past at most step 1: fetch data from twitter 1. fetch data 2. create geo-clusters 3. extract topics 4. analyze stats
Full transcript