Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

RepLab 2013 - Tasks and Participation

Talk for the RepLab 2013 workshop
by

Julio Gonzalo

on 29 September 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of RepLab 2013 - Tasks and Participation

RepLab 2013:
Monitoring Online Reputation

The scenario
The Task
The Test Collection
Evaluation metrics
Results
In summary...
input
output
topic 1
2
3
...
less importance
discarded
Filtering
Clustering
Ranking
Topic
Priority
Summary
Polarity
for
Reputation
Filtering
Topic
Detection
Tweet Stream
apple
bla,bla,ba....
apple
...
bla,bla,bla
bla,bla,ba....
apple
...
bla,bla,bla
bla,bla,ba....
apple
...
bla,bla,bla
bla,bla,ba....
apple
...
bla,bla,bla
bla,bla,ba....
apple
...
bla,bla,bla
bla,bla,ba....
apple
...
bla,bla,bla
Alert!
Alert!
Alert!
Mildly Important
sep' 13
Discarded
(unrelated)
Unimportant
Apple
launches new
Iphone 5s...
Having a delicious
piece of
apple
pie

Filtering Approaches
Instance-based Learning + Heterogeneity Based Ranking (HBR)
Tweet-level Binary classification: Is the tweet RELATED/UNRELATED to the entity of interest?
Topic Detection Approaches
Wikified Tweet Clustering
Full Task Monitoring
Filtering + Topic Detecion + Topic Priority
Tweet-level Multiclass Classification: Positive/Neutral/Negative for the reputation of the entity
Clustering of tweets by topics
Topic-level Multiclass classification:
Alert / Mildly Important / Unimportant topic
for the reputation of the entity
Polarity Approach
Similar to the RepLab 2013 official baseline
Each tweet in the test set is labeled as same as the most similar tweet in the training set
Combination of rankings given by multiple text similarity measures
Applicable to all the subtasks (Topic Detection, Polarity, Priority...)
Two-steps classification algorithm
Step 1: Automatic Keyword Discovery: each term is classified as positive/negative/none keyword.
Step 2: Automatic Tweet Classification: Tweets containing keywords are used to feed a binary BoW classifier that classifies the remainder tweets as related/unrelated.
F-1 = Harmonic Mean({R,S} x {Filtering, Topic Detection, Topic Priority})
All approaches ranks similar or lower than the baseline
Instance-based learning + HBR performs similarly to the baseline (x% lower in terms of F(R,S)
Filter Keywords:
Directly classifying tweets (second step) performs x% than discovering filter keywords and y% than the baseline
Generic (no entity-specific training data) needs further improvements in the keyword classification step
Step 1: Term clustering
Learned similarity function (content-based, meta-data, time-aware features)
Hierarchical Agglomerative Clustering
Step 2: Tweet clustering
Assigns tweets according to maximal term overlap
Term Clustering
LDA-based Clustering
Jaccard similarity over Wikipedia pages/entities linked to the tweets
Based on Twitter-LDA and Topics over Time models
Transfer learning: target tweets + background tweets to establish the right number of clusters
Filter Keywords
(Spina et al, ESwA 2013)
SentiSense: Affective Lexicon of 5,496 words and 2,190 synsets from WordNet labeled with an emotional category.
Semantic Graphs for Domain-specificic Lexicon Adaptation
Graph generated upon semantic relations between concepts in WN.
Tweets represented as a Vector of Emotional Intensities (VEI) feed a Machine Learning classifier. (same as RepLab 2012)
PERFECT SYSTEM
PERFECT SYSTEM
PERFECT SYSTEM
How important is each component?
Using entity-specific training data gets significantly higher F(R,S) scores (145%-164%)
All but instance-based Learning+HBR gets competitive results (relative to RepLab 2013 systems)
- worse filtering system => tends to annotate all as "unrelated"
- Input for the Topic Detection subsystem: only tweets that are more likely to be related to the entity =>
Sensitivity scores are lower (low recall), but Reliability score are much higher (easier to cluster more related tweets)

=> F(R,S) Topic Detection: higher with a bad filtering than with a good filtering

Priority: instance-based and baseline (similar results)
In general, filtering is crucial for the overall F-1
+95% improvement of the best run considering a perfect filtering subsystem
In this case, the run that uses Instance-based learning+HBR in the priority subtask gets the best overall F-1
Best improvement is 69%, much less than using a perfect filtering (+217%)
Even if actual priority scores are low (0.27-0.30), a perfect priority system has less impact than the other tasks (only +53% in the best case)
INPUT
OUTPUT
61 entities
1 query
2,200
4 annotations
142,527 tweets
570,108 annotations
19.5 person month
Test Collection
{En,ES}
Polarity for Reputation:
opinions vs polar facts
Participation
Registered: 45 Submitted: 16 (36%)
Filtering
Polarity
Topic detection
Topic priority
Full Task
Evaluation
Online Reputation Monitoring
Filtering:
R is product of precisions on positive and negative classes
S is product of recalls
Clustering (topic detection):
R is BCubed precision
S is Bcubed recall
Applicable to polarity assuming
negative > neutral > positive
Binary relationships:
d is topically related to d'
d has more priority than d'
Ranking (retrieval):
weights allow to set that the n
first documents weight x%
Perceived difficulty
Inter-annotator agreement
Entity-specific training leads to boost in performance (close to inter-annotator agreement)
topic detection & ranking most difficult steps
Best full task: 0.19 - room for improvement
Accurate filtering is key
Highlights
highlights:
Large high-quality annotation effort
knowledge engineering with reputation experts
general evaluation metric for task + subtasks
Adolfo Corujo (Llorente & Cuenca)
Julio Gonzalo (UNED)
Edgar Meij (Yahoo! Research)
Maarten de Rijke (U. Amsterdam)
(Lab organizers)

Enrique Amigó
Jorge Carrillo
Irina Chugur
Tamara Martín
Damiano Spina
UNED

1
3
2
4
One task = four coupled subtasks
Baselines provided for all subtasks
+ formal constraints
Challenges:
amount/specificity of training material
expert consensus
Things we excluded!
What is the worst valued player
from the Spanish soccer team?

Reina
What is the most popular fast food chain in Twitter?
Subway
Ana Pitart
Vanessa Álvarez
LL&C
Full transcript