A "small" earthquake in Chile
Twitter helped:
Majority of tweets were helpful
- False tsunami warnings
- False reports of looting
- ...
Of course not
Information credibility on Twitter
Carlos Castillo, Marcelo Mendoza, Bárbara Poblete
(1) Yahoo! Research Barcelona
(2) Yahoo! Research Latin America
(3) Univ. Federico Santa Maria
(4) Univ. de Chile
xkcd: Seismic waves
Chileans
Prominent role for communications
All public figures tweet
Well integrated with traditional media
since ~1yr before earthquake (elections)
- Sat Feb 27, 2010. 03:34 local time
8th largest recorded in history
Communications
- Haiti 2010: 7.7 Mw
- Chile 2010: 8.8 Mw
- Japan 2011: 9 Mw
Almost impossible for 2-3 hr
First video images 6-7 hr later
Day 4
Day 2
Day 3
Day 1
Some were not:
Number of tweets per event
n=747
5/7 labels must agree
n=383
5/7 labels must agree
Credible or not?
Users believed these are
"almost certainly true"
Sub-sets of features
Newsworthy or not?
Task 2: find credible events
Classification results
Supervised classification
Information credibility:
Task 1: find newsworthy events
Tweets that people found credible:
- US construction plunges 10%
- Tropical storm in the gulf
- Markets down on Portugal, Greece
- Yankees' Bob Sheppard dies
Supervised classification
Features
Labels
Learning
Almost certainly true
[Likely to be true]
Likely to be false
Almost certainly false
Perceived quality
Made of multiple dimensions
- Text: avg length, sentiments, ...
- Network: friends and followers of participants
- Propagations: graph-based features of propagation trees
- Top-elements: share of most frequent URL, author, @mention, #hashtag
Newsworthy events:
92% precision at 92% recall
Credible events:
87% precision at 83% recall
Users believed these are
"likely to be false" or
"almost certainly false"
747 events deemed "newsworthy"
by automatic classifier
7 labels per newsworthy event
Events from Twitter Monitor
[Mathioudakis & Koudas 2010]
April to July 2010, English tweets.
e.g.: Earth Day, floods in Nashville, woke with a hangover, ...
Spreading a specific news event
-or-
Conversation or comments among friends
- Lennon lyrics auctioned for 1.2m [true]
- Suspicious vehicle causes NYC scare [false alarm]
- Free video calling app for Android [spam]
- Riots: BART murder trial veredict [partially true]
- Have a URL
- Don't have question or exclamation marks
- Express a negative sentiment
- Are re-posted by prolific, and well-connected users
Crowdsourced task, 383 events
7 labels per event
Follows [Alonso et al. 2010]
Can we automatically detect
false "events"?
Two classification tasks:
- Find newsworthy events
- Among them, find credible events
Examples:
digital texts: author, source, ads, design, ...
physical world: many!
4 features: share of top URL, author, @mention, and #hashtag
Summary
Discussion & Future work
- Online/early classification
- Other types of misinformation
- Spam (commercial)
- Astroturfing (political)
- Detected past events on twitter
- Took a sample of 10 tweets per event
- Supervised classification
- Found newsworthy events
92% precision at 92% recall
- Found credible newsworthy events
87% precision at 83% recall
@ChaToX @MMendozaRocha @BPoblete
@YahooLabs