@twase
15 min platform overview
crawlers
tracking approx. 4,000 non-spam
and 1,000 spam Screen Names
> 1,000,000 Tweets in corpus database
external
17,500 malware and
scammer sites
~10,000 profile images
API status
focus on engine, not on
externally facing service
API level with scan
module within a week
API example
XML right now, next-up most likely JSON
https://api.twase.com/<key>/<ver>/<cmd>/<p1>/<p2>/...
.../LATEST/scan/jchivers
staying ahead
constant observation by humanoids
future -
spam classification
adult (see my hot pics!)
IT (cheap $oftwarez)
health (lose weight)
education (online degree)
finance (make $$$$'s!!!)
future -
predator
tracks new Screen Name creation
launches full scan and crawl
when account age reaches X days
2
4
1
3
4
1
5
2
3
each module must
return Grading 1-5
grades specific to
the scan module
account_age
GRADE 1 > 6 months
GRADE 2 > 1 month
GRADE 5 > 2 weeks
GRADE 4 > 1 week
GRADE 5 ~ 1 day
utilises weighting maxtrix
to determine overall grade
evaluator
module:value, module:value, module:value,...
intro
a technology that given a Twitter
Screen Name*, can determine whether
the thing behind it is a spammer,
marketeer, bot or disruptive member
of the Twitterverse
*
screen name vs. user(s)
screen name = a single Twitter account
user = the 'thing' behind the screen name
developer-focussed
API offering XML (and JSON later)
for developers building
Twitter clients and applications
e.g. desktop, data mining app, screen name ranking
and statistics or inclusion into existing platform
@devnest
scan modules
approx. 40 scan modules
all results will be made
available via the API
~ 400,000 links
!twase
TwitChuck
Twerp Scan
Clean Tweets
TwitBlock
Tweepi
2,000 commonly used
spam/scam profile images
bio
GRADE 1 = 0 spam words
GRADE 2 = 1 spam words
GRADE 3 = 2 spam words
GRADE 4 = 3 spam words
GRADE 5 >= 4 spam words
e.g. account_age vs. f_f_ratio vs. malware_link
600 spam words
400 spam phrases
honeypots
.../LATEST/set/account_age/grade1/86400
.../LATEST/get/custom_mod/average_execution_time
module feedback loop
into weighting matrix
Adams825 Beasley937 Blackburn678 DessieMolina17 Ewing924 Finley672 FlossieCox37 FriedaCote888 Fuentes380 GeriMelton137 Gilliam300 Hewitt37 JohnnieVega551 LauriGood690 LilianBoyd28 Little624 ManuelaRuiz655 Martinez314 Mccray760 Mcmillan246 MelvaFrye770 NadaCarter554 Potter343 SheilaMarks794 StacieMoses328 VelvaJohns290 Waters934 YongCrosby822 cinecritic2123 double_tre7732 kjpalladin4698
ChicagoDri1666 ChicagoDri1832 ChicagoDri3852 ElleBrigit9242 FreeMixDow2013 FreeMixDow8322 HousewareP9655 RichardRBa2732 YoungMogul6154 cherrylin84612 dbreakfast1393 dezentmusi806 dezentmusi8629 essenceand3868 essenceand6770 jordanrepe2050 kjpalladin5209 kjpalladin849 lisadarwen2879lisadarwen2996lisadarwen7565 maikeMonst2089maikeMonst2498 maikeMonst410 maikeMonst4613 maikeMonst6903mortadelaf3366 mortadelaf6152 mygodreign3020 mygodreign6933 princezkha3964 sorryitwas5977 tommyfaiza5406 vexintheci9751
example Screen Name composition
name + numbers
random word + numbers
investigate GRQ Twitter apps
future
'live' spam feed
SNX/www crawling
distributed scan engine
. . .
gathering data
scan engine
spam grading
honeypot analysis
external partnerships/data
history
by @jchivers for #devnest London 4More presentations by
Popular presentations
Trello Architecture
Brett Kiefer on
This is the visual part of a talk I gave on the trello.com architecture at the MongoDB user group on 18 Jan 2012. Blog post ...
More popular prezis in Explore>