Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

DGA Domain Classification

No description
by

Josiah Hagen

on 7 December 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of DGA Domain Classification

Length of 2nd level domain
Results
Building a Better Botnet DGA Mousetrap
Separating Rats, Mice and Cheese in DNS Data
Josiah Hagen
Hewlett Packard Enterprise TippingPoint
Miranda Mowbray & Prasad Rao
Hewlett Packard Labs
Goals
Feature Vector
Classify domain names as benign and malicious
Classify malicious domains according to DGA family
Minimize false positive classifications of malicious
Not a Goal
Solution
Gather domains by provenance
Training
Evaluation
Determine groups with matching
syntactical features
Classifiers for Benign / Malicious
Classifiers for family or origin
Unknown domains one at a time
Determine coarse and fine syntactical features
Classify if Benign / Malicious
Classify family or origin
Top level domain
Coarse
Fine
Number of levels in domain
Regular Expressions
Intersections
Possible Subsets = 2
Handful in practice
Leaf
Lobe
n
Feature Selection
Feature Selection
Quantify everything about a string
Coarse and fine syntax the same for elements within a lobe
Aggregates
N-Grams
Characters by Position
Words
2LD Counts
A-F
G-Z
a-f
g-z
consonants
vowels
digits
ISO-8859-1
uppercase foreign
lowercase foreign
other printable
other non-printable
doubles 'aa' - 'zz'
non-linguistic bigrams

Overall Counts
Dots Dashes
Dots, Dashes, & Underscores
RFC 1034 Violations
length > 254
labels > 63
labels not [a-z].*
labels not .*[a-z0-9]
empty labels '..'
invalid chars
(22)
1-grams
Counts of characters
2-grams
Counts of character
pairs: 'aa' 'ab'... 'zz'
Counts of character
triples: 'aaa', 'aab'...'zzz'
3-grams
Separate linguistic / non-linguistic elements
Catch bias in DGA PRNG
Space
Pros
Cons
Overfitting
(256)
(1600)
(64000)
Reduce
to 40
Forward
Backward
Boolean slots for whether a given character occurs within a given position indexed from beginning and end of domain
Classifying fixed substrings
Space
Pros
Cons
Overfitting
Banjori, Bankpatch, Caphaw, Web Services
(10240)
(10240)
2LD
Prefix
(4)
(4)
Counts of words
Max count of non-overlapping words
Max percentage of characters comprised of words
Length of the longest word
Classifying Benign vs Malicious
Time
Pros
Cons
Overfitting
Matsnu, Rovnix, Suppobox
Conclusions
Syntactical rules help
Unbalanced data hurts
Not a standalone solution
Results worse on real data
Especially word based DGA FPs
Some features are good for classifiers
Aggregates
Linguistic or not (bigrams)
Word based
Hash words to compress dictionary to reduce FP
Build classifers for infected hosts
Determine which hosts are infected with which malware
Length of prefix
Syntactical Rules
Full transcript