Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Efficient Automated Signatures Extraction Implementation

No description
by

g p

on 20 March 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Efficient Automated Signatures Extraction Implementation

Efficient Automated Signature Extraction Implementation
M.Sc Final Project
Goal: Fast implementation of an automatic signature extraction system
Efficient Automated Signatures Extraction Implementation
Evaluation
Signature Extraction System
Environment
The Interdisciplinary Center, Herzlia
Efi Arazi School of Computer Science

Golan Parashi

March 2016
Command Line Interface
Web Application
“Zero-Day Signature Extraction for High Volume Attacks” [1]
Based On
Analyzing Traffic
Known Implementation
Improved
Overview
Better
k-gram
Merge
Complete Signatures
(Problematic) Example:
Packets contain host names: www.zzz.com, app.bbb.com ...

Common suffix is
“.com”

k-gram
with high count


www.aaa.com
appears couple of times -> signature candidate (only lower limit check)

Avoiding Division by Zero
Longer Signatures
Example:
k-gram
length is 3, input:
gol
ERE
gol

Longest signature that can be extracted:
gol
EREgo because of the second appearance of "
gol
"
New code is able to extract "
gol
ERE
gol
"
+ Quality
POC code works with strings - support text protocols

Extended Protocol Support
Streamlined Run-Time
POC code is user interactive and requires user input in several steps
POC code is monolithic, and difficult to read
Modularity & Maintainability
Development Methodology
Example
LCU_InsertIntoHashtable() -
28.6%
out of total execution time
SimpleBuffer
custom class instead of std::vector in HH module
Data Sets
1. Real peacetime + Real attack: path traversal to get information on pizzeria restaurants
Measurements
Bibliography
[1] Yehuda Afek, Anat Bremler-Barr, Shir Landau Feibish (2014). Zero-Day Signature Extraction for High Volume Attacks.

[2] Graham Cormode , Marios Hadjieleftheriou (2008). Finding Frequent Items in Data Stream.

[3] Peter Kankowski (2011). Benchmarking CRC32 and PopCnt instructions. http://www.strchr.com/crc32_popcnt

[4] gperftools. https://code.google.com/p/gperftools

[5] Valgrind. http://valgrind.org

[6] cppcheck. http://cppcheck.sourceforge.net

Validate algorithm in paper

Proof of concept (POC)
+Accurate
Faster
+Accurate
Improved
+ Quality
New code use
command line parameters
New code is modular, well structured, readable and maintainable
std::vector::assign() is to blame!
LCU_InsertIntoHashtable() -
5.3%

out of total execution time
2. Real peacetime capture no attack (1GB capture)
3. Real peacetime + synthetic attack: URL parameter tampering to force real-time image resizing by bypassing cache.
4. Synthetic peacetime + attack: 2 types of attack
strings_counter is initialized to 0
Division by Zero in algorithm:

In
TripleHeavyHitters
pseudo code
1. Write/Fix Code
2. Run profilers (CPU, Memory)/code-analyzer
3. Detect issue (slowness, leaks, etc)
Tools: gperftools[4], Valgrind[5], cppcheck[6]
Faster
Increased Throughput
POC code: 32Mbs.

New code: Observed 1.1Gbs on some inputs

C++
Speed, code reuse, proved standard library
Hardware Acceleration
Compute CRC32 codes using special CPU instructions
2-3 times faster than SW implementation [3]
CRC32 is used as a hash function
Improved Lookups
POC code iterate and perform sub-string search
White-list Signature Filtering
Updating Item Count in HH Component
Item handled for this packet?
updating
k-gram
to HH1
updating signatures into HH2
POC use std::map<> - O(log n)
New code adds 'last packet' field to HH data-structure - O(1)
New code use hash table lookup
Memory Allocation/Free/Copy
POC code:
LCU_Update(LCU_type *lcu,
std::string newitem
, ...)
New code:
unsigned char * newitem
(pointer/reference)

Most used function (~85% execution time)
Finding Frequent Items
“Finding Frequent Items in Data Stream” [2] implementation

Fixed memory space

Efficient: find, update, remove, insert - O(1)

Modified to support variable length buffers
New code works with buffers - support any protocol
POC code stops building longer signatures if a
k-gram
was already seen in the packet
Algorithm merge
k-grams
if:
C++ templates instead of polymorphism
Inline functions (avoiding function call overhead)
Built to run on Linux
Example: "attack
S
ite"

"
S
it"
k-gram
ends signature merging: "attackSi"
"
S
it"
k-gram
is skipped
Next signature starts with "ite"
k-gram

In POC code, when

a
k-gram
ends a signature merging, it is not used as a initial value for next signature
`
This research was supported by the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC Grant agreement no 259085.
Acknowledgment
Full transcript