Prezi

Share this prezi

Who can edit:

Present Online

Send the link below via email or IM to invite your audience

Copy

Start the presentation

Start presenting

  • Invited audience will follow you as you navigate and present
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can view together your prezi
  • Learn more about this feature in the manual

Download prezi for:

Present offline on a PC or Mac.

  • Embedded YouTube videos need an active Internet connection to play.
  • Portable prezis are not editable.

Edit and present offline with Prezi Desktop

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

On the Accurate Identification of Network Service Dependencies in Distributed System

Talk for LISA '12, San Diego, CA
by Barry Peddycord III on 13 December 2012

Comments (0)

Please log in to add your comment.

Report abuse

Prezi Transcript

Barry Peddycord III NC State University bwpeddyc@ncsu.edu @isharacomix On the Accurate Identification of Network Service Dependencies in Distributed Systems Motivation USENIX LISA '12 #NSDMiner MySQL DB Web Portal Shibboleth Client Machine Local-Remote Dependency Remote-Remote Dependency Approaches that involve installing an agent on the machines to be monitored in the form of a kernel module or middleware. Host-Based Approaches Example NSDMiner Evaluation Deployment Conclusions Problem Network Service Dependencies Defined by configuration parameters and source code Each service does it differently! Often very intricate and subtle Hard to keep track How good are YOUR docs? Want to identify them automatically Prior Work Two Paradigms Patterns in the behavior of the network can model its structure Previous approaches fall into two categories: A dependency is a relationship between two services A and B such that A (the depending service) contacts B (the depended service) to complete a task. A network service is a software application that runs on a server and listens on a port for connections from other applications. Host-Based Accurate, but intrusive Install an agent (i.e. a kernel module) to track socket/application behavior Magpie [OSDI 2004] Pinpoint [NSDI 2004] Macroscope [CoNEXT 2009] Intrusiveness makes them unattractive Security risks Resource contention Network-Based Treat hosts as black boxes Data-mine on-the-wire network traffic to extract relationships Sherlock [SIGCOMM 2007] eXpose [SIGCOMM 2008] Orion [OSDI 2008] NSDMiner [INFOCOM 2011] High-false positive/false negative rates Why bother? Know thyself If dependencies are discovered after a failure occurs, it's too late Knowing in advance Improves response time Allows pro-active action to be taken on mission-critical services Networks are dynamic Intuition NSDMiner Non-intrusive and fairly accurate Open Source Python Module http://sf.net/p/nsdminer Future work includes Making it work in real time Identifying remote-remote dependencies This work is supported by the U.S. Army Research Office (ARO) under MURI grant W911NF-09-1-0525 Dr. Peng Ning NC State University pning@ncsu.edu Dr. Sushil Jajodia George Mason University jajodia@gmu.edu Ranking Confidence equals: log( number of nested A->B flows ) log( number of times A is accessed ) Post-processing Given a Communication Graph Less-used services are vulnerable to false positive, false negative Post-processing uses overall structure to fine-tune results Why logs? Two Important Properties Not all nested flows are equal Give candidates with more evidence more weight "every other flow" means more when it's 10000 than 100 Later flows are worth less Is 90% less convincing than 95%? Inference Clustering The Output List of Dependency Candidates Returns each network service and all of its dependency candidates Dependencies ordered by most-likely to least-likely Should be verified by hand, so a few false positives are acceptable Design NSDMiner Clustering Inference Open Source Available on SourceForge Written as a Python Module 'import nsdminer' Comes with a command-line interface for processing data What's needed Collect the Data Collect all network traffic from network switches Export netflows from switches Use packet mirroring - forward and save all pcap headers of packets Usually a week of packets is needed Using NSDMiner Just install and run! Run 'nsdminer' to process your data Command line options let you choose various parameters Detailed in the paper and README Output will be a list of services, dependencies, and confidence values Going Beyond Extend and improve NSDMiner using the features of the 'nsdminer' Python library. Use it in your own networks and let us know how it works for you in the SourceForge forum! Intuition Consider a Web Host Many servers are configured the same way (HTTPD) with the same dependencies (MySQL, SMTP, etc) Some are more popular than others, having more traffic Identify dependencies of less used servers by identifying 'similar' services Example Ground truth: All A's depend on D's Observed traffic: Sim 2/3 Sim 1/2 Similarity(X,Y) = shared deps total deps Intuition Backups and Load-Balancing In a load-balancing cluster, a depending service will eventually utilize all cluster nodes In a backup-cluster, a service will use the primary nodes until they fail, then move to backup nodes In both cases, if a service uses one node in a cluster, it uses them all Example 75% Agreement 75% Agreement 100% Agreement Algorithm Identify all pairs of similar services above a certain similarity threshold Combine pairs into similarity groups Calculate agreement on dependency candidates Infer dependencies from members of similarity group to most agreed-upon candidates Algorithm Count the number of times that pairs of services are depended upon by the same service For services that have support above a certain threshold, these services are considered to be in clusters Re-interpret services that depend on services in clusters as depending on the entire cluster itself.
See the full transcript