### Present Remotely

Send the link below via email or IM

CopyPresent to your audience

Start remote presentation- Invited audience members
**will follow you**as you navigate and present - People invited to a presentation
**do not need a Prezi account** - This link expires
**10 minutes**after you close the presentation - A maximum of
**30 users**can follow your presentation - Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

### Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.

You can change this under Settings & Account at any time.

# Automata

No description

Criticize on related journal article related to

automata theory and formal language Prepared By:

Fartash Haghanikhameneh

Seyed Ahmad Mousavi

Yazan Ahmad Al Sariera

Prepared for:

DR. NOR FAZLIDA MOHD SANI

Article Detail Sun, Y., & Kim, M. S. (2011). DFA-Based Regular Expression Matching on Compressed Traffic. 2011 IEEE Computer Engineering and Technology (ICCET) (pp. 1 –5). doi:10.1109 deep packet inspection used by many network security applications. This method checks header and payload portion of package to identify class of applications, virus, attack signature, etc. In order to identify and categorise the packets they hardly depended on accurate analysis of packets content which is done by regular expression technique. Regular expressions are implemented using finite automata, which take payload of a packet as an input string. However, existing approaches, including non-deterministic finite automata (NFA) and deterministic finite automata (DFA) do not deal with compressed traffic, which become more and more popular in HTTP applications. Research Problem Hypothesis Propose an efficient algorithm for regular expression matching to implement deep packet inspection on Related work Nowadays, regular expressions are the language of choice in NIDS (network intrusion detection systems) from commercial vendors such as 3Com’s TippingPoint X505 [10] and Cisco

IPS [5], as well as open source NIDS such as Bro [4] and Snort [3]. Those regular expressions are typically implemented using finite automata, either NFA or DFA. ASIC-based application-specific integrated circuits Several commercial network equipment vendors, including 3Com and Cisco have supplied their own NIDS and a number of smaller players have intro-duced pattern matching ASICs for these NIDS. ReCPU is a fast ASIC-based regular expression matching architecture.

In fact, many have argued that deep packet inspection should be done on ASICs. Developing ASICs for NIDS, however, has several disadvantages; it requires a large investment and a long development cycle, and it is hard to upgrade. FPGA-based Field-programmable gate array There is a body of literature advocating FPGA-based pattern matching. It can provide not only fast matching cycle but also parallel matching operations. NFAs are well-suited for FPGA-based matching because of its wide bandwidth and low memory consumption. Pre-decoded

CAMs and Bitwise optimized CAM are FPGA-based architectures that use character pre-decoding coupled with CAM-based patterns to accelerate the matching speed.

This type of approaches, however, is not flexible enough for general-purpose regular expression matching. Besides, it is still expensive and power-consuming. Software-based The software-based approaches are also called general-purpose approaches, and they are based on general-purpose processors or network processors,. DFAs are more popular in software-based approaches because they only need one state transition per input character, which causes at most one memory access for each character input. However, As we mentioned earlier, the practical use of DFAs is limited because of their excessive memory usage. In order to mitigate this issue, many methods

have been proposed. They develop several memory

compression techniques for DFAs, focusing on reducing the number of transitions between states, and in some cases, 99% transitions can be eliminated. Proposed Approach At the beginning, the author presents a number of prerequisites like definition of “HTTP compression”, “NFA”, “DFA”, their characteristics and a number of terminologies used inside the article, and then the story began.

As we know, the main attempt is to reduce the number of transitions and consequently,

Reduce the process time. It’s good to take a look at the first example provided in the paper. Figure.1. illustrates a DFA and NFA accepting regular expressions (a+bc), (bcd+), and (cde) on the alphabet

Σ = {a, b, c, d, e}. All transactions are started from state “0”

Evaluation Authors choose five sets of traffic with different compression rates between 10% and 90%, and build DFA from three different groups of regular expressions, R1, R2 and R3, which contains 100, 200, and 400 regular expressions respectively, collected from Snort rule set. They simulate the number of DFA state access in the original approach and the numbers of DFA state access in the proposed approach for 100 regular expressions, the result illustrated in Table below. Conclusion Criticize And Recommendation As a brief summary for the reviewed paper, the anthers proposed an efficient regular expression matching in deeply packet evaluation for compressed traffic in the network. They primary reviewed the components of Nondeterministic finite automaton and deterministic finite automaton then they decided to choose deterministic finite automaton because it sustains at most one active condition. They work Based on the observation that for the similar entry character will be a high possibility that the following state will be the same state no matter what is the current state, they built an efficient deterministic finite automaton generator that consists of some states and transitions to enhance this property at the most. Their algorithm details the earlier active state pattern and matching pattern. This information allows them to skip some compressed parts of the traffic when the active state is repeated. The simulation results show that their technique effectively skips most of the compressed sections in the traffic and decreases the regularity of deterministic finite automaton state accesses in proportion to the traffic’s data compression ratio. Despite the fact that deterministic finite automaton is not appropriate for a large number of complicated regular expressions , their technique can be very easily extended to be used in single, multiple small DFAs, which will make their approach more scalable. It is time to focus on HTTP compression structure and figure out the relationships.

Compression algorithm is that when a series of connected letters or characters (a substring) has already seemed in the near past, a pair of numbers may be used to characterize this repeated substring.

The pair of numbers is also called a length-distance pair, which represents the distance in bytes of the two substrings and the length in bytes of the repeated substring.

An example would be abcdefghijdefzfg..., for the repeated part we put a pointer to determine the place of repetition and also the number of characters to be repeated. For instance, regarding to mentioned example, it is become to a new form like: abcdefghij (7, 3) zfg…

The author claims that since the repeated part is already in a form of other remained strings, we can skip considering them and in the time of intrusion detection, we would not check them in order to avoid redundant process.

Thank You Security Evaluation shortage Although Authors evaluate the amount of eliminated states and show how their algorithm makes process fast in compare with previous algorithm, but they don’t evaluate how many threat can be detected in real system. In order to evaluate security they need to calculate false positive and false negative to find out how many threat can be detected and how many of them are accurate and how many of them are wrong alarm.

Security evaluation helps to prove, elimination of decoded parts of packets (which is cause to eliminate some state in model) is not effect accuracy (precision and recall) of intrusion detection systems.

Memory constraint According to the statistics, one of the popular and common compression algorithms for internet traffic is called LZ77. Moreover, LZ77 is being utilized in this article’s implementation. However, it also has been stated that the algorithm has some issues such as CPU time and memory resources consumption. Consequently, authors focused on how to solve these problems in the new algorithm. Thus, they proposed an efficient DFA-based approach to perform regular expression matching beside LZ77. Though the enhancement of the new algorithm had some considerable significance like faster process time, but the memory issue still remained unsolved. Running time Another view point can be considering running time in a form of computer science accepted format. In this paper, we had the comparison between previous and current state of pattern matching using regular expressions, however, talking about process speed in terms of Big O notation is missed.Full transcript

by

Tweet## Fartash Haghani

on 3 October 2012#### Transcript of Automata

Criticize on related journal article related to

automata theory and formal language Prepared By:

Fartash Haghanikhameneh

Seyed Ahmad Mousavi

Yazan Ahmad Al Sariera

Prepared for:

DR. NOR FAZLIDA MOHD SANI

Article Detail Sun, Y., & Kim, M. S. (2011). DFA-Based Regular Expression Matching on Compressed Traffic. 2011 IEEE Computer Engineering and Technology (ICCET) (pp. 1 –5). doi:10.1109 deep packet inspection used by many network security applications. This method checks header and payload portion of package to identify class of applications, virus, attack signature, etc. In order to identify and categorise the packets they hardly depended on accurate analysis of packets content which is done by regular expression technique. Regular expressions are implemented using finite automata, which take payload of a packet as an input string. However, existing approaches, including non-deterministic finite automata (NFA) and deterministic finite automata (DFA) do not deal with compressed traffic, which become more and more popular in HTTP applications. Research Problem Hypothesis Propose an efficient algorithm for regular expression matching to implement deep packet inspection on Related work Nowadays, regular expressions are the language of choice in NIDS (network intrusion detection systems) from commercial vendors such as 3Com’s TippingPoint X505 [10] and Cisco

IPS [5], as well as open source NIDS such as Bro [4] and Snort [3]. Those regular expressions are typically implemented using finite automata, either NFA or DFA. ASIC-based application-specific integrated circuits Several commercial network equipment vendors, including 3Com and Cisco have supplied their own NIDS and a number of smaller players have intro-duced pattern matching ASICs for these NIDS. ReCPU is a fast ASIC-based regular expression matching architecture.

In fact, many have argued that deep packet inspection should be done on ASICs. Developing ASICs for NIDS, however, has several disadvantages; it requires a large investment and a long development cycle, and it is hard to upgrade. FPGA-based Field-programmable gate array There is a body of literature advocating FPGA-based pattern matching. It can provide not only fast matching cycle but also parallel matching operations. NFAs are well-suited for FPGA-based matching because of its wide bandwidth and low memory consumption. Pre-decoded

CAMs and Bitwise optimized CAM are FPGA-based architectures that use character pre-decoding coupled with CAM-based patterns to accelerate the matching speed.

This type of approaches, however, is not flexible enough for general-purpose regular expression matching. Besides, it is still expensive and power-consuming. Software-based The software-based approaches are also called general-purpose approaches, and they are based on general-purpose processors or network processors,. DFAs are more popular in software-based approaches because they only need one state transition per input character, which causes at most one memory access for each character input. However, As we mentioned earlier, the practical use of DFAs is limited because of their excessive memory usage. In order to mitigate this issue, many methods

have been proposed. They develop several memory

compression techniques for DFAs, focusing on reducing the number of transitions between states, and in some cases, 99% transitions can be eliminated. Proposed Approach At the beginning, the author presents a number of prerequisites like definition of “HTTP compression”, “NFA”, “DFA”, their characteristics and a number of terminologies used inside the article, and then the story began.

As we know, the main attempt is to reduce the number of transitions and consequently,

Reduce the process time. It’s good to take a look at the first example provided in the paper. Figure.1. illustrates a DFA and NFA accepting regular expressions (a+bc), (bcd+), and (cde) on the alphabet

Σ = {a, b, c, d, e}. All transactions are started from state “0”

Evaluation Authors choose five sets of traffic with different compression rates between 10% and 90%, and build DFA from three different groups of regular expressions, R1, R2 and R3, which contains 100, 200, and 400 regular expressions respectively, collected from Snort rule set. They simulate the number of DFA state access in the original approach and the numbers of DFA state access in the proposed approach for 100 regular expressions, the result illustrated in Table below. Conclusion Criticize And Recommendation As a brief summary for the reviewed paper, the anthers proposed an efficient regular expression matching in deeply packet evaluation for compressed traffic in the network. They primary reviewed the components of Nondeterministic finite automaton and deterministic finite automaton then they decided to choose deterministic finite automaton because it sustains at most one active condition. They work Based on the observation that for the similar entry character will be a high possibility that the following state will be the same state no matter what is the current state, they built an efficient deterministic finite automaton generator that consists of some states and transitions to enhance this property at the most. Their algorithm details the earlier active state pattern and matching pattern. This information allows them to skip some compressed parts of the traffic when the active state is repeated. The simulation results show that their technique effectively skips most of the compressed sections in the traffic and decreases the regularity of deterministic finite automaton state accesses in proportion to the traffic’s data compression ratio. Despite the fact that deterministic finite automaton is not appropriate for a large number of complicated regular expressions , their technique can be very easily extended to be used in single, multiple small DFAs, which will make their approach more scalable. It is time to focus on HTTP compression structure and figure out the relationships.

Compression algorithm is that when a series of connected letters or characters (a substring) has already seemed in the near past, a pair of numbers may be used to characterize this repeated substring.

The pair of numbers is also called a length-distance pair, which represents the distance in bytes of the two substrings and the length in bytes of the repeated substring.

An example would be abcdefghijdefzfg..., for the repeated part we put a pointer to determine the place of repetition and also the number of characters to be repeated. For instance, regarding to mentioned example, it is become to a new form like: abcdefghij (7, 3) zfg…

The author claims that since the repeated part is already in a form of other remained strings, we can skip considering them and in the time of intrusion detection, we would not check them in order to avoid redundant process.

Thank You Security Evaluation shortage Although Authors evaluate the amount of eliminated states and show how their algorithm makes process fast in compare with previous algorithm, but they don’t evaluate how many threat can be detected in real system. In order to evaluate security they need to calculate false positive and false negative to find out how many threat can be detected and how many of them are accurate and how many of them are wrong alarm.

Security evaluation helps to prove, elimination of decoded parts of packets (which is cause to eliminate some state in model) is not effect accuracy (precision and recall) of intrusion detection systems.

Memory constraint According to the statistics, one of the popular and common compression algorithms for internet traffic is called LZ77. Moreover, LZ77 is being utilized in this article’s implementation. However, it also has been stated that the algorithm has some issues such as CPU time and memory resources consumption. Consequently, authors focused on how to solve these problems in the new algorithm. Thus, they proposed an efficient DFA-based approach to perform regular expression matching beside LZ77. Though the enhancement of the new algorithm had some considerable significance like faster process time, but the memory issue still remained unsolved. Running time Another view point can be considering running time in a form of computer science accepted format. In this paper, we had the comparison between previous and current state of pattern matching using regular expressions, however, talking about process speed in terms of Big O notation is missed.