Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


State Machines in Data Compression

An exploration of state machines in Data Compression.

Alec Turnbull

on 4 May 2010

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of State Machines in Data Compression

State Machines in Data
Compression By: Alec Turnbull A finite-state machine is a set of states with transitions from state to state triggered by actions. Finite-state machines can be used to describe any process with a finite number of states. Coding messages with input alphabet
I = {a,b,c,d}

Represent 'a' as 00, 'b' as 01, 'c' as 10,
'd' as 11 Variable-length Code
Represent 'a' as 0, 'b' as 10, 'c' as 110
'd' as 111. Given the input sequence 'abbaacaabcbacdb' we find that the frequency of an 'a' is .4, 'b' is .333, 'c' is .2, and 'd' is .067.
It is customary to call these frequencies probabilities. Encoding the initial data with the straightforward scheme gives us the encoded binary message


with 30 bits. The variable-length code provides the encoded message


, which has 28 bits.
Represent 'a' as 0, 'b' as 1, 'c' as 11, 'd' as 10. Kraft Inequality ∑2 ≤ 1 -Li 2-1 + 2-1+ 2-2+ 2-2= 1.5

2-1 + 2-2+ 2-3+ 2-3= 1

The optimum length is

Li = -log2Pi.

So, the optimal average code length is

H = ∑pi(-log2pi)
Entropy H = ∑p (-log p )

P(a) = .8
P(b) = .2

a=0, b= 1

10 bits aa = 0
ab = 10
ba = 110
bb = 111 aa ab ba aa aa


8 bits .8 log .8 + .2 log .2 ½ ½ .7219

The optimum length for our code is 7.219 bits.
1.Start with a current interval,[L,H), initialized to [0,1).

2. For each symbol in the file,
a.Subdivide the current interval into two subintervals proportional to the estimated probability that the symbol will be the next symbol to occur.
b.Select the subinterval corresponding to the symbol that occurs and establish it as the current interval.

3. Output enough bits to distinguish this subinterval from all other possible final subintervals.
Arithmetic Coding 1.Start with a current interval,[L,H), initialized to [0,1). 0 1 ) .8 .64 ) .512 ) .4096 | .49152 | ) ____________ ) | 0.49152 0.4982308864
0.01111111 0.01111101
Arithmetic coding comes arbitrarily
close to the optimum code length.

In our case, we must output 8 bits, which is close to our optimum of 7.219 bits. 0.01111111100011000000111100110010 Quasi-Arithmetic Coding Quasi-arithmetic coding differs from
arithmetic coding in a couple of ways:

1. Restrict the allowable subintervals
to a limited few.

2. Output bits as soon as we know them
and scale up the interval.
The allowable subintervals for our quasi-arithmetic coder are:

[0,1), [0,.75), [.25,1) 0.01111101110101000100000100110101 Output .01111110 to identify
our subinterval from all other
subintervals. This has 8 bits. We begin with the interval [0,1).

The input of an 'a' leads us to the interval [0,.8),
which we round to the nearest subinterval, [0,.75).

We output nothing and move on to the next

The next 'a' gives us the subinterval [0,.6).

This subinterval is rounded to [0,.5) and scaled up to [0,1).

We output '0' and move on to the next symbol.

The third 'a' from the interval [0,1) gives the subinterval [0,.8) which is rounded to [0,.75).

The 'b' input causes us to select the subinterval [.6,.75). Since this is not close to any acceptable subinterval, we expand the subinterval twice to [.35,1). We then round this to [.25, 1) and output '11', indicating that we scaled up twice with respect to a 'b'.
Continuing this process for the entire data stream


we arrive at the coded message of


Rather than performing the subinterval calculations and
expansions every time we wish to encode a data stream, we can perform these calculations once and substitute a state machine to perform these processes. Probability estimation is key to effective data compression.
One can employ static probabilities, as we have, to model
the data. On the other hand, an method may be used to
adaptively provide probabilities based on the context of the
data we are estimating.
P = (1-s)P + s for a input 0 for b input t+1 t { a
a P(a) code length .5
To model our data we will use an s value
of .7. 1
.003208 In this case, employing a high speed of adaptation was a great choice, as we arrived at a total code length of 7.222, which is very close our optimum of 7.219. Rather than performing these calculations every time,
we can create a state machine with these probabilities as
states, and round the value to the nearest state.

Here is an example with a speed of adaptation, s, of .5. Using a speed of adaptation of .5, and using our state machine,
we arrive at a total code length of 5.49 bits.

The reason for this low code length is that the speed of adaptation does not allow for the 'b' inputs to reach low probabilities. i i 2 ) .507904 ____________ One method of probability estimation is to provide a global estimate of the probability of that particular symbol.

number of symbol i
total number of symbols ____________ i p = .5046272 ) .50200576 ) ____________ | ) .49152 .5002076 ) .49908608 ) .4982308864 The next 'b' gives us the subinterval [.85,1).
This is scaled up to [.7,1), again to [.4,1), and once
more to [0,1). We then output '111' and move to the
next symbol. a
a P(a)
Code length
Full transcript