State Machines in Data

Compression By: Alec Turnbull A finite-state machine is a set of states with transitions from state to state triggered by actions. Finite-state machines can be used to describe any process with a finite number of states. Coding messages with input alphabet

I = {a,b,c,d}

Represent 'a' as 00, 'b' as 01, 'c' as 10,

'd' as 11 Variable-length Code

Represent 'a' as 0, 'b' as 10, 'c' as 110

'd' as 111. Given the input sequence 'abbaacaabcbacdb' we find that the frequency of an 'a' is .4, 'b' is .333, 'c' is .2, and 'd' is .067.

It is customary to call these frequencies probabilities. Encoding the initial data with the straightforward scheme gives us the encoded binary message

000101000010000001100100101101

with 30 bits. The variable-length code provides the encoded message

0101000110001011010011011110

, which has 28 bits.

Represent 'a' as 0, 'b' as 1, 'c' as 11, 'd' as 10. Kraft Inequality ∑2 ≤ 1 -Li 2-1 + 2-1+ 2-2+ 2-2= 1.5

2-1 + 2-2+ 2-3+ 2-3= 1

The optimum length is

Li = -log2Pi.

So, the optimal average code length is

H = ∑pi(-log2pi)

Entropy H = ∑p (-log p )

P(a) = .8

P(b) = .2

aaabbaaaaa

a=0, b= 1

0001100000

10 bits aa = 0

ab = 10

ba = 110

bb = 111 aa ab ba aa aa

01011000

8 bits .8 log .8 + .2 log .2 ½ ½ .7219

The optimum length for our code is 7.219 bits.

1.Start with a current interval,[L,H), initialized to [0,1).

2. For each symbol in the file,

a.Subdivide the current interval into two subintervals proportional to the estimated probability that the symbol will be the next symbol to occur.

b.Select the subinterval corresponding to the symbol that occurs and establish it as the current interval.

3. Output enough bits to distinguish this subinterval from all other possible final subintervals.

Arithmetic Coding 1.Start with a current interval,[L,H), initialized to [0,1). 0 1 ) .8 .64 ) .512 ) .4096 | .49152 | ) ____________ ) | 0.49152 0.4982308864

0.01111111 0.01111101

Arithmetic coding comes arbitrarily

close to the optimum code length.

In our case, we must output 8 bits, which is close to our optimum of 7.219 bits. 0.01111111100011000000111100110010 Quasi-Arithmetic Coding Quasi-arithmetic coding differs from

arithmetic coding in a couple of ways:

1. Restrict the allowable subintervals

to a limited few.

2. Output bits as soon as we know them

and scale up the interval.

The allowable subintervals for our quasi-arithmetic coder are:

[0,1), [0,.75), [.25,1) 0.01111101110101000100000100110101 Output .01111110 to identify

our subinterval from all other

subintervals. This has 8 bits. We begin with the interval [0,1).

The input of an 'a' leads us to the interval [0,.8),

which we round to the nearest subinterval, [0,.75).

We output nothing and move on to the next

symbol.

The next 'a' gives us the subinterval [0,.6).

This subinterval is rounded to [0,.5) and scaled up to [0,1).

We output '0' and move on to the next symbol.

The third 'a' from the interval [0,1) gives the subinterval [0,.8) which is rounded to [0,.75).

The 'b' input causes us to select the subinterval [.6,.75). Since this is not close to any acceptable subinterval, we expand the subinterval twice to [.35,1). We then round this to [.25, 1) and output '11', indicating that we scaled up twice with respect to a 'b'.

Continuing this process for the entire data stream

aaabbaaaaa,

we arrive at the coded message of

01111100.

Rather than performing the subinterval calculations and

expansions every time we wish to encode a data stream, we can perform these calculations once and substitute a state machine to perform these processes. Probability estimation is key to effective data compression.

One can employ static probabilities, as we have, to model

the data. On the other hand, an method may be used to

adaptively provide probabilities based on the context of the

data we are estimating.

P = (1-s)P + s for a input 0 for b input t+1 t { a

a

a

b

b

a

a

a

a

a P(a) code length .5

.85

.955

.2865

.08595

.725785

.917736

.975321

.992596

.997779

To model our data we will use an s value

of .7. 1

.234465

.066427

1.80339

3.48197

.462386

.123849

.036051

.010721

.003208 In this case, employing a high speed of adaptation was a great choice, as we arrived at a total code length of 7.222, which is very close our optimum of 7.219. Rather than performing these calculations every time,

we can create a state machine with these probabilities as

states, and round the value to the nearest state.

Here is an example with a speed of adaptation, s, of .5. Using a speed of adaptation of .5, and using our state machine,

we arrive at a total code length of 5.49 bits.

The reason for this low code length is that the speed of adaptation does not allow for the 'b' inputs to reach low probabilities. i i 2 ) .507904 ____________ One method of probability estimation is to provide a global estimate of the probability of that particular symbol.

number of symbol i

total number of symbols ____________ i p = .5046272 ) .50200576 ) ____________ | ) .49152 .5002076 ) .49908608 ) .4982308864 The next 'b' gives us the subinterval [.85,1).

This is scaled up to [.7,1), again to [.4,1), and once

more to [0,1). We then output '111' and move to the

next symbol. a

a

a

b

b

a

a

a

a

a P(a)

.5

.75

.875

.5

.25

.75

.875

.9375

.96875

.96875

Code length

1

.415037

.192645

1

2

.415037

.192645

.093109

.045804

.045804

### Present Remotely

Send the link below via email or IM

CopyPresent to your audience

Start remote presentation- Invited audience members
**will follow you**as you navigate and present - People invited to a presentation
**do not need a Prezi account** - This link expires
**10 minutes**after you close the presentation - A maximum of
**30 users**can follow your presentation - Learn more about this feature in our knowledge base article

# State Machines in Data Compression

An exploration of state machines in Data Compression.

by

Tweet