**Chordalysis-MML**

**It works!**

**Motivation**

We need to know why!

An information-theoretic scoring of decomposable models

Experiments: it works!

**Take-home message**

We have very effective approaches for making predictions from data

Spam detection

Credit card fraud detection

Weather prediction

Cancer prediction

How good are we at understanding the why of our prediction systems?

"We predicted you that you are going to get

cancer

in 4 years.

What you can do is...

"

Temperature on Earth is likely to increase by 3°C,

because...

Data will be even more valuable when we know

how to take actions

that will make a difference.

**A statistically efficient and scalable method**

for log-linear analysis of high-dimensional data

for log-linear analysis of high-dimensional data

Log-linear analysis for more than 2 variables

Closed form iff the graph is chordal (triangulated)

Chordalysis-MML

Chordalysis-MML

2. Efficient computation of sub-scores -

4. Details about chordal graphs:

Controlled environment

Real-world examples

We need a controlled environment...

... because in reality, we never have the "ground truth".

Chordalysis

Good?

Bad?

... hard to tell for sure...

Experimental methodology

1. choose a structure

2. parametrize the model

3. sample data

Chordalysis

4. learn structure

5. compare structure

6. output stats

Results

Study of the elderly

25 variables

15,000 patients

Customer management

85 variables

6,000 clients

Finance

500 variables

20 years of trading

Computation: 4s

...

31,000 images

300+ attributes

1,000 proteins

700+ attributes

10,000 users

70 attributes

**Chordalysis-MML:**

**1. can analyse high-dimensional data**

2. is parameter free

2. is parameter free

**3. is released on**

http://sourceforge.net/p/chordalysis/

**http://www.francois-petitjean.com**

**francois.petitjean@monash.edu**

**@LeDataMiner**

**Monash University - Melbourne**

...

#mistakes

**IEEE ICDM 2014**

...a very simple example

"Is

voting

for the presidential election independent of the being a

member of one or more associations

?"

Stats

Why log-linear analysis?

The null model

Vote

Member

How to compute E?

Expectation

The hypothesis

Vote

Member

Expectation

Let's do the maths

For multinomial sampling, the null model will be rejected when gets large.

... i.e. when

We have

we reject the null hypothesis

1. formulate hypotheses

Collect data

Vote

Member

?

2. Get data

3. Evaluation

Information theory

Vote

Member

Vote

Member

Models

Length

Model

Length

Data

How it

works

2 things have to be sent

1. What model it is?

2. The parameters of the model

There are 2 possible models

=>

1 bit

**François Petitjean, Lloyd Allison and Geoff Webb**

e.g.

Send denominator 1473 (all models will have it)

Send 987

Model => 1 bit

2 parameters

=> 18.9 bits

Total = 19.9 bits

Model => 1 bit

3 parameters

=> 29.8 bits

Total=30.9 bits

If you know that 987 people voted, how many bits does it take to reconstitute the data re

Voted

?

This is an ordering problem:

Positions {

1

,

2

,

4

,

5

,

8

,...}

1

2

4

5

8

There are

orderings

Sending any takes

Total = 2,742 bits

Total = 2,690 bits

Vote

Member

Total

2762

bits

2720

bits

more probable than

Vote

Member

Vote

Member

Disclaimer: it's actually a bit more complicated than that, see our paper - Section III.A

No

General framework: selecting a statistically significant log-linear model = superset of Markov Networks

generalizing

Decomposable models

If the data has more than

10

variables, the statistical approach to log-linear analysis

only works for the class of

decomposable models

See our

ICDM 2013 paper

Question addressed by this paper:

Can we design a Bayesian framework that keeps a very

low-rate of false discovery

and scale to datasets with

100+ variables

?

general form

Scoring

Scoring a decomposable model

All we need is...

The Graph

The parameters

The model

The data

given the model

1. sending : the number of edges

2. sending where the edges are

Encoding the structure of the graph

We only need to send the edges:

Note

: the number of nodes/variables is common to all models => no impact on the score => no need to take it into account

max number of edges

Encoding the parameters of the model

Score

decomposable

by cliques

multinomial distributions, e.g.

... and every parameter is a ratio

A clique C has

parameters.

Encoding the data given the model

Height

Weight

Gender

Height

Weight

Gender

tall

tall

small

med

...

normal

heavy

normal

light

...

F

M

M

F

...

Sending positions for:

- combinations of

height

and

weight

- combinations of

height

and

gender

Height

Weight

small

med

tall

light

norm.

heavy

a

b

c

d

e

f

g

h

i

Sending the positions for the clique

weight-height

costs:

N is the number of transactions/examples in the dataset

The positions for

height

would be sent twice!

minimal separators of the graph

Proved

What are the elements that are sent twice?

how?

Scoring models that have only 1 edge difference

Scoring can be done efficiently

function of only

4

cliques

1

2

3

4

parameters

data

Proved

Simple graph:

- 500 maximal cliques

- 500 min. separators

We only need to consider

4

of them

Efficient

Not in the talk...

What else will you find in the paper?

...but in the paper!

3. Memoization of partial results

- How to elect the edges that will keep the graph triangulated?

- How to compute maximal cliques and minimal separators?

1. Theory for scoring edges' addition

very

efficiently

42

bits difference

http://prezi.com/s94i5nsrdwdb/

Learning graphical models from data

1. Learn structure from data

2. Use the structure

Belief Propagation

Alice has difficulty walking

She had a

heart attack

10 years ago

She takes medication for her

high blood pressure

The doctor suspects Type 2 diabetes

p(Diabetes|...)=0.01

Female

60+

No

Yes

60-65kg

No

170+

Yes

Yes

Yes

Yes

Yes

Yes

No

No

Yes

No

Yes

?

?

Yes

No

No

?

Diabetes is not likely, the doctor decides to change her medication for high blood pressure

Studying correlations & independencies

In data:

The systolic blood pressure is independent of the weight, given

gender

diastolic blood pressure

medication for high blood pressure

A weight change is not likely to change the systolic blood pressure only

Information theory

F-measure

number of samples