Loading presentation...
Prezi is an interactive zooming presentation

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

ICDM 2014 - Chordalysis MML

Talk to the IEEE International Conference on Data Mining
by

Francois Petitjean

on 22 June 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of ICDM 2014 - Chordalysis MML

Chordalysis-MML
It works!
Motivation
We need to know why!
An information-theoretic scoring of decomposable models
Experiments: it works!
Take-home message
We have very effective approaches for making predictions from data
Spam detection
Credit card fraud detection
Weather prediction
Cancer prediction
How good are we at understanding the why of our prediction systems?
"We predicted you that you are going to get
cancer
in 4 years.
What you can do is...
"
Temperature on Earth is likely to increase by 3°C,
because...

Data will be even more valuable when we know
how to take actions
that will make a difference.
A statistically efficient and scalable method
for log-linear analysis of high-dimensional data

Log-linear analysis for more than 2 variables
Closed form iff the graph is chordal (triangulated)
Chordalysis-MML
Chordalysis-MML

2. Efficient computation of sub-scores -
4. Details about chordal graphs:
Controlled environment
Real-world examples
We need a controlled environment...
... because in reality, we never have the "ground truth".
Chordalysis
Good?
Bad?
... hard to tell for sure...
Experimental methodology
1. choose a structure
2. parametrize the model
3. sample data
Chordalysis
4. learn structure
5. compare structure
6. output stats
Results
Study of the elderly
25 variables
15,000 patients
Customer management
85 variables
6,000 clients
Finance
500 variables
20 years of trading
Computation: 4s
...
31,000 images
300+ attributes
1,000 proteins
700+ attributes
10,000 users
70 attributes
Chordalysis-MML:
1. can analyse high-dimensional data
2. is parameter free

3. is released on
http://sourceforge.net/p/chordalysis/
http://www.francois-petitjean.com
francois.petitjean@monash.edu
@LeDataMiner
Monash University - Melbourne
...
#mistakes
IEEE ICDM 2014
...a very simple example
"Is
voting
for the presidential election independent of the being a
member of one or more associations
?"
Stats
Why log-linear analysis?
The null model
Vote
Member
How to compute E?
Expectation
The hypothesis
Vote
Member
Expectation
Let's do the maths
For multinomial sampling, the null model will be rejected when gets large.
... i.e. when
We have
we reject the null hypothesis
1. formulate hypotheses
Collect data
Vote
Member
?
2. Get data
3. Evaluation
Information theory
Vote
Member
Vote
Member
Models
Length
Model
Length
Data
How it
works
2 things have to be sent
1. What model it is?
2. The parameters of the model
There are 2 possible models
=>
1 bit
François Petitjean, Lloyd Allison and Geoff Webb
e.g.
Send denominator 1473 (all models will have it)
Send 987
Model => 1 bit
2 parameters
=> 18.9 bits
Total = 19.9 bits
Model => 1 bit
3 parameters
=> 29.8 bits
Total=30.9 bits
If you know that 987 people voted, how many bits does it take to reconstitute the data re
Voted
?
This is an ordering problem:
Positions {
1
,
2
,
4
,
5
,
8
,...}
1
2
4
5
8
There are
orderings
Sending any takes
Total = 2,742 bits
Total = 2,690 bits
Vote
Member
Total
2762
bits
2720
bits
more probable than
Vote
Member
Vote
Member
Disclaimer: it's actually a bit more complicated than that, see our paper - Section III.A
No
General framework: selecting a statistically significant log-linear model = superset of Markov Networks
generalizing
Decomposable models
If the data has more than
10
variables, the statistical approach to log-linear analysis
only works for the class of
decomposable models
See our
ICDM 2013 paper
Question addressed by this paper:
Can we design a Bayesian framework that keeps a very
low-rate of false discovery
and scale to datasets with
100+ variables
?

general form
Scoring
Scoring a decomposable model
All we need is...
The Graph
The parameters
The model
The data
given the model
1. sending : the number of edges
2. sending where the edges are
Encoding the structure of the graph
We only need to send the edges:
Note
: the number of nodes/variables is common to all models => no impact on the score => no need to take it into account
max number of edges
Encoding the parameters of the model
Score
decomposable
by cliques
multinomial distributions, e.g.
... and every parameter is a ratio
A clique C has
parameters.
Encoding the data given the model
Height
Weight
Gender
Height
Weight
Gender
tall
tall
small
med
...
normal
heavy
normal
light
...
F
M
M
F
...
Sending positions for:
- combinations of
height
and
weight

- combinations of
height
and
gender

Height
Weight
small
med
tall
light
norm.
heavy
a
b
c
d
e
f
g
h
i
Sending the positions for the clique
weight-height
costs:
N is the number of transactions/examples in the dataset
The positions for
height
would be sent twice!
minimal separators of the graph
Proved
What are the elements that are sent twice?
how?
Scoring models that have only 1 edge difference
Scoring can be done efficiently
function of only
4
cliques
1
2
3
4
parameters
data
Proved
Simple graph:
- 500 maximal cliques
- 500 min. separators
We only need to consider
4
of them
Efficient
Not in the talk...
What else will you find in the paper?
...but in the paper!
3. Memoization of partial results
- How to elect the edges that will keep the graph triangulated?
- How to compute maximal cliques and minimal separators?
1. Theory for scoring edges' addition
very
efficiently
42
bits difference
http://prezi.com/s94i5nsrdwdb/
Learning graphical models from data
1. Learn structure from data
2. Use the structure
Belief Propagation
Alice has difficulty walking
She had a
heart attack
10 years ago
She takes medication for her
high blood pressure
The doctor suspects Type 2 diabetes
p(Diabetes|...)=0.01
Female
60+
No
Yes
60-65kg
No
170+
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
No
Yes
?
?
Yes
No
No
?
Diabetes is not likely, the doctor decides to change her medication for high blood pressure
Studying correlations & independencies
In data:
The systolic blood pressure is independent of the weight, given
gender
diastolic blood pressure
medication for high blood pressure
A weight change is not likely to change the systolic blood pressure only
Information theory
F-measure
number of samples
Full transcript