Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Machine Learning Jun-30

No description
by

Marco Chierici

on 30 June 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Machine Learning Jun-30

Machine Learning
Playground
Marco Chierici & Alessandro Zandonà
Advanced Machine Learning
and
case studies in metagenomics

FBK-Trento
WebValley Intl 2014

control

case

Illumina
Roche 454
AB Solid

Reference Databases

minepy, minerva

Network Analysis

Pathways Enrichment

Differential analysis of abundance

Predictive Classification
Preprocessing

NCBI
SILVA
HMP
GreenGenes
Alignment

NGS platforms

Microbial Reads, phenotype

Metagenomics Pipeline @ MPBA/FBK
Quantification
Predictive profiling
The art of developing algorithms and techniques that allow computers to learn, i.e. gaining understanding by constructing models of observed data through inference from sample with the intention to use them for prediction.
Recap: statistical machine learning
Performance evaluation
How can I
assess
the goodness of my prediction?
True
Negative
False
Positive
False
Negative
True
Positive
Prediction
Reality
Class -1
Class +1
Class -1
Class +1
Class -1: healthy subjects
Class +1: patients
Performance metrics
How can I
quantify

the goodness of my prediction?
TN
FP
FN
TP
Prediction
Reality
Class -1
Class +1
Class -1
Class +1
Sensitivity
Specificity
Accuracy
Matthews Correlation Coefficient (MCC)
MCC properties
Range: [-1, +1]


MCC=+1
MCC=0
MCC=-1
perfect prediction
average random prediction
inverse prediction
Predictive classification
Split the dataset in k smaller partitions
k-fold cross-validation
k-1 folds as training set, to
learn
classification rules
1 fold as test set, to
evaluate
classification performance
[Nature, 2013]
[Nature, 2012]
[Nature, 2012]
Gut metagenome in European women with normal, impaired
and diabetic glucose control [Karlsson et al, Nature, 2013]
gut microbiota from
145 European women
43 normal glucose tolerance (NGT)
49 impaired glucose tolerance (IGT)
53 with type 2 diabetes (T2D)
DATASET
DNA from faecal samples (WGS)
Illumina HiSeq 2000
Paired-ends reads 300 bp
Purposes
predictive biomarkers for T2D
Metagenomic
clusters
Alignment
Data quality control
compositional and functional alterations in the metagenomes of women with T2D
Gene calling
Functional
analysis
Statistical
analysis
Results
Figure 2 | Associations of MGCs with clinical biomarkers.
Figure 3 | Classification of T2D status by abundance of species and MGCs.
b,
The 30 most discriminant species in the model using 915 species and discriminating between NGT and T2D women.
c,
The 30 most discriminant MGCs in the model using all 800 MGCs and discriminating between NGT and T2D women. The bar lengths in b and c indicate the importance of the variable, and colours represent enrichment in T2D (red shades) or NGT (blue shades).
Figure 4 | Stratification of IGT women based on gut microbiota profiles.
gut microbiota of
368 Chinese individuals
DATASET
DNA from faecal samples (WGS)
Illumina GAIIx and Illumina HiSeq 2000
PE reads 350 bp
Purposes
Metagenomic
linkage
groups
Alignment
(MLGs)
Biodiversity
Characterize gut microbial community composition in T2D subjects
Gene calling
Functional
analysis
Statistical
analysis
A metagenome-wide association study of gut microbiota in type 2 diabetes
Definition of a T2D index
Results
Figure 2 | Taxonomic and functional characterization of gut microbiota in T2D.
A co-occurrence network was deduced from 47 MLGs. The size of the nodes indicates gene number within the MLG. The colour of the nodes indicates their taxonomic assignment.
Figure 3 | Gut microbiota of T2D patients show a moderate degree of dysbiosis.
Figure 4 | A trial classification of T2D using gut microbial gene markers.
gut microbiota from
531 individuals
316 from US
100 from Venezuela
115 from Malawi
DATASET
DNA from faecal samples
WGS
Purposes
Biodiversity
Alignment
Reads
filtering (WGS)
Evaluate taxonomic and functional variability in gut microbiome considering different factors:
age
geographic location
kinship
Functional
analysis
Statistical
analysis
16S
Human gut microbiome viewed across age and geography
Results
Figure 1 | Differences in the fecal microbial communities of Malawians, Amerindians and US children and adult.
Figure 4 | Differences in the fecal microbiota between family members across the three populations studied.
Testing on data used to train:
OVERFITTING
Test data used twice to select features and to evaluate model:
SELECTION BIAS
Full transcript