Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Learning Latent Factor Models & Data Fusion

An invited talk at the Workshop on Matrix Computations in Biomedical Informatics, Pavia, June 20, 2015
by

Blaz Zupan

on 7 July 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Learning Latent Factor Models & Data Fusion

"meta" movies
"meta" users
Matrix
tri-factorization

Predictions
Recommender Systems
Data
Factorization
Predictions
All together now: data fusion by collective matrix tri-factorization
Thanks!
Waleed Nasser
Edward Nam
Chris Dinh
Adam Kuspa
Gad Shaulsky
Rafael Rosengarten
Mariko Katoh-Kurasawa
Balaji Santhanam
Computer Science @ University of Ljubljana (MŽ, BZ)
Genetics @ Baylor College of Medicine (BZ)
Adam Kuspa
Gad Shaulsky
Dictyostelium
discoideum
Dicty is
bacteria
predator!
genetic
screens
Dicty genes for bacterial response?
Gram+ defective: swp1, gpi, nagB1
Gram- defective: clkB, spc3, alyL, nip7
genome
found
workload
estimated
12,000 genes
7 genes
5 years
~200 genes
Now what?!
Nasser et al., Current Biology 10(23), 2013.
More genetic screens?
50% coverage (100 genes) -> 20 screens required!
80% coverage (160 genes) -> 65 screens required!
Data-driven
approach
genes
mutant phenotypes
timepoints
phenotype data
expression
data
phenotype
ontology
publications
PubMed
data
MeSH terms
MeSH
annotation
MeSH
ontology
Data Integration
Movies
Users
"meta" -> original space recipes
Dicty bacterial response
14 data sources
4 Gram- seed genes
ranked 12,000 genes
9 candidates
P(X>=8) = 10
-13
Drug-induced liver injury prediction
fusion of 29 data sources

substantially
improved accuracy
Zitnik & Zupan, CAMDA 2013, Berlin (best paper award)
Discovery of disease-disease associations
Fusion of 11 data sources
systems-level molecular data.

Proposed 14 new associations
not present in Disease Ontology.
Confirmed in literature.

Large-scale data fusion for reclassification of diseases.
Zitnik et al. (2013) Scientific Reports.
Learning Latent Factor Models by
Funding: NIH, ARRS, EU FP7, Fulbright
50,000 clonal mutants
Chisholm & Firtel (2004) Nat Rev Mol Cell Biol
8 predictions correct
Collective Matrix Factorization
Marinka Žitnik, Blaž Zupan
Winners of $1M Netflix Prize, 2009
previous attempts + standard ML
data fusion
Survival regression by data fusion
Fusion of 11 data sets yields substantial improvements in accuracy.
Zitnik & Zupan, CAMDA 2014, Boston (best paper award)
data fusion
Data fusion by
collective matrix factorization
increases accuracy of predictions
data is left in its original space
latent spaces and profiling
What next?
data set scoring and selection
the role of data structure
network inference by data fusion
Toolboxes
Python-based library
Data fusion in Orange
An alternative view
Two data sets
Factorized System
A larger data fusion graph ...
... and its factorization
Latent-space profiling by chaining
GO term / pharma action prediction
GeneMANIA
gene-based networks for each data set
Kernel-based approaches
gene-centric kernels for each data source
Zitnik & Zupan. IEEE Trans PAMI 2015
Mostafavi & Morris. Genome Biol 2008
Yu et al. BMC Bioinform 2010
Genes and movies?
It's all the same! "Gene-recommender system."
Mostafavi & Morris, Proteomics 2012.
Ok. We are ranking. "Gene prioritization."
Moreau & Tranchevent, Nat Rev Genet 2012.
Matrix tri-factorization
of a single data set
CAMDA = Annual International Conference on Critical Assessment of Massive Data Analysis
Collaboration with Nataša Pržulj, Imperial College London.
Wang et al. Res Comput Molecular Biol 2012.
Data Fusion
Case
Studies

What's
new?

Data
Model
Algorithm
Collective latent factor model
Computational guarantees
Transfer of knowledge
achieves data fusion
Zitnik & Zupan. IEEE Trans PAMI 2015
Zitnik & Zupan. IEEE Trans PAMI 2015
Zitnik & Zupan. IEEE Trans PAMI 2015
Zitnik & Zupan. IEEE Trans PAMI 2015
Latent chaining in the "Twitter space"
Zitnik & Zupan, Bioinformatics, ISMB 2015 (to appear)
Zitnik & Zupan, Bioinformatics, ISMB 2015 (to appear)
Collectively inferred networks are functionally richer than separately inferred networks
Reuse of
latent factors
Gene neighborhood
Gene-wise parameters
Gene interactions
Full transcript