Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
Topological data analysis on breast cancer data set
Transcript of Topological data analysis on breast cancer data set
Topological Method for Data Sets Analysis (DSGA)
Disease-specific genomic analysis (PAD)
Progression Analysis of Disease Topological based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival Monica Nicolau, Arnold J. Levine, and Gunnar Carlsson Presentation by Martin Perez-Guevara High-throughput biological data
(Microarray data in our case) Motivation Why employ topology subgroup identified limitations conclusions comparison with other methods Data patterns invisible to cluster methods c-MYB+ breast cancer Sequencing, Transcriptional Microarrays, Proteomic
High dimensional aspects
Mathematical difficulties PAD vs clustering analysis mathematically less visible groups (her2+ tumours)
filter function selection of mapper
cluster algorithm selection (diminished by resolution analysis) topological methods can give insights on data patterns that are meaningful from data and from real (biological) point of view too. Generalities of shape vs specific metric of data
Exploration of multiple aspects of data
Depth of information (not just clusters)
Can lead to visual results Data low dimension mapping intervals and overlap on mapping clusters on data sets intersections of clusters
(of different intervals/data sets) build simplicial complex Distance function Filter function (mapping) to d<<D Clustering algorithm go back to data mapping
intervals to data sets color points connections EXAMPLE Dc.T of DSGA low dimension mapping intervals and overlap on mapping clusters on data sets intersections of clusters
(of different intervals/data sets) build simplicial complex Euclidean distance Single-linkage clustering go back to data mapping
intervals to data sets PCA L<<R Normal tissue sample (R size)
Diseased tissue sample (S size) Flat vectors of N SVD (Wold, 1978) least squares fit residuals 100% survival and no metastasis Significance of the
analysis of microarrays (SAM) ER estrogen receptor Intervals: 15
Overlap: 80% 295 tumors, 262 genes Gene Thresholding 12237 genes to 262
also consider soft threshold genes with high correlations r>0.6 to stringent (at least 3) genes k = 4, p = 2 Further References mapper -> "Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition" / "Gurjeet Singh , Facundo Mémoli and Gunnar Carlsson" DSGA ->"Disease-specific genomic analysis: identifying the signature of pathologic biology" / "Monica Nicolau, Robert Tibshirani, Anne-Lise Børresen-Dale and Stefanie S. Jeffrey" single-linkage clustering -> "HIERARCHICAL CLUSTERING SCHEMES" / "STEPHEN C. JOHNSON"