Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
Internship-Intelligent network layout
Transcript of Internship-Intelligent network layout
driven by the data
Computational Systems Biology for Cancer
layout - visualization of the network
Cytoscape 3.0 plug-in
network structure + high-througput
data driven network layout
accessible and useful tool for biological network analysis
structure based layouts
Cytoscape 3.0 plugin
Data Driven Network Layout
Pure structure based layout
Pure Data Driven Layout
expression data (-omic):
Statistical methods for meaningful visualization and pattern identification
Principal Component Analysis
nonlinear Principal Manifolds*
*1. Gorban A, Kegl B, Wunch D, Zinovyev A. (eds.) Principal Manifolds for Data Visualisation and Dimension Reduction. 2008. Lecture Notes in Computational Science and Engeneering 58, p.340.
2. Gorban A.N., Zinovyev A. 2010. Principal manifolds and graphs in practice: from molecular biology to dynamical systems. Int J Neural Syst 20(3):219-32.
orthogonal linear transformation
data is projected into a new coordinate system
where axes are principal components (PC)
first PC has the largest possible variance, second: the second largest variance etc.
individual with the same profiles will be placed nearby in the 2D space
a) Configuration of nodes and 2D Principal Surface in the 3D PCA linear manifold.
The dataset is curved and cannot be mapped adequately on a 2D principal plane;
b) The distribution in the internal 2D non-linear principal surface coordinates (
) together with an estimation of the density of points;
c) The same as b), but for the linear 2D PCA manifold (
Linear PCA versus nonlinear Principal Manifolds for visualization of breast cancer microarray data
elastic maps algorithm
principal manifolds approximation
Data Driven network Layout
so far in Cytoscape we have...
no free access data driven layout !
now, there is DeDaL
Transformation between two layouts
network (structure based layout)
Data Analysis (PCA)
purely Data Driven network Layout
purely structure based
purely data driven
mixed data driven
minimization of Euclidean distance
it works with any two layouts!
it works with any layouts
you can align more than two layouts at the same time
Follow up of the project
Acknowledgments: Zinovyev Andrei , Calzone Laurence, Bonnet Eric , Viara Eric , Martignetti Loredana and all Inserm U900
Diseases, like cancer, are related to dysregulation of molecular interactions in large molecular networks
protein 1 ppi protein 2
Providing a meaningful representation of the knowledge of molecular interaction is not trivial
High amount of high-throughput data which analysis is problematic as well
1. Multidimensional data
cell lines/ tumor type/time points
2.Center the matrix by gene
3. Compute a covariance matrix
4. Compute eigenvectors and eigenvalues of covariance matrix
The eigenvector ith the highest eigenvalue will become PC1, the eigenvector with the second highest eigenvalue will became PC2 etc...
5. Deriving a new dataset (projection into a new coordinate system)
We decide to keep only 2 PC (for 2D representation)
L- number of PC kept 1<=L<=m
Y-new dataset ,E eigenvector matrix, X initial centered dataset
Xc -centered matrix
or Simple Data-Driven Layout
Data and network
: A549 epithelial cells treated for up to 72 hours with TGF-beta to induce epithelial-mesenchymal transition (EMT).
Human Genome U133 Plus 2.0
Divided into two conditions
early time (0-16h)
late time (24-72h)
: 56 genes with highest variance retreived from Human Protein Reference Database (
are detected according to the formula
m- mean pairwise distant
sdt - standard deviation of the pairwise distance
p - parameter (=1.5)
such a node is placed on the same line at the distance 2*std from the center
noise is introduced
if edge < 1
r=random( -3 : 3 )
Node (x,y) = Node (x+r, y+r)
Nodes without values
Ignored in the PCA/PMs algorithm, placed at the mean distance of its neighbor nodes' coordinates