driven by the data

How?

Java programming

DeDaL

Interaction network

**DeDaL**

**Computational Systems Biology for Cancer**

**Urszula Czerwińska**

ulcia.liberte@gmail.com

ulcia.liberte@gmail.com

layout - visualization of the network

Cytoscape 3.0 plug-in

**Conclusions**

network structure + high-througput

=

data driven network layout

accessible and useful tool for biological network analysis

**Thank You!**

grid

hierarchical

force directed

structure based layouts

**U900**

Cytoscape 3.0 plugin

Data Driven Network Layout

Pure structure based layout

Mixed Layout

Pure Data Driven Layout

High-throughput data

Protein/gene

expression data (-omic):

RNAseq

microArray

LC-MS/MS

....

Big data!

Multidimensional

Statistical methods for meaningful visualization and pattern identification

PCA

Principal Component Analysis

PMs

nonlinear Principal Manifolds*

*1. Gorban A, Kegl B, Wunch D, Zinovyev A. (eds.) Principal Manifolds for Data Visualisation and Dimension Reduction. 2008. Lecture Notes in Computational Science and Engeneering 58, p.340.

2. Gorban A.N., Zinovyev A. 2010. Principal manifolds and graphs in practice: from molecular biology to dynamical systems. Int J Neural Syst 20(3):219-32.

orthogonal linear transformation

data is projected into a new coordinate system

where axes are principal components (PC)

first PC has the largest possible variance, second: the second largest variance etc.

dimension reduction

individual with the same profiles will be placed nearby in the 2D space

clusters identification

a) Configuration of nodes and 2D Principal Surface in the 3D PCA linear manifold.

The dataset is curved and cannot be mapped adequately on a 2D principal plane;

b) The distribution in the internal 2D non-linear principal surface coordinates (

ELMap2D

) together with an estimation of the density of points;

c) The same as b), but for the linear 2D PCA manifold (

PCA2D

).

Linear PCA versus nonlinear Principal Manifolds for visualization of breast cancer microarray data

elastic maps algorithm

principal manifolds approximation

cytoscape-swing-api

VDAOengine

libraries

User-friendly interface

dialogue windows

Data Driven network Layout

so far in Cytoscape we have...

no free access data driven layout !

low expression

high expression

now, there is DeDaL

low expression

high expression

Transformation between two layouts

network (structure based layout)

Data Analysis (PCA)

purely Data Driven network Layout

DeDaL

Networks' Alignment

0%

100%

50%

purely structure based

purely data driven

mixed data driven

low expression

high expression

rotation

mirroring

minimization of Euclidean distance

reference layout

before alignment

after alignment

it works with any two layouts!

condition 1

it works with any layouts

you can align more than two layouts at the same time

Website

Description

Files

Tutorial

Coming soon

Publication

DeDaL

Independent functions

http://bioinfo-out.curie.fr/projects/dedal/

Follow up of the project

Acknowledgments: Zinovyev Andrei , Calzone Laurence, Bonnet Eric , Viara Eric , Martignetti Loredana and all Inserm U900

Motivation

Diseases, like cancer, are related to dysregulation of molecular interactions in large molecular networks

protein 1 ppi protein 2

source

inter. type

target

node

edge

node

Providing a meaningful representation of the knowledge of molecular interaction is not trivial

High amount of high-throughput data which analysis is problematic as well

condition 2

1. Multidimensional data

Maths...

genes

cell lines/ tumor type/time points

2.Center the matrix by gene

3. Compute a covariance matrix

4. Compute eigenvectors and eigenvalues of covariance matrix

S=

eigenvalues

eigenvectors

The eigenvector ith the highest eigenvalue will become PC1, the eigenvector with the second highest eigenvalue will became PC2 etc...

5. Deriving a new dataset (projection into a new coordinate system)

We decide to keep only 2 PC (for 2D representation)

L- number of PC kept 1<=L<=m

Y-new dataset ,E eigenvector matrix, X initial centered dataset

Xc -centered matrix

or Simple Data-Driven Layout

Data and network

Data

: A549 epithelial cells treated for up to 72 hours with TGF-beta to induce epithelial-mesenchymal transition (EMT).

Affymetrix

Human Genome U133 Plus 2.0

Array

Divided into two conditions

early time (0-16h)

and

late time (24-72h)

Network

: 56 genes with highest variance retreived from Human Protein Reference Database (

HPRD

)

Some improvements....

Outliers

are detected according to the formula

m(node)>m(all)+p*std(all)

m- mean pairwise distant

sdt - standard deviation of the pairwise distance

p - parameter (=1.5)

such a node is placed on the same line at the distance 2*std from the center

Overlapping

noise is introduced

if edge < 1

r=random( -3 : 3 )

Node (x,y) = Node (x+r, y+r)

Nodes without values

Ignored in the PCA/PMs algorithm, placed at the mean distance of its neighbor nodes' coordinates

l

target

l

source

l'

(l )

1

(l' )

1

(l )

ref

Degree

5

4

3

2

1

0

outlier

after adjustment

adjustment

6