### Present Remotely

Send the link below via email or IM

Present to your audience

• Invited audience members will follow you as you navigate and present
• People invited to a presentation do not need a Prezi account
• This link expires 10 minutes after you close the presentation
• A maximum of 30 users can follow your presentation

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

### Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

# Clustering_Biomed_Part2

No description
by

## Susanne Gerber

on 17 August 2017

Report abuse

#### Transcript of Clustering_Biomed_Part2

Clusterungs Verfahren, Klassifikation and Dimensions Reduktion
Supervised learning Unsupervised learning
Learning by examples
An
Enemy
is:
A
friend
is
Learning by rules:
A
friend
looks exactly
like us
is
smaller
than us
or - if larger -
has
hooves
is a big furry animal eating grass on a meadow
is an animal with feather
AND
either paddles or a very long neck/pecker (ideally both)
An
enemy

has sharp teeth
is larger than 20 cm - having paws
barks, howls or growls
is a bird of our size or larger with compact corpus (short neck) with claws
,...
(
size
in cm,
sharp

teeth
(1=yes, 0 =no),
#legs
,
feathers
(yes=1, no=0),
claws
(1=yes, 0 =no),
furry
(1=yes, 0 =no),
hooves
(1=yes, 0 =no), length of the
neck
in cm,...)
Supervised learning
is the machine learning task of inferring a function from labeled training data in order to make predictions for the future
Each example consists of an input vector (feature vector) and a desired output (label)
Feature vector:
Label
: (enemy, friend)
(100, 1, 4, 0,1 ,1,0,8)
Feature vector:
Label:
(150, 1, 4, 0, 1, 1, 0, 10)
(60, 1, 4, 0, 1, 1, 0, 7)
(30, 0, 2, 1, 1,0, 0, 3)
(140, 0, 4, 0, 0,1, 1, 5)
(8, 0, 4, 0, 1,1, 0, 2)
(6, 0, 2, 1, 1,0, 0, 1)
enemy
enemy
enemy
enemy
friend
friend
friend
Find a function that maps between the input and the desired output
allowing to classify (determine the correct class label) new/unseen instances.
? enemy or friend ?
(100, 0, 2, 1, 0, 0, 60)
(10, 0, 4,0,0,0,0)
? enemy or friend ?
? enemy or friend ?
(200, 0, 4,0,1,1,100)
unsupervised learning
is the machine learning task of inferring a function to describe cluster/groups of objects from
unlabeled
together according to their natural characteristics
clustering occurs without a priori knowledge
goal is to detect pattern or (hidden) structures in the (high-dimensional) data
Where do those rules come from ?
Reduction of Dimensions !
(50, 0,2,1,0,0,0,40)
This task is much more vage than supervised clustering.

In higher dimensions the categorization is neither unique nor canonical.

There exist a multitude of different approaches and concepts.
friend
Clustering algorithms
Clustering
approaches

Partitional
(or centroid-based)
clustering
Hierarchical
clustering
k-means
model-based
clustering
Spectral
clustering
and others
agglomerative
algorithms
divisive
algorithms
density-based
clustering
or "how to optimize the grouping of objects by minimizing distances"
The k-means algorithm partitions the data for a given k into k mutually exclusive cluster
How k-mean works:
Fisher's iris data consists of measurements on the sepal length, sepal width, petal length, and petal width for 150 iris specimens. There are 50 specimens from each of three species.
Goal: minimize iteratively the variance (inner-cluster-distance) in each cluster
Fisher's iris data
Input:
K
and a set of data-points (
x1,...xn
)
Initialization
: place
K
centroid-seeds at random locations
Assign
each data point
x
i
to the nearest centroid
cj

(depends on the chosen distance measure)
Repeat
until convergence:
How to compute the centroids positions without knowing the cluster assignment of each point ?
How to assign the data points to a certain cluster without knowing the centroid positions ?
Calculate for
each cluster

j=1,...,K
the new centroid
cj

(the mean of all points
xi
assigned to cluster
j
in the
previous step

Stop
when the solution converges (none of the cluster assignment changes any more)
How to choose the correct number of cluster ?
Does the algorithm guarantee convergence to the global optimum ?
Sir Ronald Aylmer Fisher
Clustering
an English statistician and biologist who used mathematics to combine Mendelian genetics and natural selection.
(17 February 1890 – 29 July 1962),
Silhouette coefficient
Principal Component Analysis (PCA)
Pollen et. al. :Nature Biotechnology 32, 1053–1058 (2014) doi:10.1038/nbt.2967
Example PCA-Plot: clustering of cell-types
It transforms the data to a new coordinate system such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.
Main goals
:
identification and extraction of (hidden) patterns (correlation between variables) in the data -> allows for dimension reduction
Finding the directions of maximum variance in high-dimensional data and project it onto a smaller dimensional subspace while retaining most of the information.

Iris setosa
Iris versicolor
Iris virginica

The silhouette value of a point is a measure of how similar a point is to points in its own cluster compared to points in other clusters
where
a(i) is the average distance of the point
i
to the other points in its own cluster A
d(i, C)
is the average distance of the point
i
to the other points in the cluster C
b(i)
is the minimal
d(i, C)
over all clusters other than A
The silhouette coefficient is the average silhouette value over all points. A quantitative measure that can assess the quality of a clustering
Connetivity-based clustering
Differential gene expression between the brachial and femoral artery based on microarray data: heat map showing the human orthologous genes of the 63 probe sets with a significant difference in gene expression (FDR < 0.05) as well as a 1.5-fold change between the brachial and the femoral artery.
Algorithm description:
Step 1:
Determine the distance between each pair of points m * (m - 1) / 2 different pairs -> adjacency matrix
Stop
:
Cut the hierarchical tree into cluster
Iteratively
group points into a binary hierarchical tree (linkage)
connect the closest pair of points and re-compute the distance matrix
e.g.
e.g.
Further representatives:
kMedian-Algorithm
k-means++
Gaussian Mixture model (GMM)
Idea
:
i) Given a set of points in some space the goal is to
groups together
those points being
closely packed
together (points with many nearby neighbors).
ii)Points whose nearest neighbors are
"too far away
" (lie alone in low-density regions) will be marked as
outliers
points .
e.g.
DBSCAN
Density-based spatial clustering of applications with noise (DBSCAN)

Ester, Martin; Kriegel, Hans-Peter; Sander, Jörg; Xu, Xiaowei (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226–231. CiteSeerX 10.1.1.121.9220Freely accessible. ISBN 1-57735-004-9.
Definitions:
Points are classified as

either

core points
,
(density-)
reachable points
or

outliers
Algorithm
A certain point
p
is called a
core point
if at least minPts points are within distance
epsilon
(the maximum radius of the neighborhood).
A point
q
is
reachable
from
p
if there is a path
p1, ..., pn
with
p1 = p
and
pn = q
, where each pi+1 is directly reachable from pi (all the points on the path must be core points, with the possible exception of q).
All points not reachable within distance
epsilon
from any other point are
outliers
.
Core point
form a cluster together with all points (core or non-core) that are reachable from it.

Each cluster
contains at least one core point; non-core points can be part of a cluster, but they form its "edge", since they cannot be used to reach more points.
Initialization
:
i
) Choose the two parameters:
distance (eps)
and the minimum amount of points required to form a dense region (
minPts
).
ii
)Choose a random data point
p

If
size of N < MinPts
mark
p
as Noise
else
open Cluster
-> expandCluster via depth first search and
label points as core points, border points or noise
until all points in (
eps
) are labeled
Repeat until all points are labeled:
iii
) Calculate it's (eps)-neighborhood-matrix (N) -( the set of all points that are within a defined distance)
v
) Choose randomly a new (unlabeled) data-point and continue with iii)

Stop:
If all points are labeled
iv
) assign each core point and each border point to the cluster

Cluster those cells having a similar expression profile .

Question
:
How can a cellular transcription profile of 10:000 genes/cell
become compressed and projected into a single dot in a
2 dimensional plot ???

:
PCA extracts and captures the essential information hidden
in high-dimensional data-space.
Each dot in this graph represents a single cell and it`s transcription profile (Single-cell RNA-sequencing data, 10:000 transcribed genes in each cell).
A simple yet popular and useful linear transformation technique used in numerous applications (e.g. gene expression data).
Extrembeispiel: y=x
1 Cell (1D)
Single cell RNA-seq data set:
Genes

A 10
B 5
C 18
D 54
E 25
F 34

2 Cells (2D)
Genes
(
Cell 1
) (
Cell 2
)
A 10 15
B 5 8
C 18 23
D 54 48
E 25 20
F 34 30

(basic overview)
correlation ?
3 Cells (3D)
Genes
(
Cell 1
) (
Cell 2
)
(Cell 3)
A 10 15 12
B 5 8 10
C 18 23 17
D 54 48 47
E 25 20 18
F 34 30 28

200 cells ?
Question
: are all dimensions (equally) important ?
Demonstration of the performance of the before mentined algorithms on an example of the fisher"s iris dataset.
A probabilistically- grounded way of doing soft clustering. Each "cluster" is here modeled by a gaussian probability distribution.
Tasks: For a given set of datapoints: extract the parameters of the distribution
Case one:
We already know the source of the data:
Expectation Maximization Algorithm
Example: Movies / teapot in 3D
Teapot
:
-> Even though it is in 2D, it is totally clear how it looks like - > 3rd Dimension is not necessary for getting an impression on the shape.
Movies
: Story and landscape can be captured without significant
loss of information compared to a 3D Movie.
by Victor Lavrenko
Case two:
We don't know the source, but we know the parameters of the Gaussians
Problem
: We don't have neither the source nor the parameters,...

Solution: Expectation Maximization Algorithm
Music
:
Uncompressed (.wav) is about 10 times larger than compressed .mp3
PCA takes a dataset with a lot of dimensions (i.e. lots of cells) and flattens it to few most-important dimensions
From 2D to 1D
Genes

A 10
B 5
C 18
D 54
E 25
F 34

2 Cells (2D)
Genes
(
Cell 1
) (
Cell 2
)
A 10 15
B 5 8
C 18 23
D 54 48
E 25 20
F 34 30
: : :

Orthogonal transformation
PC 1
PC 2
Step 2:
Capture the
second
largest variance (the second principal component,
PC2
) being orthogonal on the PC 1
In case of further dimensions:
each succeeding component has in turn the highest possible variance under the constraint that it is orthogonal to the preceding component
1 D
Since almost all variation is captured
via PC1, data could be flattened to 1D
without too much loss of information
(rotating the graph into new x-and y-achses)
PC1
is the linear combination of the original data that spans the direction of the most variation in the data
PC2
captures the direction of the 2nd most variation in the data
PC3
spans the direction of the 3rd largest variation
... and so on
Things that are important to know concerning PCA:
How to compute now PCA ?
By simply calculating the eigenvectors and eigenvalues from the
Covariance Matrix.
-> The first eigenvector defines a direction in space. The line along this direction is the direction of the greatest variance in the multidimensional dataset =
PC1
-> The second eigenvector gives the direction of the second largest variance ->
PC2
-> using these eigenvectores as axes in our new coordinate system (first eigenvector becomes the first axis in the new coordinate system, second ... ).