Present Remotely

Send the link below via email or IM

• Invited audience members will follow you as you navigate and present
• People invited to a presentation do not need a Prezi account
• This link expires 10 minutes after you close the presentation

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

The Human Cell

Maths-in-action project at the University of Warwick, applying mathematical techniques to DNA Sequencing.
by

Anooj Dodhia

on 14 March 2016

Report abuse

Transcript of The Human Cell

80%

15%

5%
30%

15%

5%

30%

20%
50%

50%
50%

5%

25%

20%
50%

15%

30%

5%
day
dey
they
days
to
too
her
there
therm
ther
ter
wet
whe
wea
way
the
they
there
?
The Human Cell
A mathematical insight into DNA Sequencing
Anooj Dodhia • 27 Feb 2013
The Human Cell
Anooj Dodhia • 27 Feb 2013
The Human Cell
Anooj Dodhia • 27 Feb 2013
"the weather today"
the | wea | ther | to | day
The Human Cell
Anooj Dodhia • 27 Feb 2013
Inference: If there were no Windows?
Rainy
Cold
Sunny
70%
25%
5%
45%
45%
10%
An introduction to DNA
vs
ATCG
AACC
GCAT
ATGC
AACG
Probability of a mutation between
C & G is much higher than any other
Uniform
prob.
info.
Reality
But what if...
ATCGAATCGGTCTGAAGTCGATCGATTTGAC
TTAGAATCAGTATGAAGACGATCAATATGAG
TTGGTAACCGACAGTACTGGTTGGTTATCAG
WE NEED
A more
precise
PROBABILITY
framework
Markov Chains
"A memoryless random variable"
State Space
S = {A,T,C,G}
S = {codons}
Random Variable (the chain)
X(t)= {X(1), X(2), X(3), ... }
Transition Probabilities
Prob. of future depends only on the present, and not on the past:

Denote P[X(t+1) = b | X(t) = a] := P(a,b)
P[X(t+1) | X(1), X(2), ..., X(t)] = P[X(t+1) | X(t)]
Hidden Markov Model
Observed Markov
Chain
Underlying Markov
Chain
Probabilistic
Calculation
Genetic code of interest
The sequence best matching our "observed" input
Umbrellas, coats & hats
The weather outside
But what can we do with this model?
The Forward-Backward Algorithm
The Viterbi Algorithm
25%
0%
75%
Big
Ideas
Probability
Inference
Markov Chain
Hidden Markov Models
Deoxyribonucleic acid
3,000,000,000 base pairs - 8 x dist(earth, sun)
e.g. heart cell structure or
digestive enzymes
We are interested in
a sequence of speech
which we split into
states
, or,
syllables (~3 letters)
And use
probability
to match
to the
closest sequence
in our database, made up of
{a-z, A-Z, 0-9, symbols}
We are interested in
a sequence of observed DNA
which we split into
states
, or,
codons (3 letters)
And use
probability
to match
to the
closest sequence
in our database, made up of
{A, T, C, G, '-'}
Any Questions?
Full transcript