Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
Genomic Time Machines
Transcript of Genomic Time Machines
Dickinsonia (4 cm)
What did the first
animal look like?
Can we resolve the
What was the fish that
first walked on land?
Anomalocaris (60–80 cm)
Acanthostega (50–55 cm)
Computers as Genomic Time Machines
for Meeting our Ancestors
Prof. Denis Baurain — ULg
Bioforum — April 18th, 2013
are not enough
phylogenomics can resolve the evolution of animals
13 tetrapods (Ensembl)
2 ray-finned fishes (Ensembl)
2 lobe-finned fishes
coelacanth (Broad Institute)
lungfish (RNA-seq contigs)
100x faster than BLAST
overnight on a
with both the lungfish
and the coelacanth
1 ray-finned fish
3 cartilaginous fishes
from RNA-seq contigs
using HaMStR (1 week)
22 species x 251 genes
(100,583 aligned AA)
Only orthologous genes can be used
to reconstruct a species tree!
supermatrix made with SCaFoS (1 hour)
Phylogenomic supermatrices are often
full of holes due to missing characters.
3 months on a
Tetrapods are more closely related to the lungfish
than to the coelacanth. Easy, eh?
Similarly, phylogenomics is phylogenetics applied to many genes at once.
We now have robust statistical support
and phylogenetic resolution!
The Basque language is our outgroup here.
Examining only one or two words exposes us to the stochastic error because of the lack of information.
Until its rediscovery, the coelacanth was thought to have been extinct since the Late Cretaceous period (70 MYA).
However, the coelacanth is not a living fossil,
as this concept is fundamentally incorrect!
The coelacanth is a large marine fish with fleshy fins that resembles the limbs of terrestrial vertebrates.
It was named after its discoverer, who was the curator of a small museum in South Africa.
The lungfish has lungs and can breath air. It is also a lobe-finned fish, yet living in freshwater.
NGS (Illumina) sequencing was used to sequence the two lobe-finned fishes.
blood DNA + muscle RNA library
assembly with ALLPATHS-LG and Trinity
annotation with the Ensembl pipeline
3 RNA libraries (brain, gonad/kidney, gut/liver)
assembly with Trinity
genome size: 2.86 Gbp (2.18 Gbp)
genome size: est. 50–100 Gbp
West African lungfish
Why do we have such an unstable phylogeny?
Phylogenomics is useful but can suffer from artifacts!
Let's look at their causes!
Ediacaran (635–542 MYA)
Only odd soft animals live the sea.
Cambrian (542–488 MYA)
Many animals now thrive in the sea!
Looking back at the past, it seems like
all bilaterian lineages have appeared at once!
When homoplasy is widespread, sequences are said to be saturated. The historical signal becomes very weak.
Multiple substitutions at a given site erase this signal and can even create spurious identities (homoplasy).
The phylogenetic signal lies in the substitutions inherited from the common ancestors of the sequences.
Homoplasy can affect only some sequences, which leads to the long-branch-attraction (LBA) artifact.
Long branches (old
or fast-evolving) contain
much more substitutions than reflected in their length.
If undetected, these multiple substitutions generate a
non-phylogenetic signal that
improve species sampling!
We should use
slow-evolving species only!
We need better models!
How to reduce
When considering only 4 species, these 7 multiple substitutions suggest an incorrect phylogeny.
Breaking the long branches with 35 species
helps detecting all the 25 substitutions.
The common ancestor of all animals is called the Urmetazoan. What did it look like?
It was already very complex! All lineages are thus evolutionarily simplified except bilaterians.
It featured several complex characters (e.g., neurons)! Or these have evolved convergently...
It might have been quite simple. Hey! Which one of these 3 strongly supported trees is correct?
RP3 gene in Schierwater et al. (2009)
CDC gene in Schierwater et al. (2009)
Only orthologous genes can be used
to reconstruct a species tree!
Purging the dataset of Schierwater et al. (2009) of its errors yields a very different tree.
For nodes with a scarce phylogenetic signal, even small amounts of non-phylogenetic signal may dominate and yield an incorrect tree.
Ironically, these nodes are those for which phylogenomics would be the most useful!
What are the causes of the non-phylogenetic signal in the studies of Dunn et al. (2008) and Schierwater et al. (2009)?
Amemiya CT, Alfoldi J et al. (2013)
The African coelacanth genome provides insights into tetrapod evolution.
Nature 496, 311–316.
Roure B, Baurain D, Philippe H (2013)
Impact of missing data on phylogenies inferred from empirical phylogenomic data sets.
Mol Biol Evol 30, 197–214.
Philippe H, Brinkmann H, Lavrov DV, Littlewood DT, Manuel M, Wrheide G, Baurain D (2011)
Resolving difficult phylogenetic questions: why more sequences are not enough.
PLoS Biol 9, e1000602.
Baurain D, Philippe H (2010)
Current approaches to phylogenomic reconstruction.
In Evolutionary Genomics and Systems Biology, ed Caetano-Anolles G (Wiley-Blackwell, Hoboken, N.J.), pp 17–41.
Baurain D, Brinkmann H, Philippe H (2007)
Lack of resolution in the animal phylogeny: closely spaced cladogeneses or undetected systematic errors?
Mol Biol Evol 24, 6–9.
All papers are available on http://orbi.ulg.ac.be/
This is trivial: we simply minimize the number of multiple substitutions.
With challenging phylogenetic problems, the difficulty is to make the needle bigger without enlarging the haystack!
Working hypothesis: the lack of resolution stems from an excess of non-phylogenetic signal.
Failure to reduce systematic error with the dataset of Philippe et al. (2009) also yields artifactual trees.
reduced taxon sampling
Due to a more stringent selection of sites, the dataset of Philippe et al. (2009) is the least saturated.
Completing the dataset of Dunn et al. (2008)
with new sequences also yields a very different tree.
Actually, it is enough to complete the 4 close outgroup sequences (choanoflagellates) to change the tree.
Missing data exacerbate the systematic error by reducing the number of species effectively available.
Let's look at
3 more sources
Starting grant SFRD-12/04 (FSR)