Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Genomic Time Machines

Scientific presentation about phylogenomics applied to the evolution of animals (delivered on April 18th, 2013 at the Bioforum – Liege, Belgium).
by

Denis BAURAIN

on 3 November 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Genomic Time Machines

650
600
550
500
450
400
350
Ediacaran (635–542)
Cambrian (542–488)
Devonian (416–359)
300
250
200
Dickinsonia (4 cm)
What did the first
animal look like?
Can we resolve the
Cambrian explosion?
What was the fish that
first walked on land?
Anomalocaris (60–80 cm)
Acanthostega (50–55 cm)
Computers as Genomic Time Machines
for Meeting our Ancestors
Prof. Denis Baurain — ULg
Bioforum — April 18th, 2013
phylogenetic vs.
non-phylogenetic signal

more sequences
are not enough

phylogenomics can resolve the evolution of animals
(MYA)
phylogenetic
analysis
collection
of proteomes
dataset assembly
all-vs-all
comparison
13 tetrapods (Ensembl)
2 ray-finned fishes (Ensembl)
2 lobe-finned fishes
coelacanth (Broad Institute)
lungfish (RNA-seq contigs)
319,459 proteins
after filtration
51,026,866,611
pairwise similarities
100x faster than BLAST
using USEARCH
overnight on a
desktop workstation
clustering
7,764 groups
OrthoMCL
(1 hour)
with both the lungfish
and the coelacanth
addition of
transcriptomes
1 tetrapod
1 ray-finned fish
3 cartilaginous fishes
from RNA-seq contigs
using HaMStR (1 week)
373 single-copy
alignments
22 species x 251 genes
(100,583 aligned AA)
Only orthologous genes can be used
to reconstruct a species tree!
supermatrix made with SCaFoS (1 hour)
Phylogenomic supermatrices are often
full of holes due to missing characters.
phylogenomic protocol
using PhyloBayes
(CATGTR model)
3 months on a
grid computer
Our genomic
time machine!
Tetrapods are more closely related to the lungfish
than to the coelacanth. Easy, eh?
Similarly, phylogenomics is phylogenetics applied to many genes at once.
We now have robust statistical support
and phylogenetic resolution!
The Basque language is our outgroup here.
Examining only one or two words exposes us to the stochastic error because of the lack of information.
Until its rediscovery, the coelacanth was thought to have been extinct since the Late Cretaceous period (70 MYA).
However, the coelacanth is not a living fossil,
as this concept is fundamentally incorrect!
The coelacanth is a large marine fish with fleshy fins that resembles the limbs of terrestrial vertebrates.
Latimeria chalumnae
It was named after its discoverer, who was the curator of a small museum in South Africa.
Protopterus annectens
The lungfish has lungs and can breath air. It is also a lobe-finned fish, yet living in freshwater.
NGS (Illumina) sequencing was used to sequence the two lobe-finned fishes.
blood DNA + muscle RNA library
assembly with ALLPATHS-LG and Trinity
annotation with the Ensembl pipeline
3 RNA libraries (brain, gonad/kidney, gut/liver)
assembly with Trinity
no annotation
genome size: 2.86 Gbp (2.18 Gbp)
genome size: est. 50–100 Gbp
African coelacanth
West African lungfish
Why do we have such an unstable phylogeny?
Phylogenomics is useful but can suffer from artifacts!

Let's look at their causes!
Ediacaran (635–542 MYA)
Only odd soft animals live the sea.
Cambrian (542–488 MYA)
Many animals now thrive in the sea!
Looking back at the past, it seems like
all bilaterian lineages have appeared at once!
When homoplasy is widespread, sequences are said to be saturated. The historical signal becomes very weak.
Multiple substitutions at a given site erase this signal and can even create spurious identities (homoplasy).
The phylogenetic signal lies in the substitutions inherited from the common ancestors of the sequences.
Homoplasy can affect only some sequences, which leads to the long-branch-attraction (LBA) artifact.
Long branches (old
or fast-evolving) contain
much more substitutions than reflected in their length.






If undetected, these multiple substitutions generate a
non-phylogenetic signal that
hinders reconstruction
(systematic error).
We'd better
improve species sampling!
We should use
slow-evolving species only!
We need better models!
How to reduce
systematic error?
When considering only 4 species, these 7 multiple substitutions suggest an incorrect phylogeny.
Breaking the long branches with 35 species
helps detecting all the 25 substitutions.
The common ancestor of all animals is called the Urmetazoan. What did it look like?
It was already very complex! All lineages are thus evolutionarily simplified except bilaterians.
It featured several complex characters (e.g., neurons)! Or these have evolved convergently...
It might have been quite simple. Hey! Which one of these 3 strongly supported trees is correct?
non-orthology
RP3 gene in Schierwater et al. (2009)
CDC gene in Schierwater et al. (2009)
deep paralogy
contaminations (xenology)
Only orthologous genes can be used
to reconstruct a species tree!
taxonomic
misidentification
Purging the dataset of Schierwater et al. (2009) of its errors yields a very different tree.
For nodes with a scarce phylogenetic signal, even small amounts of non-phylogenetic signal may dominate and yield an incorrect tree.

Ironically, these nodes are those for which phylogenomics would be the most useful!
What are the causes of the non-phylogenetic signal in the studies of Dunn et al. (2008) and Schierwater et al. (2009)?
Amemiya CT, Alfoldi J et al. (2013)
The African coelacanth genome provides insights into tetrapod evolution.
Nature 496, 311–316.

Roure B, Baurain D, Philippe H (2013)
Impact of missing data on phylogenies inferred from empirical phylogenomic data sets.
Mol Biol Evol 30, 197–214.

Philippe H, Brinkmann H, Lavrov DV, Littlewood DT, Manuel M, Wrheide G, Baurain D (2011)
Resolving difficult phylogenetic questions: why more sequences are not enough.
PLoS Biol 9, e1000602.

Baurain D, Philippe H (2010)
Current approaches to phylogenomic reconstruction.
In Evolutionary Genomics and Systems Biology, ed Caetano-Anolles G (Wiley-Blackwell, Hoboken, N.J.), pp 17–41.

Baurain D, Brinkmann H, Philippe H (2007)
Lack of resolution in the animal phylogeny: closely spaced cladogeneses or undetected systematic errors?
Mol Biol Evol 24, 6–9.
All papers are available on http://orbi.ulg.ac.be/
references
acknowledgment
This is trivial: we simply minimize the number of multiple substitutions.
With challenging phylogenetic problems, the difficulty is to make the needle bigger without enlarging the haystack!
Working hypothesis: the lack of resolution stems from an excess of non-phylogenetic signal.
systematic error
WAG model
(vs. CAT)
Failure to reduce systematic error with the dataset of Philippe et al. (2009) also yields artifactual trees.
reduced taxon sampling
Due to a more stringent selection of sites, the dataset of Philippe et al. (2009) is the least saturated.
missing data
Completing the dataset of Dunn et al. (2008)
with new sequences also yields a very different tree.
Actually, it is enough to complete the 4 close outgroup sequences (choanoflagellates) to change the tree.
Missing data exacerbate the systematic error by reducing the number of species effectively available.
Let's look at
3 more sources
of artifacts!
Herve Philippe
Henner Brinkmann
Beatrice Roure
Starting grant SFRD-12/04 (FSR)
Full transcript