Plus data from 67 other
Botanical / Mycological
publications (>250 datasets)
e.g. Canadian J Botany (>26)
Int J Plant Sci (>21)
Phytopathology (>17)
Lichenologist (>11)
Sydowia
Bryologist
Mycotaxon
Am J Potato Res
...
Where are the animals?
Burying data in pdf's is very naive
Acknowledgements
What can WE do?
The (current) publishing process
The (better?) publishing process
Bringing Systematics into the Digital Age
Some journals require
mandatory data archiving
Other (bad) examples...
Funding:
Helpful discussants:
Can you 'look' at a dataset and see that it's correct?
2.) Polymorphic/uncertain states are formatted in non-standard ways
(ubiquitous!)
Centralised datasets are easy to find: they're in the same place. (TreeBase)
e.g. Try finding all published arachnid datasets. It requires expert knowledge.
Look at the success of GenBank for raw sequence data
Look at the rise of Digital Taxonomy
Look at the future advances offered by Ontologies
Maureen O'Leary (MorphoBank)
William Piel (TreeBase)
Rutger Vos (TreeBase)
Dennis Stevenson (Ed. Cladistics)
Recommended Reading:
Directly lobby your societies
and/or journal editors
Set a good example by publicly archiving your published data, even if your journal doesn't require it
Spread the word! Data is important!!!
1.) Author runs data file, gets results
2.) Author exports unusable reformatted data,
often spread over many pages into a .doc or .pdf file
It hampers:
- Discoverability
- Transparency
- Repeatability
- Future re-usage
- Education & Outreach
1.) Author runs data file, gets results
2.) Author submits exact same data file, with results in a useable format to a data repo or publisher
Smith, V., 2009. Data publication: towards a database of everything.
BMC Research Notes
1.) Not including all the data, referring reader to another paper
(very common in palaeontology) :(
e.g. Norell et al 2008. A New Platynotan Lizard (Diapsida: Squamata) from the Late Cretaceous Gobi Desert (Ömnögov), Mongolia. American Museum Novitates
http://dx.doi.org/10.1186/1756-0500-2-113
Why!!!
Thanks to:
NEXUS format: {01} or (01)
Hennig (xread) format: [01]
published format: A or $ or z
My supervisor Matthew A. Wills for supporting this talk and all other members of the
Fossils, Phylogeny and Macroevolution Research Group at Bath.
+ all authors who already archive their data in appropriate centralized databases
Phylogenetic data is extremely valuable
and should be treated as such, with proper
centralised online publicly-accessible databasing
(or any random single character, it's never consistent between papers)
http://bit.ly/f6Dxfo
Want to see this again? :
3.) Unusable data is typeset (not without error)
and published
Why miss-out on
extra citations?
Making your data
available increases
your chances of citation
through re-use of your data
Surely, this is a better model,
with less room for error?
http://onlinelibrary.wiley.com/doi/10.1111/j.1558-5646.2010.01182.x/abstract
Wouldn't it be great
if undergrads could re-analyse the latest data,
to learn cladistics?
3.) Typesetting errors: author submits correct data, publisher mangles it
(clickable link)
Support / Ideas / Feedback: ross.mounce@gmail.com
Why not ALL journals?
(less frequent but occurs many times every year)
The (Continued) Growth of Phylogenetic Information
Cladistics (Mirande, 2009)
In 1993, Sanderson et al (Syst. Biol.) tried to get a handle on
just how much new phylogenetic information was
being published each year (for the period 1989-1991).
They found 882 studies whilst acknowledging that they
didn't sample every journal, noticeably lacking palaeontological ones.
The rate of growth of information publishing was increasing rapidly
~50 more studies each and every year.
In 2010 the situation is far worse. There are easily thousands of novel
studies published each and every year of which only a tiny fraction
are archived in a centralized online electronic database.
I argue here, this needs to change
Q: How many Cladistic studies are there?
(novel, explicit, phylogenetic systematic analyses)
~1966 - 2010
Systematics requires an accumulated wealth of knowledge
The (continued) growth of
phylogenetic information
Where is the depot of phylogenetic information?
a critical discourse on data publishing and (lack of) archiving
MorphoBank
Treebase (II)
38 studies
~2500 studies
Have YOU ever tried to extract phylogenetic data from a paper?
Regrettably tiny. It's not their fault.
Authors just don't submit their work.
:(
Is this everything?
NO!
Best viewed in fullscreen
Fossils, Phylogeny and Macroevolution Research Group
40,000 ?
Data particularly lacking:
Invertebrates (morphological)
Vertebrates (morphological)
Palaeontological
A: Who knows?
40k is my estimate based upon simple extrapolation from Sanderson et al (1993) with adjustment for post-1993 literature growth
Try this one:
Zhu, M., Gai, Z.-K., 2006. Phylogenetic relationships of Galeaspids (Agnatha). Vertebrata PalAsiatica 44 (1), 1-27.
The top 15 journals
contributing data (1635) to TreeBase
03. Mol Phy Evo (228)
04. Syst Biol (156)
06. Molecular Ecology (61)
01. Syst Botany (443)
02. Mycologia (305)
05. Am J Botany (95)
07. Mycoscience (53)
08. Stud in Mycology (46)
09. Taxon (40)
10. Persoonia (39)
11. Plant Syst Evo (38)
12. J Phycology (35)
13. Fungal Diversity (34)
14. Ann Miss Bot Gar (31)
15. Mycological Progress (31)
General Journals
(within which may be botanical or mycological)
http://www.ivpp.cas.cn/cbw/gjzdwxb/xbwzxz/200810/t20081023_2385439.html
(clickable link)
Botany / Mycology
journals
(neo)
Vertebrates
Palaeontological
Invertebrates
It's an Open Access journal
it should be easy, right?
BUT
Q: What phylogeny programs read
.pdf .doc or .xls files?
Wrong! It's virtually impossible without re-typing every single cell
TREEBASE is full of trees!
A: NONE! Manual re-formatting required...
and it's NEVER just a matter of copy n paste
Severely under-represented data
What's the problem?
Isn't the data matrix and cladogram published with most papers?
Animalia*
Morphological
&
Paleo-morphological
...in a way, yes.
Usually buried in appendices and supplementary materials
* excluding mammals which are reasonably well covered (pers. comm. Bill Piel)
The absurdity of data publishing:
1.) not providing all the data
e.g Knoll 2010 Geological Magazine (sauropods)
Ignavusaurus phylogenetic analysis
provides coding data for Ignavusaurus (only)
see Smith & Pol 2007 for the rest of the matrix
find Smith & Pol 2007 -> matrix not printed in full
see Yates, 2007b (Special Papers in Palaeontology)
rather difficult to get hold of (no official electronic copies, paper only!)
Journals in which paleo-phylogenetic data has been published
(excluding the mainstream, and mainstream systematics journals)
None of which archive their data properly
Acta Paleontologica Polonica
Ameghiana
American Museum Novitates
Annals of Carnegie Museum
Antarctic Science
Bulletin of the American Museum of Natural History
Bulletin of the Peabody Museum of Natural History
Canadian Journal of Earth Sciences
Comptes Rendus Palevol
Contributions from the Museum of Paleontology, University of Michigan
Contributions in Science (NHM, LA)
Cretaceous Research
Fieldiana: Geology
Fossil Record
Geobios
Geodiversitas
Geological Magazine
Historical Biology
Journal of Human Evolution
Journal of Mammalian Evolution
Journal of Paleontology
Journal of Systematic Palaeontology
Journal of Vertebrate Paleontology
Naturwissenschaften
Neues Jahrbuch für Geologie und Paläontologie
Occasional Papers of the Natural History Museum University of Kansas
Palaeontology
Paläontologische Zeitschrift
Paleontological Research
Paleontology Electronica
Records of the Australian Museum
Revista Brasiliera de Paleontology
Revista del Museo Argentino de Ciencias Naturales
Senckenbergiana Lethaea
Vertebrata PalAsiatica