Nullius in Calculo

A talk given at the 8th Systematics Association biennial meeting (Belfast), 8th July 2011. Talk abstract can be found here (in the timetable pdf): http://www.systass.org/biennial2011/

Ross Mounce

on 8 July 2011

Transcript of Nullius in Calculo

a talk by Ross Mounce for SystAss 2011, Belfast Explicitness Reproducibility On the and of Cladistic Analyses @rmounce Problem #1
Studies published without adequate supporting data Wiegmann, B. M. et al. (2011) Episodic radiations in the fly tree of life. PNAS 108, 5690-5695 Where is the data? GenBank accession numbers Morphological data matrix DISCLAIMER NOTICE! The following examples are just for illustration
to evidence my points

I mean no personal criticism

There are indeed many such examplar papers
I could have used for this demonstration of
explicitness and reproducibility (still the case even >3 months after publication,
and after notifying the editors and authors) Problem #1
Studies published without adequate supporting data Morphological data matrix (in its entirety) take a look yourself: http://www.pnas.org/content/early/2011/03/10/1012675108.abstract Kammerer, C. F. (2011) Systematics of the anteosauria (Therapsida: Dinocephalia). Journal of Systematic Palaeontology 9, 261-304 ? ? ? ? ? ? *swiftly corrected by Paul D. Taylor (ed.) upon notification author supplied the matrix, but it somehow didn't get added
to the online supplementary materials Problem #2 "There's a glitch in the matrix" Happened again with: Mayr, G. (2011) Journal of Systematic Paleontology p159-171 Williamson, T. E. & Weil, A. (2011)
A new puercan (early paleocene) hyopsodontid ” condylarth” from new mexico.
Acta Palaeontologica Polonica 56, 247-255 Desojo, J. B., Ezcurra, M. D. & Schultz, C. L. An unusual new archosauriform from the Middle–Late triassic of southern brazil and the monophyly of doswelliidae. Zoological Journal of the Linnean Society 161, 839-871 (2011) Agnolin, F. L. & Novas, F. E. Unenlagiid theropods: are they members of the dromaeosauridae (theropoda, maniraptora)? Anais da Academia Brasileira de Ciências 83 (2011) Palci, A. & Caldwell, M. W. Erratum: Redescription of ateosaurus tommasinii von meyer, 1860, and a discussion of evolutionary trends within the clade ophidiomorpha. Journal of Vertebrate Paleontology 30, 990-991 (2010) Missing characters, editor notified Various problems
(Pittman & Mounce, in prep) printed phylogeny does not match matrix,
contacted author
printed matrix does not match analysed matrix Error published in issue 1 -> erratum printed in issue 3 The more I look, the more I find... Makovicky et al + Li et al 2010
used same matrix, (but 2 separate papers)
so technically, the errors escaped peer review... twice! link to this presentation http://bit.ly/nullius - "Take nobody's word for it" Repeated problems of Systematics publishing 1.) Articles without adequate supporting data

2.) Articles with data containing significant 'glitches'
(rendering data unusable, results unfalsifiable)

3.) Articles with significant errors of analysis Matrix + Description of Matrix (characters)
Explicit Method of Analysis
Phylogeny I like these journals / authors / articles ...but why are there errors everywhere? What's the point of printing
a data matrix if it is:

a) incorrect

b) unusable

c) ambiguous e.g. unexplained
polymorphisms - what is '$' e.g. an image of a table printed sideways in a pdf e.g. not the matrix analysed 24 Feb 2011 An armoured cambrian lobopodian from China with arthropod-like appendages http://dx.doi.org/10.1038/nature09704 2 sec TNT re-analysis submitted 3rd March
accepted 24th May
out ?? July Reviewing the data and analysis a) there are problems everywhere,
in every journal, even the 'best' c) it takes SECONDS to check
any morphological MP analysis b) these problems happen with
good frequency as I have shown (with TNT) Thus, I propose: 'On the calculations of no-one...'
aka mandatorily checking data as part of peer review Phylogenetics is a data driven science;
garbage in, garbage out http://bit.ly/phylodata we should check and archive the data we produce
For more on why we should archive see (Lack of) Explicitness how are gaps treated? branch collapsing settings? uniformative characters excluded? Is it possible to EXACTLY replicate most analyses? I'm certainly not the first
to question data quality... "As soon as we got into Doug Begle’s matrix, we got a sinking feeling, because there seemed to be an awful lot of mistakes, some of them very obvious or well known characters." ...therefore why not explicitly review data? -- Colin Patterson; Systematics Association lecture, 1995 from 'Adventures in the Fish Trade' Zootaxa 2011

with thanks to D. M. Williams for sharing this manuscript "...it seems that no one - no colleague, no examiner, no referee, no editor thought it necessary to glance at the data to see if they made sense. So I feel that what’s at fault here is not just one individual, but the whole system" -- Colin Patterson; Systematics Association lecture, 1995 I couldn't have said it any better.

Whilst Colin, was perhaps talking about just one instance,
I think his quote can be applied more generally. Acknowledgements many thanks to... my supervisor Matthew Wills
my lab: The Fossils, Phylogeny & Macroevolution Group
@ University of Bath (Anne, Martin & Sylvain)

for comments and suggestions arising from my draft abstract:
David M. Williams and J. Salvador Arias

the Systematics Association for organising this conference
and awarding me a student bursary

...and thank you for watching ~300 papers using ILD test all published 2009-2010 treatment of gaps -> 5th, ?, or MCIC... uninformative characters excluded?
(for the ILD test, they should be!) branch collapsing settings 1% 50% 13% explicit? Exact settings matter! yet we often don't report them all.

Why? This hinders re-use / progress http://www.mapress.com/zootaxa/2011/f/z02946p136f.pdf
Full transcript