Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
BOAI-compliant Open Access & content mining
Transcript of BOAI-compliant Open Access & content mining
'content' includes text, numbers, tables, images, video, audio, bibliographic data & metadata (thus we can mine, and republish them) The 'tree of life' Cutting-edge content mining: applications more than 100,000
of phylogeny have
most cover very
small parts of the
tree of life The best collection of tree data so far, only has data from <3,000 publications
(it relies on authors to deposit their data) Moreover, even in 2010 the rate of data deposition was only ~ 4% Stoltzfus A, O'Meara B, Whitacre J, Mounce R, Gillespie E, Kumar S, Rosauer D, Vos R
Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis. BMC Research Notes (accepted) Extracting data from figures Download 1000s of papers "feed" them to mining scripts the scripts interpret the text, and
identify useful figures that contain data mining *text* is relatively easy getting data from
is harder but doable it helps if the figures are vector,
NOT raster graphics. More on this next... Vector graphics should be mandatory - NO rasters It can be relatively simple to re-extract information from vector graphics. But is seems that the majority(?) of digitally published diagrams examined so far are rasters http://en.wikipedia.org/wiki/File:Agapornis_phylogeny.svg We have the technology & capacity to do this http://www.guardian.co.uk/science/2012/may/23/text-mining-research-tool-forbidden ...but it seems like we might get into legal troubles if
we apply this to some subscription access content Peter Murray-Rust once got Cambridge access cut-off,
after attempting to mine some literature What are subscribers allowed to do with content?
(not much it seems) Only CC BY literature is 'safe' to mine "Subscriber shall not use spider or web-crawling or other software programs, routines, robots or other mechanized devices to continuously and automatically search and index any content accessed online under this Agreement" From an Elsevier subscription agreement (2011) http://blogs.ch.cam.ac.uk/pmr/2011/11/25/the-scandal-of-publisher-forbidden-textmining-the-vision-denied/ and the excuses given are fanciful e.g. "...platforms would collapse under the technological weight of crawler-bots... [like a] denial-of-service attack" Richard Mollet, Publishers Association http://www.publishers.org.uk/index.php?option=com_content&view=article&id=1929:content-mining-free-for-all-would-be-bad-for-al&catid=499:general&Itemid=1608 Thus science needs Open Access not just 'free access' many 'open access' journals are not explicitly licensed to allow re-use I'm working with Peter Murray-Rust
to extract open data from research literature *do* read his excellent blog: http://blogs.ch.cam.ac.uk/pmr/ Conclusions with content mining we can salvage otherwise 'lost' data - this is immensely valuable we can synthesise data from millions of papers to better harness ALL previous research without doubt content mining will be increasingly applied in research across all domains of academia (its not just of use in biomedical research!) Independent reviews such as the Hargreaves Report recommend that the potential benefits of mining are so great that exceptions should be made to Copyright law especially to allow mining. Explicitly 'mining-friendly' licenses such as CC BY must be used to publish all future research - so one must be careful to define Open Access (BOAI/BBB). The Budapest Open Access Initiative (BOAI) By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited. http://www.soros.org/openaccess/read Feb 14th, 2002 @RMounce