Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.




Keri Thompson

on 2 August 2011

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Macaw

Developing Metadata Collection and Workflow Software for In-House Mass Digitization of Texts 84,000 items/ 31 million pages scanned in 2 years
11,560 items/4.5 million pages scanned by SIL
46TB of files generated so far (hi-res images are JPG2000) Descriptive metadata supplied by partner libraries
Scanning is (mostly) done by the Internet Archive
Digitized books are available at archive.org/details/biodiversity
BHL has it's own interface hosted at Missouri Botanical Garden "Boutique" Scanning Operation one hi-res scanning back camera on a copy stand one person enters title level metadata and administrative workflow information *manually* into an Access database one person manually creates page level descriptive and structure metadata in Excel This works fine for a small book or a few pages of a larger book But what if you want to scan 800-1200 pages a day? If only it were that easy. New system and workflow requirements metadata you want to capture metadata you have time to capture it's not a magic unicorn it's.... Ornithology, Pl. 20 Plate captions
1. Aprosmicturs splendens. Peale;
2. Aprosmictus personatus. (G.R. Gray.); Artist: T.R. [Titian Ramsay] Peale Lithograph Artist: T. House Parrot Fiji Birds United States South Seas Exploring Expedition Minimum amount of metadata needed to scan and deliver book-like items Keri Thompson
Smithsonian Institution Libraries our mass-scanning (Internet Archive) equipment can't accommodate folios or books that are wider than they are tall
we want to store/manage our own scans for peace of mind and so we can easily repurpose data and images
we need to ramp up our in-house digitizing capability so we can digitize and deliver books for other disciplines ??? Macaw workflow Macaw
Metadata Collection and Workflow
System Thank You an English transcript of this presentation can be provided upon request. Presenter:
Keri Thompson
Smithsonian Institution Libraries
thompsonk@si.edu not tied to one specific repository or delivery system
uses common, open source tools (php, SQL) and communication standards (Z39.50, http)
modular and extensible
designed to work for books, but can accomodate other object models
works with SIRIS but can use other metadata sources capture and re-package metadata and images for use in other systems
enable creation of METS
embed metadata in images following SI common model
capture additional page level (image) metadata for internal uses bonus features: title level descriptive data The Problem(s) consortium of 12 natural history museum libraries, botanical libraries, and research institutions organized to digitize, serve, and preserve the legacy literature of biodiversity. meet Internet Archive and BHL image and metadata needs
automate capture of as much metadata as possible
make it easy to input that which can't be captured automatically
automate file transfer and manipulation minimum requirements: Macaw developer:
Joel Richard
Smithsonian Institution Libraries
richardjm@si.edu item or piece level descriptive data page level (image level) descriptive and structural data scanning workflow - simplified provenance or copyright information volume/issue information special marks, bookplates the Title publication information page number (as it appears on the page) marginalia page sequence
(implied order in which this page image appears in the original book, including blank pages)
Full transcript