Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Digitizing Text (Digital Jumpstart Workshop, March 3, 2011)

digital jumpstart workshop presentation

Brian Rosenblum

on 9 March 2011

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Digitizing Text (Digital Jumpstart Workshop, March 3, 2011)

Why Digitize? + Digital Formats for Text Page Images Digital Text How to digitize text Scanning or photographing OCR Keyboarding Markup Access Preservation Research Time, effort and money More than just digitization.... - access for remote users
increased visibility for your content
support open access to resources for
scholars and the general public preserve fragile material Searching--find things quickly
Create large collections of text documents for searching
Open up new types of research and content cataloging metadata creation administrative &
planning costs quality control ongoing preservation and maintenance of files & software preserves visual represenation
of the physical page

Good for representing handwritten manuscripts or documents with marginalia or complex layouts

Can be larger file sizes

Can not be searched and navigated
in the same way as digital text searchable, copy & pasteable

simple and small file format

can combine into larger collections of
documents for searching and analysis

hard to represent visual and non-textual elements

expensive to get 100% accuracy a few words here about unstructured vs structured text, and text mark-up, especially: HTML (HyperText Markup Language),
XML (eXtensible Markup Language)
TEI (Text Encoding Initative) structured digital text Some Examples OCR Software

(is a very simple, one page at a time, OCR program for Windows only)

Adobe Acrobat:
(Has a built in OCR function)

Abbyy FineReader:
(Professional OCR program. A free trial download is available.) often the fastest and cheapest option
not 100% accurate
works well on clean pages with contemporary fonts
does not work as well on non-Latin characters, complex page layouts, math and formulas, text formatting such as italics
Does not caputre much structure when you need 100% accuracy
retyping the page from scratch is often the best way

Tools TEI by Example
http://tbe.kantl.be/TBE/TBE.htm The Diary of Henry Machyn (1550 - 1563) Google Books http://goo.gl/85gEw http://goo.gl/bSYpz British History Online University of Michigan Critical Edition http://porter.umdl.umich.edu/m/machyn/ 3 versions http://tbe.kantl.be/TBE/examples/TBED00v00.htm Example: Crowdsourcing http://goo.gl/ku08s Transcribe Bentham Text Editors:
(for Mac) TextWrangler or BBEdit or TextMate
(for Windows) Notepad++

XML Editor
oXygen (use either XML Author or XML Editor, free trial versions available)
http://oxygenxml.com/xml_author.html Digitizing Text
Digital Jumpstart Workshop
March 3, 2011 http://quod.lib.umich.edu/d/did/ Example:
Diderot's Encyclopedia - collaborative translation project
Full transcript