Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Introduction to TEI

Presented at the Utrecht Workshop "Digitising experiences of migration" – http://lettersofmigration.blogspot.fr/p/workshop-1-utrecht.html
by

Peter Stadler

on 23 June 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Introduction to TEI

Text
In the beginning was the document
A Text is not a Document
in the shape of letters and their layout?
in the original from which this copy derives?
in the stories we read into it?
or in its author’s intentions?
An Introduction to
14th century BC diplomatic letter in Akkadian, found in Amarna.
http://commons.wikimedia.org/wiki/File:Amarna_Akkadian_letter.png
Carl Maria von Weber to Caroline Brandt, May 25/26, 1817
Staatsbibliothek zu Berlin Preußischer Kulturbesitz, Mus.ep. Weber, C. M. v. 97
Cf. http://www.weber-gesamtausgabe.de/A041184
Gravestone (19th century) at the cemetery St. Laurentii/Föhr
http://commons.wikimedia.org/wiki/File:St._Laurentii_Grabstein_Nickelsen.jpg
Beginning of the Habakkuk Commentary or Pesher Habakkuk (1st century BC)
http://commons.wikimedia.org/wiki/File:Habakkuk_Pesher.png
Where is the text?
A "document" is something that exists in the world, which we can digitize.
A "text" is an abstraction, created by or for a community of readers, which we can encode.
aka "Text Bearing Object"
Digital Objects
Encoding
Makes explicit (to a machine) what is implicit (to a person)
Makes text processable by a machine
Allows for semantic enrichment
Credits and Links
TEI@Oxford: http://tei.oucs.ox.ac.uk/Oxford/ (Sebastian Rahtz, James Cummings et al.)
TEI by Example: http://www.teibyexample.org (Edward Vanhoutte, Melissa Terras, Ron van den Branden et al.)
TEI Guidelines: http://www.tei-c.org/Guidelines/P5/
Character Encoding
A mapping of glyphs to computer code
E.g. ASCII (American Standard Code for Information Interchange)
E.g. UTF-8 (Unicode)
Facilitates storage and transmission of electronic texts
The very foundation of Text Processing
Text Encoding
Adding meta information to the stream of text
Descriptive markup = separating, naming and describing units of text in a formalized way
The application of markup to a document can be an intellectual activity
In deciding what markup to apply, and how this represents the original, one is undertaking the task of an editor
There is (almost) no such thing as neutral markup – all of it involves interpretation
Markup can assist in answering research questions, and the deciding what markup is needed to enable such questions to be answered can be a research activity in itself
Good textual encoding is never as easy or quick as people would believe
Detailed document analysis is needed before encoding for the resulting markup to be useful
Basic XML Rules
An XML element consists of a start tag and an end tag
Tag names are case sensitive
An XML element may have attributes
An attribute has a value which is delimited by quotes
There is only one root element to an XML document
XML elements must be properly nested
Basic XML Example
An XML Quiz
<greeting>Hello world!</greeting>
<greeting>Hello world!</Greeting>
<greeting><grunt>Ho</grunt> world!</greeting>
<grunt>Ho <greeting>world!</greeting></grunt>
<greeting><grunt>Ho world!</greeting></grunt>
<grunt type=loud>Ho</grunt>
<grunt type="loud" ></grunt>
Validity
An XML document must be well-formed and may be valid!
XML
XML = eXtensible Markup Language
XML is application-, platform-, and vendor- independent
XML is an industry standard
The XML standard is maintained by the World Wide Web Consortium (W3C)
Based on Unicode, makes it ideal for long term preservation
Many formats are based on XML: MusicXML, MEI, SVG, MathML, XHTML, EAD, TEI
Welcome to <placeName>Utrecht</placeName>
Welcome to <placeName>Utrecht</placeName>
A valid XML document conforms to rules that are expressed in an external schema.
The schema specifies:
The name of the root element
The names of all elements
Names, data types and possibly default values for attributes
Rules for nesting of elements

The TEI provides such a schema for text encoding and interchange
Text Encoding Initiative (TEI)
TEI = a non-profit international consortium, backed by a large community
TEI = guidelines (recommendations) for text encoding and interchange, along with examples and discussions
TEI = an XML schema, the formal specification of the TEI Guidelines
The XML Tree
The Institution
1987: First meeting of a diverse group of scholars at Vassar College leading
to the intellectual foundation for the Text Encoding Initiative
1990: Release of the first draft (known as "P1") of the Guidelines
1994: First official version of the Guidelines ("P3")
1999–2001: Establishment of the Consortium for the Maintenance of the
Text Encoding Initiative
2007: Release of the (current) Guidelines P5
The Community
Get involved!
The TEI Guidelines
Text – what to keep in mind
There are documents and there are texts.
Texts are (mostly) transmitted via documents, inheriting (to some degree) the boundaries and limitations of the carrier medium.
The TEI is made and maintained by scholars, for the use of scholars.
Mailing lists: TEI-L being the main list, but most SIGs run their own lists as well
SIGs (Special Interest Groups): covering several areas, e.g. correspondence, education, manuscripts, libraries.
Membership: institutional members and individual subscribers
There are different readings to a text.
In a scholarly sense: there are different research questions to a text.
Basic XML example revisited
The TEI Guidelines
Modules
TEI schemas
There is not one TEI schema, in fact customization is encouraged!
The formal TEI specification consists of several distinct modules.
Each module declares a set of elements and attributes and is explained in a Guideline chapter.
analysis
certainty
core
corpus
dictionaries
drama
figures
gaiji
header
iso-fs
linking
msdescription
namesdates
netsspoken
tagdocs
tei
textcrit
textstructure
transcr
verse
To start with, you'll find several pre-customized schemas on the TEI website:
Tite
Lite
Drama
Speech
All
The Benefits
Easy transformation of particular information to several other meta data standards: Dublin Core, METS/MODS, etc.
Easy reporting: Print all distinct <persName>s from my letter corpus
Easy searching: Find all <author>s that start with a "C"
Easy transformation to a variety of output formats: PDF, eBook, HTML
What's next?
Identification of entities (places, persons, etc.)
Utilizing controlled vocabularies
Getting connected to the "Linked Data Cloud"
Your data is ready for the application of various sexy tools: Geo Information Systems (GIS), timelines, network analysis, etc.
From Text to Data
What really matters
Documents or Texts?
Usually a customization is created by removing unnecessary elements. But it is also possible to add new features via a customization.
Full transcript