Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Basic TEI Structure and Core Elements

Basic TEI Structure and Core Elements for an introductory TEI workshop. License: CC+By; http://tinyurl.com/jc-BasicTEIandCore

James Cummings

on 11 October 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Basic TEI Structure and Core Elements

Basic TEI Structure and Core Elements
What is common to all these materials?
Identification information (e.g. page numbers, shelfmarks, titles...)
Divisions and subdivisions of individual documents (sometimes signalled by headings)
Pictures and diagrams (with captions)
A number of writing modes or registers (prose, verse, drama, etc.)
Each with formal structural units (such as paragraphs, lists, stanzas, lines, speeches, stage-directions)
Which contain interesting textual distinctions (titles, headings, names, quotations ...) sometimes signalled by a change in rendition
there are also metatextual indications (corrections, deletions, annotations, revisions, etc.)
What sort of texts can the TEI cope with?
The TEI takes a generalistic approach to overall text structure and this means it should be able to cope with texts of any size, language, date, complexity, writing system, or media.
Basic Structure (1)
What is a <text>?
A text may be
forming an organic whole
consisting of several components which are in some important sense independent of each other
What's inside the <body>?
Usually, there are
of various kinds

which may nest one inside another
which have a @type for example "chapter", "subsection" etc.
and possibly a name or number of some kind (for which we use the @n attribute)
TEI Document Structure(s)
A TEI document is represented by means of a
element which contains both data and metadata
A sequence of
elements which share metadata can also be brought together under a
(paragraph) marks paragraphs in prose

Fundamental unit for prose texts
can contain all the phrase-level elements in the core module
can appear directly inside
or inside
'Core' Elements
The so-called 'core' module of the TEI groups together elements which may appear in any kind of text. This includes:
highlighting, emphasis and quotation
simple editorial changes
basic names numbers, dates, addresses
simple links and cross-references
lists, notes, annotation, indexing
reference systems, bibliographic citations
simple verse and drama
This could be in any
: books, journals, manuscripts, postcards, letters, rolls of papyrus, clay tablets, web pages, gravestones, etc. and contain any
of text.
The core module of the TEI can cope with this and more!
The <TEI> document may be of any size or internal complexity, a postcard or a multi-volume encyclopedia
This might be used for a document of any size, form or type.
an optional
element containing front matter (titlepage, preface etc.)
a mandatory
element containing the body of the
an optional
element containing additional back matter
Unitary Texts Contain:
Composite Texts Contain:
matter relating to the composite
element, containing at least one
, groups can nest to form sub-groups
matter relating to the composite
Unitary vs Composite
Basic <teiCorpus> Structure
Each <TEI> element could represent a new document in the collection, versions of a text, or sample in a language corpora, etc.
Two other useful
attributes are:
for indicating ISO language code
for providing a unique ID code

By highlighting we mean the use of any combination of
typographic features (font, size, hue, etc.) in a printed
or written text in order to distinguish some passage of a
text from its surroundings. Often highlighting indicates words and phrases which are:
distinct in some way (e.g. foreign, archaic, technical)
emphatic or stressed when spoken
not really part of the text (e.g. cross references, titles, headings)
a distinct narrative stream (e.g. an aside, internal monologue, commentary)
attributed to some other agency inside or outside the text (e.g. direct speech, quotation)
set apart in another way (e.g. proverbial phrases, words mentioned but not used)
<hi> (general purpose highlighting);
<distinct> (linguistically distinct)
Quotation marks can be used to set off text for many reasons; the TEI has the following elements:
(separated from the surrounding text with quotation marks)
(speech or thought)
(passage attributed to an external source)
(groups a quotation and citation)
Simple Editorial Changes
Additions, Deletions, and Ommisions
(addition to the text, e.g. marginal gloss)
(phrase marked as deleted in the text)
(indicates point where material is omitted)
(contains text unable to be transcribed clearly)
Basic Names
(a name in the text, contains a proper noun or noun phrase)
(a referencing string )
attribute is useful for categorizing these, and they both also have
, and
(an electronic mail address)
(a postal address)
(a non-specific address line)
(a full street address)
(a postal (or zip) code)
(a postal box number)
can also be used
and the 'namesdates' module extends this with more geographic names
Basic Numbers and Measures
(marks a number of any sort)
(marks a quantity or commodity)
(groups specifications relating to a single object)

has simple
contains a date in any format and includes
attribute for a regularised form
attribute to specify what calendar system
(and many other attributes)
contains a time in any format and includes
attribute for a regularised form
Global Attributes
Some features (potentially) apply to everything:

TEI provides global attributes for these:
provides a unique identifier for any element;
provides a name or number for any element
specifies the language of any element, using an ISO
standard code
, and
provide ways of specifying the original visual appearance (rendition) of any element
points to the agency responsible for some aspect of an encoding
classifies the level of certainty of the interpretation
Components of a Division
What do you find inside a division?
one or more headings tagged with
at the start or
at the bottom
which may be organized as a sequence of
(paragraphs) or
elements containing
divided into
(metrical lines) perhaps grouped into
(line-groups like stanzas)
divided into
(speeches) perhaps with a
and a mix of
elements. Also maybe some

With additional modules of the TEI you can have other content in divisions. For example in dictionaries you might have
elements, spoken corpora might have
(utterances), etc.
defines a pointer to another location
defines a reference to another location, with optional linking text
Both elements have a
attribute taking a URI reference to point to things
If the linking text is able to be generated one could use
instead of
(a sequence of items forming a list)
(one component of a list)
(label associated with an item)
(heading for column of labels)
(heading for column of items)
<note> (contains a note or annotation)
Notes can be those existing in the text, or provided by the editor of the digital text
A @place attribute can be used to indicate the physical location of the note
Notes should usually be encoded where its identifier/mark first appears
Foreign Phrases
(indicates the location of an inline graphic, illustration, or figure)
(encoded binary data embedding a graphic or other object)
The figure module provides
for more complex graphics and figures
Simple Verse
Simple Drama
Bibliographic Citations
Other similar elements include:
(groups alternative editorial encodings)
(apparent error)
(corrected error)
(original form)
(regularized form)
(abbreviated form)
(expanded form)
The names and dates module of the TEI vastly expands the number of name/person/place/org related elements available.
The above list in TEI:
marks phrases belonging to some language other than that of the surrounding text
attribute is used to indicate the ISO language code for the element content
(contains one or more verse lines functioning as a formal unit, e.g. a stanza, refrain, verse paragraph, etc.)
(contains a single, possibly incomplete, line of verse)
More elements for metrical analysis are provided with the 'verse' module.
If converting an existing index, use nested lists. For
auto-generated indexes:
(marks an index entry) with optional
element is used to mark a term inside an
element can self-nest for hierarchical index entries
an individual speech in a performance text, or a passage presented as such in a prose or verse text
a specialized form of heading or label, giving the name of one or more speakers in a dramatic text or fragment
a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged
Thanks to: Lou Burnard, Sebastian Rahtz, and many more in the TEI Community
Complete List of Dating Attributes
@calendar: pointer (like #Gregorian) to a calendarDesc/calendar
@period: pointer (like #Hellenistic) to a description of that period


@when: YYYY-MM-DD fixed point (like 1499-02-10, 0300-05, -0100-07-13, 0500)
@notBefore: YYYY-MM-DD fixed earliest possibility for uncertain date
@notAfter: YYYY-MM-DD fixed latest possibility for uncertain date
@from: YYYY-MM-DD fixed start point of date range
@to: YYYY-MM-DD fixed end point of date range

(@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)

(@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)

@dur: A W3C duration formulation such as "P1DT12"

for structured bibliographic citations
Full transcript