Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Basic TEI Structure and Core Elements

Basic TEI Structure and Core Elements for an introductory TEI workshop. License: CC+By; http://tinyurl.com/jc-BasicTEIandCore
by

James Cummings

on 11 October 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Basic TEI Structure and Core Elements

Basic TEI Structure and Core Elements
@jamescummings
http://tinyurl.com/jc-BasicTEIandCore
CC+BY
What is common to all these materials?
Identification information (e.g. page numbers, shelfmarks, titles...)
Divisions and subdivisions of individual documents (sometimes signalled by headings)
Pictures and diagrams (with captions)
A number of writing modes or registers (prose, verse, drama, etc.)
Each with formal structural units (such as paragraphs, lists, stanzas, lines, speeches, stage-directions)
Which contain interesting textual distinctions (titles, headings, names, quotations ...) sometimes signalled by a change in rendition
there are also metatextual indications (corrections, deletions, annotations, revisions, etc.)
What sort of texts can the TEI cope with?
The TEI takes a generalistic approach to overall text structure and this means it should be able to cope with texts of any size, language, date, complexity, writing system, or media.
Basic Structure (1)
What is a <text>?
A text may be
unitary
or
composite
:
unitary:
forming an organic whole
composite:
consisting of several components which are in some important sense independent of each other
What's inside the <body>?
Usually, there are
divisions
of various kinds
<div>

which may nest one inside another
which have a @type for example "chapter", "subsection" etc.
and possibly a name or number of some kind (for which we use the @n attribute)
TEI Document Structure(s)
A TEI document is represented by means of a
<TEI>
element which contains both data and metadata
A sequence of
<TEI>
elements which share metadata can also be brought together under a
<teiCorpus>
element
Paragraphs
<p>
(paragraph) marks paragraphs in prose

Fundamental unit for prose texts
<p>
can contain all the phrase-level elements in the core module
<p>
can appear directly inside
<body>
or inside
<div>
(divisions)
'Core' Elements
The so-called 'core' module of the TEI groups together elements which may appear in any kind of text. This includes:
paragraphs
highlighting, emphasis and quotation
simple editorial changes
basic names numbers, dates, addresses
simple links and cross-references
lists, notes, annotation, indexing
graphics
reference systems, bibliographic citations
simple verse and drama
This could be in any
form
: books, journals, manuscripts, postcards, letters, rolls of papyrus, clay tablets, web pages, gravestones, etc. and contain any
type
of text.
The core module of the TEI can cope with this and more!
The <TEI> document may be of any size or internal complexity, a postcard or a multi-volume encyclopedia
This might be used for a document of any size, form or type.
an optional
<front>
element containing front matter (titlepage, preface etc.)
a mandatory
<body>
element containing the body of the
document
an optional
<back>
element containing additional back matter
Unitary Texts Contain:
Composite Texts Contain:
optional
<front>
matter relating to the composite
a
<group>
element, containing at least one
<text>
, groups can nest to form sub-groups
optional
<back>
matter relating to the composite
Unitary vs Composite
Basic <teiCorpus> Structure
Each <TEI> element could represent a new document in the collection, versions of a text, or sample in a language corpora, etc.
Two other useful
global
attributes are:
@xml:lang
for indicating ISO language code
@xml:id
for providing a unique ID code

Highlighting
By highlighting we mean the use of any combination of
typographic features (font, size, hue, etc.) in a printed
or written text in order to distinguish some passage of a
text from its surroundings. Often highlighting indicates words and phrases which are:
distinct in some way (e.g. foreign, archaic, technical)
emphatic or stressed when spoken
not really part of the text (e.g. cross references, titles, headings)
a distinct narrative stream (e.g. an aside, internal monologue, commentary)
attributed to some other agency inside or outside the text (e.g. direct speech, quotation)
set apart in another way (e.g. proverbial phrases, words mentioned but not used)
Highlighting
<hi> (general purpose highlighting);
<distinct> (linguistically distinct)
Quotation
Quotation marks can be used to set off text for many reasons; the TEI has the following elements:
<q>
(separated from the surrounding text with quotation marks)
<said>
(speech or thought)
<quote>
(passage attributed to an external source)
<cit>
(groups a quotation and citation)
Simple Editorial Changes
Additions, Deletions, and Ommisions
<add>
(addition to the text, e.g. marginal gloss)
<del>
(phrase marked as deleted in the text)
<gap>
(indicates point where material is omitted)
<unclear>
(contains text unable to be transcribed clearly)
Basic Names
<name>
(a name in the text, contains a proper noun or noun phrase)
<rs>
(a referencing string )
The
@type
attribute is useful for categorizing these, and they both also have
@ref
, and
@nymRef
attributes
Addresses
<email>
(an electronic mail address)
<address>
(a postal address)
<addrLine>
(a non-specific address line)
<street>
(a full street address)
<postCode>
(a postal (or zip) code)
<postBox>
(a postal box number)
<name>
can also be used
and the 'namesdates' module extends this with more geographic names
Basic Numbers and Measures
<num>
(marks a number of any sort)
<measure>
(marks a quantity or commodity)
<measureGrp>
(groups specifications relating to a single object)

While
<num>
has simple
@type
and
@value
attributes,
<measure>
has
@type
,
@quantity
,
@unit
and
@commodity
attributes
Dates
<date>
contains a date in any format and includes
a
@when
attribute for a regularised form
a
@calendar
attribute to specify what calendar system
(and many other attributes)
<time>
contains a time in any format and includes
a
@when
attribute for a regularised form
Global Attributes
Some features (potentially) apply to everything:
identity
language
rendition
responsibility

TEI provides global attributes for these:
@xml:id
provides a unique identifier for any element;
@n
provides a name or number for any element
@xml:lang
specifies the language of any element, using an ISO
standard code
@rend
,
@style
, and
@rendition
provide ways of specifying the original visual appearance (rendition) of any element
@resp
points to the agency responsible for some aspect of an encoding
@cert
classifies the level of certainty of the interpretation
Components of a Division
What do you find inside a division?
Headings:
one or more headings tagged with
<head>
at the start or
<trailer>
at the bottom
Prose:
which may be organized as a sequence of
<p>
(paragraphs) or
<list>
elements containing
<item>
elements
Poetry:
divided into
<l>
(metrical lines) perhaps grouped into
<lg>
(line-groups like stanzas)
Drama:
divided into
<sp>
(speeches) perhaps with a
<speaker>
and a mix of
<p>
or
<l>
elements. Also maybe some
<stage>
directions.

With additional modules of the TEI you can have other content in divisions. For example in dictionaries you might have
<entry>
elements, spoken corpora might have
<u>
(utterances), etc.
Linking
<ptr>
defines a pointer to another location
<ref>
defines a reference to another location, with optional linking text
Both elements have a
@target
attribute taking a URI reference to point to things
If the linking text is able to be generated one could use
<ptr>
instead of
<ref>
Lists
<list>
(a sequence of items forming a list)
<item>
(one component of a list)
<label>
(label associated with an item)
<headLabel>
(heading for column of labels)
<headItem>
(heading for column of items)
Notes
<note> (contains a note or annotation)
Notes can be those existing in the text, or provided by the editor of the digital text
A @place attribute can be used to indicate the physical location of the note
Notes should usually be encoded where its identifier/mark first appears
Foreign Phrases
Graphics
<graphic>
(indicates the location of an inline graphic, illustration, or figure)
<binaryObject>
(encoded binary data embedding a graphic or other object)
The figure module provides
<figure>
and
<figDesc>
for more complex graphics and figures
Indexing
Simple Verse
Simple Drama
Bibliographic Citations
Other similar elements include:
<emph>
<mentioned>
<soCalled>
<term>
<gloss>
<choice>
(groups alternative editorial encodings)
Errors:
<sic>
(apparent error)
<corr>
(corrected error)
Regularization:
<orig>
(original form)
<reg>
(regularized form)
Abbreviation:
<abbr>
(abbreviated form)
<expan>
(expanded form)
The names and dates module of the TEI vastly expands the number of name/person/place/org related elements available.
The above list in TEI:
<foreign>
marks phrases belonging to some language other than that of the surrounding text
a
@xml:lang
attribute is used to indicate the ISO language code for the element content
<lg>
(contains one or more verse lines functioning as a formal unit, e.g. a stanza, refrain, verse paragraph, etc.)
<l>
(contains a single, possibly incomplete, line of verse)
More elements for metrical analysis are provided with the 'verse' module.
If converting an existing index, use nested lists. For
auto-generated indexes:
<index>
(marks an index entry) with optional
@indexName
attribute
The
<term>
element is used to mark a term inside an
<index>
element
The
<index>
element can self-nest for hierarchical index entries
<sp>
an individual speech in a performance text, or a passage presented as such in a prose or verse text
<speaker>
a specialized form of heading or label, giving the name of one or more speakers in a dramatic text or fragment
<bibl>
a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged
Thanks to: Lou Burnard, Sebastian Rahtz, and many more in the TEI Community
Complete List of Dating Attributes
att.datable
@calendar: pointer (like #Gregorian) to a calendarDesc/calendar
@period: pointer (like #Hellenistic) to a description of that period


att.datable.w3c

@when: YYYY-MM-DD fixed point (like 1499-02-10, 0300-05, -0100-07-13, 0500)
@notBefore: YYYY-MM-DD fixed earliest possibility for uncertain date
@notAfter: YYYY-MM-DD fixed latest possibility for uncertain date
@from: YYYY-MM-DD fixed start point of date range
@to: YYYY-MM-DD fixed end point of date range


att.datable.iso
(@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)


att.datable.custom
(@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)

att.duration
att.duration.w3c
@dur: A W3C duration formulation such as "P1DT12"
att.duration.iso
(@dur-iso)

<biblStruct>
for structured bibliographic citations
Full transcript