Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
An Overview of TEI Metadata
Transcript of An Overview of TEI Metadata
often called "data about data"
term originally used only with electronic data but its meaning has broadened
data about the content, context, and structure of information resources
the catalogue record of the data
The General Purpose of Metadata
In general metadata is meant to:
support the identication, retrieval, use, re-use, management, and preservation of information resources
enriches the informational value of an object
can describe a collection, a single resource, or a component part of a larger resource
TEI requires some of its metadata to be stored inside the XML document, prefixed to the content. This information comprises the TEI header
although, as we will see, some can be included inside the <body> or pointed to outside the document. It is:
used to store bibliographical information about both the electronic version(s) of the text as well as any physical, or analogue, source(s)
basic information is similar to library cataloguing and supports interroperability with other metadata standards
much like an electronic version of a title page attached to a printed work
An Overview of TEI Metadata
Thanks to Lou Burnard, Sebastian Rahtz, and many others of the TEI Community
The TEI Header
The TEI header was designed with two goals in mind
needs of bibliographers and librarians trying to document ‘electronic books’
needs of text analysts and digital editors trying to document ‘coding practices’ within digital resources
The result is that discussion of the header tends to be pulled in two directions...
The Librarian's Header
Conforms to standard bibliographic model, using similar terminology
Organized as a single source of information for bibliographic description of a digital resource, with established mappings to other such records (e.g. MARC, EAD, etc.)
General consensus from 'Best Practice for TEI for Libraries' from TEI-LIB SIG
Pressure for greater and more exact constraints to improve precision of description
In general, a preference for
The Digital Editor's Header
Gives a polite nod to common bibliographic practice, but has a far wider scope
Supports a (potentially) huge range of very miscellaneous information, organized in fairly ad hoc or individualistic ways
Many different codes of practice in different user communities
Unpredictable combinations of narrowly encoded documentation systems and loose prose descriptions
Especially concerned with the editorial principles
Structure of the <teiHeader>
TEI Header Structure
The TEI header has four main components:
(file description) contains a full bibliographic description of a computer file. A "computer file" may actually correspond with several files across different operating system.
(encoding description) documents the relationship between an electronic text and the source or sources from which it was derived.
(text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting. (just about everything not covered in the other header elements)
(revision description) summarizes the revision history for a file.
is required; the others are optional.
Two Levels of Header
The TEI supports two 'levels' of header if you are using the
element to collect
corpus level metadata sets default properties for everything in a corpus
text level metadata sets specific properties for one component text of a corpus
Types of Content in the Header
prose description: series of paragraphs
phrase: character data, interspersed with phrase-level elements, but not paragraphs
grouping elements: specialised elements recording some structured information
declarations: Elements whose names end with the suffix Decl (e.g.
) enclose information about specific encoding practices applied in the electronic text.
descriptions: Elements whose names end with the suffix Desc (e.g.
) contain a prose description, possibly, but not necessarily, organised under some specific headings by suggested sub-elements
Title is required
Publication information is required
Source information is required
<!-- Profile Decription -->
List of revisions
Purpose of the data
Means of creation of the data
Time and date of creation
Creator or author of the data
Where the data was created
Standards used in creating the data
Size of the data in useful units
Related or supplemental data
Last revision date of the data
Stage of production of the data
Most headers are somewhere between the two.
William Shakespeare, Bodleian First Folio
element has some
: provides a title for the resource and any associated statements of responsibility
: documents the sources from which the encoded text derives (if any)
: documents how the encoded text is published or distributed
: yes, digital texts have editions too
: and they also t into "series".
: how many floppy disks, gigabytes, files?
: notes of various types
More about <fileDesc>
: contains a mandatory
which identifies the electronic file (not its source!)
optionally followed by additional titles, and by ‘statements of responsibility’, as appropriate, using
or the generic
: may contain
<p> to give prose (e.g. to say the text is unpublished)
one or more
, each followed by
Title and Responsibility Statements
, you can repeat any of these elements as necessary, and document additional responsbilities with a generic
The title of the electronic work should be derived from the source text, but is for the electronic work so should be clearly distinguishable from it.
At a minimum, identify the author of the text and (where appropriate) the creator of the file or corpus
Edition Statements and Extents
can be used to document the details of this particular edition
of the digital file
optional for the first release, but is mandatory for each later release
approximate size of a text stored on some carrier medium or of some other object, digital or non-digital
is sometimes used to document number of words in a corpus
Publication Statement Example
About the <publicationStmt>
At least one of
must be present unless the entire publication statement is given as prose paragraphs using
If the creation date is different than the date of publication, creation date should be given within
, not in the
formal license may be entered in
The Series Statement
Series statements include:
separate items that share a collective title applicable to the group
two or more volumes of items, similar in character and issued in sequence
separately numbered sequence of volumes within a serial or serials
The optional <notesStmt> can contain notes on almost any aspect of the file or its contents:
These notes can be short statements, or many parargaphs long.
Where possible, take care to encode such information with more precise elements elsewhere in the TEI header
, such as 'reportage' or 'detective fiction', should be described under <profileDesc>
The Source Description
All electronic works need to document their source, even 'born digital' ones! There are variety of elements you may draw from:
prose description, just a
(bibliographic citation): contains free text and/or any mixture of bibliographic elements such as
(structured) contains similar elements but constrained in various ways according to bibliographic standards
(fully-structured) special-cases texts which were born TEI by replicating an embedded
may be used for lists of such descriptions, e.g. bibliographies
Specialised elements for spoken texts (
etc.) and for manuscripts (
Association between Header and Text
By default everything asserted by a header is true of the text to which it is prefixed. This can be over-ridden
as when a text header over-rides or amplifies a corpus-header setting
when model.declarable elements are selected by means of the @decls attribute (available on all model.declaring elements)
using special purpose selection/definition elements e.g.
Most components of the encoding description are declarable.
The Encoding Description
<encodingDesc> groups notes about the procedures used when the text was encoded, either summarised in prose or within specific elements such as
: goals of the project
: sampling principles
: editorial principals, e.g.
: classification system/s used
: specifics about usage of particular elements
Detailed notes in <encodingDesc> could be used to generate a section of an editorial description.
The Tagging Declaration
records elements namespace, tag frequency, information about the usage of particular tags not specified elsewhere, and default rendition of the text in the source.
: structured information about appearance in the source document
rendered using informal prose description, standard stylesheet language (CSS, XSL-FO), or project-defined language.
: structured information about an application which has edited this TEI file.
This would usually be added automatically by the application editing the file.
contains a collection of descriptions, categorised only as ‘non-bibliographic’.
Default members of the model.profileDescPart class include:
: information about the origination of the intellectual content of the text, e.g. time and place
: information about languages, registers, writing systems etc used in the text
: classifications applied to the text by means of a list of specified criteria or by means of a collection of pointers
: information about the ‘participants’, either real or depicted, in the text
: information about the particular style or hand distinguished within a manuscript
An example of <creation>
Here <listChange> records stages in changes to the document.
Further down, in <revisionDesc>, the same element is used to record changes to the electronic file.
Language and Writing System Used
element is provided to document usage of languages and writing systems in the text. Languages are identified by their ISO codes:
provides a description of a text in terms of its 'Situational parameters', a description of the situation within which the text was produced or experienced.
Participant descriptions can either be unstructured (as above) or structured (as below)
you find list of
elements, usually each with a
attributes, indicating significant stages in the evolution of a document.
Conventionally, the most recent change is given first.
Can be given in a <listChange> elements. Used here it is about the electronic file, used in <creation> it is about the stages of textual production.
Can be maintained manually, or done by means of a version control system (like Subversion or Git)