Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

An Overview of TEI Metadata

An Overview of TEI Metadata for an introductory TEI workshop. License: CC+BY; http://tinyurl.com/jc-TEIMetadata
by

James Cummings

on 24 January 2018

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of An Overview of TEI Metadata

What is Metadata?
often called "data about data"
term originally used only with electronic data but its meaning has broadened
data about the content, context, and structure of information resources
the catalogue record of the data
The General Purpose of Metadata
In general metadata is meant to:
support the identication, retrieval, use, re-use, management, and preservation of information resources
enriches the informational value of an object
can describe a collection, a single resource, or a component part of a larger resource
TEI Metadata
TEI requires some of its metadata to be stored inside the XML document, prefixed to the content. This information comprises the TEI header
although, as we will see, some can be included inside the <body> or pointed to outside the document. It is:
used to store bibliographical information about both the electronic version(s) of the text as well as any physical, or analogue, source(s)
basic information is similar to library cataloguing and supports interroperability with other metadata standards
much like an electronic version of a title page attached to a printed work
An Overview of TEI Metadata
@jamescummings
http://tinyurl.com/jc-TEIMetadata
CC+BY
Thanks to Lou Burnard, Sebastian Rahtz, and many others of the TEI Community
The TEI Header
The TEI header was designed with two goals in mind
needs of bibliographers and librarians trying to document ‘electronic books’
needs of text analysts and digital editors trying to document ‘coding practices’ within digital resources

The result is that discussion of the header tends to be pulled in two directions...
The Librarian's Header
Conforms to standard bibliographic model, using similar terminology
Organized as a single source of information for bibliographic description of a digital resource, with established mappings to other such records (e.g. MARC, EAD, etc.)
General consensus from 'Best Practice for TEI for Libraries' from TEI-LIB SIG
Pressure for greater and more exact constraints to improve precision of description
In general, a preference for
structured data
over
loose prose
The Digital Editor's Header
Gives a polite nod to common bibliographic practice, but has a far wider scope
Supports a (potentially) huge range of very miscellaneous information, organized in fairly ad hoc or individualistic ways
Many different codes of practice in different user communities
Unpredictable combinations of narrowly encoded documentation systems and loose prose descriptions
Especially concerned with the editorial principles
About <teiHeader>
Structure of the <teiHeader>
TEI Header Structure
The TEI header has four main components:
<fileDesc>
(file description) contains a full bibliographic description of a computer file. A "computer file" may actually correspond with several files across different operating system.
<encodingDesc>
(encoding description) documents the relationship between an electronic text and the source or sources from which it was derived.
<profileDesc>
(text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting. (just about everything not covered in the other header elements)
<revisionDesc>
(revision description) summarizes the revision history for a file.
Only
<fileDesc>
is required; the others are optional.
Two Levels of Header
The TEI supports two 'levels' of header if you are using the
<teiCorpus>
element to collect
<TEI>
documents.
corpus level metadata sets default properties for everything in a corpus
text level metadata sets specific properties for one component text of a corpus
Types of Content in the Header
Free prose:
prose description: series of paragraphs
<p>
phrase: character data, interspersed with phrase-level elements, but not paragraphs
grouping elements: specialised elements recording some structured information
declarations: Elements whose names end with the suffix Decl (e.g.
<subjectDecl>
,
<refsDecl>
) enclose information about specific encoding practices applied in the electronic text.
descriptions: Elements whose names end with the suffix Desc (e.g.
<settingDesc>
,
<projectDesc>
) contain a prose description, possibly, but not necessarily, organised under some specific headings by suggested sub-elements
An Example
<teiHeader>
<fileDesc>
<titleStmt>
<title>
Title is required
</title>
<author>
Author
</author>
</titleStmt>
<editionStmt>
<edition>
Edition
</edition>
</editionStmt>
<extent>
Extent
</extent>
<publicationStmt>
<p>
Publication information is required
</p>
</publicationStmt>
<seriesStmt>
<title>
Series
</title>
</seriesStmt>
<notesStmt>
<note>
Notes
</note>
</notesStmt>
<sourceDesc>
<p>
Source information is required
</p>
</sourceDesc>
</fileDesc>
<encodingDesc>
<p>
Encoding Description
</p>
</encodingDesc>
<profileDesc>
<!-- Profile Decription -->
</profileDesc>
<revisionDesc>
<change>
List of revisions
</change>
</revisionDesc>
</teiHeader>

<teiHeader> Template
Some examples:
Purpose of the data
Means of creation of the data
Time and date of creation
Creator or author of the data
Where the data was created
Standards used in creating the data
Size of the data in useful units
Related or supplemental data
Last revision date of the data
Stage of production of the data
Most headers are somewhere between the two.
Minimal Header
William Shakespeare, Bodleian First Folio
File Description
The
<fileDesc>
element has some
mandatory
elements:
<titleStmt>
: provides a title for the resource and any associated statements of responsibility
<sourceDesc>
: documents the sources from which the encoded text derives (if any)
<publicationStmt>
: documents how the encoded text is published or distributed

and some
optional
ones:
<editionStmt>
: yes, digital texts have editions too
<seriesStmt>
: and they also t into "series".
<extent>
: how many floppy disks, gigabytes, files?
<notesStmt>
: notes of various types

More about <fileDesc>
<titleStmt>
: contains a mandatory
<title>
which identifies the electronic file (not its source!)
optionally followed by additional titles, and by ‘statements of responsibility’, as appropriate, using
<author>
,
<editor>
,
<sponsor>
,
<funder>
,
<principal>
or the generic
<respStmt>

<publicationStmt>
: may contain
<p> to give prose (e.g. to say the text is unpublished)
or
one or more
<publisher>
,
<distributor>
,
<authority>
, each followed by
<pubPlace>
,
<address>
,
<availability>
,
<idno>
etc.
Title and Responsibility Statements
Within
<titleStmt>
, you can repeat any of these elements as necessary, and document additional responsbilities with a generic
<respStmt>
:
The title of the electronic work should be derived from the source text, but is for the electronic work so should be clearly distinguishable from it.
At a minimum, identify the author of the text and (where appropriate) the creator of the file or corpus
Edition Statements and Extents
<editionStmt>
can be used to document the details of this particular edition
of the digital file
optional for the first release, but is mandatory for each later release

<extent>
approximate size of a text stored on some carrier medium or of some other object, digital or non-digital
is sometimes used to document number of words in a corpus
Publication Statement Example
About the <publicationStmt>
mandatory element
At least one of
<publisher>
,
<distributor>
and/or
<authority>
must be present unless the entire publication statement is given as prose paragraphs using
<p>
If the creation date is different than the date of publication, creation date should be given within
<profileDesc>
, not in the
<publicationStmt>
formal license may be entered in
<licence>
included in
<availability>
The Series Statement
Series statements include:
separate items that share a collective title applicable to the group
two or more volumes of items, similar in character and issued in sequence
separately numbered sequence of volumes within a serial or serials
Notes Statement
The optional <notesStmt> can contain notes on almost any aspect of the file or its contents:
These notes can be short statements, or many parargaphs long.
Where possible, take care to encode such information with more precise elements elsewhere in the TEI header
For example,
text types
, such as 'reportage' or 'detective fiction', should be described under <profileDesc>
The Source Description
All electronic works need to document their source, even 'born digital' ones! There are variety of elements you may draw from:
prose description, just a
<p>
<bibl>
(bibliographic citation): contains free text and/or any mixture of bibliographic elements such as
<author>
,
<publisher>
etc.
<biblStruct>
(structured) contains similar elements but constrained in various ways according to bibliographic standards
<biblFull>
(fully-structured) special-cases texts which were born TEI by replicating an embedded
<fileDesc>
A
<listBibl>
may be used for lists of such descriptions, e.g. bibliographies
Specialised elements for spoken texts (
<recordingStmt>
etc.) and for manuscripts (
<msDesc>
)
Authority lists:
<listPerson>
,
<listPlace>
,
<listOrg>
Association between Header and Text
By default everything asserted by a header is true of the text to which it is prefixed. This can be over-ridden
as when a text header over-rides or amplifies a corpus-header setting
when model.declarable elements are selected by means of the @decls attribute (available on all model.declaring elements)
using special purpose selection/definition elements e.g.
<catRef>
and
<taxonomy>

Most components of the encoding description are declarable.
The Encoding Description
<encodingDesc> groups notes about the procedures used when the text was encoded, either summarised in prose or within specific elements such as
<projectDesc>
: goals of the project
<samplingDecl>
: sampling principles
<editorialDecl>
: editorial principals, e.g.
<correction>
,
<hyphenation>
,
<interpretation>
,
<normalization>
,
<punctuation>
,
<quotation>
,
<segmentation>

<classDecl>
: classification system/s used
<tagsDecl>
: specifics about usage of particular elements

Detailed notes in <encodingDesc> could be used to generate a section of an editorial description.
The Tagging Declaration
<tagsDecl>
records elements namespace, tag frequency, information about the usage of particular tags not specified elsewhere, and default rendition of the text in the source.
<rendition>
: structured information about appearance in the source document
rendered using informal prose description, standard stylesheet language (CSS, XSL-FO), or project-defined language.
Application Information
<appInfo>
: structured information about an application which has edited this TEI file.
This would usually be added automatically by the application editing the file.
Profile Description
The
<profileDesc>
contains a collection of descriptions, categorised only as ‘non-bibliographic’.

Default members of the model.profileDescPart class include:
<creation>
: information about the origination of the intellectual content of the text, e.g. time and place
<langUsage>
: information about languages, registers, writing systems etc used in the text
<textDesc>
and
<textClass>
: classifications applied to the text by means of a list of specified criteria or by means of a collection of pointers
<particDesc>
and
<settingDesc>
: information about the ‘participants’, either real or depicted, in the text
<handNotes>
: information about the particular style or hand distinguished within a manuscript
An example of <creation>
Here <listChange> records stages in changes to the document.

Further down, in <revisionDesc>, the same element is used to record changes to the electronic file.
Language and Writing System Used
The
<langUsage>
element is provided to document usage of languages and writing systems in the text. Languages are identified by their ISO codes:
Text Description
<textDesc>
provides a description of a text in terms of its 'Situational parameters', a description of the situation within which the text was produced or experienced.
Participant Description
Participant descriptions can either be unstructured (as above) or structured (as below)
Revision Description
Inside
<revisionDesc>
you find list of
<change>
elements, usually each with a
@date
and
@who
attributes, indicating significant stages in the evolution of a document.
Conventionally, the most recent change is given first.
Can be given in a <listChange> elements. Used here it is about the electronic file, used in <creation> it is about the stages of textual production.
Can be maintained manually, or done by means of a version control system (like Subversion or Git)
Full transcript