Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

TEI

An Introduction to TEI P5 XML and the oXygen XML Editor: DHOxSS 2015, IntroDH Workshop. License: CC+By; http://tinyurl.com/dhoxss-introTEI
by

James Cummings

on 19 January 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of TEI

TEI Guidelines
@jamescummings
What is the TEI?
(The Text Encoding Initiative)
An international consortium of institutions, projects and individual members
A community of users and volunteers
A freely available manual of set of regularly maintained and updated recommendations: 'The Guidelines'
Definitions, examples, and discussion of over 540 markup distinctions for textual, image facsimile, genetic editing etc.
A mechanism for producing customized schemas for validating your project's digital texts
A set of free and openly licensed, customizable tools and stylesheets for transformations to many formats (e.g. HTML, Word, PDF, Databases, RDF/LinkedData, Slides, ePub, etc.)
A simple consensus-based way of organizing and structuring textual (and other) resources
A format for documenting your interpretation and understanding of a text (and how text functions)
An archival, well-understood, format for long-term preservation of digital data and metadata
Whatever
you
make it! It is a community-driven standard
What the TEI is not:
the only standard in this area
objective or non-interpretative
used consistently even within the same project (never mind in other ones)
fixed and unchanging
your research end-point
automatic publication or long-term preservation
T E I
NOT TEI
The
benefits
of a shared vocabulary
far outweigh
the effort of learning the TEI
Full TEI Schema
Modules
Simple analytic mechanisms
Certainty and uncertainty
Core elements
Corpus texts
Dictionaries
Performance texts
Tables, formulæ, notated music, and figures
Character and glyph documentation
The TEI Header
Feature structures
Linking, segmentation and alignment
Manuscript Description
Names and dates
Graphs, networks, and trees
Transcribed Speech
Documentation of TEI modules
Critical Apparatus
Default text structure
Transcription of primary sources
Verse structures
Customized TEI
Modules
Simple analytic mechanisms
Certainty and uncertainty
Core elements
Corpus texts
Dictionaries
Performance texts
Tables, formulæ, notated music, and figures
Character and glyph documentation
The TEI Header
Feature structures
Linking, segmentation and alignment
Manuscript Description
Names and dates
Graphs, networks, and trees
Transcribed Speech
Documentation of TEI modules
Critical Apparatus
Default text structure
Transcription of primary sources
Verse structures
TEI ODD Customisation
But of course you want to remove specific elements from these 'modules' as well.

No
project will need all of them!
Chaining of TEI ODD Customisations
One of the interesting developments in TEI ODD design is that can 'chain' customisations.

This means that if a TEI Community or Project makes a customisation then others can come along and make their own customisation that points to this project's TEI ODD as a source.

This enables projects to truly say "We're very much like that project over there (e.g. EpiDoc), but we need to add back in this element that they removed" and to document this in a machine-processable manner.
TEI ODD Customisation
OxGarage
OxGarage
Freely available web frontend to underlying XSLT conversions
REST-enabled API interface for scripts doing bulk conversions
Pipelined conversions through many steps (e.g. DOCX to TEI P5 to ePub)
Often uses TEI P5 as pivot format

because we need to
interchange
resources
between people
(increasingly) between machines
because we need to
integrate
resources
of different media types
from different technical contexts
because we need to
preserve
resources
cryogenics is not the (full) answer!
we need to preserve metadata as well as data
Why do we
need standards like this?
A document is "TEI Conformant" if and only if it:
is a well-formed XML document
can be validated against a TEI Schema, that is, a schema derived from the TEI Guidelines
conforms to the TEI Abstract Model
uses the TEI Namespace (and other namespaces where relevant) correctly
is documented by means of a TEI Conformant ODD file which refers to the TEI Guidelines

TEI Conformance
Standardization should not mean

Do what I do
’, but rather

Explain what you need to do
in terms I can understand

Version Date
2.9.1 2015-10-15
2.8.0 2015-04-06
2.7.0 2014-09-16
2.6.0 2014-01-20
2.5.0 2013-07-26
2.4.0 2013-07-05
2.3.0 2013-01-17
2.2.0 2012-10-25
2.1.0 2012-05-15
2.0.2 2012-02-02
2.0.1 2011-12-22
2.0.0 2011-12-16
1.9.1 2011-03-05
1.9.0 2011-02-25
1.8.0 2010-11-05
1.7.0 2010-07-06
1.6.0 2010-02-12
1.5.0 2009-11-08
1.4.1 2009-07-01
1.4.0 2009-06-20
1.3.0 2009-02-01
1.2.0 2008-10-31
1.1.0 2008-07-04
1.0.1 2008-02-03
1.0.0 2007-11-02
Versions of TEI P5
TEI Development
The TEI Guidelines are constantly being improved and as such are an evolving history of Digital Humanities concerns
But any individual project can choose to stay with any earlier version
The elected TEI Technical Council takes bug reports and feature requests from the community and implements them
You, (yes you!), can participate in the community mailing list (TEI-L) and point out bugs or make feature requests on:
http://github.com/TEIC
http://www.tei-c.org/oxgarage/
The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form. Its chief deliverable is a set of Guidelines which specify encoding methods for machine-readable texts chiefly in the humanities, social sciences and linguistics.
Really this talk just exposes you to the TEI Guidelines... if you want to do a full week of training on it come back to DHOxSS 2016 and do the TEI workshop!
About The TEI
The Text Encoding Initiative was born into a very different world:
the world wide web did not exist
the tunnel beneath the English Channel was still being built
a state called the Soviet Union had just launched a space station called Mir
serious computing was done on mainframes
most people didn't have mobile phones
1987 was a long time ago...
Corpus linguistics and ‘artificial intelligence’ had created a demand for large scale lexical resources in academia and beyond
Advances in text processing were beginning to affect lexicography and document management systems (e.g. TeX, Scribe, tRoff..)
The Internet existed (but the 'web' didn't yet) and theories about how to use it 'hypertextually' abounded
There was data which was too much to read in one go and needed visualization of some sort
Books, articles, and even courses in something called "Computing in the Humanities" were becoming commonplace

...but we also a familiar problems

International membership consortium re-established 2000 (see http://www.tei-c.org/)
The TEI Guidelines, originally envisaged as a single (large) reference manual, increasingly used as a modular online resource
The Guidelines embody a broad consensus about the significant particularities of a huge range of textual materials
expressed both in prose and by means of formal definitions
the definitions can be expressed as a grammar or schema:
TEI P1-P3 (1991-1999) : SGML DTD
TEI P4 (2000) : SGML or XML DTD
TEI P5 (2007-) XML DTD, W3C Schema, or RelaxNG
The TEI is a modular system, which must be customized for particular applications
... the TEI is owned and developed by an active international community
The TEI now
Markup makes explicit the distinctions we want to make when processing a string of bytes
Markup is a way of naming and characterizing the parts of a text in a formalized way
Markup provides additional levels of annotation on data
It's (usually) more useful to markup what we think things are than what they look like
Defining Markup
Markup makes explicit to a machine which is implicit to a person
The application of markup to a document can be an intellectual activity
In deciding what markup to apply, and how this represents the original, one is undertaking the task of an editor
There is (almost) no such thing as neutral markup -- all of it involves interpretation
Markup can assist in answering research questions, and the deciding what markup is needed to enable such questions to be answered can be a research activity in itself
Good textual encoding is never as easy or quick as people would believe
Detailed document analysis is needed before encoding for the resulting markup to be useful
Markup as a scholarly activity
Imagine you are going to markup several thousand pages of complex material....
Which features are you going to markup?
Why are you choosing to markup this feature?
How reliably and consistently can you do this?
Might some of this work be automated?
What needs human decision making?
A useful mental exercise
Now, imagine your budget has been halved. Repeat the exercise!
vi. Languages and Character Sets

The documents which users of these Guidelines may wish to encode encompass all kinds of material, potentially expressed in the full range of written and spoken human languages, including the extinct, the non-existent, and the conjectural. Because of this wide scope, special attention has been paid to two particular aspects of the representation of linguistic information often taken for granted: language identification, and character encoding.

Sections include:
Language Identification
Characters and Character Sets
1. The TEI Infrastructure
This chapter describes the infrastructure for the encoding scheme defined by these Guidelines. It introduces the conceptual framework within which the following chapters are to be understood, and the means by which that conceptual framework is implemented.

Sections include:
TEI Modules
Defining a TEI Schema
The TEI Class System
Macros
The TEI Infrastructure Module
iv. About These Guidelines
This is an initial chapter explaining the notation, background, and future development of the Guidelines

Sections include:
Structure and Notational Conventions of this Document
Historical Background
Future Developments and Version Numbers

7. Performance Texts
This module is intended for use when encoding printed dramatic texts, screen plays or radio scripts, and written transcriptions of any other form of performance.

Sections include:
Front and Back Matter
The Body of a Performance Text
Other Types of Performance Text
Module for Performance Texts

Elements Defined:
<actor>, <camera>, <caption>, <castGroup>, <castItem>, <castList>, <epilogue>, <move>, <performance>, <prologue>, <role>, <roleDesc>, <set>, <sound>, <spGrp>, <tech>, <view>

<set>
<p>The Scene, an un-inhabited Island.</p>
</set>

<castList>
<head>Names of the Actors.</head>
<castItem>Alonso, K. of Naples</castItem>
<castItem>Sebastian, his Brother.</castItem>
<castItem>Prospero, the right Duke of Millaine.</castItem>
</castList>

8. Transcriptions of Speech
The module described in this chapter is intended for use with a wide variety of transcribed spoken material.

Sections include:
General Considerations and Overview
Documenting the Source of Transcribed Speech
Elements Unique to Spoken Texts
Elements Defined Elsewhere
Module for Transcribed Speech

Elements Defined:
<broadcast>, <equipment>, <incident>, <kinesic>, <pause>, <recording>, <recordingStmt>, <scriptStmt>, <shift>, <u>, <vocal>, <writing>

<u who="#a">look at this</u>
<writing who="#a" type="newspaper" gradual="false"> Government claims economic problems
<soCalled>over by June</soCalled>
</writing>
<u who="#a">what nonsense!</u>

9. Dictionaries
This chapter defines a module for encoding lexical resources of all kinds, in particular human-oriented monolingual and multilingual dictionaries, glossaries, and similar documents.

Sections include:
Dictionary Body and Overall Structure
The Structure of Dictionary Entries
Top-level Constituents of Entries
Headword and Pronunciation References
Typographic and Lexical Information in Dictionary Data
Unstructured Entries
The Dictionary Module

Elements Defined:
<case>, <colloc>, <def>, <dictScrap>, <entry>, <entryFree>, <etym>, <form>, <gen>, <gram>, <gramGrp>, <hom>, <hyph>, <iType>, <lang>, <lbl>, <mood>, <number>, <oRef>, <oVar>, <orth>, <pRef>, <pVar>, <per>, <pos>, <pron>, <re>, <sense>, <stress>, <subc>, <superEntry>, <syll>, <tns>, <usg>, <xr>

<entry>
<form>
<orth>competitor</orth>
<hyph>com|peti|tor</hyph>
<pron>k@m"petit@(r)</pron>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<def>person who competes.</def>
</entry>

5. Characters, Glyphs, and Writing Modes
Text encoders sometimes find that the published repertoire of Unicode characters is inadequate to their needs with ancient languages or recording particular variant glyph forms.

Sections include:
Is Your Journey Really Necessary?
Markup Constructs for Representation of Characters and Glyphs
Annotating Characters
Adding New Characters
How to Use Code Points from the Private Use Area
Writing Modes
Examples of Different Writing Modes
Text Rotation
Caveat
Formal Definition

Elements Defined:
<char>, <charDecl>, <charName>, <charProp>, <g>, <glyph>, <glyphName>, <localName>, <mapping>, <unicodeName>, <value>


12. Critical Apparatus

This chapter defines a module for use in encoding an apparatus of variants for scholarly editions, which may be used in conjunction with any of the modules defined in these Guidelines.

Sections include:
The Apparatus Entry, Readings, and Witnesses
Linking the Apparatus to the Text
Using Apparatus Elements in Transcriptions
Module for Critical Apparatus

Elements Defined:
<app>, <lacunaEnd>, <lacunaStart>, <lem>, <listApp>, <listWit>, <rdg>, <rdgGrp>, <variantEncoding>, <wit>, <witDetail>, <witEnd>, <witStart>, <witness>

<p>Certain it is, this was not the case with the redoubtable Brom Bones; and <app>
<rdg wit="#msA">from the moment Ichabod Crane made his advances,</rdg>
<rdg wit="#msB">coincidentally when Ichabod Crane made his advances,</rdg>
<rdg wit="#msC">from the moment Ichabod Crane started to sing, </rdg> </app> the interests of the former
evidently declined;</p>

This chapter describes a module which may be used for the encoding of names and other phrases descriptive of persons, places, or organizations, in a manner more detailed than that possible using the elements already provided for these purposes in the Core module.

Sections include:
Attribute Classes Defined by This Module
Names [e.g. personal, place and organisational names]
Biographical and Prosopographical Data
Module for Names and Dates
69. 13. Names, Dates, People, and Places -- Elements

Elements Defined:
<addName>, <affiliation>, <age>, <birth>, <bloc>, <climate>, <country>, <death>, <district>, <education>, <event>, <faith>, <floruit>, <forename>, <genName>, <geo>, <geogFeat>, <geogName>, <langKnowledge>, <langKnown>, <listEvent>, <listNym>, <listOrg>, <listPerson>, <listPlace>, <listRelation>, <location>, <nameLink>, <nationality>, <nym>, <occupation>, <offset>, <org>, <orgName>, <persName>, <person>, <personGrp>, <place>, <placeName>, <population>, <region>, <relation>, <residence>, <roleName>, <settlement>, <sex>, <socecStatus>, <state>, <surname>, <terrain>, <trait>

<!-- In the header -->
<person xml:id="ArnMag">
<persName xml:lang="is">Árni Magnússon</persName>
<persName xml:lang="da">Arne Magnusson</persName>
<persName xml:lang="la">Arnas Magnæus</persName>
</person>
<!-- In the text -->
<p> <persName ref="#ArnMag">Arnas</persName> dixit "Reveniam".</p>

14. Tables, Formulæ, Graphics and Notated Music


Many documents, both historical and contemporary, include not only text, but also graphics, artwork, and other images. Since they may frequently be most conveniently encoded and processed using external notations, they are dealt with together.

Sections include:
Tables
Formulæ and Mathematical Expressions
Notated Music in Written Text
Specific Elements for Graphic Images
Overview of Basic Graphics Concepts
Graphic Image Formats
Module for Tables, Formulæ, Notated Music, and Graphics

Elements Defined:
<cell>, <figDesc>, <figure>, <formula>, <notatedMusic>, <row>, <table>
15. Language Corpora
This chapter discusses language corpora, with the distinguishing characteristic of any individual corpus is that its components have been selected or structured according to some conscious set of design criteria.

Sections include:
Varieties of Composite Text
Contextual Information
Associating Contextual Information with a Text
Linguistic Annotation of Corpora
Recommendations for the Encoding of Large Corpora
Module for Language Corpora


Elements Defined:
<activity>, <channel>, <constitution>, <derivation>, <domain>, <factuality>, <interaction>, <locale>, <particDesc>, <preparedness>, <purpose>, <setting>, <settingDesc>, <textDesc>

<settingDesc>
<setting who="#p1 #p2">
<name type="village">Sleep Hollow</name>
<date>early spring, 1789</date>
<locale>a farm house, sat by the hearth</locale>
<activity>courting</activity>
</setting>
<setting who="#p3">
<name type="village">Sleepy Hollow</name>
<date>early spring, 1789</date>
<locale>school house</locale>
<activity>teaching</activity>
</setting>
</settingDesc>

16. Linking, Segmentation, and Alignment
This chapter discusses a number of ways in which encoders may represent analyses of the structure of a text which are not necessarily linear or hierarchic.

Sections include:
Links
Pointing Mechanisms
Blocks, Segments, and Anchors
Correspondence and Alignment
Synchronization
Identical Elements and Virtual Copies
Aggregation
Alternation
Stand-off Markup
Connecting Analytic and Textual Markup
Module for Linking, Segmentation, and Alignment


Elements Defined:
<ab>, <alt>, <altGrp>, <anchor>, <join>, <joinGrp>, <link>, <linkGrp>, <seg>, <timeline>, <when>


11. Representation of Primary Sources
This chapter defines a module intended for use in the representation of primary sources, such as manuscripts or other written materials.

Sections include:
Digital Facsimiles
Combining Transcription with Facsimile
Scope of Transcriptions
Advanced Uses of surface and zone
Aspects of Layout
Headers, Footers, and Similar Matter
Changes
Other Primary Source Features not Covered in these Guidelines
Module for Transcription of Primary Sources

Elements Defined:
<addSpan>, <am>, <damage>, <damageSpan>, <delSpan>, <ex>, <facsimile>, <fw>, <handNotes>, <handShift>, <line>, <listChange>, <listTranspose>, <metamark>, <mod>, <redo>, <restore>, <retrace>, <sourceDoc>, <space>, <subst>, <substJoin>, <supplied>, <surface>, <surfaceGrp>, <surplus>, <transpose>, <undo>, <zone>

<facsimile>
<graphic url="page1.png"/>
<graphic url="page2.png"/>
</facsimile>

19. Graphs, Networks, and Trees
Graphical representations are widely used for displaying relations among informational units because they help readers to visualize those relations and hence to understand them better.


Sections include:
Graphs and Digraphs
Trees
Another Tree Notation
Representing Textual Transmission
Module for Graphs, Networks, and Trees


Elements Defined:
<arc>, <eLeaf>, <eTree>, <forest>, <graph>, <iNode>, <leaf>, <listForest>, <node>, <root>, <tree>, <triangle>


20. Non-hierarchical Structures

XML employs a strongly hierarchical document model. At various points, these Guidelines discuss problems that arise when using XML to encode textual features that either do not naturally lend themselves to representation in a strictly hierarchical form or conflict with other hierarchies represented in the markup.

Sections include:
Multiple Encodings of the Same Information
Boundary Marking with Empty Elements
Fragmentation and Reconstitution of Virtual Elements
Stand-off Markup
Non-XML-based Approaches

21. Certainty, Precision, and Responsibility
Encoders of text often find it useful to indicate that some aspects of the encoded text are problematic or uncertain, and to indicate who is responsible for various aspects of the markup of the electronic text.

Sections include:
Levels of Certainty
Indications of Precision
Attribution of Responsibility
The Certainty Module

Elements Defined:
<certainty>, <precision>, <respons>


22. Documentation Elements
This chapter describes a module which may be used for the documentation of the XML elements and element classes which make up any markup scheme, in particular that described by the TEI Guidelines, and also for the automatic generation of schemas conforming to that documentation.

Sections include:
Phrase Level Documentary Elements
Modules and Schemas
Specification Elements
Common Elements
Building a Schema
Combining TEI and Non-TEI Modules
Linking Schemas to XML Documents
Module for Documentation Elements

Elements Defined:
<altIdent>, <alternate>, <att>, <attDef>, <attList>, <attRef>, <classRef>, <classSpec>, <classes>, <code>, <constraint>, <constraintSpec>, <content>, <datatype>, <defaultVal>, <eg>, <egXML>, <elementRef>, <elementSpec>, <equiv>, <exemplum>, <gi>, <ident>, <listRef>, <macroRef>, <macroSpec>, <memberOf>, <moduleRef>, <moduleSpec>, <remarks>, <schemaSpec>, <sequence>, <specDesc>, <specGrp>, <specGrpRef>, <specList>, <tag>, <val>, <valDesc>, <valItem>, <valList>
23. Using the TEI


This section discusses some technical topics concerning the deployment of the TEI markup scheme documented elsewhere in these Guidelines.

Sections include:
Serving TEI files with the TEI Media Type
Obtaining the TEI Schemas
Personalization and Customization
Conformance
Implementation of an ODD System
17. Simple Analytic Mechanisms
This chapter describes a module for associating simple analyses and interpretations with text elements. We use the term analysis here to refer to any kind of semantic or syntactic interpretation which an encoder wishes to attach to all or part of a text.

Sections include:
Linguistic Segment Categories
Global Attributes for Simple Analyses
Spans and Interpretations
Linguistic Annotation
Module for Analysis and Interpretation

Elements Defined:
<c>, <cl>, <interp>, <interpGrp>, <m>, <pc>, <phr>, <s>, <span>, <spanGrp>, <w>

<s>
<w ana="#AT0">The</w>
<w ana="#NN1">victim</w>
<w ana="#POS">'s</w>
<w ana="#NN2">friends</w>
<w ana="#VVD">told</w>
<w ana="#NN2">villagers</w>
<w ana="#CJT">that</w>
<w ana="#AT0">the</w>
<w ana="NPO">Headless</w>
<w ana="NPO">Horseman</w>
<w ana="#VVD">rode</w>
<w ana="#PRP">into</w>
<w ana="#AT0">the</w>
<w ana="#NN1">forest</w>
<w ana="#CJC">and</w>
<w ana="#AV0">never</w>
<w ana="#VVD">reappeared</w>
</s>

<
TEI
>
<
teiHeader
>
<!-- required -->
</
teiHeader
>
<
facsimile
>
<!-- optional -->
</
facsimile
>
<
sourceDoc
>

<!-- optional -->
</
sourceDoc
>
<
text
>

<!-- required if no facsimile or sourceDoc-->
</
text
>
</
TEI
>

TEI Basic Structure
<
TEI
>
<
teiHeader
>
<!-- Metadata required here -->
</
teiHeader
>
<
text
>
<
front
>
<!-- optional front matter -->
</
front
>
<
body
>
<
div

n
="
1
">
<!-- first division -->
</
div
>
<
div

n
="
2
">
<!-- second division -->
</
div
>
<!-- ... -->
</
body
>
<
back
>
<!-- optional back matter -->
</
back
>
</
text
>
</
TEI
>
TEI Text Structure
2. The TEI Header
This chapter addresses the problems of describing an encoded work so that the text itself, its source, its encoding, and its revisions are all thoroughly documented.

Sections include:
Organization of the TEI Header
The File Description
The Encoding Description
The Profile Description
The Revision Description
Minimal and Recommended Headers
Note for Library Cataloguers
The TEI Header Module

Elements Defined:
<abstract>, <appInfo>, <application>, <authority>, <availability>, <biblFull>, <cRefPattern>, <calendar>, <calendarDesc>, <catDesc>, <catRef>, <category>, <change>, <classCode>, <classDecl>, <correction>, <creation>, <distributor>, <edition>, <editionStmt>, <editorialDecl>, <encodingDesc>, <extent>, <fileDesc>, <funder>, <geoDecl>, <handNote>, <hyphenation>, <idno>, <interpretation>, <keywords>, <langUsage>, <language>, <licence>, <listPrefixDef>, <namespace>, <normalization>, <notesStmt>, <prefixDef>, <principal>, <profileDesc>, <projectDesc>, <publicationStmt>, <quotation>, <refState>, <refsDecl>, <rendition>, <revisionDesc>, <samplingDecl>, <segmentation>, <seriesStmt>, <sourceDesc>, <sponsor>, <stdVals>, <styleDefDecl>, <tagUsage>, <tagsDecl>, <taxonomy>, <teiHeader>, <textClass>, <titleStmt>
Example
<teiHeader>
<fileDesc>
<titleStmt>
<title>
<!-- title of the resource -->
</title>
</titleStmt>
<publicationStmt>
<p>(Information about distribution of the resource)</p>
</publicationStmt>
<sourceDesc>
<p>(Information about source from which the resource derives)</p>
</sourceDesc>
</fileDesc>
</teiHeader>
The TEI Header
Elements Available in All Documents
('Core')
3. Elements Available in All TEI Documents

This chapter describes elements which may appear in any kind of text and the tags used to mark them in all TEI documents. Most of these elements are freely floating phrases, which can appear at any point within the textual structure, although they must generally be contained by a higher-level element of some kind (such as a paragraph).

Sections include:
Paragraphs
Treatment of Punctuation
Highlighting and Quotation
Simple Editorial Changes
Names, Numbers, Dates, Abbreviations, and Addresses
Simple Links and Cross-References
Lists
Notes, Annotation, and Indexing
Graphics and Other Non-textual Components
Reference Systems
Bibliographic Citations and References
Passages of Verse or Drama
Overview of the Core Module

Elements:
<abbr>, <add>, <addrLine>, <address>, <analytic>, <author>, <bibl>, <biblScope>, <biblStruct>, <binaryObject>, <cb>, <choice>, <cit>, <citedRange>, <corr>, <date>, <del>, <desc>, <distinct>, <editor>, <email>, <emph>, <expan>, <foreign>, <gap>, <gb>, <gloss>, <graphic>, <head>, <headItem>, <headLabel>, <hi>, <imprint>, <index>, <item>, <l>, <label>, <lb>, <lg>, <list>, <listBibl>, <measure>, <measureGrp>, <media>, <meeting>, <mentioned>, <milestone>, <monogr>, <name>, <note>, <num>, <orig>, <p>, <pb>, <postBox>, <postCode>, <ptr>, <pubPlace>, <publisher>, <q>, <quote>, <ref>, <reg>, <relatedItem>, <resp>, <respStmt>, <rs>, <said>, <series>, <sic>, <soCalled>, <sp>, <speaker>, <stage>, <street>, <term>, <time>, <title>, <unclear>
Examples
John eats a <foreign xml:lang="fr">croissant</foreign> every morning.


<mentioned xml:lang="fr">Croissant</mentioned> is difficult to pronounce with your mouth full.


A <term xml:lang="fr">croissant</term> is a crescent-shaped piece of light, buttery, pastry that is usually eaten for breakfast, especially in France.

<choice>
Examples
An <choice>
<corr cert="high">Autumn</corr>
<sic>Antony</sic>
</choice> it was, That grew the more by reaping

the <choice>
<expan>World Wide Web Consortium</expan>
<abbr>W3C</abbr>
</choice>

...how godly a
<choice>
<orig>dede</orig>
<reg>deed</reg>
</choice> it is to overthrow...
4. Default Text Structure
This chapter describes the default high-level structure for TEI documents. A full TEI document combines metadata describing it, represented by a teiHeader element, with the document itself, represented by a text element.

Sections include:
Divisions of the Body
Elements Common to All Divisions
Grouped and Floating Texts
Virtual Divisions
Front Matter
Title Pages
Back Matter
Module for Default Text Structure
Elements
<TEI>, <argument>, <back>, <body>, <byline>, <closer>, <dateline>, <div>, <div1>, <div2>, <div3>, <div4>, <div5>, <div6>, <div7>, <divGen>, <docAuthor>, <docDate>, <docEdition>, <docImprint>, <docTitle>, <epigraph>, <floatingText>, <front>, <group>, <imprimatur>, <opener>, <postscript>, <salute>, <signed>, <teiCorpus>, <text>, <titlePage>, <titlePart>, <trailer>


Example
<TEI>
<teiHeader>
<!-- .... -->
</teiHeader>
<text>
<front>
<!-- front matter of text, if any, goes here -->
</front>
<body>
<!-- body of text, divisions, etc go here -->
</body>
<back>
<!-- back matter of text, if any, goes here -->
</back>
</text>
</TEI>
Default Text Structure
6. Verse


This module is intended for use when encoding texts which are entirely or predominantly in verse, and for which the elements for encoding verse structure already provided by the core module are inadequate.

Sections include:
Structural Divisions of Verse Texts
Components of the Verse Line
Rhyme and Metrical Analysis
Rhyme
Metrical Notation Declaration
Encoding Procedures for Other Verse Features
Module for Verse
Elements Defined:
<caesura>, <metDecl>, <metSym>, <rhyme>
Examples
<div
type="canzone"
met="E/E/S/E/S/E/E/S/E/S/E/S/S/E/S/E/E/S/S/E/E"
rhyme="abbcdaccbdceeffghhhgg">
<lg n="1" type="stanza">
<l n="1">Doglia mi reca nello core ardire</l>
</lg>
</div>

<metDecl type="met" pattern="((E|S)/)+)">
<metSym value="E" terminal="false">xxxxxxxxx+o</metSym>
<metSym value="S" terminal="false">xxxxx+o</metSym>
<metSym value="x">metrically prominent or non-prominent</metSym>
<metSym value="+">metrically prominent</metSym>
<metSym value="o">optional non prominent</metSym>
<metSym value="/">line division</metSym>
</metDecl>
10. Manuscript Description
This module defines a special purpose element which can be used to provide detailed descriptive information about handwritten (and other unique) primary sources.

Sections include:
Overview
The Manuscript Description Element
Phrase-level Elements
The Manuscript Identifier
The Manuscript Heading
Intellectual Content
Physical Description
History
Additional Information
Manuscript Parts
Module for Manuscript Description

Elements Defined:
<accMat>, <acquisition>, <additional>, <additions>, <adminInfo>, <altIdentifier>, <binding>, <bindingDesc>, <catchwords>, <collation>, <collection>, <colophon>, <condition>, <custEvent>, <custodialHist>, <decoDesc>, <decoNote>, <depth>, <dim>, <dimensions>, <explicit>, <filiation>, <finalRubric>, <foliation>, <handDesc>, <height>, <heraldry>, <history>, <incipit>, <institution>, <layout>, <layoutDesc>, <locus>, <locusGrp>, <material>, <msContents>, <msDesc>, <msIdentifier>, <msItem>, <msItemStruct>, <msName>, <msPart>, <musicNotation>, <objectDesc>, <objectType>, <origDate>, <origPlace>, <origin>, <physDesc>, <provenance>, <recordHist>, <repository>, <rubric>, <scriptDesc>, <scriptNote>, <seal>, <sealDesc>, <secFol>, <signatures>, <source>, <stamp>, <summary>, <support>, <supportDesc>, <surrogates>, <textLang>, <typeDesc>, <typeNote>, <watermark>, <width>


Example
<msDesc xml:id="mySpecialManuscript">
<msIdentifier>
<!-- You *must* give identification information -->
</msIdentifier>
<msContents>
<!-- You can describe the intellectual structure of the text -->
</msContents>
<physDesc>
<!-- You can describe the full physical description of the object -->
</physDesc>
<history>
<!-- You can give a full history of the object, its origin,
provenance, and acquisition -->
</history>
<additional>
<!-- You can provide additional information about
surrogates, administrative metadata, etc. -->
</additional>
</msDesc>
13. Names, Dates, People, and Places
Appendices
Appendix A. Model Classes
Appendix B. Attribute Classes
Appendix C. Elements
Appendix D. Attributes
Appendix E. Datatypes and Other Macros

http://www.tei-c.org/
http://tinyurl.com/itlp-20151113
Dr James Cummings
Academic IT Services

TEI Guidelines: An overview of the recommendations of the Text Encoding Initiative
`
Paragraphs
<p> (paragraph) marks paragraphs in prose

Fundamental unit for prose texts
<p> can contain all the phrase-level elements in the core
<p> can appear directly inside <body> or inside <div> (divisions)
<p>
Thanks for yours of this morning. I hope
<lb/>you have had my card posted last Monday.
<lb/>On Mond. next I lecture the
<orgName ref="#Fieldclub">Field Club</orgName> -
<lb/>a Nat. Hist. Association, in the lines of our
<lb/>old Society - Geological, (you + me) + Botanical
<lb/>(New) Do you remember: you<supplied>r</supplied> old
<lb/>Black Molt?
</p>

By highlighting we mean the use of any combination of typographic features (font, size, hue, etc.) in a printed or written text in order to distinguish some passage of a text from its surroundings. For words and phrases which are:
distinct in some way (e.g. foreign, archaic, technical)
emphatic or stressed when spoken
not really part of the text (e.g. cross references, titles, headings)
a distinct narrative stream (e.g. an internal monologue, commentary)
attributed to some other agency inside or outside the text (e.g. direct speech, quotation)
set apart in another way (e.g. proverbial phrases, words mentioned but not used)
Highlighting
Highlighting Examples
<hi> (general purpose highlighting); <distinct> (linguistically distinct)

<p>Last week I wrote (to order) a strong
<lb/>bit of Blank: on
<hi rend="ul">
Antaeus v. Heracles
</hi>
.
<lb/>These are the best lines, methinks:
<lb/>(N.B. Antaeus deriving strength from his Mother Earth
<lb/>nearly licked old
<distinct>
Herk
</distinct>
.) </p>

Other similar elements include: <emph>, <mentioned>, <soCalled>, <term> and <gloss>
Quotation marks can be used to set off text for many reasons, so the TEI has the following elements:
<q> (separated from the surrounding text with quotation marks)
<said> (speech or thought)
<quote> (passage attributed to an external source)
<cit> (groups a quotation and citation)
Quotation
<cit>
<quote>
<l>... How Earth herself empowered him with her trick,</l>
<l>Gave him the grip and stringency of Winter,</l>
<l>And all the ardour of th' invincible Spring;</l>
</quote>
<bibl>
<author>Wilfred Owen</author>
<title ref="works.xml#WO123">Letter to Leslie Gunston / The Wrestler</title>
<date when="1917-07">July 1917</date>
</bibl>
</cit>
<choice>
<choice> (groups alternative editorial encodings)
Errors:
<sic> (apparent error)
<corr> (corrected error)
Regularization:
<orig> (original form)
<reg> (regularized form)
Abbreviation:
<abbr> (abbreviated form)
<expan> (expanded form)

<add>
(addition to the text, e.g. marginal gloss)
<del>
(phrase marked as deleted in the text)
<gap>
(indicates point where material is omitted)
<unclear>
(contains text unable to be transcribed clearly)
Additions, Deletions, and Omissions
<p>
<add place="left">My </add>
<del rend="stroked">It's </del>
<del rend="stroked"><add place="above">The</add></del>
subject <del rend="stroked">of</del> is War, and the
<unclear>pity</unclear> of <del rend="stroked">it</del> War.
<lb/> The Poetry is in the pity.
</p>

Names and Addresses
<email> (an electronic mail address)
<address> (a postal address)
<addrLine> (a non-specific address line)
<street> (a full street address)
<postCode> (a postal (or zip) code)
<postBox> (a postal box number)
<name> can also be used
and the 'namesdates' module extends this with more geographic and personal names

<num> (marks a number of any sort)
<measure> (marks a quantity or commodity)
<measureGrp> (groups specifications relating to a single object)

While <num> has simple @type and @value attributes, <measure> has @type, @quantity, @unit and @commodity attributes
Numbers, Measures
and Dates

<date> (contains a date in any format and includes a when attribute for a regularised form and a calendar attribute to specify what calendar system)
<time> (contains a time in any format and includes a when attribute for a regularised form)
Lists
<list> (a sequence of items forming a list)
<item> (one component of a list)
<label> (label associated with an item)
<headLabel> (heading for column of labels)
<headItem> (heading for column of items)
<div>
<head>Lists</head>
<list>
<item><gi>list</gi> (a sequence of items forming a list)</item>
<item><gi>item</gi> (one component of a list)</item>
<item><gi>label</gi> (label associated with an item)</item>
<item><gi>headLabel</gi> (heading for column of labels)</item>
<item><gi>headItem</gi> (heading for column of items)</item>
</list>
</div>
Simple Verse and Drama
<sp who="#F-tem-pro">
<speaker>Prospero</speaker>
<l part="Y">I'll deliver all,</l>
<l>And promise you calm seas, auspicious gales,</l>
<l>Be free and fare thou well.
<stage type="exit">Exit Ariel</stage>
Please you, draw near.
<stage type="exit">Exeunt all but Prospero</stage>
<note place="margin">Epilogue</note>
</l>
<l>Now my charms are all o'erthrown,</l>
<l>And what strength I have's mine own</l>
<l>As you from crimes would pardoned be,</l>
<l>Let your indulgence set me free.</l>
</sp>
<stage type="mix">He awaits applause, then exit.</stage>
<l>
(verse line) contains a single, possibly incomplete, line of verse.
<lg>
(line group) contains one or more verse lines functioning as a formal unit, e.g. a stanza, refrain, verse paragraph, etc.
<sp>
(speech) contains an individual speech in a performance text, or a passage presented as such in a prose or verse text.
<speaker>
contains a specialized form of heading or label, giving the name of one or more speakers in a dramatic text or fragment.
<stage>
(stage direction) contains any kind of stage direction within a dramatic text or fragment.

Resources
This prezi:
http://tinyurl.com/itlp-20151113
TEI Consortium:
http://www.tei-c.org/
TEI Guidelines:
http://www.tei-c.org/P5/
Roma:
http://www.tei-c.org/Roma/
OxGarage:
http://www.tei-c.org/oxgarage/
tei-oxford-subscribe@maillist.ox.ac.uk
TEI-L:
http://tinyurl.com/TEI-subscribe
Full transcript