Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript
  • The application of markup to a document can be an intellectual activity
  • In deciding what markup to apply, and how this represents the original, one is undertaking the task of an editor
  • There is (almost) no such thing as neutral markup -- all of it involves interpretation
  • Markup can assist in answering research questions, and the deciding what markup is needed to enable such questions to be answered can be a research activity in itself
  • Good textual encoding is never as easy or quick as people would believe
  • Detailed document analysis is needed before encoding for the resulting markup to be useful
  • the only standard in this area
  • objective or non-interpretative
  • used consistently even within the same project (never mind in other ones)
  • fixed and unchanging
  • your research end-point
  • automatic publication or long-term preservation

<TEI>

<teiHeader>

<!-- Metadata required here -->

</teiHeader>

<text>

<front>

<!-- optional front matter -->

</front>

<body>

<div n="1">

<!-- first division -->

</div>

<div n="2">

<!-- second division -->

</div>

<!-- ... -->

</body>

<back>

<!-- optional back matter -->

</back>

</text>

</TEI>

But of course you want to remove specific elements from these 'modules' as well.

No project will need all of them!

<TEI>

<teiHeader>

<!-- .... -->

</teiHeader>

<text>

<front>

<!-- front matter of text, if any, goes here -->

</front>

<body>

<!-- body of text, divisions, etc go here -->

</body>

<back>

<!-- back matter of text, if any, goes here -->

</back>

</text>

</TEI>

1987 was a long time ago...

The Text Encoding Initiative was born into a very different world:

  • the world wide web did not exist
  • the tunnel beneath the English Channel was still being built
  • a state called the Soviet Union had just launched a space station called Mir
  • serious computing was done on mainframes
  • most people didn't have mobile phones

About This Talk

Defining Markup

...but we also a familiar problems

  • Markup makes explicit the distinctions we want to make when processing a string of bytes
  • Markup is a way of naming and characterizing the parts of a text in a formalized way
  • Markup provides additional levels of annotation on data
  • It's (usually) more useful to markup what we think things are than what they look like

Markup makes explicit to a machine which is implicit to a person

Markup as a scholarly activity

  • Corpus linguistics and ‘artificial intelligence’ had created a demand for large scale lexical resources in academia and beyond
  • Advances in text processing were beginning to affect lexicography and document management systems (e.g. TeX, Scribe, tRoff..)
  • The Internet existed (but the 'web' didn't yet) and theories about how to use it 'hypertextually' abounded
  • There was data which was too much to read in one go and needed visualization of some sort
  • Books, articles, and even courses in something called "Computing in the Humanities" were becoming commonplace

A useful mental exercise

Imagine you are going to markup several thousand pages of complex material....

  • Which features are you going to markup?
  • Why are you choosing to markup this feature?
  • How reliably and consistently can you do this?
  • Might some of this work be automated?
  • What needs human decision making?

Now, imagine your budget has been halved. Repeat the exercise!

  • The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form. Its chief deliverable is a set of Guidelines which specify encoding methods for machine-readable texts chiefly in the humanities, social sciences and linguistics.
  • Really this talk just exposes you to the TEI Guidelines... our Digital Humanities at Oxford Summer School often features at full 5 day TEI Workshop:

http://digital.humanities.ox.ac.uk/dhoxss/

The TEI now

  • International membership consortium re-established 2000 (see http://www.tei-c.org/)
  • The TEI Guidelines, originally envisaged as a single (large) reference manual, increasingly used as a modular online resource
  • The Guidelines embody a broad consensus about the significant particularities of a huge range of textual materials
  • expressed both in prose and by means of formal definitions
  • the definitions can be expressed as a grammar or schema:
  • TEI P1-P3 (1991-1999) : SGML DTD
  • TEI P4 (2000) : SGML or XML DTD
  • TEI P5 (2007-) XML DTD, W3C Schema, or RelaxNG
  • The TEI is a modular system, which must be customized for particular applications
  • ... the TEI is owned and developed by an active international community

TEI Guidelines:

An overview of the recommendations of the Text Encoding Initiative

@jamescummings

researchsupport@it.ox.ac.uk

http://tinyurl.com/itlp-tei-2015-02-19

vi. Languages and Character Sets

The documents which users of these Guidelines may wish to encode encompass all kinds of material, potentially expressed in the full range of written and spoken human languages, including the extinct, the non-existent, and the conjectural. Because of this wide scope, special attention has been paid to two particular aspects of the representation of linguistic information often taken for granted: language identification, and character encoding.

Sections include:

  • Language Identification
  • Characters and Character Sets

Default Text Structure

iv. About These Guidelines

1. The TEI Infrastructure

4. Default Text Structure

This chapter describes the default high-level structure for TEI documents. A full TEI document combines metadata describing it, represented by a teiHeader element, with the document itself, represented by a text element.

Sections include:

  • Divisions of the Body
  • Elements Common to All Divisions
  • Grouped and Floating Texts
  • Virtual Divisions
  • Front Matter
  • Title Pages
  • Back Matter
  • Module for Default Text Structure

Elements

This chapter describes the infrastructure for the encoding scheme defined by these Guidelines. It introduces the conceptual framework within which the following chapters are to be understood, and the means by which that conceptual framework is implemented.

Sections include:

  • TEI Modules
  • Defining a TEI Schema
  • The TEI Class System
  • Macros
  • The TEI Infrastructure Module

<TEI>, <argument>, <back>, <body>, <byline>, <closer>, <dateline>, <div>, <div1>, <div2>, <div3>, <div4>, <div5>, <div6>, <div7>, <divGen>, <docAuthor>, <docDate>, <docEdition>, <docImprint>, <docTitle>, <epigraph>, <floatingText>, <front>, <group>, <imprimatur>, <opener>, <postscript>, <salute>, <signed>, <teiCorpus>, <text>, <titlePage>, <titlePart>, <trailer>

Example

This is an initial chapter explaining the notation, background, and future development of the Guidelines

Sections include:

  • Structure and Notational Conventions of this Document
  • Historical Background
  • Future Developments and Version Numbers

The TEI Header

2. The TEI Header

Elements Available in All Documents

('Core')

3. Elements Available in All TEI Documents

This chapter addresses the problems of describing an encoded work so that the text itself, its source, its encoding, and its revisions are all thoroughly documented.

Sections include:

  • Organization of the TEI Header
  • The File Description
  • The Encoding Description
  • The Profile Description
  • The Revision Description
  • Minimal and Recommended Headers
  • Note for Library Cataloguers
  • The TEI Header Module

This chapter describes elements which may appear in any kind of text and the tags used to mark them in all TEI documents. Most of these elements are freely floating phrases, which can appear at any point within the textual structure, although they must generally be contained by a higher-level element of some kind (such as a paragraph).

Sections include:

  • Paragraphs
  • Treatment of Punctuation
  • Highlighting and Quotation
  • Simple Editorial Changes
  • Names, Numbers, Dates, Abbreviations, and Addresses
  • Simple Links and Cross-References
  • Lists
  • Notes, Annotation, and Indexing
  • Graphics and Other Non-textual Components
  • Reference Systems
  • Bibliographic Citations and References
  • Passages of Verse or Drama
  • Overview of the Core Module

Elements Defined:

Examples

Elements:

<abstract>, <appInfo>, <application>, <authority>, <availability>, <biblFull>, <cRefPattern>, <calendar>, <calendarDesc>, <catDesc>, <catRef>, <category>, <change>, <classCode>, <classDecl>, <correction>, <creation>, <distributor>, <edition>, <editionStmt>, <editorialDecl>, <encodingDesc>, <extent>, <fileDesc>, <funder>, <geoDecl>, <handNote>, <hyphenation>, <idno>, <interpretation>, <keywords>, <langUsage>, <language>, <licence>, <listPrefixDef>, <namespace>, <normalization>, <notesStmt>, <prefixDef>, <principal>, <profileDesc>, <projectDesc>, <publicationStmt>, <quotation>, <refState>, <refsDecl>, <rendition>, <revisionDesc>, <samplingDecl>, <segmentation>, <seriesStmt>, <sourceDesc>, <sponsor>, <stdVals>, <styleDefDecl>, <tagUsage>, <tagsDecl>, <taxonomy>, <teiHeader>, <textClass>, <titleStmt>

An <choice>

<corr cert="high">Autumn</corr>

<sic>Antony</sic>

</choice> it was, That grew the more by reaping

the <choice>

<expan>World Wide Web Consortium</expan>

<abbr>W3C</abbr>

</choice>

...how godly a

<choice>

<orig>dede</orig>

<reg>deed</reg>

</choice> it is to overthrow...

<abbr>, <add>, <addrLine>, <address>, <analytic>, <author>, <bibl>, <biblScope>, <biblStruct>, <binaryObject>, <cb>, <choice>, <cit>, <citedRange>, <corr>, <date>, <del>, <desc>, <distinct>, <editor>, <email>, <emph>, <expan>, <foreign>, <gap>, <gb>, <gloss>, <graphic>, <head>, <headItem>, <headLabel>, <hi>, <imprint>, <index>, <item>, <l>, <label>, <lb>, <lg>, <list>, <listBibl>, <measure>, <measureGrp>, <media>, <meeting>, <mentioned>, <milestone>, <monogr>, <name>, <note>, <num>, <orig>, <p>, <pb>, <postBox>, <postCode>, <ptr>, <pubPlace>, <publisher>, <q>, <quote>, <ref>, <reg>, <relatedItem>, <resp>, <respStmt>, <rs>, <said>, <series>, <sic>, <soCalled>, <sp>, <speaker>, <stage>, <street>, <term>, <time>, <title>, <unclear>

Example

Examples

<teiHeader>

<fileDesc>

<titleStmt>

<title>

<!-- title of the resource -->

</title>

</titleStmt>

<publicationStmt>

<p>(Information about distribution of the resource)</p>

</publicationStmt>

<sourceDesc>

<p>(Information about source from which the resource derives)</p>

</sourceDesc>

</fileDesc>

</teiHeader>

  • John eats a <foreign xml:lang="fr">croissant</foreign> every morning.

  • <mentioned xml:lang="fr">Croissant</mentioned> is difficult to pronounce with your mouth full.

  • A <term xml:lang="fr">croissant</term> is a crescent-shaped piece of light, buttery, pastry that is usually eaten for breakfast, especially in France.

Why would you want those things?

  • because we need to interchange resources
  • between people
  • (increasingly) between machines
  • because we need to integrate resources
  • of different media types
  • from different technical contexts
  • because we need to preserve resources
  • cryogenics is not the (full) answer!
  • we need to preserve metadata as well as data

TEI Basic Structure

<TEI>

<teiHeader>

<!-- required -->

</teiHeader>

<facsimile>

<!-- optional -->

</facsimile>

<sourceDoc>

<!-- optional -->

</sourceDoc>

<text>

<!-- required if no facsimile or sourceDoc-->

</text>

</TEI>

TEI Text Structure

6. Verse

Examples

This module is intended for use when encoding texts which are entirely or predominantly in verse, and for which the elements for encoding verse structure already provided by the core module are inadequate.

Sections include:

  • Structural Divisions of Verse Texts
  • Components of the Verse Line
  • Rhyme and Metrical Analysis
  • Rhyme
  • Metrical Notation Declaration
  • Encoding Procedures for Other Verse Features
  • Module for Verse

Elements Defined: <caesura>, <metDecl>, <metSym>, <rhyme>

<div

type="canzone"

met="E/E/S/E/S/E/E/S/E/S/E/S/S/E/S/E/E/S/S/E/E"

rhyme="abbcdaccbdceeffghhhgg">

<lg n="1" type="stanza">

<l n="1">Doglia mi reca nello core ardire</l>

</lg>

</div>

<metDecl type="met" pattern="((E|S)/)+)">

<metSym value="E" terminal="false">xxxxxxxxx+o</metSym>

<metSym value="S" terminal="false">xxxxx+o</metSym>

<metSym value="x">metrically prominent or non-prominent</metSym>

<metSym value="+">metrically prominent</metSym>

<metSym value="o">optional non prominent</metSym>

<metSym value="/">line division</metSym>

</metDecl>

The TEI Guidelines

5. Characters, Glyphs, and Writing Modes

7. Performance Texts

Example

10. Manuscript Description

This module is intended for use when encoding printed dramatic texts, screen plays or radio scripts, and written transcriptions of any other form of performance.

Sections include:

  • Front and Back Matter
  • The Body of a Performance Text
  • Other Types of Performance Text
  • Module for Performance Texts

Elements Defined: <actor>, <camera>, <caption>, <castGroup>, <castItem>, <castList>, <epilogue>, <move>, <performance>, <prologue>, <role>, <roleDesc>, <set>, <sound>, <spGrp>, <tech>, <view>

<set>

<p>The Scene, an un-inhabited Island.</p>

</set>

<castList>

<head>Names of the Actors.</head>

<castItem>Alonso, K. of Naples</castItem>

<castItem>Sebastian, his Brother.</castItem>

<castItem>Prospero, the right Duke of Millaine.</castItem>

</castList>

<msDesc xml:id="mySpecialManuscript">

<msIdentifier>

<!-- You *must* give identification information -->

</msIdentifier>

<msContents>

<!-- You can describe the intellectual structure of the text -->

</msContents>

<physDesc>

<!-- You can describe the full physical description of the object -->

</physDesc>

<history>

<!-- You can give a full history of the object, its origin,

provenance, and acquisition -->

</history>

<additional>

<!-- You can provide additional information about

surrogates, administrative metadata, etc. -->

</additional>

</msDesc>

This module defines a special purpose element which can be used to provide detailed descriptive information about handwritten (and other unique) primary sources.

Sections include:

  • Overview
  • The Manuscript Description Element
  • Phrase-level Elements
  • The Manuscript Identifier
  • The Manuscript Heading
  • Intellectual Content
  • Physical Description
  • History
  • Additional Information
  • Manuscript Parts
  • Module for Manuscript Description

Elements Defined: <accMat>, <acquisition>, <additional>, <additions>, <adminInfo>, <altIdentifier>, <binding>, <bindingDesc>, <catchwords>, <collation>, <collection>, <colophon>, <condition>, <custEvent>, <custodialHist>, <decoDesc>, <decoNote>, <depth>, <dim>, <dimensions>, <explicit>, <filiation>, <finalRubric>, <foliation>, <handDesc>, <height>, <heraldry>, <history>, <incipit>, <institution>, <layout>, <layoutDesc>, <locus>, <locusGrp>, <material>, <msContents>, <msDesc>, <msIdentifier>, <msItem>, <msItemStruct>, <msName>, <msPart>, <musicNotation>, <objectDesc>, <objectType>, <origDate>, <origPlace>, <origin>, <physDesc>, <provenance>, <recordHist>, <repository>, <rubric>, <scriptDesc>, <scriptNote>, <seal>, <sealDesc>, <secFol>, <signatures>, <source>, <stamp>, <summary>, <support>, <supportDesc>, <surrogates>, <textLang>, <typeDesc>, <typeNote>, <watermark>, <width>

Appendices

TEI ODD Customisation

Text encoders sometimes find that the published repertoire of Unicode characters is inadequate to their needs with ancient languages or recording particular variant glyph forms.

Sections include:

  • Is Your Journey Really Necessary?
  • Markup Constructs for Representation of Characters and Glyphs
  • Annotating Characters
  • Adding New Characters
  • How to Use Code Points from the Private Use Area
  • Writing Modes
  • Examples of Different Writing Modes
  • Text Rotation
  • Caveat
  • Formal Definition

Elements Defined: <char>, <charDecl>, <charName>, <charProp>, <g>, <glyph>, <glyphName>, <localName>, <mapping>, <unicodeName>, <value>

OxGarage

  • Appendix A. Model Classes
  • Appendix B. Attribute Classes
  • Appendix C. Elements
  • Appendix D. Attributes
  • Appendix E. Datatypes and Other Macros

Customized TEI

Full TEI Schema

Modules

What is the TEI?

Simple analytic mechanisms

Certainty and uncertainty

Core elements

Corpus texts

Dictionaries

Performance texts

Tables, formulæ, notated music, and figures

Character and glyph documentation

The TEI Header

Feature structures

Linking, segmentation and alignment

Manuscript Description

Names and dates

Graphs, networks, and trees

Transcribed Speech

Documentation of TEI modules

Critical Apparatus

Default text structure

Transcription of primary sources

Verse structures

(The Text Encoding Initiative)

TEI ODD Customisation

  • An international consortium of institutions, projects and individual members

What the TEI is not:

9. Dictionaries

8. Transcriptions of Speech

  • A community of users and volunteers
  • A freely available manual of set of regularly maintained and updated recommendations: 'The Guidelines'

This chapter defines a module for encoding lexical resources of all kinds, in particular human-oriented monolingual and multilingual dictionaries, glossaries, and similar documents.

Sections include:

  • Dictionary Body and Overall Structure
  • The Structure of Dictionary Entries
  • Top-level Constituents of Entries
  • Headword and Pronunciation References
  • Typographic and Lexical Information in Dictionary Data
  • Unstructured Entries
  • The Dictionary Module

Elements Defined: <case>, <colloc>, <def>, <dictScrap>, <entry>, <entryFree>, <etym>, <form>, <gen>, <gram>, <gramGrp>, <hom>, <hyph>, <iType>, <lang>, <lbl>, <mood>, <number>, <oRef>, <oVar>, <orth>, <pRef>, <pVar>, <per>, <pos>, <pron>, <re>, <sense>, <stress>, <subc>, <superEntry>, <syll>, <tns>, <usg>, <xr>

<entry>

<form>

<orth>competitor</orth>

<hyph>com|peti|tor</hyph>

<pron>k@m"petit@(r)</pron>

</form>

<gramGrp>

<pos>n</pos>

</gramGrp>

<def>person who competes.</def>

</entry>

The module described in this chapter is intended for use with a wide variety of transcribed spoken material.

Sections include:

  • General Considerations and Overview
  • Documenting the Source of Transcribed Speech
  • Elements Unique to Spoken Texts
  • Elements Defined Elsewhere
  • Module for Transcribed Speech

Elements Defined: <broadcast>, <equipment>, <incident>, <kinesic>, <pause>, <recording>, <recordingStmt>, <scriptStmt>, <shift>, <u>, <vocal>, <writing>

<u who="#a">look at this</u>

<writing who="#a" type="newspaper" gradual="false"> Government claims economic problems

<soCalled>over by June</soCalled>

</writing>

<u who="#a">what nonsense!</u>

Chaining of TEI ODD Customisations

OxGarage

  • Definitions, examples, and discussion of over 540 markup distinctions for textual, image facsimile, genetic editing etc.

One of the interesting developments in TEI ODD design is that can 'chain' customisations.

This means that if a TEI Community or Project makes a customisation then others can come along and make their own customisation that points to this project's TEI ODD as a source.

This enables projects to truly say "We're very much like that project over there (e.g. EpiDoc), but we need to add back in this element that they removed" and to document this in a machine-processable manner.

  • Freely available web frontend to underlying XSLT conversions
  • REST-enabled API interface for scripts doing bulk conversions
  • Pipelined conversions through many steps (e.g. DOCX to TEI P5 to ePub)
  • Often uses TEI P5 as pivot format
  • A set of free and openly licensed, customizable tools and stylesheets for transformations to many formats (e.g. HTML, Word, PDF, Databases, RDF/LinkedData, Slides, ePub, etc.)

http://www.tei-c.org/oxgarage/

  • A simple consensus-based way of organizing and structuring textual (and other) resources
  • A mechanism for producing customized schemas for validating your project's digital texts

NOT TEI

  • A format for documenting your interpretation and understanding of a text (and how text functions)
  • An archival, well-understood, format for long-term preservation of digital data and metadata
  • Whatever you make it! It is a community-driven standard

T E I

The

benefits

of a shared vocabulary

far outweigh

the effort of learning the TEI

TEI Conformance

TEI Development

Versions of TEI P5

A document is "TEI Conformant" if and only if it:

  • is a well-formed XML document
  • can be validated against a TEI Schema, that is, a schema derived from the TEI Guidelines
  • conforms to the TEI Abstract Model
  • uses the TEI Namespace (and other namespaces where relevant) correctly
  • is documented by means of a TEI Conformant ODD file which refers to the TEI Guidelines

Version Date

2.7.0 2014-09-16

2.6.0 2014-01-20

2.5.0 2013-07-26

2.4.0 2013-07-05

2.3.0 2013-01-17

2.2.0 2012-10-25

2.1.0 2012-05-15

2.0.2 2012-02-02

2.0.1 2011-12-22

2.0.0 2011-12-16

1.9.1 2011-03-05

1.9.0 2011-02-25

1.8.0 2010-11-05

1.7.0 2010-07-06

1.6.0 2010-02-12

1.5.0 2009-11-08

1.4.1 2009-07-01

1.4.0 2009-06-20

1.3.0 2009-02-01

1.2.0 2008-10-31

1.1.0 2008-07-04

1.0.1 2008-02-03

1.0.0 2007-11-02

  • The TEI Guidelines are constantly being improved and as such are an evolving history of Digital Humanities concerns
  • But any individual project can choose to stay with any earlier version
  • The elected TEI Technical Council takes bug reports and feature requests from the community and implements them
  • You, (yes you!), can participate in the community mailing list (TEI-L) and point out bugs or make feature requests on:

http://tei.sourceforge.net

Standardization should not mean

‘Do what I do’, but rather

‘Explain what you need to do

in terms I can understand’

18. Feature Structures

A feature structure is a general purpose data structure which identifies and groups together individual features, each of which associates a name with one or more values. Because of the generality of feature structures, they can be used to represent many different kinds of information, but they are of particular usefulness in the representation of linguistic analyses.

Sections include:

  • Organization of this Chapter
  • Elementary Feature Structures and the Binary Feature Value
  • Other Atomic Feature Values
  • Feature Libraries and Feature-Value Libraries
  • Feature Structures as Complex Feature Values
  • Re-entrant Feature Structures
  • Collections as Complex Feature Values
  • Feature Value Expressions
  • Default Values
  • Linking Text and Analysis
  • Feature System Declaration
  • Formal Definition and Implementation

Elements Defined: <bicond>, <binary>, <cond>, <default>, <f>, <fDecl>, <fDescr>, <fLib>, <fs>, <fsConstraints>, <fsDecl>, <fsDescr>, <fsdDecl>, <fsdLink>, <fvLib>, <if>, <iff>, <numeric>, <string>, <symbol>, <then>, <vAlt>, <vColl>, <vDefault>, <vLabel>, <vMerge>, <vNot>, <vRange>

12. Critical Apparatus

This chapter defines a module for use in encoding an apparatus of variants for scholarly editions, which may be used in conjunction with any of the modules defined in these Guidelines.

Sections include:

The Apparatus Entry, Readings, and Witnesses

Linking the Apparatus to the Text

Using Apparatus Elements in Transcriptions

Module for Critical Apparatus

Elements Defined: <app>, <lacunaEnd>, <lacunaStart>, <lem>, <listApp>, <listWit>, <rdg>, <rdgGrp>, <variantEncoding>, <wit>, <witDetail>, <witEnd>, <witStart>, <witness>

<p>Certain it is, this was not the case with the redoubtable Brom Bones; and <app>

<rdg wit="#msA">from the moment Ichabod Crane made his advances,</rdg>

<rdg wit="#msB">coincidentally when Ichabod Crane made his advances,</rdg>

<rdg wit="#msC">from the moment Ichabod Crane started to sing, </rdg> </app> the interests of the former

evidently declined;</p>

17. Simple Analytic Mechanisms

11. Representation of Primary Sources

20. Non-hierarchical Structures

23. Using the TEI

This section discusses some technical topics concerning the deployment of the TEI markup scheme documented elsewhere in these Guidelines.

Sections include:

Serving TEI files with the TEI Media Type

Obtaining the TEI Schemas

Personalization and Customization

Conformance

Implementation of an ODD System

XML employs a strongly hierarchical document model. At various points, these Guidelines discuss problems that arise when using XML to encode textual features that either do not naturally lend themselves to representation in a strictly hierarchical form or conflict with other hierarchies represented in the markup.

Sections include:

  • Multiple Encodings of the Same Information
  • Boundary Marking with Empty Elements
  • Fragmentation and Reconstitution of Virtual Elements
  • Stand-off Markup
  • Non-XML-based Approaches

13. Names, Dates, People, and Places

16. Linking, Segmentation, and Alignment

This chapter describes a module which may be used for the encoding of names and other phrases descriptive of persons, places, or organizations, in a manner more detailed than that possible using the elements already provided for these purposes in the Core module.

Sections include:

Attribute Classes Defined by This Module

Names [e.g. personal, place and organisational names]

Biographical and Prosopographical Data

Module for Names and Dates

69. 13. Names, Dates, People, and Places -- Elements

Elements Defined: <addName>, <affiliation>, <age>, <birth>, <bloc>, <climate>, <country>, <death>, <district>, <education>, <event>, <faith>, <floruit>, <forename>, <genName>, <geo>, <geogFeat>, <geogName>, <langKnowledge>, <langKnown>, <listEvent>, <listNym>, <listOrg>, <listPerson>, <listPlace>, <listRelation>, <location>, <nameLink>, <nationality>, <nym>, <occupation>, <offset>, <org>, <orgName>, <persName>, <person>, <personGrp>, <place>, <placeName>, <population>, <region>, <relation>, <residence>, <roleName>, <settlement>, <sex>, <socecStatus>, <state>, <surname>, <terrain>, <trait>

<!-- In the header --><person xml:id="ArnMag">

<persName xml:lang="is">Árni Magnússon</persName>

<persName xml:lang="da">Arne Magnusson</persName>

<persName xml:lang="la">Arnas Magnæus</persName>

</person>

<!-- In the text -->

<p>

<persName ref="#ArnMag">Arnas</persName> dixit "Reveniam".

</p>

This chapter discusses a number of ways in which encoders may represent analyses of the structure of a text which are not necessarily linear or hierarchic.

Sections include:

  • Links
  • Pointing Mechanisms
  • Blocks, Segments, and Anchors
  • Correspondence and Alignment
  • Synchronization
  • Identical Elements and Virtual Copies
  • Aggregation
  • Alternation
  • Stand-off Markup
  • Connecting Analytic and Textual Markup
  • Module for Linking, Segmentation, and Alignment

Elements Defined: <ab>, <alt>, <altGrp>, <anchor>, <join>, <joinGrp>, <link>, <linkGrp>, <seg>, <timeline>, <when>

This chapter describes a module for associating simple analyses and interpretations with text elements. We use the term analysis here to refer to any kind of semantic or syntactic interpretation which an encoder wishes to attach to all or part of a text.

Sections include:

  • Linguistic Segment Categories
  • Global Attributes for Simple Analyses
  • Spans and Interpretations
  • Linguistic Annotation
  • Module for Analysis and Interpretation

Elements Defined: <c>, <cl>, <interp>, <interpGrp>, <m>, <pc>, <phr>, <s>, <span>, <spanGrp>, <w>

<s>

<w ana="#AT0">The</w>

<w ana="#NN1">victim</w>

<w ana="#POS">'s</w>

<w ana="#NN2">friends</w>

<w ana="#VVD">told</w>

<w ana="#NN2">villagers</w>

<w ana="#CJT">that</w>

<w ana="#AT0">the</w>

<w ana="NPO">Headless</w>

<w ana="NPO">Horseman</w>

<w ana="#VVD">rode</w>

<w ana="#PRP">into</w>

<w ana="#AT0">the</w>

<w ana="#NN1">forest</w>

<w ana="#CJC">and</w>

<w ana="#AV0">never</w>

<w ana="#VVD">reappeared</w>

</s>

This chapter defines a module intended for use in the representation of primary sources, such as manuscripts or other written materials.

Sections include:

  • Digital Facsimiles
  • Combining Transcription with Facsimile
  • Scope of Transcriptions
  • Advanced Uses of surface and zone
  • Aspects of Layout
  • Headers, Footers, and Similar Matter
  • Changes
  • Other Primary Source Features not Covered in these Guidelines
  • Module for Transcription of Primary Sources

Elements Defined: <addSpan>, <am>, <damage>, <damageSpan>, <delSpan>, <ex>, <facsimile>, <fw>, <handNotes>, <handShift>, <line>, <listChange>, <listTranspose>, <metamark>, <mod>, <redo>, <restore>, <retrace>, <sourceDoc>, <space>, <subst>, <substJoin>, <supplied>, <surface>, <surfaceGrp>, <surplus>, <transpose>, <undo>, <zone>

<facsimile>

<graphic url="page1.png"/>

<graphic url="page2.png"/>

</facsimile>

22. Documentation Elements

21. Certainty, Precision, and Responsibility

This chapter describes a module which may be used for the documentation of the XML elements and element classes which make up any markup scheme, in particular that described by the TEI Guidelines, and also for the automatic generation of schemas conforming to that documentation.

Sections include:

  • Phrase Level Documentary Elements
  • Modules and Schemas
  • Specification Elements
  • Common Elements
  • Building a Schema
  • Combining TEI and Non-TEI Modules
  • Linking Schemas to XML Documents
  • Module for Documentation Elements

Elements Defined: <altIdent>, <alternate>, <att>, <attDef>, <attList>, <attRef>, <classRef>, <classSpec>, <classes>, <code>, <constraint>, <constraintSpec>, <content>, <datatype>, <defaultVal>, <eg>, <egXML>, <elementRef>, <elementSpec>, <equiv>, <exemplum>, <gi>, <ident>, <listRef>, <macroRef>, <macroSpec>, <memberOf>, <moduleRef>, <moduleSpec>, <remarks>, <schemaSpec>, <sequence>, <specDesc>, <specGrp>, <specGrpRef>, <specList>, <tag>, <val>, <valDesc>, <valItem>, <valList>

Encoders of text often find it useful to indicate that some aspects of the encoded text are problematic or uncertain, and to indicate who is responsible for various aspects of the markup of the electronic text.

Sections include:

  • Levels of Certainty
  • Indications of Precision
  • Attribution of Responsibility
  • The Certainty Module

Elements Defined: <certainty>, <precision>, <respons>

15. Language Corpora

14. Tables, Formulæ, Graphics and Notated Music

Many documents, both historical and contemporary, include not only text, but also graphics, artwork, and other images. Since they may frequently be most conveniently encoded and processed using external notations, they are dealt with together.

Sections include:

  • Tables
  • Formulæ and Mathematical Expressions
  • Notated Music in Written Text
  • Specific Elements for Graphic Images
  • Overview of Basic Graphics Concepts
  • Graphic Image Formats
  • Module for Tables, Formulæ, Notated Music, and Graphics

Elements Defined: <cell>, <figDesc>, <figure>, <formula>, <notatedMusic>, <row>, <table>

This chapter discusses language corpora, with the distinguishing characteristic of any individual corpus is that its components have been selected or structured according to some conscious set of design criteria.

Sections include:

  • Varieties of Composite Text
  • Contextual Information
  • Associating Contextual Information with a Text
  • Linguistic Annotation of Corpora
  • Recommendations for the Encoding of Large Corpora
  • Module for Language Corpora

Elements Defined: <activity>, <channel>, <constitution>, <derivation>, <domain>, <factuality>, <interaction>, <locale>, <particDesc>, <preparedness>, <purpose>, <setting>, <settingDesc>, <textDesc>

<settingDesc>

<setting who="#p1 #p2">

<name type="village">Sleep Hollow</name>

<date>early spring, 1789</date>

<locale>a farm house, sat by the hearth</locale>

<activity>courting</activity>

</setting>

<setting who="#p3">

<name type="village">Sleepy Hollow</name>

<date>early spring, 1789</date>

<locale>school house</locale>

<activity>teaching</activity>

</setting>

</settingDesc>

Learn more about creating dynamic, engaging presentations with Prezi