Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Digital Scholarly Editing

Digital Scholarly Editing, including thoughts about Digital Scholarly Editions, and introduction to Markup, XML, TEI, and oXygen. CC+by. http://tinyurl.com/jc-dse-lyon
by

James Cummings

on 7 March 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Digital Scholarly Editing

What is the TEI?
(The Text Encoding Initiative)
An international consortium of institutions, projects and individual members
A community of users and volunteers
A freely available manual of set of regularly maintained and updated recommendations: 'The Guidelines'
Definitions, examples, and discussion of over 540 markup distinctions for textual, image facsimile, genetic editing etc.
A mechanism for producing customized schemas for validating your project's digital texts
A set of free and openly licensed, customizable tools and stylesheets for transformations to many formats (e.g. HTML, Word, PDF, Databases, RDF/LinkedData, Slides, ePub, etc.)
A simple consensus-based way of organizing and structuring textual (and other) resources
A format for documenting your interpretation and understanding of a text (and how text functions)
An archival, well-understood, format for long-term preservation of digital data and metadata
Whatever
you
make it! It is a community-driven standard
What the TEI is not:
the only standard in this area
objective or non-interpretative
used consistently even within the same
project (never mind in other ones)
fixed and unchanging
your research end-point
automatic publication or long-term preservation
T E I
NOT TEI
The
benefits
of a shared vocabulary
far outweigh
the effort of learning the TEI
Full TEI Schema
Modules
Simple analytic mechanisms
Certainty and uncertainty
Core elements
Corpus texts
Dictionaries
Performance texts
Tables, formulæ, notated music, and figures
Character and glyph documentation
The TEI Header
Feature structures
Linking, segmentation and alignment
Manuscript Description
Names and dates
Graphs, networks, and trees
Transcribed Speech
Documentation of TEI modules
Critical Apparatus
Default text structure
Transcription of primary sources
Verse structures
Customized TEI
Modules
Simple analytic mechanisms
Certainty and uncertainty
Core elements
Corpus texts
Dictionaries
Performance texts
Tables, formulæ, notated music, and figures
Character and glyph documentation
The TEI Header
Feature structures
Linking, segmentation and alignment
Manuscript Description
Names and dates
Graphs, networks, and trees
Transcribed Speech
Documentation of TEI modules
Critical Apparatus
Default text structure
Transcription of primary sources
Verse structures
TEI ODD Customisation
But of course you want to remove specific elements from these 'modules' as well.

No
project will need all of them!
Chaining of TEI ODD Customisations
One of the interesting developments in TEI ODD design is that can 'chain' customisations.

This means that if a TEI Community or Project makes a customisation then others can come along and make their own customisation that points to this project's TEI ODD as a source.

This enables projects to truly say "We're very much like that project over there (e.g. EpiDoc), but we need to add back in this element that they removed" and to document this in a machine-processable manner.
TEI ODD Customisation
OxGarage
OxGarage
Freely available web frontend to underlying XSLT conversions
REST-enabled API interface for scripts doing bulk conversions
Pipelined conversions through many steps (e.g. DOCX to TEI P5 to ePub)
Often uses TEI P5 as pivot format

because we need to interchange resources
between people
(increasingly) between machines
because we need to integrate resources
of different media types
from different technical contexts
because we need to preserve resources
cryogenics is not the (full) answer!
we need to preserve metadata as well as data
Why would you want those things?
A document is "TEI Conformant" if and only if it:
is a well-formed XML document
can be validated against a TEI Schema, that is, a schema derived from the TEI Guidelines
conforms to the TEI Abstract Model
uses the TEI Namespace (and other namespaces where relevant) correctly
is documented by means of a TEI Conformant ODD file which refers to the TEI Guidelines

TEI Conformance
Standardization should not mean

Do what I do
’, but rather

Explain what you need to do
in terms I can understand

Version Date
2.9.1 2015-10-15
2.8.0 2015-04-06
2.7.0 2014-09-16
2.6.0 2014-01-20
2.5.0 2013-07-26
2.4.0 2013-07-05
2.3.0 2013-01-17
2.2.0 2012-10-25
2.1.0 2012-05-15
2.0.2 2012-02-02
2.0.1 2011-12-22
2.0.0 2011-12-16
1.9.1 2011-03-05
1.9.0 2011-02-25
1.8.0 2010-11-05
1.7.0 2010-07-06
1.6.0 2010-02-12
1.5.0 2009-11-08
1.4.1 2009-07-01
1.4.0 2009-06-20
1.3.0 2009-02-01
1.2.0 2008-10-31
1.1.0 2008-07-04
1.0.1 2008-02-03
1.0.0 2007-11-02
Versions of TEI P5
TEI Development
The TEI Guidelines are constantly being improved and as such are an evolving history of Digital Humanities concerns
But any individual project can choose to stay with any earlier version
The elected TEI Technical Council takes bug reports and feature requests from the community and implements them
You, (yes you!), can participate in the community mailing list (TEI-L) and point out bugs or make feature requests on:
http://github.com/teic
http://www.tei-c.org/oxgarage/
<
TEI
>
<
teiHeader
>
<!-- required -->
</
teiHeader
>
<
facsimile
>
<!-- optional -->
</
facsimile
>
<
sourceDoc
>

<!-- optional -->
</
sourceDoc
>
<
text
>

<!-- required if no facsimile or sourceDoc-->
</
text
>
</
TEI
>

TEI Basic Structure
<
TEI
>
<
teiHeader
>
<!-- Metadata required here -->
</
teiHeader
>
<
text
>
<
front
>
<!-- optional front matter -->
</
front
>
<
body
>
<
div

n
="
1
">
<!-- first division -->
</
div
>
<
div

n
="
2
">
<!-- second division -->
</
div
>
<!-- ... -->
</
body
>
<
back
>
<!-- optional back matter -->
</
back
>
</
text
>
</
TEI
>
TEI Text Structure
TEI
vi. Languages and Character Sets

The documents which users of these Guidelines may wish to encode encompass all kinds of material, potentially expressed in the full range of written and spoken human languages, including the extinct, the non-existent, and the conjectural. Because of this wide scope, special attention has been paid to two particular aspects of the representation of linguistic information often taken for granted: language identification, and character encoding.

Sections include:
Language Identification
Characters and Character Sets
1. The TEI Infrastructure
This chapter describes the infrastructure for the encoding scheme defined by these Guidelines. It introduces the conceptual framework within which the following chapters are to be understood, and the means by which that conceptual framework is implemented.

Sections include:
TEI Modules
Defining a TEI Schema
The TEI Class System
Macros
The TEI Infrastructure Module
iv. About These Guidelines
This is an initial chapter explaining the notation, background, and future development of the Guidelines

Sections include:
Structure and Notational Conventions of this Document
Historical Background
Future Developments and Version Numbers

7. Performance Texts
This module is intended for use when encoding printed dramatic texts, screen plays or radio scripts, and written transcriptions of any other form of performance.

Sections include:
Front and Back Matter
The Body of a Performance Text
Other Types of Performance Text
Module for Performance Texts

Elements Defined:
<actor>, <camera>, <caption>, <castGroup>, <castItem>, <castList>, <epilogue>, <move>, <performance>, <prologue>, <role>, <roleDesc>, <set>, <sound>, <spGrp>, <tech>, <view>

<set>
<p>The Scene, an un-inhabited Island.</p>
</set>

<castList>
<head>Names of the Actors.</head>
<castItem>Alonso, K. of Naples</castItem>
<castItem>Sebastian, his Brother.</castItem>
<castItem>Prospero, the right Duke of Millaine.</castItem>
</castList>

8. Transcriptions of Speech
The module described in this chapter is intended for use with a wide variety of transcribed spoken material.

Sections include:
General Considerations and Overview
Documenting the Source of Transcribed Speech
Elements Unique to Spoken Texts
Elements Defined Elsewhere
Module for Transcribed Speech

Elements Defined:
<broadcast>, <equipment>, <incident>, <kinesic>, <pause>, <recording>, <recordingStmt>, <scriptStmt>, <shift>, <u>, <vocal>, <writing>

<u who="#a">look at this</u>
<writing who="#a" type="newspaper" gradual="false"> Government claims economic problems
<soCalled>over by June</soCalled>
</writing>
<u who="#a">what nonsense!</u>

9. Dictionaries
This chapter defines a module for encoding lexical resources of all kinds, in particular human-oriented monolingual and multilingual dictionaries, glossaries, and similar documents.

Sections include:
Dictionary Body and Overall Structure
The Structure of Dictionary Entries
Top-level Constituents of Entries
Headword and Pronunciation References
Typographic and Lexical Information in Dictionary Data
Unstructured Entries
The Dictionary Module

Elements Defined:
<case>, <colloc>, <def>, <dictScrap>, <entry>, <entryFree>, <etym>, <form>, <gen>, <gram>, <gramGrp>, <hom>, <hyph>, <iType>, <lang>, <lbl>, <mood>, <number>, <oRef>, <oVar>, <orth>, <pRef>, <pVar>, <per>, <pos>, <pron>, <re>, <sense>, <stress>, <subc>, <superEntry>, <syll>, <tns>, <usg>, <xr>

<entry>
<form>
<orth>competitor</orth>
<hyph>com|peti|tor</hyph>
<pron>k@m"petit@(r)</pron>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<def>person who competes.</def>
</entry>

5. Characters, Glyphs, and Writing Modes
Text encoders sometimes find that the published repertoire of Unicode characters is inadequate to their needs with ancient languages or recording particular variant glyph forms.

Sections include:
Is Your Journey Really Necessary?
Markup Constructs for Representation of Characters and Glyphs
Annotating Characters
Adding New Characters
How to Use Code Points from the Private Use Area
Writing Modes
Examples of Different Writing Modes
Text Rotation
Caveat
Formal Definition

Elements Defined:
<char>, <charDecl>, <charName>, <charProp>, <g>, <glyph>, <glyphName>, <localName>, <mapping>, <unicodeName>, <value>


12. Critical Apparatus

This chapter defines a module for use in encoding an apparatus of variants for scholarly editions, which may be used in conjunction with any of the modules defined in these Guidelines.

Sections include:
The Apparatus Entry, Readings, and Witnesses
Linking the Apparatus to the Text
Using Apparatus Elements in Transcriptions
Module for Critical Apparatus

Elements Defined:
<app>, <lacunaEnd>, <lacunaStart>, <lem>, <listApp>, <listWit>, <rdg>, <rdgGrp>, <variantEncoding>, <wit>, <witDetail>, <witEnd>, <witStart>, <witness>

<p>Certain it is, this was not the case with the redoubtable Brom Bones; and <app>
<rdg wit="#msA">from the moment Ichabod Crane made his advances,</rdg>
<rdg wit="#msB">coincidentally when Ichabod Crane made his advances,</rdg>
<rdg wit="#msC">from the moment Ichabod Crane started to sing, </rdg> </app> the interests of the former
evidently declined;</p>

This chapter describes a module which may be used for the encoding of names and other phrases descriptive of persons, places, or organizations, in a manner more detailed than that possible using the elements already provided for these purposes in the Core module.

Sections include:
Attribute Classes Defined by This Module
Names [e.g. personal, place and organisational names]
Biographical and Prosopographical Data
Module for Names and Dates
69. 13. Names, Dates, People, and Places -- Elements

Elements Defined:
<addName>, <affiliation>, <age>, <birth>, <bloc>, <climate>, <country>, <death>, <district>, <education>, <event>, <faith>, <floruit>, <forename>, <genName>, <geo>, <geogFeat>, <geogName>, <langKnowledge>, <langKnown>, <listEvent>, <listNym>, <listOrg>, <listPerson>, <listPlace>, <listRelation>, <location>, <nameLink>, <nationality>, <nym>, <occupation>, <offset>, <org>, <orgName>, <persName>, <person>, <personGrp>, <place>, <placeName>, <population>, <region>, <relation>, <residence>, <roleName>, <settlement>, <sex>, <socecStatus>, <state>, <surname>, <terrain>, <trait>

<!-- In the header --><person xml:id="ArnMag">
<persName xml:lang="is">Árni Magnússon</persName>
<persName xml:lang="da">Arne Magnusson</persName>
<persName xml:lang="la">Arnas Magnæus</persName>
</person>
<!-- In the text -->
<p>
<persName ref="#ArnMag">Arnas</persName> dixit "Reveniam".
</p>

14. Tables, Formulæ, Graphics and Notated Music


Many documents, both historical and contemporary, include not only text, but also graphics, artwork, and other images. Since they may frequently be most conveniently encoded and processed using external notations, they are dealt with together.

Sections include:
Tables
Formulæ and Mathematical Expressions
Notated Music in Written Text
Specific Elements for Graphic Images
Overview of Basic Graphics Concepts
Graphic Image Formats
Module for Tables, Formulæ, Notated Music, and Graphics

Elements Defined:
<cell>, <figDesc>, <figure>, <formula>, <notatedMusic>, <row>, <table>
15. Language Corpora
This chapter discusses language corpora, with the distinguishing characteristic of any individual corpus is that its components have been selected or structured according to some conscious set of design criteria.

Sections include:
Varieties of Composite Text
Contextual Information
Associating Contextual Information with a Text
Linguistic Annotation of Corpora
Recommendations for the Encoding of Large Corpora
Module for Language Corpora


Elements Defined:
<activity>, <channel>, <constitution>, <derivation>, <domain>, <factuality>, <interaction>, <locale>, <particDesc>, <preparedness>, <purpose>, <setting>, <settingDesc>, <textDesc>

<settingDesc>
<setting who="#p1 #p2">
<name type="village">Sleep Hollow</name>
<date>early spring, 1789</date>
<locale>a farm house, sat by the hearth</locale>
<activity>courting</activity>
</setting>
<setting who="#p3">
<name type="village">Sleepy Hollow</name>
<date>early spring, 1789</date>
<locale>school house</locale>
<activity>teaching</activity>
</setting>
</settingDesc>

16. Linking, Segmentation, and Alignment
This chapter discusses a number of ways in which encoders may represent analyses of the structure of a text which are not necessarily linear or hierarchic.

Sections include:
Links
Pointing Mechanisms
Blocks, Segments, and Anchors
Correspondence and Alignment
Synchronization
Identical Elements and Virtual Copies
Aggregation
Alternation
Stand-off Markup
Connecting Analytic and Textual Markup
Module for Linking, Segmentation, and Alignment


Elements Defined:
<ab>, <alt>, <altGrp>, <anchor>, <join>, <joinGrp>, <link>, <linkGrp>, <seg>, <timeline>, <when>


11. Representation of Primary Sources
This chapter defines a module intended for use in the representation of primary sources, such as manuscripts or other written materials.

Sections include:
Digital Facsimiles
Combining Transcription with Facsimile
Scope of Transcriptions
Advanced Uses of surface and zone
Aspects of Layout
Headers, Footers, and Similar Matter
Changes
Other Primary Source Features not Covered in these Guidelines
Module for Transcription of Primary Sources

Elements Defined:
<addSpan>, <am>, <damage>, <damageSpan>, <delSpan>, <ex>, <facsimile>, <fw>, <handNotes>, <handShift>, <line>, <listChange>, <listTranspose>, <metamark>, <mod>, <redo>, <restore>, <retrace>, <sourceDoc>, <space>, <subst>, <substJoin>, <supplied>, <surface>, <surfaceGrp>, <surplus>, <transpose>, <undo>, <zone>

<facsimile>
<graphic url="page1.png"/>
<graphic url="page2.png"/>
</facsimile>

18. Feature Structures
A feature structure is a general purpose data structure which identifies and groups together individual features, each of which associates a name with one or more values. Because of the generality of feature structures, they can be used to represent many different kinds of information, but they are of particular usefulness in the representation of linguistic analyses.

Sections include:
Organization of this Chapter
Elementary Feature Structures and the Binary Feature Value
Other Atomic Feature Values
Feature Libraries and Feature-Value Libraries
Feature Structures as Complex Feature Values
Re-entrant Feature Structures
Collections as Complex Feature Values
Feature Value Expressions
Default Values
Linking Text and Analysis
Feature System Declaration
Formal Definition and Implementation


Elements Defined:
<bicond>, <binary>, <cond>, <default>, <f>, <fDecl>, <fDescr>, <fLib>, <fs>, <fsConstraints>, <fsDecl>, <fsDescr>, <fsdDecl>, <fsdLink>, <fvLib>, <if>, <iff>, <numeric>, <string>, <symbol>, <then>, <vAlt>, <vColl>, <vDefault>, <vLabel>, <vMerge>, <vNot>, <vRange>


20. Non-hierarchical Structures

XML employs a strongly hierarchical document model. At various points, these Guidelines discuss problems that arise when using XML to encode textual features that either do not naturally lend themselves to representation in a strictly hierarchical form or conflict with other hierarchies represented in the markup.

Sections include:
Multiple Encodings of the Same Information
Boundary Marking with Empty Elements
Fragmentation and Reconstitution of Virtual Elements
Stand-off Markup
Non-XML-based Approaches

21. Certainty, Precision, and Responsibility
Encoders of text often find it useful to indicate that some aspects of the encoded text are problematic or uncertain, and to indicate who is responsible for various aspects of the markup of the electronic text.

Sections include:
Levels of Certainty
Indications of Precision
Attribution of Responsibility
The Certainty Module

Elements Defined:
<certainty>, <precision>, <respons>


22. Documentation Elements
This chapter describes a module which may be used for the documentation of the XML elements and element classes which make up any markup scheme, in particular that described by the TEI Guidelines, and also for the automatic generation of schemas conforming to that documentation.

Sections include:
Phrase Level Documentary Elements
Modules and Schemas
Specification Elements
Common Elements
Building a Schema
Combining TEI and Non-TEI Modules
Linking Schemas to XML Documents
Module for Documentation Elements

Elements Defined:
<altIdent>, <alternate>, <att>, <attDef>, <attList>, <attRef>, <classRef>, <classSpec>, <classes>, <code>, <constraint>, <constraintSpec>, <content>, <datatype>, <defaultVal>, <eg>, <egXML>, <elementRef>, <elementSpec>, <equiv>, <exemplum>, <gi>, <ident>, <listRef>, <macroRef>, <macroSpec>, <memberOf>, <moduleRef>, <moduleSpec>, <remarks>, <schemaSpec>, <sequence>, <specDesc>, <specGrp>, <specGrpRef>, <specList>, <tag>, <val>, <valDesc>, <valItem>, <valList>
23. Using the TEI


This section discusses some technical topics concerning the deployment of the TEI markup scheme documented elsewhere in these Guidelines.

Sections include:
Serving TEI files with the TEI Media Type
Obtaining the TEI Schemas
Personalization and Customization
Conformance
Implementation of an ODD System
17. Simple Analytic Mechanisms
This chapter describes a module for associating simple analyses and interpretations with text elements. We use the term analysis here to refer to any kind of semantic or syntactic interpretation which an encoder wishes to attach to all or part of a text.

Sections include:
Linguistic Segment Categories
Global Attributes for Simple Analyses
Spans and Interpretations
Linguistic Annotation
Module for Analysis and Interpretation

Elements Defined:
<c>, <cl>, <interp>, <interpGrp>, <m>, <pc>, <phr>, <s>, <span>, <spanGrp>, <w>

<s>
<w ana="#AT0">The</w>
<w ana="#NN1">victim</w>
<w ana="#POS">'s</w>
<w ana="#NN2">friends</w>
<w ana="#VVD">told</w>
<w ana="#NN2">villagers</w>
<w ana="#CJT">that</w>
<w ana="#AT0">the</w>
<w ana="NPO">Headless</w>
<w ana="NPO">Horseman</w>
<w ana="#VVD">rode</w>
<w ana="#PRP">into</w>
<w ana="#AT0">the</w>
<w ana="#NN1">forest</w>
<w ana="#CJC">and</w>
<w ana="#AV0">never</w>
<w ana="#VVD">reappeared</w>
</s>

2. The TEI Header
This chapter addresses the problems of describing an encoded work so that the text itself, its source, its encoding, and its revisions are all thoroughly documented.

Sections include:
Organization of the TEI Header
The File Description
The Encoding Description
The Profile Description
The Revision Description
Minimal and Recommended Headers
Note for Library Cataloguers
The TEI Header Module

Elements Defined:
<abstract>, <appInfo>, <application>, <authority>, <availability>, <biblFull>, <cRefPattern>, <calendar>, <calendarDesc>, <catDesc>, <catRef>, <category>, <change>, <classCode>, <classDecl>, <correction>, <creation>, <distributor>, <edition>, <editionStmt>, <editorialDecl>, <encodingDesc>, <extent>, <fileDesc>, <funder>, <geoDecl>, <handNote>, <hyphenation>, <idno>, <interpretation>, <keywords>, <langUsage>, <language>, <licence>, <listPrefixDef>, <namespace>, <normalization>, <notesStmt>, <prefixDef>, <principal>, <profileDesc>, <projectDesc>, <publicationStmt>, <quotation>, <refState>, <refsDecl>, <rendition>, <revisionDesc>, <samplingDecl>, <segmentation>, <seriesStmt>, <sourceDesc>, <sponsor>, <stdVals>, <styleDefDecl>, <tagUsage>, <tagsDecl>, <taxonomy>, <teiHeader>, <textClass>, <titleStmt>
Example
<teiHeader>
<fileDesc>
<titleStmt>
<title>
<!-- title of the resource -->
</title>
</titleStmt>
<publicationStmt>
<p>(Information about distribution of the resource)</p>
</publicationStmt>
<sourceDesc>
<p>(Information about source from which the resource derives)</p>
</sourceDesc>
</fileDesc>
</teiHeader>
The TEI Header
Elements Available in All Documents
('Core')
3. Elements Available in All TEI Documents

This chapter describes elements which may appear in any kind of text and the tags used to mark them in all TEI documents. Most of these elements are freely floating phrases, which can appear at any point within the textual structure, although they must generally be contained by a higher-level element of some kind (such as a paragraph).

Sections include:
Paragraphs
Treatment of Punctuation
Highlighting and Quotation
Simple Editorial Changes
Names, Numbers, Dates, Abbreviations, and Addresses
Simple Links and Cross-References
Lists
Notes, Annotation, and Indexing
Graphics and Other Non-textual Components
Reference Systems
Bibliographic Citations and References
Passages of Verse or Drama
Overview of the Core Module

Elements:
<abbr>, <add>, <addrLine>, <address>, <analytic>, <author>, <bibl>, <biblScope>, <biblStruct>, <binaryObject>, <cb>, <choice>, <cit>, <citedRange>, <corr>, <date>, <del>, <desc>, <distinct>, <editor>, <email>, <emph>, <expan>, <foreign>, <gap>, <gb>, <gloss>, <graphic>, <head>, <headItem>, <headLabel>, <hi>, <imprint>, <index>, <item>, <l>, <label>, <lb>, <lg>, <list>, <listBibl>, <measure>, <measureGrp>, <media>, <meeting>, <mentioned>, <milestone>, <monogr>, <name>, <note>, <num>, <orig>, <p>, <pb>, <postBox>, <postCode>, <ptr>, <pubPlace>, <publisher>, <q>, <quote>, <ref>, <reg>, <relatedItem>, <resp>, <respStmt>, <rs>, <said>, <series>, <sic>, <soCalled>, <sp>, <speaker>, <stage>, <street>, <term>, <time>, <title>, <unclear>
Examples
John eats a <foreign xml:lang="fr">croissant</foreign> every morning.


<mentioned xml:lang="fr">Croissant</mentioned> is difficult to pronounce with your mouth full.


A <term xml:lang="fr">croissant</term> is a crescent-shaped piece of light, buttery, pastry that is usually eaten for breakfast, especially in France.

Examples
An <choice>
<corr cert="high">Autumn</corr>
<sic>Antony</sic>
</choice> it was, That grew the more by reaping

the <choice>
<expan>World Wide Web Consortium</expan>
<abbr>W3C</abbr>
</choice>

...how godly a
<choice>
<orig>dede</orig>
<reg>deed</reg>
</choice> it is to overthrow...
4. Default Text Structure
This chapter describes the default high-level structure for TEI documents. A full TEI document combines metadata describing it, represented by a teiHeader element, with the document itself, represented by a text element.

Sections include:
Divisions of the Body
Elements Common to All Divisions
Grouped and Floating Texts
Virtual Divisions
Front Matter
Title Pages
Back Matter
Module for Default Text Structure
Elements
<TEI>, <argument>, <back>, <body>, <byline>, <closer>, <dateline>, <div>, <div1>, <div2>, <div3>, <div4>, <div5>, <div6>, <div7>, <divGen>, <docAuthor>, <docDate>, <docEdition>, <docImprint>, <docTitle>, <epigraph>, <floatingText>, <front>, <group>, <imprimatur>, <opener>, <postscript>, <salute>, <signed>, <teiCorpus>, <text>, <titlePage>, <titlePart>, <trailer>


Example
<TEI>
<teiHeader>
<!-- .... -->
</teiHeader>
<text>
<front>
<!-- front matter of text, if any, goes here -->
</front>
<body>
<!-- body of text, divisions, etc go here -->
</body>
<back>
<!-- back matter of text, if any, goes here -->
</back>
</text>
</TEI>
Default Text Structure
6. Verse


This module is intended for use when encoding texts which are entirely or predominantly in verse, and for which the elements for encoding verse structure already provided by the core module are inadequate.

Sections include:
Structural Divisions of Verse Texts
Components of the Verse Line
Rhyme and Metrical Analysis
Rhyme
Metrical Notation Declaration
Encoding Procedures for Other Verse Features
Module for Verse
Elements Defined:
<caesura>, <metDecl>, <metSym>, <rhyme>
Examples
<div
type="canzone"
met="E/E/S/E/S/E/E/S/E/S/E/S/S/E/S/E/E/S/S/E/E"
rhyme="abbcdaccbdceeffghhhgg">
<lg n="1" type="stanza">
<l n="1">Doglia mi reca nello core ardire</l>
</lg>
</div>

<metDecl type="met" pattern="((E|S)/)+)">
<metSym value="E" terminal="false">xxxxxxxxx+o</metSym>
<metSym value="S" terminal="false">xxxxx+o</metSym>
<metSym value="x">metrically prominent or non-prominent</metSym>
<metSym value="+">metrically prominent</metSym>
<metSym value="o">optional non prominent</metSym>
<metSym value="/">line division</metSym>
</metDecl>
10. Manuscript Description
This module defines a special purpose element which can be used to provide detailed descriptive information about handwritten (and other unique) primary sources.

Sections include:
Overview
The Manuscript Description Element
Phrase-level Elements
The Manuscript Identifier
The Manuscript Heading
Intellectual Content
Physical Description
History
Additional Information
Manuscript Parts
Module for Manuscript Description

Elements Defined:
<accMat>, <acquisition>, <additional>, <additions>, <adminInfo>, <altIdentifier>, <binding>, <bindingDesc>, <catchwords>, <collation>, <collection>, <colophon>, <condition>, <custEvent>, <custodialHist>, <decoDesc>, <decoNote>, <depth>, <dim>, <dimensions>, <explicit>, <filiation>, <finalRubric>, <foliation>, <handDesc>, <height>, <heraldry>, <history>, <incipit>, <institution>, <layout>, <layoutDesc>, <locus>, <locusGrp>, <material>, <msContents>, <msDesc>, <msIdentifier>, <msItem>, <msItemStruct>, <msName>, <msPart>, <musicNotation>, <objectDesc>, <objectType>, <origDate>, <origPlace>, <origin>, <physDesc>, <provenance>, <recordHist>, <repository>, <rubric>, <scriptDesc>, <scriptNote>, <seal>, <sealDesc>, <secFol>, <signatures>, <source>, <stamp>, <summary>, <support>, <supportDesc>, <surrogates>, <textLang>, <typeDesc>, <typeNote>, <watermark>, <width>


Example
<msDesc xml:id="mySpecialManuscript">
<msIdentifier>
<!-- You *must* give identification information -->
</msIdentifier>
<msContents>
<!-- You can describe the intellectual structure of the text -->
</msContents>
<physDesc>
<!-- You can describe the full physical description of the object -->
</physDesc>
<history>
<!-- You can give a full history of the object, its origin,
provenance, and acquisition -->
</history>
<additional>
<!-- You can provide additional information about
surrogates, administrative metadata, etc. -->
</additional>
</msDesc>
13. Names, Dates, People, and Places
Appendices
Appendix A. Model Classes
Appendix B. Attribute Classes
Appendix C. Elements
Appendix D. Attributes
Appendix E. Datatypes and Other Macros
Types of Markup
Procedural Markup:
RED INK ON; print "-£1000"; RED INK OFF
Descriptive Markup
Descriptive markup makes explicit to a computer what is implicit to a reader
It is usually more useful to mark up what we think things represent (in a source text, in our understanding of the data, etc.) rather than what they look like.
Using descriptive markup enables us to make explicit the distinctions we want to make when processing a string of characters
It gives us a way of naming, characterising, and annotating textual data in a formalised way and recording this for re-use
Also called 'Encoding' or 'Annotation'
Separation of Form and Content
Presentational markup
cares more about fonts and layout than meaning
Descriptive markup
says what things are, and leaves the rendition or processing of them for a separate step
Separating the form of something from its content makes its re-use more flexible
It also allows easy changes of presentation across a large number of documents

Markup as an Intellectual Activity
The application of markup to a document can be a scholarly activity
Deciding what markup to apply, how this represents the understanding being modelled, one is acting as an editor
There is almost no such thing as neutral markup -- all of it involves interpretation
Markup assists in answering research questions: but understanding the markup decisions which enable those answers can be a research activity in itself
What is the Point of Markup?
Markup is used in many different fields, for many different purposes: storing data, relating information, encoding understanding, preserving metadata
Markup is a way of making our knowledge or understanding about a text explicit
Markup makes strives to make explicit (to a machine) what is implicit (to a person)
Markup assists us in facilitating re-use of the same material:
in different formats
in different contexts
by different sorts of users
Markup
Presentational Markup:
\textcolor{red}{-£1000}
Descriptive Markup:
<
measure

unit
="
pounds
"
value
="
-1000
">
One thousand pounds in debt
</
measure
>
Compare the Markup
About XML
XML is structured data represented as strings of text
XML looks like HTML, except that:
XML is
extensible
XML
must
be
well-formed
XML
can
be
validated
XML is application-, platform-, and vendor- independent
XML empowers the content provider and facilitates data integration and migration
It is one of the best plain text long-term preservation formats for textual data that we have
XML Terminology
<?
xml

version
="
1.0
" ?>
<
root
xmlns
="
http://namespace/
"
<
element

attribute
="
value
">
content
<
childElement

type
="
empty
"/>
content
</
element
>
<!-- comment -->
</
root
>
Annotation by nesting vs standoff
<
taxonomy
>
<
category

xml:id
="
lit
">
<
catDesc
>Literature</
catDesc
>
<
category

xml:id
="
prose
">
<
catDesc
>Prose Texts</
catDesc
>
<
category

xml:id
="
nov
">
<
catDesc
>Novels</
catDesc
>
</
category
>
</
category
>
<
category

xml:id
="
poe
">
<
catDesc>P
oetry</
catDesc
>
<
category

xml:id
="
sonnets
">
<
catDesc
>Sonnets</
catDesc
>
<
category

xml:id
="
petSon
">
<
catDesc
>Petrarchan Sonnets</
catDesc
>
</
category
>
<
category

xml:id
="
shakeSon
">
<
catDesc
>Shakespearean Sonnets</
catDesc
>
</
category
>
<
category

xml:id
="
spensSon
">
<
catDesc
>Spenserian Sonnets</
catDesc
>
</
category
>
</
category
>
</
category
>
<
category

xml:id
="
drama
">
<
catDesc
>Dramatic texts</
catDesc
>
</
category
>
</
category
>
</
taxonomy
>
XML Syntax
There is a
single root node
containing the whole of an XML document
Each subtree is
properly nested
within the root node
Element/attribute names and values are always
case sensitive
Start-tags and end-tags are always mandatory (except there are combined start-and-end tags called 'empty elements' <pb/> <gap/>)
Attribute values are always
quoted

XML in Practice
XML
Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).
It uses ISO 10646 (also known as Unicode)
Originally designed to meet the challenges of large-scale electronic publishing, XML also now plays an indispensable role in the exchange of a wide variety of data on the Web and elsewhere.
Its success means that general tools are ubiquitous and how it works is well-understood.

You use XML every day
--
you just don't realise it.
XML
XML declaration
root element
namespace
element
attribute
and value
an 'empty' child element
comment
content
<?
xml

version
="
1.0
"
encoding
="
utf-8
" ?>
<
div

n
="
1
">
<
head
>SCENE I. On a ship at sea: a tempestuous noise of thunder and lightning heard.</
head
>
<
stage
>Enter a Master and a Boatswain</
stage
>
<
sp
>
<
speaker
>Master</
speaker
>
<
ab
>Boatswain!</
ab
>
</
sp
>
<
sp
>
<
speaker
>Boatswain</
speaker
>
<
ab
>Here, master: what cheer?</
ab
>
</
sp
>
<
sp
>
<
speaker
>Master</
speaker
>
<
ab
>Good, speak to the mariners: fall to't, yarely,</
ab
>
<
ab
>or we run ourselves aground: bestir, bestir.</
ab
>
</
sp
>
<
stage
>Exit</
stage
>
</
div
>

<
div

xml:id
="
sonnet116
"
ana
="
#shakeSon
"
>
<
head
>Sonnet 116</
head
>
<
lg

type
="
stanza
">
<
l
>Let me not to the marriage of true <
rhyme

label
="
a
">minds</
rhyme
></
l
>
<
l
>Admit impediments. Love is not <
rhyme

label
="
b
">love</
rhyme
></
l
>
<
l
>Which alters when it alteration <
rhyme

label
="
a
">finds</
rhyme
>,</
l
>
<
l
>Or bends with the remover to <
rhyme

label
="b">remove</
rhyme
>:</
l
>
</
lg
>
<
lg

type
="
stanza
">
<
l
>O no; it is an ever-fixed <
rhyme

label
="
c
">mark</
rhyme
>,</
l
>
<
l
>That looks on tempests, and is never <
rhyme

label
="
d
">shaken</
rhyme
>;</
l
>
<
l
>It is the star to every wandering <
rhyme

label
="
c
">bark</
rhyme
>,</
l
>
<
l
>Whose worth's unknown, although his height be <
rhyme

label
="
d
">taken</
rhyme
>.</
l
>
</
lg
>
<
lg

type
="
stanza
">
<
l
>Love's not Time's fool, though rosy lips and <
rhyme

label
="
e
">cheeks</
rhyme
></
l
>
<
l
>Within his bending sickle's compass <
rhyme

label
="
f
">come</
rhyme
>;</
l
>
<
l
>Love alters not with his brief hours and <
rhyme

label
="
e
">weeks</
rhyme
>,</
l
>
<
l
>But bears it out even to the edge of <
rhyme

label
="
f
">doom</
rhyme
>.</
l
>
</
lg
>
<
lg

type
="
couplet
">
<
l
>If this be error and upon me <
rhyme

label
="
g
">proved</
rhyme
>,</
l
>
<
l
>I never writ, nor no man ever <
rhyme

label
="
g
">loved</
rhyme
>.</
l
>
</
lg
>
</
div
>
What does it mean to be well-formed?
An XML document is encoded as a linear string of characters
It begins with a special processing instruction
Element occurrences are marked by start and end-tags
The characters < and & are
Magic
and must always be "escaped" using
&lt;
or
&amp;
if you want to use them as themselves
Comments are delimited by
<!-- and -->
Attribute name/value pairs are supplied on the start-tag and may be given in any order
xml:id
="
uniqueID
" and
xml:lang
="
languageCode
"
The XML Format
<!-- In document stand-off linking-->
<
linkGrp
>
<
link

target
="
#shakeSon #sonnet116
"/>
<!-- more links -->
</
linkGrp
>

<!-- Out of document linking -->
<
linkGrp
>
<
link

target
="
http://www.example.com/taxonomy.xml#shakeSon
http://www.example.com/poems.xml#sonnet116
"/>
<!-- more links -->
</
linkGrp
>

Note:
You can be
valid
in addition to being
well-formed
. This means you obey the rules of a specified schema, such as the Guidelines of the Text Encoding Initiative
Test your XML Knowledge
Well-Formedness vs Validity
Being
well-formed
means you obey the rules of the XML Syntax (e.g. proper nesting, quoted attributed values); All XML must be well-formed, or stop processing
Being
valid
means in addition you obey rules about which elements are allowed where, what attributes they have, and what their values may be
Common schema languages include:
Relax NG (Compact or XML Syntax)
W3C XML Schema
DTD Language
Or the Text Encoding Initiative has a meta-schema customisation language (TEI ODD) which enables generation of all of these
DTDs are very dated, don't cope with namespaces, and have other problems. We recommend Relax NG or TEI ODD.
XML Vocabularies
There are a huge number of XML vocabularies available many overlapping and redundant
Wikipedia lists arround 250 of them, and there are many which are not listed there:
http://en.wikipedia.org/wiki/List_of_XML_markup_languages

There probably exists an XML vocabulary for the data you use: it is better to use an existing format than re-invent the wheel
We (
researchsupport@it.ox.ac.uk
) can help you choose a markup language suitable to your field, work, research, project
The university is a long-term supporter of the Text Encoding Initiative (TEI) guidelines which is an extremely flexible and extensible vocabulary
http://www.tei-c.org/
XML Editors
There are many XML editors available, both free and proprietary
I use the oXygen XML editor, for which the University of Oxford has a site license
If possible You want an editor which provides :
syntax highlighting
continual schema validation
content completion
node collapsing
XPath/XQuery searching
built-in XSLT transformations
multi-platform
Which are
well-formed
XML?
<
seg
>some text</
seg
>
<
seg
> <
w
>some</
w
> <
hi
>text</
hi
> </
seg
>
<
seg
> <
w
>some <
hi
></
w
> text</
hi
> </
seg
>
<
seg

type
="
text
">some text</
seg
>
<
seg

type
=
text
>some text</
seg
>
<
seg

type
="
text
"> some text <
seg
/>
<
seg

type
="
text
"> some text<
gap
/> </
seg
>
<
seg

type
="
text
">some text</
Seg
>
XML Markup
<
element

attribute
="
value
">
Text or child elements here
</
element
>
<
element
> Text </
element
>
<
element

attribute
="
value
"/>
oXygen
Feature Matrix
How to get oXygen at the University of Oxford
Go to
https://register.it.ox.ac.uk

Click on 'Software' then 'oXygen XML Editor'
Download the 'oXygen XML
Editor
' not 'Developer' or 'Author' (Editor is the full package)
When starting cut-and-paste the license from
register.it.ox.ac.uk
into the license box that pops up
This is a named user license so you can use it on all of your computers
Starting new documents (show range of types); show menus generally
Basic editing: tag completion, attributes
Element/attribute/value glosses
Validation
Element refactoring (surround & split)
Format and indent
XPath searching, regular expressions
Toggling comments
Element folding
Different editor types (Text/Grid/Author)
Different editor perspectives (e.g. XSLT Debugger)
Show views
Transformation scenarios
Features of oXygen XML Editor
Intelligent XML/XHTML editing
Author-mode visual editing
XSLT editor / debugger
XML Schema editor/diagrams
WSDL editor/analyser
Relax NG editor/diagrams
XQuery editor/debugger
JSON editor
Publishing with transformation scenarios
XML Database integration
CMS integration
XML diff (visual differencing editor)
Subversion client
Office Open XML (DocX)
Open Document Format (openoffice/libreoffice)
EPUB
XSL:FO Editor
SVG Editor
Syntax hilighting/validation for many other formats
http://oxygenxml.com
If you use TEI then auto-update, see my blogpost:
http://blogs.it.ox.ac.uk/jamesc/2014/04/02/auto-update-your-tei-framework-in-oxygen/
Features to Demo
About the TEI
What is a Digital Scholarly Edition?
Scholarly Editions
"A scholarly edition is the
critical

representation
of
historical documents
"
- Patrick Sahle
Historical Documents
All existing documents are 'historical'.
They must already be in existence, creating new documents isn't scholarly editing
They could be existing digital documents
Representation
This could be in an abstract or non-presentational format
Or in a purely presentational format
It must contain actual data, not just metadata cataloguing or describing it
Critical
There must be critical or scholarly examination of the document(s) or else you aren't really editing them
Digital Scholarly Editions
A digital scholarly edition isn't just a digital representation of a scholarly editing.
That is a merely a
digitised edition
.
And is it the same thing as a
Scholarly Digital Edition?
Three contentious issues:
What are historical documents?
What counts as representation?
What is critical?
One theory of text:
A
text
is an understanding or abstraction created by or for a community of readers
A
document
is a physical object that we can encode
A catalogue entry for a text is not a scholarly edition in itself.
A facsimile is not an edition
A digital scholarly edition lives up to the requirements of a scholarly edition, but exploits the possibilities of the digital world it inhabits.
It should surpass the limits of print editions
A true digital edition cannot but printed without undergoing some loss of functionality and/or information content.
If it only represents itself as a set of digital pages equivalent to printed ones, it isn't really a digital edition.
Digital Scholarly
Editions

Digital Scholarly
Editing

Dr James Cummings
University of Oxford
James.Cummings@it.ox.ac.uk
@jamescummings

http://tinyurl.com/jc-dse-lyon

This does not imply that there are necessarily multiple witnesses, or that you aren't presenting all possible readings to a user
Full transcript