Transcript of Linking, Fragmentation, and Analysis

Linking, Fragmentation, and Analysis
Dr James Cummings
University of Oxford

<ref> and <ptr/>
Section 12</ref>

<ref target="#section12">Section 12</ref>

<ptr target="#section12"/>
<seg xml:id="L1">E</seg>lizabeth it is in vain you say</l>
<l>"<seg xml:id="L2">L</seg>ove not" — thou sayest it in so sweet a way:</l>
<seg xml:id="L3">I</seg>n vain those words from thee or L.E.L.</l>
<seg xml:id="L4">Z</seg>antippe's talents had enforced so well:</l>
<seg xml:id="L5">A</seg>h! if that language from thy heart arise,</l>
<seg xml:id="L6">B</seg>reath it less gently forth — and veil thine eyes.</l>
<seg xml:id="L7">E</seg>ndymion, recollect, when Luna tried</l>
<seg xml:id="L8">T</seg>o cure his love — was cured of all beside —</l>
<seg xml:id="L9">H</seg>is follie — pride — and passion — for he died.</l>
and <linkGrp>
<link> defines an association or hypertextual link among elements or passages, of some type not more precisely specifiable by other elements.
<anchor/>, @corresp, and @ana
(anchor point) attaches an identifier to a point within a text, whether or not it corresponds with a textual element.
<listPrefixDef> and private URIs
(list of prefix definitions) contains a list of definitions of prefixing schemes used in data.pointer values, showing how abbreviated URIs using each scheme may be expanded into full URIs.
(prefixing scheme) defines a prefixing scheme used in data.pointer values, showing how abbreviated URIs using the scheme may be expanded into full URIs.
<ref>, <ptr/>, and XPointers
<link/> and <linkGrp>
<anchor/>, @corresp, and @ana
<listPrefixDef>, <prefixDef>, and private URIs
<sp who="#F-ham-ham">
<speaker rend="italic">Ham.</speaker>
<l>To be, or not to be, that is the Question:</l>
<l><s>Whether 'tis Nobler in the minde to suffer</l>
<l>The Slings and Arrowes of outragious Fortune,</l>
<l>Or to take Armes against a Sea of troubles,</l>
<l>And by opposing end them.</s> To dye, to sleepe</l>
<l>No more; and by a sleepe, to say we end</l>
<l>The Heart-ake, and the thousand Naturall shockes</l>
<cb n="2"/>
<!-- ... -->

Non-hierarchical structures
<l>Scorn not the sonnet; critic, you have frowned,</l>
<l>Mindless of its just honours; with this key</l>
<l>Shakespeare unlocked his heart; the melody</l>
<l>Of this small lute gave ease to Petrarch's wound.</l>
specifies whether or not its parent element is fragmented in some way, typically by some other overlapping structure
@next and @prev
points to the next element of a virtual aggregate of which the current element is part.
points to the previous element of a virtual aggregate of which the current element is part.
Stand-off markup
<w xml:id="w01">Scorn</w> <w xml:id="w02">not</w>
<w xml:id="w03">the</w> <w xml:id="w04">sonnet</w>;
<w xml:id="w05">critic</w>, <w xml:id="w06">you</w>
<w xml:id="w07">have</w> <w xml:id="w08">frowned</w>,
<w xml:id="w09">Mindless</w> <w xml:id="w10">of</w>
<w xml:id="w11">its</w> <w xml:id="w12">just</w>
<w xml:id="w13">honours</w>; <w xml:id="w14">with</w>
<w xml:id="w15">this</w> <w xml:id="w16">key</w></l>
<w xml:id="w17">Shakespeare</w> <w xml:id="w18">unlocked</w>
<w xml:id="w19">his</w> <w xml:id="w20">heart</w>;
<w xml:id="w21">the</w> <w xml:id="w22">melody</w>
<w xml:id="w23">Of</w> <w xml:id="w24">this</w>
<w xml:id="w25">small</w> <w xml:id="w26">lute</w>
<w xml:id="w27">gave</w> <w xml:id="w28">ease</w>
<w xml:id="w29">to</w> <w xml:id="w30">Petrarch's</w>
<w xml:id="w31">wound</w>.
Non-hierarchical structures
@next and @prev
Stand-off markup
<s>, <cl>, <w>, <m>, <c>, and <pc>
(s-unit) contains a sentence-like division of a text.
(clause) represents a grammatical clause.
(word) represents a grammatical (not necessarily orthographic) word.
(character) represents a character.
(punctuation character) contains a character or string of characters regarded as constituting a single punctuation mark.

Word Markup
Basic markup (and provision of @xml:id attributes) of can be automated.
This makes it easier for others to use your text.
You can create an edition with many potentially overlapping hierarchies by marking only words and doing all other markup as stand-off.
Words can also have structure inside them
<w xml:id="mk03">make</w>
<w xml:id="up03">up</w>
<!-- elsewhere in the document -->
<span target="#mk03 #up03"> phrasal verb "make up"</span>
<interp> and <interpGrp>
(interpretation) summarizes a specific interpretative annotation which can be linked to a span of text.
@ana and <taxonomy>
defines a typology either implicitly, by means of a bibliographic citation, or explicitly by a structured taxonomy.
<s>, <cl>, <w>, <m>, <c>, <pc>
Word markup
<interp> and <interpGrp>
@ana and <taxonomy>

There is more complex pointing as well
using XPointer, but there are not many processing implementations:

For example:

<ptr target="stalky.xml#element(app[1])"/>
<ptr target="stalky.xml#element(1/1/8)"/>

<join> identifies a possibly fragmented segment of text, by pointing at the possibly discontiguous elements which compose it.
<join targets="#L1 #L2 #L3 #L4 #L5 #L6 #L7 #L8 #L9"
<desc>The beloved's name</desc>
(from Edgar Allan Poe).
<l xml:id="l2.79">A place there is, betwixt earth, air and seas</l>
<l xml:id="l2.80">Where from Ambrosia, Jove retires for ease.</l>
<l xml:id="l2.88">Sign'd with that Ichor which from Gods distills.</l>
<note xml:id="n2.79"><bibl>Ovid Met. 12.</bibl>
<quote xml:lang="la">
<l>Orbe locus media est, inter terrasq; fretumq;</l>
<l>Cœlestesq; plagas —</l>

<note xml:id="n2.88">Alludes to <bibl>Homer, Iliad 5</bibl></note>
<linkGrp type="imitationnotes">
<link targets="#n2.79 #l2.79"/>
<link targets="#n2.88 #l2.88"/>
(corresponds) points to elements that correspond to the current element in some way.
(analysis) indicates one or more elements containing interpretations of the element on which the @ana attribute appears.
<p>He was merely working up to a peroration, and the boys knew it; but McTurk cut through the frothing sentence, the others echoing:</p>
<p>‘<anchor xml:id="MTa"/>I appeal to the Head, sir.’</p>
<p>‘<anchor xml:id="Ba"/>I appeal to the head, sir.’</p>
<p>‘<anchor xml:id="Sa"/>I appeal to the Head, sir.’</p>
<p>It was their unquestioned right. Drunkenness meant expulsion after a public flogging. They had been accused of it. The case was the Head's, and the Head's alone.</p>

<note corresp="#MTa #Ba #Sa" ana="#structuralNotes">All these are said at the same time, and though encoded as <gi>anchor</gi> elements that <att>xml:id</att> attribute could have been placed on the parent <gi>p</gi></note>

<prefixDef ident="psn"
<p> Private URIs using the <code>psn</code>
prefix are pointers to <gi>person</gi> elements in the personography.xml file.
For example, <code>psn:MDH</code> dereferences to
<prefixDef ident="bibl"
<p> Private URIs using the <code>bibl</code> prefix can be expanded to form URIs which retrieve the relevant bibliographical reference from www.example.com.</p>

<!-- elsewhere in the document -->

<title ref="bibl:jcummings2008">The Text Encoding Initiative and
the Study of Literature</title> was written by
<persName ref="psn:JCC">James</persName>
<seg>Scorn not the sonnet;</seg>
<seg>critic, you have frowned, Mindless of its just honours;</seg>
<seg>with this key Shakespeare unlocked his heart;</seg>
<seg>the melody Of this small lute gave ease to Petrarch's wound.</seg>
<lb n="1"/>Scorn not the sonnet;</seg>; <seg>critic, you have frowned, <lb n="2"/>Mindless of its just honours;</seg>
<seg>with this key <lb n="3"/>Shakespeare unlocked his heart;</seg>
<seg>the melody <lb n="4"/>Of this small lute gave ease to Petrarch's wound.</seg>
<l><anchor subtype="sentenceStart" type="delimiter"/>
Scorn not the sonnet; <anchor subtype="sentenceEnd" type="delimiter"/>
<anchor subtype="sentenceStart" type="delimiter"/> critic, you have frowned,</l>

<l>Mindless of its just honours; <anchor subtype="sentenceEnd" type="delimiter"/>
<anchor subtype="sentenceStart" type="delimiter"/> with this key</l>

<l>Shakespeare unlocked his heart; <anchor subtype="sentenceEnd" type="delimiter"/>
<anchor subtype="sentenceStart" type="delimiter"/> the melody</l>

<l>Of this small lute gave ease to Petrarch's wound. <anchor subtype="sentenceEnd" type="delimiter"/></l>

<seg n="sentence1">Scorn not the sonnet;</seg>
<seg n="sentence2">critic, you have frowned,</seg>
<seg n="sentence2">Mindless of its just honours;</seg>
<seg n="sentence3">with this key</seg>
<seg n="sentence3">Shakespeare unlocked his heart;</seg>
<seg n="sentence4">the melody</seg>
<seg n="sentence4">Of this small lute gave ease to Petrarch's wound.</seg>
<seg>Scorn not the sonnet;</seg>
<seg part="I">critic, you have frowned,</seg>
<seg part="F">Mindless of its just honours;</seg>
<seg part="I">with this key</seg>
<seg part="F">Shakespeare unlocked his heart;</seg>
<seg part="I">the melody</seg>
<seg part="F">Of this small lute gave ease to Petrarch's wound.</seg>
<seg>Scorn not the sonnet;</seg>
<seg next="#s2b" xml:id="s2a">critic, you have frowned,</seg>
<seg prev="#s2a" xml:id="s2b">Mindless of its just honours;</seg>
<seg next="#s3b" xml:id="s3a">with this key</seg>
<seg prev="#s3a" xml:id="s3b">Shakespeare unlocked his heart;</seg>
<seg next="#s4b" xml:id="s4a">the melody</seg>
<seg prev="#s4a" xml:id="s4b">Of this small lute gave ease to Petrarch's wound.</seg>
<!-- Elsewhere in the document -->
<join result="s" scope="root"
target="#w01 #w02 #w03 #w04"/>
<join result="s" scope="root"
target="#w05 #w06 #w07 #w08 #w09 #w10 #w11 #w12 #w13"/>
<join result="s" scope="root"
target="#w14 #w15 #w16 #w17 #w18 #w19 #w20"/>
<join result="s" scope="root"
target="#w21 #w22 #w23 #w24 #w25 #w26 #w27 #w28 #w29 #w30 #w31"/>
<cl>It was about the beginning of September, 1664,
<cl>that I, among the rest of my neighbours,
heard in ordinary discourse
<cl>that the plague was returned again to Holland; </cl>
<cl>for it had been very violent there, and particularly at
Amsterdam and Rotterdam, in the year 1663, </cl>
<cl>whither, <cl>they say,</cl> it was brought,
<cl>some said</cl> from Italy, others from the Levant, among some goods
<cl>which were brought home by their Turkey fleet;</cl>
<cl>others said it was brought from Candia;
others from Cyprus. </cl>
<cl>It mattered not <cl>from whence it came;</cl>
<cl>but all agreed <cl>it was come into Holland again.</cl>
<w type="adjective">
<m type="base">
<m type="prefix" baseForm="con">com</m>
<m type="root">fort</m>
<m type="suffix">able</m>
associates an interpretative annotation directly with a span of text.
<w ana="#AT0">The </w>
<w ana="#NN1">victim</w>
<w ana="#POS">'s</w>
<w ana="#NN2">friends </w>
<w ana="#VVD">told </w>
<w ana="#NN2">police </w>
<w ana="#CJT">that </w>
<w ana="#NP0">Kruger </w>
<w ana="#VVD">drove </w>
<w ana="#PRP">into </w>
<w ana="#AT0">the </w>
<w ana="#NN1">quarry </w>
<w ana="#CJC">and </w>
<w ana="#AV0">never </w>
<w ana="#VVD">surfaced</w>

<interpGrp type="POS">
<interp xml:id="AT0">Definite article</interp>
<interp xml:id="AV0">Adverb</interp>
<interp xml:id="CJC">Conjunction</interp>
<interp xml:id="CJT">Relative that</interp>
<interp xml:id="NN1">Noun singular</interp>
<interp xml:id="NN2">Noun plural</interp>
<interp xml:id="NP0">Proper noun</interp>
<interp xml:id="POS">Genitive marker</interp>
<interp xml:id="PRP">Preposition</interp>
<interp xml:id="VVD">Verb past tense</interp>

<category xml:id="literature">
<category xml:id="poetry">
<category xml:id="sonnet">
<category xml:id="shakesSonnet">
<catDesc>Shakespearean Sonnet</catDesc>
<category xml:id="petraSonnet">
<catDesc>Petrarchan Sonnet</catDesc>
<category xml:id="haiku">
<category xml:id="drama">
<category xml:id="meter">
<catDesc>Metrical Categories</catDesc>
<category xml:id="feet">
<catDesc>Metrical Feet</catDesc>
<category xml:id="iambic">
<category xml:id="trochaic">
<category xml:id="feetNumber">
<catDesc>Number of feet</catDesc>
<category xml:id="pentameter">
<category xml:id="tetrameter">
<!-- elsewhere in document -->
<lg ana="#shakesSonnet #iambic #pentameter">
<l>Shall I compare thee to a summer's day</l>
<!-- ... -->
Full transcript