Institute of Museum and Library Services Grant
- Transcribe & Encode
- ~300 Freedom Suits filed in St. Louis Circuit Court
- ~1800-1864
Judicial Precedent
Create an extension to the TEI consisting of a set of additional elements, with rules described in an XML schema, for the encoding of legal texts within a proscribed domain
- Translating the structure and function of a discrete set of legal/historical documents into the XML framework
- Our initial efforts should form the basis of a flexible schema that could be used for related document sets
- e.g. Criminal litigation of the same era
- The standard should form a nascent part of an incremental process that will expand slowly to other collections
What is the appropriate entity for encoding?
Case vs. Document
- Near the end of litigation, slave owners who felt they were likely to lose would often sell the slave to a new owner, thus forcing the slave/plaintiff to re-file the litigation and begin the process anew
- The issues in the cases would usually be identical
- Because of the sometimes ill-defined boundaries of the cases, the legal consultants proposed using the individual documents as the basic entity to be encoded
- This would facilitate the distinction between meta-data and information inherent in the basic textual object
- Further developments in the schema could facilitate the appropriate grouping of documents as needed into cases or related case (or historical) groupings
Nature of Suit (limited types)
- Freedom
- Fur Trade
- Native American
- Lewis/Clark/Corps of Discovery
Authority
- Branch of government (Legislature, Judiciary, etc.)
Court
- Jurisdiction (County, State, Federal)
Case Number
People
- Type of Participant/Party
Date(s) of Suit
- Filed, Discovery, Decided, Appealed, etc.
Disposition
- at various levels, jurisdictions
Case Citation(s)
Type
- All Litigation
- Sample Types (pleading, summons, discovery, etc.)
- Can be scaled to criminal/transactional/etc.
Date(s) Associated with Document
State of Litigation Associated with Document
Source of the Document
Structure
All expandable to other types
- All 500 cases imaged and transcribed
- 480 cases have had their transcriptions proofread and edited by undergraduate students
- 420 cases have been encoded using the legal extensions to TEI
Missouri Freedom Suits
FREEDOM = Living in Free Territory
Winny v Whitesides
Statutory Requirements
- Northwest Ordinance & 1824 Missouri Statute
- Petition to Sue for Freedom
- Trespass, Assault & Battery, False Imprisonment
Other Freedom Suits
Julia v McKinney
Found Records
Lewis & Clark
Fur Trade
Native American
Freedom
Earlier Grants & Projects
Our Grant
National Importance
Transcribe
Encode
Combination Approach
trespass, assault and battery, false imprisonment
manumission
born free
lived on free land
Independent membership consortium hosted by academic institutions in the US and Europe
Produces a set of guidelines which specify encoding methods for machine-readable texts
Digitize, Transcribe, Encode
Text Encoding Initiative (TEI)
St. Louis Circuit Court Historical Records Project and supplementary material in TEI XML
TEI is now the de facto standard for the encoding of electronic texts in the humanities academic community, and is used for literary documents, cultural heritage documents, and many library collections
Guidelines are a widely-used standard for text materials for performing online research and teaching
Technological Goals
IMLS Grant
Develop extensions to the TEI for encoding legal documents
Legal XML
Major Grant Deliverable
to reflect legal function, genres, and roles, and employ these extensions in this collection
- When Digital Library Services started working on the Revised Dred Scott Project in 2007, there were obstacles to converting the documents to XML
- Due primarily to a lack of a consistent scheme for representing the documents’ legal function
- Started looking for appropriate standards to encode legal documents in XML
Interconnected Texts
Create the extensions with their expansion to other domains in mind
Primary Goals
Legal Professional Groups
Combination Approach
- Less interested in the documents per se, than in the information they contain, or the efficient exchange of that information (LegalXML, GJXML, MetaLex)
- Often too broad in scope (covering all legal documents for all time, losing specificity)
Start with a specific collection of legal documents, but develop TEI extensions that would apply to a slightly broader domain than just the documents in that collection
Image
Standard can be developed incrementally as other groups implement the standard for encoding legal texts from other domains
Other Efforts at Legal XML
Library and Archival Community
Transcribe
- Have an interest in representing the documents to some extent as artifacts
- Often too parochial (only applying to documents in a given collection)
- In the freedom suits, a case was not a completely discrete entity
- Cases were often closely related to other litigation with the same plaintiff
- Multiple cases existed sequentially with the same plaintiff(s) and/or defendant(s)
- Appellate opinions are frequently not in existence or not recorded
- Historical records are better indicator of ultimate result
Complicating Factors
Encode
A Case in General
Foundational Issues
User Interface
Full text, Boolean search capability by the end of the year
http://digital.wustl.edu/legalencodingproject/
What is our primary unit of text/textual object?
Specify a domain for application of the TEI extensions
- temporal, geographical disciplinary (law, history, etc.)
- divisions within disciplines, civil vs. criminal in law
- A case in the larger legal sense could include non-textual, non-print objects, usually in the form of exhibits.
- These could include technological artifacts such as recordings (audio/video), digital files, and other physical entities
- Subsequent litigation is well-documented
Functionality
Status
Help!
This type of encoding allows us to preserve the documents as documents, but also treat them as records in a database
Once these key pieces of legal information are identified, researchers will be able to perform smart searches - not just free text searches
http://www.digital.wustl.edu
Initial Proposal
The Case in this Corpus
The Core Object
Named Entity Recognition (NER)
Refinement of Document Model
Resource Description Framework (RDF)
- A case in the Freedom, Fur Trade, Lewis/Clark and similar suits usually consisted of multiple filings, depositions, orders, motions and judgments, among other document types
- All (or nearly all) material was in textual format written on paper
- Final outcomes were not always known in re the appellate process
Document Information
Users will be able to browse relationships between the people and places in the suits
Erika came on board - April 2010
Further refinement/expansion of basic outline
Determination of two domains
- Case Information (Meta-data)
- Document (Object data)
Review of cases in the collection for common types and structures, determine scope
Other potential categories identified
Not a final product
Other Potential Categories
Case Information
Scalable
Other Types of Documents
- Legislative, Criminal, Scholarly, Work Product, etc.
Characteristics of Documents
- Jurisdiction
- Issuing Authority
- Dates
- Structure
- Other
Moving into the Schema
- Though basic, the outline provided a starting point for the schema
- Initial work with the documents prompted a reconsideration of the initial decision to use the documents as the primary units
- Document model used to create first draft of extensions to the TEI
History of Schema Design
Document Typology
Basic outline developed by legal experts (law librarians)
- April 2010
- Rough and unpolished
- Designed to generate thought
Intial Meetings between various groups
- Refinement of concepts
- Differentiation between meta-data (case description) and object-specific data (individual litigation document)
- The good news - the functions were mappable to modern terminology
- Filings were not always clearly labeled or described in the docket
- One filing could fall within multiple categories
- Clerks of court were not standardized in their handling of litigation materials in the 1800s
- Decided to use modern terminology absent a specific historical term of well-established meaning
- Initial outline listed eight major types with subcategories
- Expanded in the schema to thirty three
Basic Categories
Sample Outline
The Outline was quite simplistic
Three classes of legal documents proposed
- Transactional
- Litigation
- Other (memos, client communications, etc.)
Litigation documents were relevant category for freedom suits
- Mostly meta-data, i.e. information about the case of which the document was a part
- Document typology section was the core challenge
- Litigation filings of that era did not correspond categorically to modern labels