The Internet belongs to everyone. Let’s keep it that way.

Protect Net Neutrality
Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Diachronic transliteration

No description
by

Krzysztof Jassem

on 20 December 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Diachronic transliteration

Goal
Linguistic Background
Solution
Conclusions
1. Proposed solution:
transliteration of diachronic texts
Diachronic Text Normalization
using finite-state transducers
Graphical Input
Output
Zważając, że oprócz przedmiotów oddanych pod ostateczne rozstrzygnięcie
Delegacji

Administracyjnej
postantowieniem
Naszym
d. 20. Lutego
b.r.
w Tytule 1. w artykule 6. w oddziałach a, b i c, wskazanych
znajdują
się
jeszcze

decyzje
Rad Preferkturalnych, tak przez Skarb,
jako
też i strony do Rady Stanu i Rady
Naywyższej
zarekursowane, których
Delegacja

Adnimistracyjna

nie sądzi
się
być
umocowaną ostatecznie rozstrzygnąć,...
Extracted Text
Zważając, że oprócz przedmiotów oddanych pod ostateczne rozstrzygnięcie
Delegacyi

Administracyjney
postantowieniem
Naszem
d. 20. Lutego
r.b.
w Tytule 1. w artykule 6. w oddziałach a, b i c, wskazanych
znaydują
się
ieszcze

decyzye
Rad Preferkturalnych, tak przez Skarb,
iako
też i strony do Rady Stanu i Rady
Naywyższey
zarekursowane, których
Delegacya

Adnimistracyina

niesądzi
się
bydź
umocowaną ostatecznie rozstrzygnąć,...
1828. rok
Computational Background
Finite-State
Automata & Transducers
OpenFST & Thrax
IT Tools
Continuous Integration & Markdown Language
.md files
.pdf Document
Set of Tests
Rule Headers
Transliteration Rules
Set of Transducers
Application for
Diachronic Transliteration
http://dillinger.io/
Source: scmQuest.com
OpenFst
Thrax
OpenFST& Thrax Comparison
Lower = "a" | "ą" | "á" | "b" | "c" | "ć" | "d" | "e" | "ę" | "f" | "g" | "h" | "i" | "j" | "k" |
"l" | "ł" | "m" | "n" | "ń" | "o" | "ó" | "p" | "q" | "r" | "s" | "ś" | "t" | "u" |
"v" | "w" | "x" | "y" | "z" | "ź" | "ż" ;

Upper = "A" | "Ą" | "Á" | "B" | "C" | "Ć" |"D" | "E" | "Ę" |"F" | "G" | "H" | "I" | "J" | "K" |
"L" | "Ł" |" M" | "N" | "Ń" | "O" |"Ó" | "P" | "Q" | "R" | "S"| "Ś" | "T" | "U" | "V" |
"W" | "X" | "Y" | "Z"| "Ź" | "Ż" ;

Letter = Lower | Upper ;

LowerVowel = "a" | "ą" | "á" | "e" | "ę" | "i" | "o" | "ó" | "u" | "y" ;

UpperVowel = "A" | "Ą" | "Á" | "E" | "Ę" | "I" | "O" | "Ó" | "U" | "Y" ;

export Vowel = LowerVowel | UpperVowel ;

LowerConsonant = Lower - LowerVowel ;

UpperConsonant = Upper - UpperVowel ;
# Feliński's Rules (before 1830)

## Rule: A_acute (the letter no longer exists in Polish)
This rule replaces 'a_acute' with standard 'a'.

A_acute = (
("Á" : "A") |
("á" : "a")
) ;

export Rule001_01 =
CDRewrite[ A_acute,
AnyString,
AnyString,
Any* ] ;

## Rule: Infinitive 'ź_into_ć'
This rule replaces 'ź' that ends verb infinitives with 'ć'.
> bydź -> być
> wiedź -> wieść
> kładź -> kłaść
> uledz -> ulec
> piedz -> piec
> módz -> móc

Infinitive_z =
"y" ("dź" : "ć") |
("ie" | "a") ("dź" : "ść") |
("e" | "ó") ("dz" : "c") ;

export Rule001_02 =
CDRewrite[ Infinitive_z,
NonEmptyString,
End,
Any* ] ;

2. Using Thrax, OpenFST:
state-of-art FST techniques (efficiency)
easy-to-learn rule format
Full transcript