Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading content…
Loading…
Transcript

We use a POS Tagging library to guess numbers and symbols grammar info.

Match words with dictionary

Probability

For each word in List A:

  • Get word's grammar info
  • Find lemma and Forms
  • Search word's translation in dictionary
  • Add a Score to that word if the same word or a variant has been found*.
  • Mark the matching word in the right Selected Words list.
  • Use a factor based on Type of Word for scoring words.

Regarding statistics:

  • 1 to 1 alignment covers more than 83% of cases.
  • We don't need actually to translate whole contents but only sentences.

Dictionary Match Cases

*

For all cases grammar info must match as well

  • 1 of 1 matches
  • 1 of many matches
  • synonym
  • more than one word as a single match
  • Don't search for numbers, they should appear equal.

Alignment Process Logic

  • Split sentences into words via NLP library (right sentence boundaries)
  • Select a word as important (via grammar info)
  • Find translation using lemma and Forms
  • Match Forms returned in translation result against words in the other language sentence

Principle behind

As in human translation we use:

  • Structured Knowledge (dictionary database, corpora data via NLP libraries)
  • Basic linguistic logic (grammar/lemma/Forms matches)
  • Querying any pre-processed data is cheaper and more probable to be right than calculating it mathematically

English result (List A)

Iterating and matching

We calculate a WordScore for each word and a global SentenceScore taking all WordScores into account:

  • [en] Aristotle > [es] Aristóteles: 1.0 (exact match) * 1.0 (NNP Factor) = 1.0
  • [en] democracy > [es] democracia: 1.0 (exact match) * 0.9 (NN Factor) = 0.9
  • [en] control > [es] controlan: 0.8 (conjugation verb found) * 0.8 (VB Factor) = 0.64
  • [en] free > [es] libres: 0.5 (plural found, many definitions) * 0.7 (JJ Factor) = 0.35
  • continue with the rest of the words in List A and List B...
  • Aristotle
  • democracy
  • constitution
  • majority
  • government
  • control
  • defined
  • being
  • free
  • poor

AWN: Average number of words [ (4+4)/2 ]

SWD: Selected Words Difference [ ABS(4 - 4)*2 ]

1.0 + 0.9 + 0.64 + 0.35

Score = ----------------------------------- = 0.7225

4 (AWN) + 0 (SWD)

More to take into account...

Sentence Score

  • Number of Important Words
  • Number of words per Type
  • Dictionary matches (translation and grammar)
  • Matches of synonyms
  • Matches of more-than-one-word

Spanish result (List B)

  • Aristóteles
  • democracia
  • constitución
  • mayoría
  • gobierno
  • controlan
  • define
  • siendo
  • libre
  • pobres

Sample sentence

On each Sentence

  • Get POS tags and GrammarInfo for all words
  • Guess the real sentence language via NLP library

Sentence Alignment

Eligibility

Selecting criteria

Most Important Words

We might assume that the alignment is right if Score is close to 1.0, otherwise we discard the alignment based on a minimum required score setting.

EN: Aristotle defined democracy as the constitution in which the free and the poor, being in the majority, control government.

ES: Aristóteles define la democracia como la constitución en el que el libre y los pobres, siendo la mayoría, controlan el gobierno.

Based on grammar information look for:

  • Cardinal numbers, symbols and foreign words*
  • Proper nouns
  • nouns, adjetives and adverbs
  • verbs

*

Algorithm development

We still need

Thanks!

  • Set better Word Type Factors
  • Set better Dictionary Match Scores
Learn more about creating dynamic, engaging presentations with Prezi