Loading…
Transcript

The main idea

Implementation Area

To create a basis for computerized English grammar starting with the Morphology of a Noun.

So, the object of our research is the Morphology of English, the subject - paradigmatic Noun classes.

  • Syntactical and Semantical Processors;
  • Machine Translation;
  • Natural Language Processing;
  • Error Search Systems;
  • Information Retrieval Systems;
  • Automatic Text Editing and Summarising.

Automatic Morphological Analysis of English Noun

Research Objectives

1. Creating paradigmatic Noun classes;

2. Working with alphabetic frequency dictionary;

3. Tagging words` stems by belonging to certain Parts of Speech;

4. Tagging Nouns by belonging to certain Paradigmatic Class.

English Morphological Analyzer made by the Department of Computer and Information Science in the University of Pennsylvania

Noun processing:

mice N_Root1 N (mouse) PL

mouse N_Root1 N (mouse) SG

ambassador N_Root2 N (ambassador)

Phase 1

In the first phase we have completed the following tasks:

1. create the dictionary of stems ;

(for example, if you have ability and abilities, the stem abilit remains, which later will be attributed to the paradigmatic class with the inflection y -> #, + ies);

2. to cope with morphologically homonymous word forms ;

(for example, the word absolute can be used as a Noun with the meaning "absolute value" and as an Adjective with the meaning "total", so we added to the base the same stem with 2 different codes).

• Noun – N

• Verb – V

• Adjective – Adj

• Adverb – Adv

• Pronoun – Pron

• Preposition - Prep

• Numeral - Num

• Conjunction – Conj

• Interjection – Interj

• Article – Art

Results and Statistics

28 paradigmatic classes of the Noun,

12 000 word forms processed.

40% — class «Noun»,

15% — Class «Verb»,

20% — «Adjective»,

25% — «Adverb», «Preposition», «Pronoun», «Conjunction», «Interjection», «Article».

The most frequent Noun classes:

1-4 (the most common nouns with the traditional way of inflection).

Phase 2

1) develop a classification by the Noun paradigmatic classes;

2) ascribe paradigmatic class code to the Noun stems.

Noun is N; S - singular; P - plural; C - singular, possessive case; L - plural, possessive case