Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Zipf's law across languages of the world: Towards a cross-linguistic measure of lexical diversity

No description
by

christian bentz

on 2 October 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Zipf's law across languages of the world: Towards a cross-linguistic measure of lexical diversity

Zipf's Law across languages of the world:

Christian Bentz*, Douwe Kiela**, Felix Hill** & Paula Buttery*
*Dept. of Theoretical and Applied Linguistics, University of Cambridge
** Computer Laboratory, University of Cambridge
Towards a cross-linguistic measure of lexical diversity
Lexical diversity: What is it?
Distribution of
word forms
used to encode a
constant

information content


Example:
Hungarian: Minden emberi lény szabadon születik és egyenlő méltósága és joga van

German: Alle Menschen sind frei und gleich an Würde und Rechten geboren

English: All human beings are born free and equal in dignity and rights

Fijian: Era sucu ena galala na tamata yadua, era tautauvata ena nodra dokai kei na nodra dodonu
11
11
12
16
Compounds
German:
Schiffahrtskapitaenkabinenschluessel
English:
Key to the cabin of the captain of the ship
Inflections
German:
Schiff, Schiffe, Schiffes, Schiffen
English:
ship, ships
Lexicon
tokens types
10
10
11
12
German:
zuschliessen, abschliessen
English:
lock
Orthography
German: Nachbar, Programm
English: neighbor, neighbour, programme, program
?
Lexical DiVersity:
How Can we measure it?


Measure used as biodiversity index: Zipf-Mandelbrot's law

(Jost, 2006)

Zipf-Mandelbrot's law in quantitative linguistics:

- Scrutinizing differences in children's
speech and adult's speech
(Baixeries, Elvevåg, & Ferrer-i-
Cancho, 2013)

-
Quantitative differences in texts and languages
(Popescu &
Altmann, 2010; Popescu et al. 2009; Baroni, 2009; Ha et al. 2006)



Zipf (1949)
rank words according to frequencies of occurrence
German: Alle Menschen sind frei und gleich an Würde und Rechten geboren...

English: All human beings are born free and equal in dignity and rights...
types
tokens
Zipf-Mandelbrot's law
β: Mandelbrot's corrective (1953)
the
und
and
of
to
der
die
zu
α: slope
C: intercept
inverse
relationship
Background
Quantitative
Measure
Typological
Analyses
Definition
How and Why Does Lexical Diversity Vary across Languages?
Universal Declaration
of Human Rights



Data
Method
Maximum likelihood
estimation of ZM parameters
- parallel text
(~constant content)
-363 languages
(only languages
delimiting words by white space)
Sociolinguistic Typology
McWhorter (2007: 14): “languages
widely acquired non-natively are shorn of much of their natural elaboration.”

Trudgill (2011:40): "Simplification will occur in sociolinguistic contact situations only to the extent that untutored, especially short-term,
adult
second language learning occurs [...]

Wray & Grace (2007: 557): "A language that is customarily learned and used by adult non-native speakers will come under pressure to become more learnable by the adult mind, as contrasted with the child mind."
L2 impact on lexical diversity?
Lexical Diversity AND L2 Effects?
Data 1
Number of
L2/L1 speakers
261
Data 2
ZM parameter values for UDHR
363
99


Dependent variable: alpha
Predictor: log(ratio of L2 speakers)
p<0.01


Dependent variable: alpha
Predictor: log (ratio of L2 speakers)
Random effect: by language family and area


Mixed-effects model :
(Baayen, Davidson & Bates, 2008;
Barr et al. 2013;
Jaeger et al. 2011)
p>0.05
low
LD
high
LD
Corpus
Generation
Learner/user
Generation

Corpus
Generation

L2
Language as a Complex Adaptive System
Gell-Mann (1992)
Complexity and Complex Adaptive Systems

Croft (2000)
Explaining language change: an evolutionary approach

Kirby & Hurford (2002)
The emergence of linguistic structure

Ritt (2004)
Selfish sounds and linguistic evolution

Christiansen & Chater (2008)
Language as shaped by the brain

Beckner et al. (2009)
Language is a complex adaptive system
output
input
input
output
L1
"Language Helix"
What does that mean for
historical language change
and
language evolution
?
Typological and historical studies
Bentz & Christiansen (2010, 2013): L2 influence in the history of Romance and Germanic languages

Lupyan & Dale (2010): Population size (indirect L2 measure) and linguistic complexity

Bentz & Winter (2012, 2013): L2 influence on case marking systems


Simple
quantitative
but
linguistically meaningful
measure that can be used with whole texts and corpora?

LEXICAL DIVERSITY
}
Definition:
What drives Lexical Diversity?
}
Conclusion
- Lexical diversity can be measured cross-linguistically by using Zipf's law

- Lexical diversity correlates with measures of language contact

- This might be seen as another instance of language adapting to its sociolinguistic niche
Simple linear model:
Who is adapting
to whom?
Douwe Kiela
Felix Hill
Paula Buttery
Thanks.
Full transcript