Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
Zipf's law across languages of the world: Towards a cross-linguistic measure of lexical diversity
Transcript of Zipf's law across languages of the world: Towards a cross-linguistic measure of lexical diversity
Christian Bentz*, Douwe Kiela**, Felix Hill** & Paula Buttery*
*Dept. of Theoretical and Applied Linguistics, University of Cambridge
** Computer Laboratory, University of Cambridge
Towards a cross-linguistic measure of lexical diversity
Lexical diversity: What is it?
used to encode a
Hungarian: Minden emberi lény szabadon születik és egyenlő méltósága és joga van
German: Alle Menschen sind frei und gleich an Würde und Rechten geboren
English: All human beings are born free and equal in dignity and rights
Fijian: Era sucu ena galala na tamata yadua, era tautauvata ena nodra dokai kei na nodra dodonu
Key to the cabin of the captain of the ship
Schiff, Schiffe, Schiffes, Schiffen
German: Nachbar, Programm
English: neighbor, neighbour, programme, program
How Can we measure it?
Measure used as biodiversity index: Zipf-Mandelbrot's law
Zipf-Mandelbrot's law in quantitative linguistics:
- Scrutinizing differences in children's
speech and adult's speech
(Baixeries, Elvevåg, & Ferrer-i-
Quantitative differences in texts and languages
Altmann, 2010; Popescu et al. 2009; Baroni, 2009; Ha et al. 2006)
rank words according to frequencies of occurrence
German: Alle Menschen sind frei und gleich an Würde und Rechten geboren...
English: All human beings are born free and equal in dignity and rights...
β: Mandelbrot's corrective (1953)
How and Why Does Lexical Diversity Vary across Languages?
of Human Rights
estimation of ZM parameters
- parallel text
delimiting words by white space)
McWhorter (2007: 14): “languages
widely acquired non-natively are shorn of much of their natural elaboration.”
Trudgill (2011:40): "Simplification will occur in sociolinguistic contact situations only to the extent that untutored, especially short-term,
second language learning occurs [...]
Wray & Grace (2007: 557): "A language that is customarily learned and used by adult non-native speakers will come under pressure to become more learnable by the adult mind, as contrasted with the child mind."
L2 impact on lexical diversity?
Lexical Diversity AND L2 Effects?
ZM parameter values for UDHR
Dependent variable: alpha
Predictor: log(ratio of L2 speakers)
Dependent variable: alpha
Predictor: log (ratio of L2 speakers)
Random effect: by language family and area
Mixed-effects model :
(Baayen, Davidson & Bates, 2008;
Barr et al. 2013;
Jaeger et al. 2011)
Language as a Complex Adaptive System
Complexity and Complex Adaptive Systems
Explaining language change: an evolutionary approach
Kirby & Hurford (2002)
The emergence of linguistic structure
Selfish sounds and linguistic evolution
Christiansen & Chater (2008)
Language as shaped by the brain
Beckner et al. (2009)
Language is a complex adaptive system
What does that mean for
historical language change
Typological and historical studies
Bentz & Christiansen (2010, 2013): L2 influence in the history of Romance and Germanic languages
Lupyan & Dale (2010): Population size (indirect L2 measure) and linguistic complexity
Bentz & Winter (2012, 2013): L2 influence on case marking systems
measure that can be used with whole texts and corpora?
What drives Lexical Diversity?
- Lexical diversity can be measured cross-linguistically by using Zipf's law
- Lexical diversity correlates with measures of language contact
- This might be seen as another instance of language adapting to its sociolinguistic niche
Simple linear model:
Who is adapting