Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Language of Idioms

No description

Ashleigh Faith

on 11 December 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Language of Idioms

Language of Idioms:
Get to the Program

Lay it Down Straight
Its about time
Absence makes the heart grow fonder
Acting high and mighty
Actions Speak Louder than Words
Act ones age
Adding insult to injury
All set
Thumbs up
Apple of my eye
At each other's throats
At this stage
From all walks of life

No love lost
between machines and idioms
Digital scholarship has a
hard time
with idioms and
their ilk

Text analysis and translation between languages are just a few issues
revolving around

Computers see
no rhyme nor reason
to idioms and sentiment analysis is almost imposable with current techniques.
How can digital humanities help with that?

There are two reasons digital sentiment extraction from idioms is difficult and they are intertwined:

tackle the road less traveled
: linguistics
There are two major linguistic schools of thought for drawing sentiment analysis from idioms and two
new arrivals on the scene.
Lets Get on the Same Page
Strike While the Iron is Hot
Idioms: "a group of words established by usage as having a meaning not deducible from those of the individual words."

Idioms are verbal codes, figures of speech, and symbolic. If you are note familiar with the idiom's usage you will not
get the gist of
what is being expressed.

Often, an idiom is so
saturated in
acceptance from users in
your neck of the woods
-particularly in informal discourse- that it is difficult not to use the idiom.

In computational linguistics this is called metonymy.

This is now a
hot issue
in text and sentiment analysis online and in social media.
Don't put all your eggs in one
machine learning and analysis
Metonymal Sentiment Anlysis
But most cultures have idioms.
Here are a few of the hardest to translate...
There are many idioms that Americans use:...
"Further out toward the horizon lies the prospect of intelligent systems that filter vast quantities of unstructured content, drawing inferences that can be formatted according to journalistic [linguistic] norms.
Along the way, of course, intelligent systems will need to start coping with the complexities of human language have so far confounded them, including idiom, metaphor and sarcasm." -Peter Kirwan, 2009
1. There are a lot of idioms, new and old, that are still used/reference. Lexicons of idioms and meaning are not
up to snuff
for reference.
2. Machine learning cannot cope with idioms -
in other words,
unedited idiomatic text expressions- because machine learning depends on example text reference to identify expressions. This is hard to do without a proper lexicon.
Bringing this full-circle
Accurately identifying idiom sentiment analysis has obvious repercussions in text analysis and social media, however, there are vast digital scholarship and humanities repercussions as well.
Compositional- when the meaning of a complex expression is determined by the meanings of its words and the rules used to combine them. Ex:
Putting the cart before the horse.
Non-compositonal- when the meanings of any of the component parts of the idiom do not convey the sentiment. Ex:
Lets rock and roll!
top contenders
Text introducers- when an idiom is introduce by a word or combination of words in order to signal that it is coming. Ex: I think the idea of having numbers from each firm is

a dog in the manger
is an introducer.
Text dependencies- the words that the idiom quantifies. Ex:

got to the bottom of

the situation -We
the situation
are dependencies.

Recent scholarship shows that all four methods -compositional, non-compositonal, text introducesrs. and text dependencies- can be used in a hybrid approach to analyze sentiment from idioms.

However, all machine learning is accomplished through example -whether that is probability, vector space, or rule based- there needs to be a corpus of examples for machines to learn from -which in turn enables analysis of large collections of informal text.

The Devil is in the details
so if a poor lexical corpus of idioms is used it is
garbage in. garbage out
This analysis does not usually result in high accuracy because it ignores the idiom sentiment and sticks with strict text meaning
This is the hardest type of idiom to draw sentiment from because the words used have a rich historical and cultural heritage that lends its meaning to the idiom used but have no intrinsic meaning.
Recent studies found this may be helpful with translating idioms from other languages but it is not conclusive whether or not introducers can help with sentiment analysis of idioms
This option has potential but without a proper lexicon it will not prove useful.
There is
light at the end of the tunnel.
There are some nifty idiom dictionaries, repositories, and lexicons compiled for use. The added benefit of these is that most of them also include the definition of the idiom and the morphology of the idiom sentiment -all of which, if compiled, can be used for machine learning and analysis performed using the hybridization of the four linguistic methods mentioned.

Can digital humanities


this initiative?

A list of all corpora found for this presentation can be found in the reference section.
Idioms are cultural history footprints -preserving certain terminology, history and wisdom.
Example: "Lurch is no longer used other than in an idiom expression.
When you leave someone "
in the lurch
," you leave them in a jam, in a difficult position. But while getting left in the lurch may leave you staggering around and feeling off-balance, the "lurch" in this expression has a different origin than the staggery one. The balance-related lurch comes from nautical vocabulary, while the lurch you get left in comes from an old French backgammon-style game called lourche. Lurch became a general term for the situation of beating your opponent by a huge score. By extension it came to stand for the state of getting the better of someone or cheating them."
"Laughter is the best medicine"
is a wisdom adage that can trace its probable lineage to the Bible verse “A merry heart does good, like medicine, but a broken spirit dries the bones.” – Proverbs 17:22 (New King James Version). Although the The Oxford Dictionary of Quotations and Brewers' Phrase and Fable agree that the saying doesn't pre-date the 20th century.
Example: "
Carpe Noctem"
is a modern idiom meaning to seize the night to get work done and not procrastinate. There are many modern idioms that have their inception in social media and the pop culture of today.
Example: "The Occupy students were
owing their feisty protests to a
young Turk
The term
is "of unknown origin, first found in British newspaper police-court reports in the summer of 1898, almost certainly from the variant form of the Irish surname Houlihan, which figured as a characteristic comic Irish name in music hall songs and newspapers of the 1880s and '90s".
Young Turk
is to mean "a young and rebellious or unmanageable member of an organization. The original Young Turks were a reformist political movement in the Ottoman Empire. The name reflected both the desire to rejuvenate a declining empire and the youthfulness of its founders (young army officers) and some of its supporters (students). In modern English use, the implication of unreliable extremism owes less to the actual Young Turks than to the old-fashioned English view of Turks as rather hot-headed and violent people."

Idiomology is a field that is sorely lacking scholars and research. Sentiment analysis is grabbing
a piece of the
Big Data
but the digital humanities have not
weighed in
on idioms to a large extent. Combining the two fields may further cultural history understanding, enable mapping of social and cultural trends, preserve snapshots of history and record heritage in wisdom adages and lore. The first step in accomplishing this is creating a lexicon to reference.
A further project that is
ripe for the picking
is establishing a database of all idioms for scholars to utilize whether they are mining sentiment or working in the digital humanities.
Anyone game?
Mark, learn and inwardly digest
on the scholarship, and lexicon, of idioms
Scholarship referenced in this presentation
Margarita Straksiene, “Analysis of Idiom Translation Strategies from English Into Lithuanian”, STUDIES ABOUT LANGUAGES 14 (2009): 13-19.
Shastri, Lokendra, Anju G. Parvathy, Abhishek Kumar, John Wesley, and Rajesh Balakrishnan. “Sentiment Extraction: Integrating Statistical Parsing, Semantic Analysis, and Common Sense Reasoning.” Proceedings of the Twenty-Second Innovative Applications of Artificial Intelligence Conference (2010): 1853-58.
Baron, Faye Rochelle. “Identifying Non-Compositional Idioms in Text Using Wordnet Synsets.” Master's thesis, University of Toronto, 2007.
Mateu, Jaume, and M. Teresa Espinal. “Argument Structure and Compositionality in Idiomatic Constructions.” Universitat Autònoma de Barcelona (2004): 1-22.
Holsinger, Edward, and Elsi Kaiser. “Effects of Context On Processing (Non)-Compositional Expressions.” Department of Linguistics, University of Southern California, Los Angeles (2009): 1-4.
Carter, Simon, Wouter Weerkamp, and Manos Tsagkias. “Microblog Language Identification: Overcoming the Limitations of Short, Unedited and Idiomatic Text.” Language Resources and Evaluation 47, no. 1 (2012): 195-215.
Rentoumi, Vassiliki, George Giannakopoulos, Vangelis Karkaletsis, and George A. Vouros. “Sentiment Analysis of Figurative Language Using a Word Sense Disambiguation Approach.” International Conference RANLP (2009): 370-75.
Titone, Debra A., and Cynthia M. Connine. “On the Compositional and Noncompositional Nature of Idiomatic.” Journal of Pragmatics 31 (1999): 1655-74.
Cernak, Frantisek, "Text Introducers of Proverbs and Other Idioms", (2004): 1-10.
Lexicons, databases, and overall corpora accessed for this presentation
Some neat books to look at :
In the Land of Invented Languages: Esperanto Rock Stars, Klingon Poets, Loglan Lovers, and the Mad Dreamers Who Tried to Build A Perfect Language
by Arika Okrent
Found in Translation: How Language Shapes Our Lives and Transforms the World
by Nataly Kelly
The Power of Babel: A Natural History of Languag
e by John McWhorter
The Oxford Dictionary of Quotations
by Elizabeth Knowles
Brewer's Dictionary of Phrase and Fable
by Susie Dent
One final note before
I bid you adieu
I had a lot of fun with this presentation and I had to at one point put a stop to it because I am too fascinated by language and the possibilities idioms hold.

A fun game that can be played is going through and finding the highlighted idioms I used throughout the presentation.

Can you identify them, what they mean, and what their morphological cultural history is? If not they can all be found in the corpora I have in my references.
Happy hunting!
Translating idioms is a difficult task and is one
huddle to jump
for idiom sentiment extraction
The overarching question is what else can the digital humanities do with idioms and sentiment analysis?
Further research is needed.
Full transcript