You also might be interested in
Postgraduate course CALM:
Computer-Assisted Language Mediation (UGent)
- More information: Thursday 18 February, 12:50-13:20 pm (faculteit Economie en Bedrijfskunde, Sint-Pietersplein 7, lokaal 0.013)
- http://www.flw.ugent.be/najemaster
Essential links
Structure of the course
Week 1: general introduction + R + use case 1
Week 2: machine learning + use case 2
Week 3-6: Python tutorial
Week 7-8: in-class preparation of one of the two case studies
Week 9: presentation of the results
Specific aims
- Acquiring basic terminology in the field of DH in general, and distributional semantics and/or sentiment analysis in specific.
- Acquiring hands-on experience in a selected set of computational tools.
- Being able to critically reflect on strengths and weaknesses of these tools.
- Being able to conduct a small-scale case study using these tools, and present the results in a paper and presentation.
Main aim in this course
- Learning how to retrieve, analyze and visualize linguistic patterns
- for research purposes in Translation Studies (distributional semantics)
- for end-user purposes in organizations (sentiment analysis)
Course evaluation
As a consequence
Conduct one of the two use cases and present it in a(n):
- [Oral presentation (deadline: week 9]
- Written assignment (deadline: to be determined)
Online collection of DH-definitions: http://whatisdigitalhumanities.com/
An online guide to DH: http://sites.library.northwestern.edu/dh/
Online companion to DH: http://digitalhumanities.org:3030/companion/view?docId=blackwell/9781405103213/9781405103213.xml
Structured collection of Digital Research Tools: http://dirtdirectory.org/
- DH is a very broad, dynamic field: it attracts many scholars from different backgrounds.
- DH is interdisciplinary in nature: it uses computational tools from other scientific fields.
- DH is empirical in nature: it uses observable data as point of departure.
Added value of introducing digital methods to humanities?
- Computers are faster, more reliable and better in some of the tasks that are inextricably linked to doing research in humanities, such as:
- Finding linguistic phenomena (characters, words, constructions) in large collections of texts (= corpora).
- Adding linguistically relevant information in corpora (POS tagging, lemmatization...).
- Counting instances and visualizing patterns in the data
- Nevertheless, humans outperform computers in asking the relevant questions, selecting the appropriate tools and interpreting the visualized output.
A more specific definition
Key references
Berry, D. M; (2012). Understanding Digital Humanities.
Palgrave MacMillan
Burdick, A., J. Drucker, P. Lunenfeld, T. Presner & J. Schnapp (2012). Digital_Humanities. MIT Press.
Jones, S. E. (2013). The Emergence of the Digital Humanities. Routledge.
Digital Humanities is a field of research:
- which extensively uses digital methods like presentation software, database storage, programming languages, text analysis tools, dynamic visualizations...
- ... in order to gather, organize, analyze,
teach and present scholarly research in the humanities (e.g. literature, linguistics, philosophy and history) ...
- ... with the ultimate goal to ask and answer new questions and view old questions differently.
A definition
A Gentle Introduction to
Digital Humanities
What sits at the intersection
of computational methods
and the traditional pursuits
of the humanities.
Overview of the Google corpus compilation project
Let's get a feel for some really cool research in DH
http://whatisdigitalhumanities.com/
Track the regularization of irregular verbs in US and GB:
- Chided vs. chid: chided:eng_gb_2012,chid:eng_gb_2012,chided:eng_us_2012,chid:eng_us_2012
- Burnt vs. burned: burnt:eng_gb_2012,burned:eng_gb_2012,burnt:eng_us_2012,burned:eng_us_2012
Some really interesting digital tools (we won't be presenting)
- You can easily find out when (and for how long) a given artist, politician, writer became popular.
- Elvis Presley, Jimi Hendrix, Procol Harum, Marvin Gaye
- Lenin, Stalin, Brezhnev, Andropov, Gorbachev, Yeltsin
- You can even do that for 'years', as a quantification of interest in the present and societal forgetting (e.g., 1900, 1950, 1980)
- Culturomics: the quantitative study of human culture as reflected in (diachronic) language use.
- Building on extreme corpora consisting of digitized books (< Google Books project),
- Scanned, OCR'ed, provided with meta data
- 5 M books (~4% of the complete population), amounting to 500 Bn. tokens, in different languages (not Dutch, though :-(, from 1500-2000.
Michel et al. (2011), "Quantitative Analysis
of Culture Using Millions of Digitized Books", Science 331. [DOI: 10.1126/science.1199644]
- You can easily find out which artist, politician, writer was censored during a given period in a given country.
- Compare Picasso and Chagall in the English and German parts of the Google corpus (Picasso:eng_2012, Picasso:ger_2012)
Some really interesting digital tools we will be presenting
- Presentation software: Prezi, LaTeX
- Web crawling: Site Crawler
- Text analysis: Serendip
- Mapping software: Neatline
- Data visualization software: Cytoscape
- And many many more:
- http://dirtdirectory.org/
- http://sites.library.northwestern.edu/dh/tools-resources/
- Historical epidemiology: e.g., influenza (pandemic).
- History of Warfare: e.g., Taliban.
- History of nations: e.g., Belgium.
- History of food: e.g., hamburger, pizza, steak, pasta, sushi.
- History of science: e.g., Darwin, Galileo, Freud, Einstein.
- Advanced text editing: Sublime Text, Notepad++
- Scripting languages: Python
- Machine learning: Weka
- Statistical analysis and data visualization: R
Foci in culturomics
- Cultural change: which concepts are getting into attention at which point in time (e.g., slavery)?
- Application: history of {diseases, science, food, war}, detection of censorship, prominence in collective memory.
- Linguistic change: what is the evolution of the words used for those concepts (e.g. the Great War vs. World War I)?
- Application: evolution of grammar, lexicography
- http://www.culturomics.org
- http://ngrams.googlelabs.com