Human communication is increasingly recorded as digital text, which constitutes big data that can be used to study numerous scientific and real-world problems. The goals of this course are to (i) provide an introduction to quantitative methods designed to analyze text, (ii) give an overview over common applications of these methods in economics and the social sciences, and (iii) illustrate the potential of text-as-data methods to ask new research questions and find new answers to existing problems.

The course provides an overview over the most common text-as-data methods as well as their typical areas of application:

- Prerequisites (text import, creation of corpora, pre-processing, creation of document-term matrices, lemmatization)
- Text statistics (e.g., frequency analysis, measures of readability, similarity indices)
- Generic and customized dictionaries
- Sentiment analysis
- Text classification using reference texts and supervised learning
- Topic modeling

All sessions are broadcast on Zoom (https://ju-se.zoom.us/j/66228720267 External link, opens in new window.) but attendance on site is preferable for didactic reasons. All teaching is in English.

The meetings are planned as follows:

  • Session 1: Tuesday, April 29, 10:00 – 16:00, B5002
  • Session 2: Tuesday, May 13, 10:00 – 16:00, B5002
  • Session 3: Wednesday, May 14, 10:00 – 16:00, B5002
  • Session 4: Tuesday, May 20, 10:00 – 16:00, B5002
  • Session 5: Wednesday, May 21, 10:00 – 16:00, A5002
  • Session 6: Thursday, June 12, 10:00 – 16:00, B5015

See the schedule below for more detailed information on each session.

The course has a maximum of 15 places. The seats are accessible via the "first come, first served" principle. The last day to apply is January 30 2025.

Application form Word, 58.8 kB.

Course schedule Pdf, 184.3 kB.

Course syllabus Pdf, 128.4 kB.

If you have any questions, please contact the course coordinator:

Associate Professor, Docent Marcel Garz, JIBS
marcel.garz@ju.se