28835 Methods Tutorial

AI-powered content analysis: Using ChatGPT to measure media and communication content

Marko Bachl

Comments

Large language models (LLM; starting with Google’s BERT) and particularly their implementations as generative or conversational AI tools (e.g., OpenAI’s ChatGPT) are increasingly used to measure or classify media and communication content. The idea is simple yet intriguing: Instead of training and employing humans for annotation tasks, researchers describe the concept of interest to a model such as ChatGPT, present the coding unit, and ask for a classification. The first tests of the utility of ChatGPT and similar tools for content analysis were positive to enthusiastic [1, 2]. However, others pointed out the need for more thorough validation and reliability tests [3, 4]. Easy-to-use tools and user-friendly tutorials have proliferated the methods to the average social scientist [5, 6]. Yet (closed-source, commercial) large language models are not entirely understood even by their developers, and their uncritical use has been criticized on ethical grounds [7, 8].

In this seminar, we will engage practically with this cutting-edge methodological research. We start with a quick refresher on the basics of quantitative content analysis (both human and computational), focusing on quality criteria and evaluation (validity, reliability, reproducibility, robustness, replicability). We will then attempt an overview of the rapidly developing literature on LLMs’ utility for content analysis. The central part of the seminar will be dedicated to small evaluation studies by student teams. Questions can range from understanding a tool’s parameters (e.g., What’s the effect of a model’s “temperature” on reliability and validity?) to practical optimization (e.g., Which prompts work best for a given task?) to critical questions (e.g., Does the classification show gender, racial, …, biases?).


Requirements:

  • Some prior exposure to (standardized, quantitative) content analysis will be helpful. However, qualitative methods also have their place in evaluating content analysis methods. If you have little experience with the former but can contribute with the latter, make sure to team up with students whose skill set complements yours.
  • Prior knowledge in R or Python, applied data analysis, and interacting with application programming interfaces (API) will be helpful but are not required. Again, make sure that the teams overall have a balanced skill set.
  • You will use your computer to conduct your evaluation study. Credit for commercial APIs (e.g., OpenAI) will be provided within sensible limits.
  • This is not a programming class. Neither are programming skills required nor will you acquire such skills in a systematic way. I primarily work with R and sometimes copy, paste, and adapt some Python code. So, my examples will be mainly in R. However, you are free to use whichever software you like.

References:

[1] Gilardi, F., Alizadeh, M., & Kubli, M. (2023). ChatGPT outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, 120(30), e2305016120. https://doi.org/10.1073/pnas.2305016120

[2] Rathje, S., Mirea, D.-M., Sucholutsky, I., Marjieh, R., Robertson, C., & Bavel, J. J. V. (2023). GPT is an effective tool for multilingual psychological text analysis. PsyArXiv. https://doi.org/10.31234/osf.io/sekf5

[3] Reiss, M. V. (2023). Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (arXiv:2304.11085). arXiv. https://doi.org/10.48550/arXiv.2304.11085

[4] Pangakis, N., Wolken, S., & Fasching, N. (2023). Automated annotation with generative AI requires validation (arXiv:2306.00176). arXiv. https://doi.org/10.48550/arXiv.2306.00176

[5] Kjell, O., Giorgi, S., & Schwartz, H. A. (2023). The text-package: An R-package for analyzing and visualizing human language using natural language processing and transformers. Psychological Methods. https://doi.org/10/gsmcq8; https://psyarxiv.com/293kt/

[6] Törnberg, P. (2023). How to use LLMs for text analysis (arXiv:2307.13106). arXiv. https://doi.org/10.48550/arXiv.2307.13106

[7] Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? ??. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10/gh677h

[8] Spirling, A. (2023). Why open-source generative AI models are an ethical way forward for science. Nature, 616(7957), 413–413. https://doi.org/10/gsqx6v

close

16 Class schedule

Regular appointments

Wed, 2023-10-18 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2023-10-25 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2023-11-01 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2023-11-08 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2023-11-15 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2023-11-22 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2023-11-29 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2023-12-06 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2023-12-13 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2023-12-20 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2024-01-10 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2024-01-17 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2024-01-24 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2024-01-31 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2024-02-07 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Wed, 2024-02-14 16:00 - 18:00

Lecturers:
Prof. Dr. Marko Bachl

Location:
Garystr.55/105 Seminarraum (Garystr. 55)

Subjects A - Z