Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling
 
conference paper not in proceedings

Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling

Wambsganss, Thiemo  
•
Swamy, Vinitra  
•
Rietsche, Roman
Show more
2022
29th International Conference on Computational Linguistics (COLING 2022)

Natural Language Processing (NLP) has become increasingly utilized to provide adaptivity in educational applications. However, recent research has highlighted a variety of biases in pre-trained language models. While existing studies investigate bias in different domains, they are limited in addressing fine-grained analysis on educational and multilingual corpora. In this work, we analyze bias across text and through multiple architectures on a corpus of 9,165 German peer-reviews collected from university students over five years. Notably, our corpus includes labels such as helpfulness, quality, and critical aspect ratings from the peer-review recipient as well as demographic attributes. We conduct a Word Embedding Association Test (WEAT) analysis on (1) our collected corpus in connection with the clustered labels, (2) the most common pre-trained German language models (T5, BERT, and GPT-2) and GloVe embeddings, and (3) the language models after fine-tuning on our collected data-set. In contrast to our initial expectations, we found that our collected corpus does not reveal many biases in the co-occurrence analysis or in the GloVe embeddings. However, the pre-trained German language models find substantial conceptual, racial, and gender bias and have significant changes in bias across conceptual and racial axes during fine-tuning on the peer-review data. With our research, we aim to contribute to the fourth UN sustainability goal (quality education) with a novel dataset, an understanding of biases in natural language education data, and the potential harms of not counteracting biases in language models for educational tasks.

  • Files
  • Details
  • Metrics
Type
conference paper not in proceedings
DOI
10.48550/arxiv.2209.10335
ArXiv ID

arXiv:2209.10335

Author(s)
Wambsganss, Thiemo  
Swamy, Vinitra  
Rietsche, Roman
Käser, Tanja  
Date Issued

2022

Total of pages

13

Subjects

Language Models

•

Bias

•

German

Note

Accepted as a full paper at COLING 2022: The 29th International Conference on Computational Linguistics, 12-17 of October 2022, Gyeongju, Republic of Korea.

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
ML4ED  
AVP-E-LEARN  
Event nameEvent placeEvent date
29th International Conference on Computational Linguistics (COLING 2022)

Gyeongju, Republic of Korea

October 12-17, 2022

Available on Infoscience
September 22, 2022
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/190870
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés