Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Preprints and Working Papers
  4. Argumentative essay assessment with LLMs: A critical scoping review
 
preprint

Argumentative essay assessment with LLMs: A critical scoping review

Favero, Lucile
•
Gaudeau, Gabrielle
•
Pérez-Ortiz, Juan Antonio
Show more
February 2, 2026

Large Language Models are rapidly reshaping Automated Essay Scoring (AES), yet the methodological, conceptual, and ethical foundations of Argumentative Automated Essay Scoring (AAES) remain underdeveloped. This critical review synthesizes 46 studies published between 2022--2025, following PRISMA 2020 guidelines and a preregistered protocol. We map the landscape of LLM-based AAES across six dimensions---datasets, traits, models, methods, evaluation, and analytics. Our findings show that AAES research remains fragmented and insufficiently grounded in argumentation theory. The field relies on non-comparable datasets which vary in availability, prompt diversity, rater configuration, and linguistic background. Trait analysis reveals substantial overrepresentation of rhetorical and linguistic features and sparse coverage of reasoning-oriented constructs (e.g., logical cogency, dialectical quality). Studies mainly rely on proprietary GPT-family models and rubric-based prompting, while only a minority employ fine-tuning, multi-agent approaches, or reasoning LLMs. Evaluation practices remain uneven: although studies report high human-model agreement, robustness analyses expose sensitivity to prompting, score distributions, and learner proficiency. FATEN analyses reveal recurrent concerns regarding fairness (e.g., style and L1 bias), transparency, randomness sensitivity, limited pedagogical alignment, and an absence of work on privacy or deployment safety. Taken together, the evidence suggests that while LLMs can approximate human scoring on several traits, current systems insufficiently model core argumentative reasoning and lack the validity, interpretability, and accountability required for high-stakes assessment. We conclude by proposing a research agenda focused on construct-valid datasets and rubrics, psychometric modeling, transparent evaluation protocols, and responsible design frameworks.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

argumentative_essay_assessment_with_ll_ms_a_critical_scoping_review.pdf

Type

Main Document

Version

Submitted version (Preprint)

Access type

openaccess

License Condition

CC BY

Size

1.62 MB

Format

Adobe PDF

Checksum (MD5)

58d37435d83f0d629865e7baa2d2ed35

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés