Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. PharmaSimText: A Text-Based Educational Playground filled with RL-LLM Agents That Work Together Even in Disagreement
 
journal article

PharmaSimText: A Text-Based Educational Playground filled with RL-LLM Agents That Work Together Even in Disagreement

Radmehr, Bahar  
•
Singla, Adish
•
Käser, Tanja  
January 17, 2025
Journal of Educational Data Mining

There has been a growing interest in developing simulated learners to enhance learning and teaching experiences in educational environments. However, existing works have primarily focused on structured environments relying on meticulously crafted representations of tasks, thereby limiting the learner's ability to generalize skills across tasks. In this paper, we aim to enhance simulated learners' generalization capabilities in less-structured text-based learning environments by integrating Reinforcement Learning (RL) with Large Language Models (LLMs). We investigate three types of agents: (i) RL-based agents that utilize natural language for state and action representations, (ii) LLM-based agents that leverage the model's general knowledge and reasoning through prompting, and (iii) hybrid RL-LLM agents that combine these two strategies to improve agents' performance and generalizability. To support the development of these agents, we introduce PharmaSimText, a novel benchmark developed with expert-evaluated GPT-4 generations derived from a virtual pharmacy environment designed for practicing diagnostic conversations. After experimenting with RL-based and LLM-based agents using GPT-4 and open-source LLMs along with a wide range of strategies for combining them, we find that RL-based agents are good at completing tasks, but not at asking quality diagnostic questions. Conversely, LLM-based agents are better at asking diagnostic questions, but not at completing tasks. Finally, specific variations of hybrid RL-LLM agents enable us to overcome these limitations. Our findings highlight the potential of combining methods based on RL and LLMs in creating generalizable agents that have solutions close to human ones with the LLM component, while remaining faithful to controlled environments with the RL component. The source code and benchmark are available on GitHub. 1

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

836Radmehr1To40.pdf

Type

Main Document

Version

Published version

Access type

openaccess

License Condition

CC BY-NC-ND

Size

10.87 MB

Format

Adobe PDF

Checksum (MD5)

98091d47d341cff038bb4f45ba6e2dde

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés