ChemLit-QA: a human evaluated dataset for chemistry RAG tasks
Retrieval-Augmented Generation (RAG) is a widely used strategy in Large-Language Models (LLMs) to extrapolate beyond the inherent pre-trained knowledge. Hence, RAG is crucial when working in data-sparse fields such as Chemistry. The evaluation of RAG systems is commonly conducted using specialized datasets. However, existing datasets, typically in the form of scientific Question-Answer-Context (QAC) triplets or QA pairs, are often limited in size due to the labor-intensive nature of manual curation or require further quality assessment when generated through automated processes. This highlights a critical need for large, high-quality datasets tailored to scientific applications. We introduce ChemLit-QA, a comprehensive, expert-validated, open-source dataset comprising over 1,000 entries specifically designed for chemistry. Our approach involves the initial generation and filtering of a QAC dataset using an automated framework based on GPT-4 Turbo, followed by rigorous evaluation by chemistry experts. Additionally, we provide two supplementary datasets: ChemLit-QA-neg focused on negative data, and ChemLit-QA-multi focused on multihop reasoning tasks for LLMs, which complement the main dataset on hallucination detection and more reasoning-intensive tasks.
WOS:001463418400001
École Polytechnique Fédérale de Lausanne
École Polytechnique Fédérale de Lausanne
École Polytechnique Fédérale de Lausanne
École Polytechnique Fédérale de Lausanne
École Polytechnique Fédérale de Lausanne
École Polytechnique Fédérale de Lausanne
École Polytechnique Fédérale de Lausanne
2025-06-30
6
2
020601
REVIEWED
EPFL