Towards Novel Evaluation Methods for Social Dialog Systems

Svikhnushina, Ekaterina

doi:10.5075/epfl-thesis-9811

Svikhnushina, Ekaterina

2023

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Language has shaped human evolution and led to the desire to endow machines with language abilities. Recent advancements in natural language processing enable us to achieve this breakthrough in human-machine interaction. However, introducing conversational agents with enhanced language skills raises concerns about their emotional and social engagement. To ensure acceptance, control and evaluation mechanisms must be established. Meanwhile, creating meaningful evaluation metrics for social chatbots is challenging due to the new and undefined nature of this field, lacking clear design guidelines. In this thesis, we contribute novel, effective evaluation frameworks for social chatbots developed based on human-centered research principles. The thesis is structured into three parts. The first part introduces two studies that explore users' expectations of conversational chatbots and their connection to present experiences. The initial study employs qualitative semi-structured interviews and quantitative survey analysis to establish a model of essential social qualities expected from chatbots: politeness, entertainment, attentive curiosity, and empathy (PEACE). The second study examines online chatbot reviews and reveals a discrepancy between users' expectations and their current experiences, highlighting the need for chatbots to possess more advanced social capabilities. The second part of the thesis focuses on attentive curiosity, an essential element that has received limited attention in the study of social chatbots. We propose EQT, a taxonomy of tags to differentiate between different functions of empathetic questions in social interactions. Additionally, we develop automatic classifiers for these labels, allowing us to investigate which question-asking strategies are most effective in specific emotional contexts. This analysis sheds light on the suitability of various approaches for fostering engagement and understanding in social conversations. In the third part, we expand upon our earlier findings and create comprehensive evaluation frameworks for social chatbots. First, we introduce iEval, a human evaluation framework specifically designed to capture users' subjective perceptions of their conversational partners during interactive exchanges. Using this framework, we benchmark four state-of-the-art empathetic chatbots and examine discourse factors that account for the differences in their performance levels. Additionally, we showcase how our evaluation framework can be automated by using prompting of the latest large language models. This enables us to approximate live user studies and achieve a very strong correlation with human judgment. The novel findings presented here enhance our understanding of user interaction with conversational technologies. Moreover, the developed evaluation criteria and frameworks provide valuable insights and tools for shaping and informing the design of future social chatbots.

Details

Title Towards Novel Evaluation Methods for Social Dialog Systems

Author(s) Svikhnushina, Ekaterina

Advisor(s)

Pu Faltings, Pearl

Pagination 206

Date 2023

Publisher Lausanne, EPFL

Keywords

human-computer interaction; conversational agents; chatbots; emotional intelligence; social intelligence; adoption; evaluation; user studies; user experiments; visualization

Language English

DOI https://doi.org/10.5075/epfl-thesis-9811

Laboratories GR-PU

Record Appears in Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2023-09-19

Files

Abstract

Details

PDF