The research community of dialog generation has been interested in incorporating emotional information into the design of open-domain dialog systems ever since neural networks (sequence-to-sequence models in particular) were adopted for modeling dialogs. The major objective is to generate emotionally richer responses or to make the conversational agent sound more empathetic, which entails recognizing and understanding the user's affective states, and then replying with the appropriate emotion. However, there are a number of difficulties encountered when creating such an empathetic chatbot. Some of the existing models explicitly need an emotion label as input in order to produce responses of that particular emotion, which is impractical in real-world scenarios. Others assume manually defined rules such as following or reversing the user's emotion, but psychological literature has not confirmed such rules to be universally appropriate. Moreover, they ignore the subtle emotion exchanges embedded in human-human conversations, where listeners often exhibit certain empathetic intents that are less emotional. To train a chatbot to convey such subtle emotions and intents, we need a large-scale dialog dataset that is properly labeled. Finally, it is also desirable to explicitly represent such emotional interactions found in people's daily conversations (part of so-called social intelligence) using knowledge graphs, in order to facilitate the development of chatbots. In this thesis, we propose novel solutions to these problems. First, we introduce MEED, a multi-turn emotionally engaging dialog model that learns emotion interactions directly from data, without the need of specifying emotion labels or developing heuristic rules. Then, we present MEED2, the second generation of the MEED model, which is more controllable and interpretable, and is capable of generating responses that have finer-grained emotions and empathetic intents. We also curated EDOS, a large-scale dialog dataset labeled with 32 emotions and 8 empathetic response intents, plus the neutral category. We adopted a semi-supervised learning framework to grow a seed dataset manually labeled by crowdsourcing workers, while iteratively training an emotion/intent classifier, which was used to label the whole dataset. Finally, we present AFEC, a knowledge graph capturing social intelligence found in casual conversations, which reveals how people communicate with each other in day-to-day social environments. To show the utility of AFEC, we built a retrieval-based dialog model solely based on it, and the experimental results show that the dialog model can produce much more diverse responses, yet still being emotionally appropriate. We conclude the thesis by discussing our findings, lessons learned, and some future directions worth exploration.
EPFL_TH9311.pdf
n/a
openaccess
copyright
4.3 MB
Adobe PDF
525dd4b9997ec05c6fbab9e859d0a483