Keyword Extraction and Clustering for Document Recommendation in Conversations

Habibi, Maryam; Popescu-Belis, Andrei

doi:10.1109/Taslp.2015.2405482

Habibi, Maryam; Popescu-Belis, Andrei

2015

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

This paper addresses the problem of keyword extraction from conversations, with the goal of using these keywords to retrieve, for each short conversation fragment, a small number of potentially relevant documents, which can be recommended to participants. However, even a short fragment contains a variety of words, which are potentially related to several topics; moreover, using an automatic speech recognition (ASR) system introduces errors among them. Therefore, it is difficult to infer precisely the information needs of the conversation participants. We first propose an algorithm to extract keywords from the output of an ASR system (or a manual transcript for testing), which makes use of topic modeling techniques and of a submodular reward function which favors diversity in the keyword set, to match the potential diversity of topics and remove ASR noise. Then, we propose a method to derive multiple topically-separated queries from this keyword set, in order to maximize the chances of making at least one relevant recommendation when using these queries to search over the English Wikipedia. The proposed methods are evaluated in terms of relevance with respect to conversation fragments from the Fisher, AMI, and ELEA conversational corpora, rated by several human judges. The results show that our proposal improves over previous methods that consider only word frequency or topic similarity, and represents a promising solution for a document recommender system to be used in conversations.

Details

Title Keyword Extraction and Clustering for Document Recommendation in Conversations

Author(s) Habibi, Maryam ; Popescu-Belis, Andrei

Published in IEEE/ACM Transactions on Audio Speech and Language Processing

Volume 23

Issue 4

Pages 746-759

Date 2015

Keywords

Document recommendation; information retrieval; keyword extraction; meeting analysis; topic modeling

DOI https://doi.org/10.1109/Taslp.2015.2405482

Other identifier(s) View record in Web of Science

Laboratories LIDIAP
LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Work produced at EPFL
Journal Articles
Published

Record creation date 2014-12-19

Files

Abstract

Details

PDF