Multi-Label Answer Aggregation for Crowdsourcing
Crowdsourcing has been widely established as a means to enable human computation at large scale, in particular for tasks that require manual labelling of large sets of data items. Answers obtained from heterogeneous crowd workers are aggregated to obtain a robust result. However, existing methods for answer aggregation assume that answers are given as a single label per item. Hence, these methods are ineffective for common multi-labelling problems such as image tagging and document annotation, where items are assigned sets of labels. In this paper, we propose a novel Bayesian nonparametric model for multi-label answer aggregation. It enables us to predict labels for non-grounded items, while taking into account dependencies between the labels in different answer sets. We also show how this model is instantiated for incremental learning, incorporating new answers from crowd workers as they arrive. An evaluation of our method using a number of large-scale, real-world crowdsourcing datasets reveals that it consistently outperforms the state-of-the-art in answer aggregation in terms of precision, recall, and robustness against faulty workers and data sparsity.