Managing Quality of Crowdsourced Data

The Web is the central medium for discovering knowledge via various sources such as blogs, social media, and wikis. It facilitates access to contents provided by a large number of users, regardless of their geographical locations or cultural backgrounds. Such user-generated content is often referred to as crowdsourced data, which provides informational benefit in terms of variety and scale. Yet, the quality of the crowdsourced data is hard to manage, due to the inherent uncertainty and heterogeneity of the Web. In this proposal, we summarize prior work on crowdsourced data that studies quality dimensions and techniques to assess data quality. However, they often lack mechanisms to collect data with high quality guarantee and to improve data quality. To overcome such limitations, we propose a research direction that emphasises on (1) guaranteeing the data quality at collection time, and (2) using expert knowledge to improve data quality for the cases where data is already collected.

Related material