Result Selection and Summarization for Web Table Search

The amount of information available on the Web has been growing dramatically, raising the importance of techniques for searching the Web. Recently, Web Tables emerged as a model, which enables users to search for information in a structured way. However, effective presentation of results for Web Table search requires (1) selecting a ranking of tables that acknowledges the diversity within the search result; and (2) summarizing the information content of the selected tables concisely but meaningful. In this paper, we formalize these requirements as the \emph{diversified table selection} problem and the \emph{structured table summarization} problem. We show that both problems are computationally intractable and, thus, present heuristic algorithms to solve them. For these algorithms, we prove salient performance guarantees, such as near-optimality, stability, and fairness. Our experiments with real-world collections of thousands of Web Tables highlight the scalability of our techniques. We achieve improvements up to 50\% in diversity and 10\% in relevance over baselines for Web Table selection, and reduce the information loss induced by table summarization by up to 50\%. In a user study, we observed that our techniques are preferred over alternative solutions.

Presented at:
31st IEEE International Conference on Data Engineering, Seoul, Korea, April 13-16, 2015

 Record created 2014-11-26, last modified 2018-11-26

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)