Files

Abstract

The amount of information available on the Web has been growing dramatically, raising the importance of techniques for searching the Web. Recently, Web Tables emerged as a model, which enables users to search for information in a structured way. However, effective presentation of results for Web Table search requires (1) selecting a ranking of tables that acknowledges the diversity within the search result; and (2) summarizing the information content of the selected tables concisely but meaningful. In this paper, we formalize these requirements as the \emph{diversified table selection} problem and the \emph{structured table summarization} problem. We show that both problems are computationally intractable and, thus, present heuristic algorithms to solve them. For these algorithms, we prove salient performance guarantees, such as near-optimality, stability, and fairness. Our experiments with real-world collections of thousands of Web Tables highlight the scalability of our techniques. We achieve improvements up to 50\% in diversity and 10\% in relevance over baselines for Web Table selection, and reduce the information loss induced by table summarization by up to 50\%. In a user study, we observed that our techniques are preferred over alternative solutions.

Details

Actions

Preview