Dynamic faceted search for discovery-driven analysis

We propose a dynamic faceted search system for discovery-driven analysis on data with both textual content and structured attributes. From a keyword query, we want to dynamically select a small set of "interesting" attributes and present aggregates on them to a user. Similar to work in OLAP exploration, we define "interestingness" as how surprising an aggregated value is, based on a given expectation. We make two new contributions by proposing a novel "navigational" expectation that’s particularly useful in the context of faceted search, and a novel interestingness measure through judicious application of p-values. Through a user survey, we find the new expectation and interestingness metric quite effective. We develop an efficient dynamic faceted search system by improving a popular open source engine, Solr. Our system exploits compressed bitmaps for caching the posting lists in an inverted index, and a novel directory structure called a bitset tree for fast bitset intersection. We conduct a comprehensive experimental study on large real data sets and show that our engine performs 2 to 3 times faster than Solr.

Published in:
Proceedings of the 17th ACM Conference on Information and Knowledge Management, 3-12
Presented at:
Conference on Information and Knowledge Management (CIKM '08), Napa Valley, California, USA, October 26-30, 2008

 Record created 2009-01-23, last modified 2018-03-17

Rate this document:

Rate this document:
(Not yet reviewed)