Making Statistical Data More Easily Accessible on the Web : Results of the StatSearch Case Study
In this paper we present the results of the StatSearch case study that aimed at providing enhanced access to statistical data available on the Web. In the scope of this case study we developed a prototype of an information access tool combining uerybased search engine with semi-automated navigation techniques exploiting hierarchical structuring of available data. This tool enables a better control of the information retrieval, improving the quality and ease of the access to statistical informati n. The central part of the presented StatSearch tool consists in the design of an algorithm for automated navigation through a tree-like hierarchical document structure. The algorithm relies on the computation of query related relevance score distribution over the available database to identify the most relevant clusters in the data structure. These most relevant clusters are then proposed to the user for navigation, or, alternatively, are the support for the automated navigation process. Several approach s to automation of the navigation are compared and natural language processing techniques allowing more precise and coherent computation of textual similarities are briefly described. The resulting StatSearch prototype was evaluated by the Swedish Statist cal Office (SCB) on a sample of over 5000 English documents accessible through the SCB web site. The evaluation was based on supervised on-site usability testing and aimed at identification of the prototype's potentials and its added value with respect to information access as objectively perceived by users. The evaluation method and obtained results are presented. The case study was carried out in the framework of the NEMIS Network of Excellence in Text Mining and Its Applications in Statistics (IST2001-3-574).