Making Statistical Data More Easily Accessible on the Web : Results of the StatSearch Case Study

In this paper we present the results of the StatSearch case study that aimed at providing enhanced access to statistical data available on the Web. In the scope of this case study we developed a prototype of an information access tool combining uerybased search engine with semi-automated navigation techniques exploiting hierarchical structuring of available data. This tool enables a better control of the information retrieval, improving the quality and ease of the access to statistical informati n. The central part of the presented StatSearch tool consists in the design of an algorithm for automated navigation through a tree-like hierarchical document structure. The algorithm relies on the computation of query related relevance score distribution over the available database to identify the most relevant clusters in the data structure. These most relevant clusters are then proposed to the user for navigation, or, alternatively, are the support for the automated navigation process. Several approach s to automation of the navigation are compared and natural language processing techniques allowing more precise and coherent computation of textual similarities are briefly described. The resulting StatSearch prototype was evaluated by the Swedish Statist cal Office (SCB) on a sample of over 5000 English documents accessible through the SCB web site. The evaluation was based on supervised on-site usability testing and aimed at identification of the prototype's potentials and its added value with respect to information access as objectively perceived by users. The evaluation method and obtained results are presented. The case study was carried out in the framework of the NEMIS Network of Excellence in Text Mining and Its Applications in Statistics (IST2001-3-574).


Presented at:
International Marketing and Output Database Conference 2005, The Hague, The Netherlands, 5-9 September 2005
Year:
2005
Keywords:
Laboratories:


Note: The status of this file is: EPFL only


 Record created 2010-04-23, last modified 2018-03-17

Preprint:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)