Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Evaluating Top-k Queries over Incomplete Data Streams
 
conference paper not in proceedings

Evaluating Top-k Queries over Incomplete Data Streams

Haghani, Parisa  
•
Michel, Sebastian
•
Aberer, Karl  
2009
18h ACM Conference on Information and Knowledge Management (CIKM)

We study the problem of continuous monitoring of top-k queries over multiple non-synchronized streams. Assuming a sliding window model, this general problem has been a well addressed research topic in recent years. However, most approaches assume the streams to be synchronized, that is, all attributes of an object are known simultaneously to the query processing engine, which is a crucial assumption for skyline based approaches that construct a k-skyband of potential top-k items. In many streaming scenarios though, different attributes of an item are reported in separate non-synchronized streams which do not allow for exact score calculations. We present how the traditional notion of object \emph{dominance} changes in this case such that the k dominance set still includes all and only those objects which have a chance of being among the top-k results in their life time. Based on this, we propose an exact algorithm which builds on generating multiple instances of the same object in a way that enables efficient object pruning. We show that even with object pruning the necessary storage for exact evaluation of top-k queries is linear in the size of the sliding window. As data should reside in main memory to provide fast answers in an online fashion and cope with high stream rates, storing all this data may not be possible with limited resources. We present an approximate algorithm which leverages correlation statistics of pairs of streams to evict more objects while maintaining accuracy. We evaluate the efficiency of our proposed algorithms with extensive experiments.

  • Details
  • Metrics
Type
conference paper not in proceedings
DOI
10.1145/1645953.1646064
Author(s)
Haghani, Parisa  
Michel, Sebastian
Aberer, Karl  
Date Issued

2009

Subjects

data streams

•

incomplete data

•

top-k queries

•

NCCR-MICS

•

NCCR-MICS/ESDM

URL

URL

http://www.comp.polyu.edu.hk/conference/cikm2009/
Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LSIR  
Event nameEvent placeEvent date
18h ACM Conference on Information and Knowledge Management (CIKM)

Hong Kong

November 2-6, 2009

Available on Infoscience
July 27, 2009
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/41821
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés