Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Reports, Documentation, and Standards
  4. Approximate Probabilistic Tree Indexing for Partitionable Data
 
report

Approximate Probabilistic Tree Indexing for Partitionable Data

Athanassoulis, Manos  
•
Ailamaki, Anastasia  
2013

Today we experience a shift both in storage technologies and in workloads characteristics. Firstly, solid state disks (SSD) (using flash, PCM or other low-level technologies) emerge as viable competitors of hard disk drives (HDD). Secondly, the increasing number of time-based generated workloads change the characteristics of the stored and analyzed data. Capacity and access times of storage devices create a storage tradeoff where SSD are positioned diametrically opposite to HDD: slow random accesses in HDD have been replaced by efficient random accesses - particularly reads - in SSD, but the capacity per dollar offered by HDD is still one or more orders of magnitude higher than the one offered by SSD. As the aggregated SSD capacity increases exponentially, an increasing part of our data resides on SSD. Traditionally, indexing is designed assuming HDD as secondary storage, thus minimizing random accesses at the expense of capacity. Indexing data residing on SSD, however, requires treating capacity as a scarce resource. In this paper we demonstrate that by compromising on indexing accuracy up to some extent we can save substantially on capacity when indexing on ordered or naturally partitionable attributes. We introduce approximate tree indexing, which employs probabilistic data structures (Bloom filters) to trade accuracy for size and produce smaller, yet powerful, tree indexes, which we name Bloom filter trees (BF-Trees). Small size enables frequent reconstructions and allows storing within secondary storage. The resulting decreased accuracy can be counteracted by exploiting the capabilities of SSD (e.g. internal device parallelism) and minimizing unnecessary reads by using Bloom filters with a higher granularity. In a series of experiments with a synthetic workload we show that approximate indexing offers 1.75x-65x smaller disk footprint with competitive response times.

  • Details
  • Metrics
Type
report
Author(s)
Athanassoulis, Manos  
Ailamaki, Anastasia  
Date Issued

2013

Total of pages

12

Subjects

approximate indexing

•

Bloom filters

•

data analytics

•

solid-state storage

Note

Under Review

Written at

EPFL

EPFL units
DIAS  
Available on Infoscience
November 22, 2012
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/87034
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés