000201942 001__ 201942
000201942 005__ 20190317000013.0
000201942 037__ $$aCONF
000201942 245__ $$aBF-Tree: Approximate Tree Indexing
000201942 269__ $$a2014
000201942 260__ $$c2014
000201942 336__ $$aConference Papers
000201942 520__ $$aThe increasing volume of time-based generated data and the shift in storage technologies suggest that we might need to reconsider indexing. Several workloads - like social and service monitoring - often include attributes with implicit clustering because of their time-dependent nature. In addition, solid state disks (SSD) (using flash or other low-level technologies) emerge as viable competitors of hard disk drives (HDD). Capacity and access times of storage devices create a trade-off between SSD and HDD. Slow random accesses in HDD have been replaced by efficient random accesses in SSD, but their available capacity is one or more orders of magnitude more expensive than the one of HDD. Indexing, however, is designed assuming HDD as secondary storage, thus minimizing random accesses at the expense of capacity. Indexing data using SSD as secondary storage requires treating capacity as a scarce resource. To this end, we introduce approximate tree indexing, which employs probabilistic data structures (Bloom filters) to trade accuracy for size and produce smaller, yet powerful, tree indexes, which we name Bloom filter trees (BF-Trees). BF-Trees exploit pre-existing data ordering or partitioning to offer competitive search performance. We demonstrate, both by an analytical study and by experimental results, that by using workload knowledge and reducing indexing accuracy up to some extent, we can save substantially on capacity when indexing on ordered or partitioned attributes. In particular, in experiments with a synthetic workload, approximate indexing offers 2.22x-48x smaller index footprint with competitive response times, and in experiments with TPCH and a monitoring real-life dataset from an energy company, it offers 1.6x-4x smaller index footprint with competitive search times as well.
000201942 700__ $$0243529$$g188175$$aAthanassoulis, Manos
000201942 700__ $$aAilamaki, Anastasia$$g177957$$0243527
000201942 7112_ $$d31 August- 4 September 2015$$cWaikoloa, Hawaii, USA$$a41st International Conference on Very Large Databases
000201942 773__ $$tProceedings of the 40th International Conference on Very Large Databases
000201942 8564_ $$uhttps://infoscience.epfl.ch/record/201942/files/p1287-athanassoulis.pdf$$zn/a$$s675091$$yn/a
000201942 909C0 $$xU11836$$0252224$$pDIAS
000201942 909CO $$qGLOBAL_SET$$pconf$$ooai:infoscience.tind.io:201942$$pIC
000201942 917Z8 $$x191574
000201942 937__ $$aEPFL-CONF-201942
000201942 973__ $$rREVIEWED$$sACCEPTED$$aEPFL
000201942 980__ $$aCONF