Files

Abstract

Modern database management systems (DBMS) answer a multitude of complex queries on increasingly larger datasets. Given the complexities of the queries and the numerous design features, manual design is no longer an option. Instead, automatically designing the database is vital to maximize its performance and to reduce the total cost of ownership. For this purpose, commercial DBMS feature automated physical designers suggesting an efficient DB design by using the optimizer as a cost model. Unfortunately, consulting the optimizer is timeconsuming, an effect which is typically counter-acted by drastically pruning the search space, thereby potentially missing the optimal solution. Recently techniques cache the optimizer’s output and evaluate some plans with the cached results, reducing the number of calls to the optimizer. Still, however, the cost of invoking the optimizer to fill the cache is nontrivial, undermining scalability when running workloads with thousands of queries. In this paper, we use the intermediate optimization results in a dynamic programming based optimizer to reduce the cache initialization overhead. We demonstrate the accuracy and efficiency of our techniques by implementing them on the PostgreSQL open source query optimizer. For a star-schema workload, our techniques build the cost model 5 to 10 times faster than the conventional approach, while preserving accuracy.

Details

PDF