Compile-Time Type-Driven Data Representation Transformations in Object-Oriented Languages

High-level languages allow programmers to express data structures and algorithms that abstract over the type of data they handle. This improves code reuse and makes it possible to develop general-purpose libraries. Yet, data abstractions slow down program execution, as they require low-level indirection. In this thesis we explore three compile-time approaches that leverage type systems to reduce the cost of data abstractions, thus improving program performance. In the first part of the thesis we present miniboxing, a compile-time transformation that replaces generic classes by more efficient variants, optimized to handle primitive types. These variants use the miniboxed data encoding, producing speedups of up to 20x compared to generic classes. The miniboxing transformation is the main result of this thesis and motivates the other techniques. Generalizing miniboxing, we show the Late Data Layout (LDL) mechanism, which uses the type system to guide performance-oriented program rewritings. It can be instantiated to perform a host of transformations, such as miniboxing generics, inlining value classes and unboxing primitive types. The LDL mechanism has many desirable properties, such as provable correctness in handling different data representations, reduced number of conversions and built-in support for the object-oriented paradigm. Finally, we show Data-centric Metaprogramming, a technique that allows programmers to go beyond standard compiler optimizations by defining custom representations to be used for their data. These representations are then automatically introduced by the compiler when translating programs. This technique, similar in spirit to metaprogramming, opens new directions in programmer-driven optimizations and shows encouraging results, with speedups of up to 25x. Under the hood, Data-centric Metaprogramming uses the Late Data Layout mechanism.

Related material


EPFL authors