Infoscience

Thesis

Search Techniques for Code Generation

This dissertation explores techniques that synthesize and generate program fragments and test inputs. The main goal of these techniques is to improve and support automation in program synthesis and test input generation. This is important because performing those processes manually is often tedious, time consuming and error prone. The main challenge that these techniques face is exploring the search space in efficient and scalable ways. In the first part of the dissertation, we present tools InSynth and PolySynth that interactively synthesize code fragments. They take as input a partial program and automatically extract type information, the desired type, and set of visible declarations. They use this input to synthesize ranked list of expressions with the desired type. Finally, they present the expressions to a developer in similar manner to code completions in modern IDEs. InSynth is the first tool that uses a complete algorithm to generate expressions with first class functions and higher order functions. We present the theoretical foundation of the InSynth problem, that is based on type inhabitation, and the type-based backward search algorithm that solves it. PolySynth uses type-driven, resolution based algorithm that considers polymorphic types (generics) to generate expressions. Furthermore, the uniqueness of both tools comes from the fact that their algorithms operate using corpus statistics. The statistics are used to steer the algorithms and the search space exploration towards the most relevant solutions. In the second part of the dissertation we present the tool anyCode that uses natural language input to synthesize expressions. As input it accepts English words or Java program language constructs. This allows a developer to encode her intuition about the desired expression using words or the expression that approximates the desired structure. Thanks to this flexibility, anyCode can also repair broken expressions. It uses a pipeline of natural language and related-word tools to analyze the input. This helps anyCode to identify the set of the most relevant components and to reduce the size of search space. To further reduce the size of search space and to create the most relevant expressions, anyCode uses two statistical models: unigram and probabilistic context free grammar. Finally, in the last part of the dissertation we present UDITA, a Java-like language with support for non-determinism, which allows a user to describe test generation programs. Test generation programs are run on a top of Java PathFinder (JPF), a popular explicit-state model checker, that has a built-in backtracking mechanism and supports non-determinism. Using UDITA programs, JPF generates test inputs. The first benefit of UDITA is that non-determinism empowers a user to describe many test inputs as easily as describing a single test input. The second benefit is that it gives a user more flexibility allowing her to describe test generation programs by arbitrarily combining filters and generators. UDITA reduces the size of search space using an algorithm that reduces the number of generated complex isomorphic structures and that delays non-deterministic choices.

Related material