000222871 001__ 222871
000222871 005__ 20190509132602.0
000222871 0247_ $$2doi$$a10.5075/epfl-thesis-7249
000222871 02470 $$2urn$$aurn:nbn:ch:bel-epfl-thesis7249-5
000222871 02471 $$2nebis$$a10778360
000222871 037__ $$aTHESIS
000222871 041__ $$aeng
000222871 088__ $$a7249
000222871 245__ $$aSpecialising Parsers for Queries
000222871 269__ $$a2016
000222871 260__ $$bEPFL$$c2016$$aLausanne
000222871 300__ $$a152
000222871 336__ $$aTheses
000222871 502__ $$aProf. Viktor Kuncak (président) ; Prof. Martin Odersky (directeur de thèse) ; Prof. Christoph Koch, Prof. Eelco Visser, Dr Alessandro Warth (rapporteurs)
000222871 520__ $$aMany software systems consist of data processing components that analyse large datasets to gather information and learn from these. Often, only part of the data is relevant for analysis. Data processing systems contain an initial preprocessing step that filters out the unwanted information. While efficient data analysis techniques and methodologies are accessible to non-expert programmers, data preprocessing seems to be forgotten, or worse, ignored. This despite real performance gains being possible by efficiently preprocessing data. Implementations of the data preprocessing step traditionally have to trade modularity for performance: to achieve the former, one separates the parsing of raw data and filtering it, and leads to slow programs because of the creation of intermediate objects during execution. The efficient version is a low-level implementation that interleaves parsing and querying. In this dissertation we demonstrate a principled and practical technique to convert the modular, maintainable program into its interleaved efficient counterpart. Key to achieving this objective is the removal, or deforestation, of intermediate objects in a program execution. We first show that by encoding data types using Böhm-Berarducci encodings (often referred to as Church encodings), and combining these with partial evaluation for function composition we achieve deforestation. This allows us to implement optimisations themselves as libraries, with minimal dependence on an underlying optimising compiler. Next we illustrate the applicability of this approach to parsing and preprocessing queries. The approach is general enough to cover top-down and bottom-up parsing techniques, and deforestation of pipelines of operations on lists and streams. We finally present a set of transformation rules that for a parser on a nested data format and a query on the structure, produces a parser specialised for the query. As a result we preserve the modularity of writing parsers and queries separately while also minimising resource usage. These transformation rules combine deforested implementations of both libraries to yield an efficient, interleaved result.
000222871 6531_ $$aParsers combinators
000222871 6531_ $$apartial evaluation
000222871 6531_ $$aChurch encodings
000222871 6531_ $$astaging
000222871 6531_ $$adeforestation
000222871 6531_ $$afusion
000222871 6531_ $$aquerying
000222871 700__ $$0246590$$g170823$$aJonnalagedda, Manohar
000222871 720_2 $$aOdersky, Martin$$edir.$$g126003$$0241835
000222871 8564_ $$uhttps://infoscience.epfl.ch/record/222871/files/EPFL_TH7249.pdf$$zn/a$$s1582008$$yn/a
000222871 909C0 $$xU10409$$0252187$$pLAMP
000222871 909CO $$pthesis-bn2018$$pDOI$$pIC$$ooai:infoscience.tind.io:222871$$qDOI2$$qGLOBAL_SET$$pthesis
000222871 917Z8 $$x108898
000222871 917Z8 $$x108898
000222871 918__ $$dEDIC$$aIC
000222871 919__ $$aLAMP1
000222871 920__ $$b2016$$a2016-11-11
000222871 970__ $$a7249/THESES
000222871 973__ $$sPUBLISHED$$aEPFL
000222871 980__ $$aTHESIS