Infoscience

Thesis

An FPGA-based syntactic parser for large size real-life context-free grammars

This thesis is at the crossroad between Natural Language Processing (NLP) and digital circuit design. It aims at delivering a custom hardware coprocessor for accelerating natural language parsing. The coprocessor has to parse real-life natural language and is targeted to be useful in several NLP applications that are time constrained or need to process large amounts of data. More precisely, the three goals of this thesis are: (1) to propose an efficient FPGA-based coprocessor for natural language syntactic analysis that can deal with inputs in the form of word lattices, (2) to implement the coprocessor in a hardware tool ready for integration within an ordinary desktop computer and (3) to offer an interface (i.e. software library) between the hardware tool and a potential natural language software application, running on the desktop computer. The Field Programmable Gate Array (FPGA) technology has been chosen as the core of the coprocessor implementation due to its ability to efficiently exploit all levels of parallelism available in the implemented algorithms in a cost-effective solution. In addition, the FPGA technology makes it possible to efficiently design and test such a hardware coprocessor. A final reason is that the future general-purpose processors are expected to contain reconfigurable resources. In such a context, an IP core implementing an efficient context-free parser ready to be configured within the reconfigurable resources of the general-purpose processor would be a support for any application relying on context-free parsing and running on that general-purpose processor. The context-free grammar parsing algorithms that have been implemented are the standard CYK algorithm and an enhanced version of the CYK algorithm developed at the EPFL Artificial Intelligence Laboratory. These algorithms were selected (1) due to their intrinsic properties of regular data flow and data processing that make them well suited for a hardware implementation, (2) for their property of producing partial parse trees which makes them adapted for further shallow parsing and (3) for being able to parse word lattices.

    Thèse École polytechnique fédérale de Lausanne EPFL, n° 2522 (2001)
    Faculté informatique et communications
    Institut d'informatique fondamentale
    Laboratoire d'intelligence artificielle
    Jury: Roger Hersch, Dominique Lavenier, Eduardo Sanchez, Manuel Vilares Ferro

    Public defense: 2002-1-25

    Reference

    Record created on 2005-03-16, modified on 2016-08-08

Related material