Modèles syntaxiques probabilistes non-génératifs

Rozenknop, Antoine

doi:10.5075/epfl-thesis-2705

Rozenknop, Antoine

2003

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

This work deals with models used, or usable in the domain of Automatic Natural Language Processing, when one seeks a syntactic interpretation of a statement. This interpretation can be used as additional information for subsequent treatments, that can aim for instance at producing a semantic representation of the statement. It can also be used as a filter to select utterances belonging to a specific language, among several hypotheses, as done in Automatic Speech Recognition. As the syntactic interpretation of a statement is generally ambiguous with natural languages, the probabilisation of the space of syntactic trees can help in the analysis task : when several analyses are competing, one can then extract the most probable interpretation, or classify interpretations according to their probabilities. We are interested here in the probabilistic versions of Context-Free Grammars (PCFGs) and Substitution Tree Grammar (PTSGs). Syntactic treebanks, which as much as possible account for the language we wish to model, serve as the basis for defining the probabilistic parameters of such grammars. First, we exhibit in this thesis some drawbacks of the usual learning paradigms, due to the use of arbitrary heuristics (STSG DOP model), or to the use of learning criteria that consider these grammars as generative ones (creation of sentences from the grammar) rather than dedicated to analysis (creation of analyses from the sentence). In a second time, we propose new methods for training grammars, based on the traditional Maximum Entropy and Maximum Likelihood criteria. These criteria are instanciated so that they correspond to a syntactic analysis task rather than a language generation task. Specific training algorithms are necessary for their implementation, but traditional algorithms can cope with those models for the task of syntactic analysis. Lastly, we invest the problem of time complexity of syntactic analysis, which is a real issue for the effective use of PTSGs. We describe classes of PTSGs that allow the analysis of a sentence in polynomial complexity. We finally describe a method that enable the extraction of such a PTSG from the set of subtrees of a treebank. The PTSG produced by this method allows us to test our non-generative learning criterium on "realistic" data, and to give a statistical comparison between this criterium and the usual heuristic criterium in term of analysis performance.

Details

Title Modèles syntaxiques probabilistes non-génératifs

Author(s) Rozenknop, Antoine

Advisor(s)

Rajman, Martin

Pagination 232

Date 2003

Publisher Lausanne, EPFL

Language French

DOI https://doi.org/10.5075/epfl-thesis-2705

Record Appears in Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2005-03-16

Files

Abstract

Details

PDF