Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Low-rank tensor methods for large Markov chains and forward feature selection methods
 
doctoral thesis

Low-rank tensor methods for large Markov chains and forward feature selection methods

Santos Paredes Quartin de Macedo, Francisco  
2018

In the first part of this thesis, we present and compare several approaches for the determination of the steady-state of large-scale Markov chains with an underlying low-rank tensor structure. Such structure is, in our context of interest, associated with the existence of interacting processes. The state space grows exponentially with the number of processes. This type of problems arises, for instance, in queueing theory, in chemical reaction networks, or in telecommunications.

As the number of degrees of freedom of the problem grows exponentially with the number of processes, the so-called \textit{curse of dimensionality} severely impairs the use of standard methods for the numerical analysis of such Markov chains. We drastically reduce the number of degrees of freedom by assuming a low-rank tensor structure of the solution.

We develop different approaches, all considering a formulation of the problem where all involved structures are considered in their low-rank representations in \textit{tensor train} format.

The first approaches that we will consider are associated with iterative solvers, in particular focusing on solving a minimization problem that is equivalent to the original problem of finding the desired steady state. We later also consider tensorized multigrid techniques as main solvers, using different operators for restriction and interpolation. For instance, aggregation/disaggregation operators, which have been extensively used in this field, are applied.

In the second part of this thesis, we focus on methods for feature selection. More concretely, since, among the various classes of methods, sequential feature selection methods based on mutual information have become very popular and are widely used in practice, we focus on this particular type of methods. This type of problems arises, for instance, in microarray analysis, in clinical prediction, or in text categorization.

Comparative evaluations of these methods have been limited by being based on specific datasets and classifiers. We develop a theoretical framework that allows evaluating the methods based on their theoretical properties. Our framework is based on the properties of the target objective function that the methods try to approximate, and on a novel categorization of features, according to their contribution to the explanation of the class; we derive upper and lower bounds for the target objective function and relate these bounds with the feature types. Then, we characterize the types of approximations made by the methods, and analyse how these approximations cope with the good properties of the target objective function.

We also develop a distributional setting designed to illustrate the various deficiencies of the methods, and provide several examples of wrong feature selections. In the context of this setting, we use the minimum Bayes risk as performance measure of the methods.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-7718
Author(s)
Santos Paredes Quartin de Macedo, Francisco  
Advisors
Kressner, Daniel  
•
Pacheco Pires, António Manuel  
Jury

Prof. Victor Panaretos (président) ; Prof. Daniel Kressner, Prof. António Manuel Pacheco Pires (directeurs) ; Dr Carlos Alves, Dr Paula Milheiro-Oliveira, Dr Nelson Antunes (rapporteurs)

Date Issued

2018

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2018-01-08

Thesis number

7718

Total of pages

175

Subjects

Markov chains

•

curse of dimensionality

•

low-rank structure

•

tensor train format

•

entropy

•

mutual information

•

forward feature selection methods

•

performance measure

•

minimum Bayes risk

Note

Co-supervision with: Instituto Superior Técnico (IST) da Universidade de Lisboa, Doutoramento em Estatistica e Processos Estocásticos

EPFL units
ANCHP  
Faculty
SB  
School
MATHICSE  
Doctoral School
EDMA  
Available on Infoscience
April 11, 2018
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/145977
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés