Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Saliency-based representations and multi-component classifiers for visual scene recognition
 
doctoral thesis

Saliency-based representations and multi-component classifiers for visual scene recognition

Fornoni, Marco  
2014

Visual scene recognition deals with the problem of automatically recognizing the high-level semantic concept describing a given image as a whole, such as the environment in which the scene is occurring (e.g. a mountain), or the event that is taking place (e.g. a rock climbing event). Scene categories, especially those related to man-made places and events, present high degrees of intra-class variability and inter-class similarity, which in turn require robust and discriminative recognition systems. An additional requirement for potential applications, such as vision-based spatial reasoning for mobile robots, is efficiency of the classification procedure. The objective of this thesis is to address these challenges, by proposing suitable image representations and classification algorithms. The first part of the thesis focuses on the representation task. We propose a bottom-up image descriptor capturing perceptually coherent structures independently of their position. In particular, our method separately pools features extracted from two perceptually different image regions: the most salient region and the remaining non-salient one. By complementing this Saliency-driven Perceptual Pooling (SPP) with an ad-hoc spatial pooling operation, we obtain compact and robust image representations, particularly suited for indoor and sports scenes. The second part of the thesis is concerned with the classification step. We propose an efficient multi-component classification algorithm, named Multiclass Latent Locally Linear SVM (ML3), able to automatically learn a set of sub-categorical linear models for each class, in a principled latent SVM framework. By linearly combining the sub-categorical models with sample and class specific weights, ML3 is able to efficiently learn smooth non-linear decision boundaries, competitive with those obtained by Gaussian kernel SVMs. ML3 also shows very competitive trade-offs between training time and performance, while ensuring high efficiency of the prediction phase. In the last part of the thesis, we use the ML3 algorithm to improve the efficiency and performance of a recently proposed image classification algorithm, named NBNN, designed to cope with classes with a large diversity. Specifically, we show how with a modification of the NBNN scoring function it is possible to use ML3 to learn a discriminative and compact set of prototypical local features for each class, and thus avoid the extensive Nearest Neighbor search used by NBNN. The resulting algorithm, named NBNL, greatly reduces the memory requirements and testing complexity of NBNN, while significantly improving its performance. The approaches proposed in this thesis effectively exploit the spatial, salient and task-driven structures present in the images, producing compact representations and relatively efficient classification procedures. The SPP representations provide competitive scene recognition performances when coupled with non-linear kernels, while the ML3 algorithm can be used to partially fill the gap between linear and non-linear kernels. Although the performance of NBNN-based methods on scene recognition tasks is still below the one obtained by traditional SVM-based approaches, the proposed NBNL algorithm reduces the performance gap, while significantly speeding up the testing phase. Experiments on three publicly available scene recognition datasets (MIT-Indoor-67, 15-Scenes and UIUC-Sports) show the value of the proposed approaches.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-6424
Author(s)
Fornoni, Marco  
Advisors
Bourlard, Hervé  
•
Caputo, Barbara  
Jury

Prof. C.N. Jones (président) ; Prof. H. Bourlard, Dr B. Caputo (directeurs) ; Prof. V. Murino, Prof. D. Skocaj, Prof. J.-Ph. Thiran (rapporteurs)

Date Issued

2014

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2014-11-24

Thesis number

6424

Subjects

visual scene recognition

•

saliency maps

•

feature pooling

•

multi-component classification

•

multi-class classification

•

locally linear SVM

•

latent SVM

•

naive Bayes nearest neighbor

EPFL units
LIDIAP  
Faculty
STI  
School
IEL  
Doctoral School
EDEE  
Available on Infoscience
November 25, 2014
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/109047
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés