Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. On the design of audio features robust to the album-effect for music information retrieval
 
doctoral thesis

On the design of audio features robust to the album-effect for music information retrieval

Scaringella, Nicolas
2009

Short-term spectral features – and most notably Mel-Frequency Cepstral Coefficients (MFCCs) – are the most widely used descriptors of audio signals and are deployed in a majority of state-of-the-art Music Information Retrieval (MIR) systems. These descriptors have however demonstrated their limitations in the context of speech processing when training and testing conditions of the system do not match, like e.g. in noisy conditions or under a channel mismatch. A related problem has been observed in the context of music processing. It has indeed been hypothesized that MIR algorithms relying on the use of short-term spectral features were unexpectedly picking up on similarities in the production/mastering qualities of music albums. This problem has been referred to as the album-effect in the literature though it has never been studied in depth. It is showed in this thesis how the album-effect relates to the problem of channel mismatch. A measure of robustness to the album-effect is proposed and channel normalization techniques borrowed from the speech processing community are evaluated to help at improving the robustness of short-term spectral features. Alternatively, longer-term features describing critical-band specialized temporal patterns (TRAPs) are adapted to the context of music processing. It is shown how such features can help at describing either timbre or rhythm content depending on the scale considered for analysis and how robust they are to the album-effect. Contrarily to more classic short-term spectral descriptors, TRAP-based features encode some form of prior knowledge of the problem considered through a trained feature extraction chain. The lack of appropriately annotated datasets raises however some new issues when it comes to training the feature extraction chain. Advanced unsupervised learning strategies are considered in this thesis and evaluated against more traditional supervised approaches relying on coarse-grained annotations such as music genres. Specialized learning strategies and specialized architectures are also proposed to compensate for some inherent variability of the data due either to album-related factors or to the dependence of music signals to the tempo of the performance.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH4412.pdf

Access type

restricted

Size

2.55 MB

Format

Adobe PDF

Checksum (MD5)

363d843cf287f5bc899a97f032a4f41f

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés