000218108 001__ 218108
000218108 005__ 20190617200516.0
000218108 037__ $$aREP_WORK
000218108 088__ $$aIdiap-RR-07-2016
000218108 245__ $$aOn Structured Sparsity of Phonological Posteriors for Linguistic Parsing
000218108 269__ $$a2016
000218108 260__ $$bIdiap$$c2016
000218108 336__ $$aReports
000218108 520__ $$aThe speech signal conveys information on different time scales from short (20–40 ms) time scale or segmental, associated to phonological and phonetic information to long (150–250 ms) time scale or supra segmental, associated to syllabic and prosodic information. Linguistic and neurocognitive studies recognize the phonological classes at segmental level as the essential and invariant representations used in speech temporal organization. In the context of speech processing, a deep neural network (DNN) is an effective computational method to infer the probability of individual phonological classes from a short segment of speech signal. A vector of all phonological class probabilities is referred to as phonological posterior. There are only very few classes comprising a short term speech signal; hence, the phonological posterior is a sparse vector. Although the phonological posteriors are estimated at segmental level, we claim that they convey supra-segmental information. Namely, we demonstrate that phonological posteriors are indicative of syllabic and prosodic events. Building on findings from converging linguistic evidence on the gestural model of Articulatory Phonology as well as neural basis of speech perception, we hypothesize that phonological posteriors convey properties of linguistic classes at multiple time scales, and this information is embedded in their support (index) of active coefficients. To verify this hypothesis, we obtain a binary representation of phonological posteriors at segmental level which is referred to as first-order sparsity structure; the high-order structures are obtained by concatenation of first-order binary vectors. It is then confirmed that classification of supra-segmental linguistic events, the problem known as linguistic parsing, can be achieved with high accuracy using a simple binary pattern matching of first-order or high-order structures.
000218108 6531_ $$aBinary pattern matching
000218108 6531_ $$aDeep neural network (DNN)
000218108 6531_ $$aLinguistic parsing
000218108 6531_ $$aphonological posteriors
000218108 6531_ $$aStructured sparse representation
000218108 700__ $$aCernak, Milos
000218108 700__ $$0243353$$g188259$$aAsaei, Afsaneh
000218108 700__ $$aBourlard, Hervé$$g117014$$0243348
000218108 8564_ $$uhttp://arxiv.org/abs/1601.05647$$zURL
000218108 8564_ $$uhttps://infoscience.epfl.ch/record/218108/files/Cernak_Idiap-RR-07-2016.pdf$$zn/a$$s830493$$yn/a
000218108 909C0 $$xU10381$$0252189$$pLIDIAP
000218108 909CO $$ooai:infoscience.tind.io:218108$$qGLOBAL_SET$$pSTI$$preport
000218108 937__ $$aEPFL-REPORT-218108
000218108 970__ $$aCernak_Idiap-RR-07-2016/LIDIAP
000218108 973__ $$aEPFL
000218108 980__ $$aREPORT