Expressive speech processing has been improved in the recent years. However, it is still hard to detect emotion change in the same speech signal or to compare emotional content of a pair of speech signals, especially using unlabeled data. Therefore, feature embedding has been used in this work to enhance emotional content comparison for pairs of speech signals, cast as a classification task. Actually, feature embedding was proved to reduce the dimensionality and the intra-feature variance in the input space. Besides, deep autoencoders have recently been used as a feature embedding tool in several applications, such as image, gene and chemical data classification. In this work, a deep autoencoder is used for feature embedding before performing classification by vector quantization of the emotional content of pairs of speech signals. Autoencoding was performed following two schemes, for all features and for each group of features. The results show that the autoencoder succeeds (a) to reveal a more compact and a clearly separated structure of the mapped features, and (b) to improve the classification rates for the similarity/dissimilarity of all emotional content aspects that were compared, i.e neutrality, arousal and valence; in order to calculate the emotion identity metric.
Emotional Content Comparison in Speech Signal Using Feature Embedding
Rovetta S.;Mnasri Z.;Masulli F.
2021-01-01
Abstract
Expressive speech processing has been improved in the recent years. However, it is still hard to detect emotion change in the same speech signal or to compare emotional content of a pair of speech signals, especially using unlabeled data. Therefore, feature embedding has been used in this work to enhance emotional content comparison for pairs of speech signals, cast as a classification task. Actually, feature embedding was proved to reduce the dimensionality and the intra-feature variance in the input space. Besides, deep autoencoders have recently been used as a feature embedding tool in several applications, such as image, gene and chemical data classification. In this work, a deep autoencoder is used for feature embedding before performing classification by vector quantization of the emotional content of pairs of speech signals. Autoencoding was performed following two schemes, for all features and for each group of features. The results show that the autoencoder succeeds (a) to reveal a more compact and a clearly separated structure of the mapped features, and (b) to improve the classification rates for the similarity/dissimilarity of all emotional content aspects that were compared, i.e neutrality, arousal and valence; in order to calculate the emotion identity metric.File | Dimensione | Formato | |
---|---|---|---|
Progresses_in_Artificial_Intelligence_and_Neural_Systems_by_Anna-job_782.pdf
accesso aperto
Descrizione: Articolo in volume
Tipologia:
Documento in versione editoriale
Dimensione
715.07 kB
Formato
Adobe PDF
|
715.07 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.