Discovering discrete subword units with binarized autoencoders and hidden-Markov-model encoders

IRIS

In this paper we address the problem of unsupervised learning of discrete subword units. Our approach is based on Deep Autoencoders (AEs), whose encoding node values are thresholded to subsequently generate a symbolic, i.e., 1-of-K (with K = No. of subwords), representation of each speech frame. We experiment with two variants of the standard AE which we have named Binarized Autoencoder and Hidden-Markov-Model Encoder. The first forces the binary encoding nodes to have a Ushaped distribution (with peaks at 0 and 1) while minimizing the reconstruction error. The latter jointly learns the symbolic encoding representation (i.e., subwords) and the prior and transition distribution probabilities of the learned subwords. The ABX evaluation of the Zero Resource Challenge - Track 1 shows that a deep AE with only 6 encoding nodes, which assigns to each frame a 1-of-K binary vector with K = 26, can outperform real-valued MFCC representations in the acrossspeaker setting. Binarized AEs can outperform standard AEs when using a larger number of encoding nodes, while HMM Encoders may allow more compact subword transcriptions without worsening the ABX performance.

Discovering discrete subword units with binarized autoencoders and hidden-Markov-model encoders

Leonardo Badino;MERETA, ALESSIO;Lorenzo Rosasco

2015-01-01

Abstract

In this paper we address the problem of unsupervised learning of discrete subword units. Our approach is based on Deep Autoencoders (AEs), whose encoding node values are thresholded to subsequently generate a symbolic, i.e., 1-of-K (with K = No. of subwords), representation of each speech frame. We experiment with two variants of the standard AE which we have named Binarized Autoencoder and Hidden-Markov-Model Encoder. The first forces the binary encoding nodes to have a Ushaped distribution (with peaks at 0 and 1) while minimizing the reconstruction error. The latter jointly learns the symbolic encoding representation (i.e., subwords) and the prior and transition distribution probabilities of the learned subwords. The ABX evaluation of the Zero Resource Challenge - Track 1 shows that a deep AE with only 6 encoding nodes, which assigns to each frame a 1-of-K binary vector with K = 26, can outperform real-valued MFCC representations in the acrossspeaker setting. Binarized AEs can outperform standard AEs when using a larger number of encoding nodes, while HMM Encoders may allow more compact subword transcriptions without worsening the ABX performance.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2015

Appare nelle tipologie:

04.01 - Contributo in atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Discovering discrete subword units with binarized autoencoders.pdf accesso chiuso Tipologia: Documento in versione editoriale Dimensione 1.41 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.41 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/888665

Citazioni

ND

ND

ND

social impact