A Novel Pitch Detection Algorithm Based on Instantaneous Frequency for Clean and Noisy Speech

IRIS

In this paper, a novel pitch detection algorithm (PDA) is proposed. Actually, pitch detection is a classical problem that has been investigated since the very beginning of speech processing. However, the novelty of the proposed method consists in establishing an empirical relationship between fundamental frequency (f(0)) and instantaneous frequency (f(i)), which serves as a basis to develop the proposed PDA. Even though f(0) and f(i) are defined as attributes of two different transforms, i.e., the Fourier transform and the Hilbert transform, respectively, the relationship proposed in this paper shows some interaction between both of them, at least empirically. The first step of this work consists in validating the proposed relationship on a large set of speech signals. Then, it is leveraged to develop an algorithm capable to (a) detect voiced/unvoiced parts of speech and (b) extract f(0) contour from f(i) values in the voiced parts. For evaluation purposes, the yielding f(0) contour is compared to some well-rated state-of-the-art PDA's. The main findings show that the quality of pitch detection obtained by the proposed technique is as satisfactory as some of top PDA's, either in clean or in simulated noisy speech. In addition, one of the main advantages consists in bypassing the traditional short-time analysis required to assume local stationarity in speech signal.

A Novel Pitch Detection Algorithm Based on Instantaneous Frequency for Clean and Noisy Speech

Zied Mnasri;Stefano Rovetta;Francesco Masulli

2022-01-01

Abstract

In this paper, a novel pitch detection algorithm (PDA) is proposed. Actually, pitch detection is a classical problem that has been investigated since the very beginning of speech processing. However, the novelty of the proposed method consists in establishing an empirical relationship between fundamental frequency (f(0)) and instantaneous frequency (f(i)), which serves as a basis to develop the proposed PDA. Even though f(0) and f(i) are defined as attributes of two different transforms, i.e., the Fourier transform and the Hilbert transform, respectively, the relationship proposed in this paper shows some interaction between both of them, at least empirically. The first step of this work consists in validating the proposed relationship on a large set of speech signals. Then, it is leveraged to develop an algorithm capable to (a) detect voiced/unvoiced parts of speech and (b) extract f(0) contour from f(i) values in the voiced parts. For evaluation purposes, the yielding f(0) contour is compared to some well-rated state-of-the-art PDA's. The main findings show that the quality of pitch detection obtained by the proposed technique is as satisfactory as some of top PDA's, either in clean or in simulated noisy speech. In addition, one of the main advantages consists in bypassing the traditional short-time analysis required to assume local stationarity in speech signal.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2022

Appare nelle tipologie:

01.01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
s00034-022-02082-8.pdf accesso chiuso Descrizione: Articolo su rivista Tipologia: Documento in versione editoriale Dimensione 2.27 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.27 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1105455

Citazioni

ND

3

0

social impact