Testing quality in interlingual respeaking and other methods of interlingual live subtitling

Pagano, Alice

doi:10.15167/pagano-alice_phd2022-07-12

Live subtitling (LS) finds its foundations in pre-recorded subtitling for the d/Deaf and hard of hearing (SDH) to produce real-time subtitles for live events and programs. LS implies the transfer from oral into written content (intersemiotic translation) and can be carried out from and to the same language (intralingual), or from one language to another (interlingual) to provide full accessibility for all, therefore combining SDH to the need of guaranteeing multilingual access as well. Interlingual Live Subtitling (from now on referred to as ILS) in real-time is currently being achieved by using different methods: the focus here is placed on interlingual respeaking as one of the currently used methods of LS – also referred to in this work as speech-to-text interpreting (STTI) – which has triggered growing interest also in the Italian industry over the past years. The hereby presented doctoral thesis intends to provide a wider picture of the literature and the research on intralingual and interlingual respeaking to the date, emphasizing the current situation in Italy in this practice. The aim of the research was to explore different ILS methods through their strengths and weaknesses, in an attempt to inform the industry on the impact that both potentialities and risks can have on the final overall quality of the subtitles with the involvement of different techniques in producing ILS. To do so, five ILS workflows requiring human and machine interaction to different extents were tested overall in terms of quality, thus not only from a linguistic accuracy point of view, but also considering another crucial factor such as delay in the broadcast of the subtitles. Two case studies were carried out with different language pairs: a first experiment (English to Italian) tested and assessed quality in interlingual respeaking on one hand, then simultaneous interpreting (SI) combined with intralingual respeaking, and SI and Automatic Speech Recognition (ASR) on the other. A second experiment (Spanish to Italian) evaluated and compared all the five methods: the first three again, and two others more machine-centered: intralingual respeaking combined with machine translation (MT), and ASR with MT. Two workshops in interlingual respeaking were offered at the master’s degree in Translation and Interpreting from the University of Genova to prepare students for the experiments, aimed at testing different training modules on ILS and their effectiveness on students’ learning outcomes. For the final experiments, students were assigned different roles for each tested method and performed different required tasks producing ILS from the same source text: a video of a full original speech at a live event. The obtained outputs were analyzed using the NTR model (Romero-Fresco & Pöchhacker, 2017) and the delay was calculated for each method. Preliminary quantitative results deriving from the NTR analyses and the calculation of delay were compared to other two case studies conducted by the University of Vigo and the University of Surrey, showing that more and fully-automated workflows are, indeed, faster than the others, while they still present several important issues in translation and punctuation. Albeit on a small scale, the research also shows how urgent and potentially easy could be to educate translators and interpreters in respeaking during their training phase, given their keen interest in the subject matter. It is hoped that the results obtained can better shed light on the repercussions of the use of different methods and induce further reflection on the importance of human interaction with automatic machine systems in providing high quality accessibility at live events. It is also hoped that involved students’ interest in this field, which was completely unknown to them prior to this research, can inform on the urgency of raising students’ awareness and competence acquisition in the field of live subtitling through respeaking.

La sottotitolazione in tempo reale (Live Subtitling, LS), trova le sue fondamenta nella sottotitolazione preregistrata per non udenti e ipoudenti per la produzione di sottotitoli per eventi o programmi televisivi dal vivo. La sottotitolazione live comporta il trasferimento da un contenuto orale a uno scritto (traduzione intersemiotica) e può essere effettuata da e verso la stessa lingua (intralinguistica), o da una lingua a un’altra (interlinguistica), fornendo così accessibilità per soggetti non udenti e al tempo stesso garantendo accesso multilingue ai contenuti audiovisivi. La sottotitolazione interlinguistica in tempo reale (d'ora in poi indicata come ILS, Interlingual Live Subtitling) viene attualmente realizzata con diversi metodi: l'attenzione è qui posta sulla tecnica del respeaking interlinguistico, uno dei metodi di sottotitolazione in tempo reale o speech-to-text interpreting (STTI) che ha suscitato negli ultimi anni un crescente interesse, anche nel panorama italiano. Questa tesi di Dottorato intende fornire un quadro della letteratura e della ricerca sul respeaking intralinguistico e interlinguistico fino ad oggi, con particolare enfasi sulla situazione attuale in Italia di questa pratica. L'obiettivo della ricerca è stato quello di esplorare diversi metodi di ILS, mettendone in luce i punti di forza e le debolezze nel tentativo di informare il settore delle potenzialità e dei rischi che possono riflettersi sulla qualità complessiva finale dei sottotitoli attraverso l’utilizzo di diverse tecniche. Per fare ciò, sono stati testati in totale cinque metodi di ILS con diversi gradi di interazione uomo-macchina; ciascun metodo è stato analizzato in termini di qualità, quindi non solo dal punto di vista dell'accuratezza linguistica, ma anche considerando un altro fattore cruciale quale il ritardo nella trasmissione dei sottotitoli stessi. Nello svolgimento della ricerca sono stati condotti due casi di studio con diverse coppie linguistiche: il primo esperimento (dall'inglese all'italiano) ha testato e valutato la qualità di respeaking interlinguistico, interpretazione simultanea insieme a respeaking intralinguistico e, infine, interpretazione simultanea e sistema di riconoscimento automatico del parlato (Automatic Speech Recognition, ASR). Il secondo esperimento (dallo spagnolo all'italiano) ha valutato e confrontato cinque i metodi: i primi tre appena menzionati e altri due in cui la macchina svolgeva la maggior parte se non la totalità del lavoro: respeaking intralinguistico e traduzione automatica (Machine Translation, MT), e ASR con MT. Sono stati offerti due laboratori di respeaking interlinguistico nel Corso magistrale in Traduzione e Interpretazione dell'Università di Genova per preparare gli studenti agli esperimenti, volti a testare diversi moduli di formazione sull'ILS e la loro efficacia sull’apprendimento degli studenti. Durante le fasi di test, agli studenti sono stati assegnati diversi ruoli per ogni metodo, producendo sottotitoli interlinguistici live a partire dallo stesso testo di partenza: un video di un discorso originale completo durante un evento dal vivo. Le trascrizioni ottenute, sotto forma di sottotitoli, sono state analizzate utilizzando il modello NTR (Romero-Fresco & Pöchhacker, 2017) e per ciascun metodo è anche stato calcolato il ritardo. I risultati quantitativi preliminari derivanti dalle analisi NTR e dal calcolo del ritardo sono stati confrontati con altri due casi di studio condotti dall'Università di Vigo (Spagna) e dall'Università del Surrey (Gran Bretagna), sottolineando come i flussi di lavoro più automatizzati o completamente automatizzati siano effettivamente più veloci degli altri, ma al contempo presentino ancora diversi problemi di traduzione e di punteggiatura. Anche se su scala ridotta, la ricerca dimostra anche quanto sia urgente e possa potenzialmente essere facile formare i traduttori e gli interpreti sul respeaking durante il loro percorso accademico, grazie anche al loro spiccato interesse per la materia. Si spera che i risultati ottenuti possano meglio mettere in luce le ripercussioni dell'uso dei diversi metodi a confronto, nonché indurre un'ulteriore riflessione sull'importanza dell'interazione umana con i sistemi automatici di traduzione e di riconoscimento del parlato nel fornire accessibilità di alta qualità per eventi dal vivo. Si spera inoltre che l’interesse degli studenti in questo campo, che era a loro completamente sconosciuto prima di questa ricerca, possa informare sull'urgenza di sensibilizzare gli studenti nel campo della sottotitolazione dal vivo attraverso il respeaking.