Imperial College London > Talks@ee.imperial > COMMSP Seminar > Spatial features of reverberant speech: estimation and application to recognition and diarization

Spatial features of reverberant speech: estimation and application to recognition and diarization

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Alastair Moore.

This talk will provide an overview of Pablo’s PhD research as prelude to his viva

Abstract Distant talking scenarios, such as hands-free calling or teleconference meetings, are essential for natural and comfortable human-machine interaction and they are being increasingly used in multiple contexts. The acquired speech signal in such scenarios is reverberant and affected by additive noise. This signal distortion degrades the performance of speech recognition and diarization systems creating troublesome human-machine interactions.

This thesis proposes a method to non-intrusively estimate room acoustic parameters (NIRA), paying special attention to a room acoustic parameter highly correlated with speech recognition degradation: clarity index. In addition, a method to provide information regarding the estimation accuracy is proposed.

An analysis of the phoneme recognition performance for multiple reverberant environments is presented, from which a confusability metric for each phoneme is derived. This confusability metric is then employed to improve reverberant speech recognition performance. Additionally, room acoustic parameters can as well be used in speech recognition to provide robustness against reverberation. A method to exploit clarity index estimates in order to perform reverberant speech recognition is introduced.

Finally, room acoustic parameters can also be used to diarize reverberant speech. A room acoustic parameter is proposed to be used as an additional source of information for single-channel diarization purposes in reverberant environments. In multi-channel environments, the time delay of arrival is a feature commonly used to diarize the input speech, however the computation of this feature is affected by reverberation. A method is presented to model the time delay of arrival in a robust manner so that speaker diarization is more accurately performed.

Biography Pablo Peso Parada received his B.Sc. and M.Sc. degrees in Telecommunication Engineering from the University of Vigo, Spain, in 2008 and 2011 respectively, and M.Sc.Res degree in Signal Theory and Communications from the University of Vigo in 2013. Between July 2010 and September 2011, he was a research fellow at University of Vigo focusing on speech recognition confidence measures. He was a research intern at Sony, Germany, from June 2012 to November 2012, focusing on signal processing for brain-computer interface. Between December 2012 and March 2013 he was a research engineer at Voice INTER connect, Germany, working on embedded speech recognition. Pablo was an ESR Marie Curie fellow at Nuance Communications, Inc., United Kingdom, from April 2013 to June 2016 where he wrote his PhD thesis. His research interests include automatic speech recognition (ASR), particularly distant speech recognition, diarization and room acoustic parameter estimation.

This talk is part of the COMMSP Seminar series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Changes to Talks@imperial | Privacy and Publicity