Search for: [Keywords = "ASR"]

Search results

Number of results: 2

items per page: 25 50 75

Sort by:

of 1

Two-Microphone Dereverberation for Automatic Speech Recognition of Polish

Mikolaj Kundegorski Philip J.B. Jackson Bartosz Ziółko

Archives of Acoustics | 2014 | vol. 39 | No 3 | 411-420 | DOI: 10.2478/aoa-2014-0045

Keywords speech enhancement reverberation ASR Polish

Download PDF Download RIS Download Bibtex

Abstract

Reverberation is a common problem for many speech technologies, such as automatic speech recognition (ASR) systems. This paper investigates the novel combination of precedence, binaural and statistical independence cues for enhancing reverberant speech, prior to ASR, under these adverse acoustical conditions when two microphone signals are available. Results of the enhancement are evaluated in terms of relevant signal measures and accuracy for both English and Polish ASR tasks. These show inconsistencies between the signal and recognition measures, although in recognition the proposed method consistently outperforms all other combinations and the spectral-subtraction baseline.

Go to article

Authors and Affiliations

Mikolaj Kundegorski

Philip J.B. Jackson

Bartosz Ziółko

A Study on the Impact of Lombard Effect on Recognition of Hindi Syllabic Units Using CNN Based Multimodal ASR Systems

Sadasivam Uma Maheswari A. Shahina Ramesh Rishickesh A. Nayeemulla Khan

Archives of Acoustics | 2020 | vol. 45 | No 3 | 419-431 | DOI: 10.24425/aoa.2020.134058

Keywords Lombard speech multimodal ASR throat microphone visual speech Convolutional Neural Network Hidden Markov Model late fusion intermediate fusion

Download PDF Download RIS Download Bibtex

Abstract

Research work on the design of robust multimodal speech recognition systems making use of acoustic and visual cues, extracted using the relatively noise robust alternate speech sensors is gaining interest in recent times among the speech processing research fraternity. The primary objective of this work is to study the exclusive influence of Lombard effect on the automatic recognition of the confusable syllabic consonant-vowel units of Hindi language, as a step towards building robust multimodal ASR systems in adverse environments in the context of Indian languages which are syllabic in nature. The dataset for this work comprises the confusable 145 consonant-vowel (CV) syllabic units of Hindi language recorded simultaneously using three modalities that capture the acoustic and visual speech cues, namely normal acoustic microphone (NM), throat microphone (TM) and a camera that captures the associated lip movements. The Lombard effect is induced by feeding crowd noise into the speaker’s headphone while recording. Convolutional Neural Network (CNN) models are built to categorise the CV units based on their place of articulation (POA), manner of articulation (MOA), and vowels (under clean and Lombard conditions). For validation purpose, corresponding Hidden Markov Models (HMM) are also built and tested. Unimodal Automatic Speech Recognition (ASR) systems built using each of the three speech cues from Lombard speech show a loss in recognition of MOA and vowels while POA gets a boost in all the systems due to Lombard effect. Combining the three complimentary speech cues to build bimodal and trimodal ASR systems shows that the recognition loss due to Lombard effect for MOA and vowels reduces compared to the unimodal systems, while the POA recognition is still better due to Lombard effect. A bimodal system is proposed using only alternate acoustic and visual cues which gives a better discrimination of the place and manner of articulation than even standard ASR system. Among the multimodal ASR systems studied, the proposed trimodal system based on Lombard speech gives the best recognition accuracy of 98%, 95%, and 76% for the vowels, MOA and POA, respectively, with an average improvement of 36% over the unimodal ASR systems and 9% improvement over the bimodal ASR systems.

Go to article

Authors and Affiliations

Sadasivam Uma Maheswari

A. Shahina

Ramesh Rishickesh

A. Nayeemulla Khan

Search results

Filters

Search results

Two-Microphone Dereverberation for Automatic Speech Recognition of Polish

Abstract

Authors and Affiliations

A Study on the Impact of Lombard Effect on Recognition of Hindi Syllabic Units Using CNN Based Multimodal ASR Systems

Abstract

Authors and Affiliations