Details

Title

Deep Belief Neural Networks and Bidirectional Long-Short Term Memory Hybrid for Speech Recognition

Journal title

Archives of Acoustics

Yearbook

2015

Numer

No 2

Publication authors

Keywords

This paper describes a Deep Belief Neural Network (DBNN) and Bidirectional Long-Short Term Mem-ory (LSTM) hybrid used as an acoustic model for Speech Recognition. It was demonstrated by manyindependent researchers that DBNNs exhibit superior performance to other known machine learningframeworks in terms of speech recognition accuracy. Their superiority comes from the fact that theseare deep learning networks. However, a trained DBNN is simply a feed-forward network with no internalmemory, unlike Recurrent Neural Networks (RNNs) which are Turing complete and do posses internalmemory, thus allowing them to make use of longer context. In this paper, an experiment is performedto make a hybrid of a DBNN with an advanced bidirectional RNN used to process its output. Resultsshow that the use of the new DBNN-BLSTM hybrid as the acoustic model for the Large VocabularyContinuous Speech Recognition (LVCSR) increases word recognition accuracy. However, the new modelhas many parameters and in some cases it may suffer performance issues in real-time applications.

Divisions of PAS

Nauki Techniczne

Abstract

This paper describes a Deep Belief Neural Network (DBNN) and Bidirectional Long-Short Term Mem- ory (LSTM) hybrid used as an acoustic model for Speech Recognition. It was demonstrated by many independent researchers that DBNNs exhibit superior performance to other known machine learning frameworks in terms of speech recognition accuracy. Their superiority comes from the fact that these are deep learning networks. However, a trained DBNN is simply a feed-forward network with no internal memory, unlike Recurrent Neural Networks (RNNs) which are Turing complete and do posses internal memory, thus allowing them to make use of longer context. In this paper, an experiment is performed to make a hybrid of a DBNN with an advanced bidirectional RNN used to process its output. Results show that the use of the new DBNN-BLSTM hybrid as the acoustic model for the Large Vocabulary Continuous Speech Recognition (LVCSR) increases word recognition accuracy. However, the new model has many parameters and in some cases it may suffer performance issues in real-time applications.

Publisher

Committee on Acoustics PAS, PAS Institute of Fundamental Technological Research, Polish Acoustical Society

Identifier

ISSN 0137-5075 ; eISSN 2300-262X

DOI

10.1515/aoa-2015-0021

×