Abstract
This paper describes a Deep Belief Neural Network (DBNN) and Bidirectional
Long-Short Term Memory (LSTM) hybrid used as an acoustic model for Speech
Recognition. It was demonstrated by many independent researchers that
DBNNs exhibit superior performance to other known machine learning
frameworks in terms of speech recognition accuracy. Their superiority
comes from the fact that these are deep learning networks. However, a
trained DBNN is simply a feed-forward network with no internal memory,
unlike Recurrent Neural Networks (RNNs) which are Turing complete and do
posses internal memory, thus allowing them to make use of longer context.
In this paper, an experiment is performed to make a hybrid of a DBNN with
an advanced bidirectional RNN used to process its output. Results show
that the use of the new DBNN-BLSTM hybrid as the acoustic model for the
Large Vocabulary Continuous Speech Recognition (LVCSR) increases word
recognition accuracy. However, the new model has many parameters and in
some cases it may suffer performance issues in real-time applications.
Go to article