A. Graves, N. Beringer, and J. Schmidhuber (Switzerland)
Speech Recognition, LSTM, RNN, SNN, Timewarping
In this paper we demonstrate that Long Short-Term Memory (LSTM) is a differentiable recurrent neu ral net (RNN) capable of robustly categorizing time warped speech data. We measure its performance on a spoken digit identification task, where the data was spike-encoded in such a way that classifying the utter ances became a difficult challenge in non-linear time warping. We find that LSTM gives greatly superior results to an SNN found in the literature, and conclude that the architecture has a place in domains that re quire the learning of large timewarped datasets, such as automatic speech recognition.
Important Links:
Go Back