Automatic Transcription of Audio Archives for Spoken Document Retrieval

P. Ircing, J. Psutka, and V. Radová (Czech Republic)


speech recognition, spoken document retrieval


The paper presents a system for automatic transcription of spontaneous Czech speech for the spoken document re trieval purposes. It focuses mainly on the language mod eling technique that improves the recognition of words be longing to categories which proved to be important for the information retrieval purposes - the named entities. Using a specially designed class language model based on the mor phological tags, we have managed to boost the transcrip tion accuracy of the named entities (in terms of word error rate) by 10% to 25% relative over the baseline word-based language model. Moreover, the overall word error rate was also reduced by approximately 3% relative.

Important Links:

Go Back