LoboVault Home

Syllables and Concepts in Large Vocabulary Speech Recognition


Please use this identifier to cite or link to this item: http://hdl.handle.net/1928/10865

Syllables and Concepts in Large Vocabulary Speech Recognition

Show full item record

Title: Syllables and Concepts in Large Vocabulary Speech Recognition
Author: De Palma, Paul A., 1947
Advisor(s): Luger, George
Committee Member(s): Croft, William
Smith, Caroline
Wooters, Charles
Department: University of New Mexico. Dept. of Linguistics
Subject: computational linguistics
computer science
speech recognition
language model
LC Subject(s): Automatic speech recognition
Speech processing systems
Computational linguistics
Degree Level: Doctoral
Abstract: Transforming an acoustic signal to words is the gold standard in automatic speech recognition. While recognizing that orthographic transcription is a valuable technique for comparing speech recognition systems without respect to application, it must also be recognized that transcription is not something that human beings do with their language partners. In fact, transforming speech into words is not necessary to emulate human performance in many contexts. By relaxing the constraint that the output of speech recognition be words, we might at the same time effectively relax the bias toward writing in speech recognition research. This puts our work in the camp of those who have argued over the years that speech and writing differ in significant ways. This study explores two hypotheses. The first is that a large vocabulary continuous speech recognition (LVCSR) system will perform more accurately if it were trained on syllables instead of words. Though several researchers have examined the use of syllables in the acoustic model of an LVCSR system, very little attention has been paid to their use in the language model. The second hypothesis has to do with adding a post-processing component to a recognizer equipped with a syllable language model. The first step is to group words that seem to mean the same thing into equivalence classes called concepts. The second step is to insert the equivalence classes into the output of a recognizer. The hypothesis is that by using this concept post-processor, we will achieve better results than with the syllable language model alone. The study reports that the perplexity of a trigram syllable language model drops by half when compared to a trigram word language model using the same training transcript. The drop in perplexity carries over to error rate. The error rate of a recognizer equipped with syllable language model drops by over 14% when compared with one using a word language model. Nevertheless, the study reports a slight increase in error rate when a concept post-processor is added to a recognizer equipped with a syllable language model. We conjecture that this is the result of deterministic mapping from syllable strings to concepts. Consequently, we outline a probabilistic mapping scheme from concepts to syllable strings.
Graduation Date: May 2010
URI: http://hdl.handle.net/1928/10865

Files in this item

Files Size Format View
Paul_De_Palma_4-15-10_Final.pdf 3.027Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record

UNM Libraries

Search LoboVault


My Account