SPEECH AND LANGUAGE TECHNOLOGIES FOR SEMANTICALLY LINKED INSTRUCTIONAL CONTENT
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractRecent advances in technology have made it possible to offer educational content online in the form of e-learning systems. The Semantically Linked Instructional Content (SLIC) system, developed at The University of Arizona,is one such system that hosts educational and technical videos online.This dissertation proposes the integration of speech and language technologies with the SLIC system.Speech transcripts are being used increasingly in video browsing systems to help understand the video content better and to do search on the content with text queries. Transcripts are especially useful for people with disabilities and those who have a limited understanding of the language of the video. Automatic Speech Recognizers (ASRs) are commonly used to generate speech transcripts for videos but are not consistent in their performance. This issue is more pronounced in a system like SLIC due to the technical nature of talks with words not seen in the ASR vocabulary and many speakers with different voices and accents making recognition harder.The videos in SLIC come with presentation slides that contain words specific to the talk subject and the speech transcript itself can be considered to be composed of these slide words interspersed with other words. Furthermore, the errors in the transcript are words that sound similar to what was actually spoken; notes instead of nodes for example. The errors that occur due to misrecognized slide words can be fixed if we know which slide words were actually spoken and where they occur in the transcript. In other words, the slide words are matched or aligned with the transcript.In this dissertation two algorithms are developed to phonetically align transcript words with slide words based on a Hidden Markov Model and a Hybrid hidden semi-Markov model respectively. The slide words constitute the hidden states and the transcript words are the observed states in both models. The alignment algorithms are adapted for different applications such as transcript correction (as already mentioned), search and indexing, video segmentation and closed captioning. Results from experiments conducted show that the corrected transcripts have improved accuracy andyield better search results for slide word queries.
Degree ProgramGraduate College
Degree GrantorUniversity of Arizona
The following license files are associated with this item: