Who Moved My Slide? Recognizing Entities In A Lecture Video And Its Applications
AuthorTung, Qiyam Junn
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractLecture videos have proliferated in recent years thanks to the increasing bandwidths of Internet connections and availability of video cameras. Despite the massive volume of videos available, there are very few systems that parse useful information from them. Extracting meaningful data can help with searching and indexing of lecture videos as well as improve understanding and usability for the viewers. While video tags and user preferences are good indicators for relevant videos, it is completely dependent on human-generated data. Furthermore, many lecture videos are technical by nature and sparse video tags are too coarse-grained to relate parts of a video by a specific topic. While extracting the text from the presentation slide will ameliorate this issue, a lecture video still contains significantly more information than what is just available on the presentation slides. That is, the actions and words of the speaker contribute to a richer and more nuanced understanding of the lecture material. The goal of the Semantically Linked Instructional Content (SLIC) project is to relate videos using more specific and relevant features such as slide text and other entities. In this work, we will present the algorithms used to recognize the entities of the lecture. Specifically, the entities in lecture videos are the laser and pointing hand gestures and the location of the slide and its text and images in the video. Our algorithms work under the assumption that the slide location (homography) is known for each frame and extend the knowledge of the scene. Specifically, gestures inform when and where on a slide notable events occur. We will also show how recognition of these entities can help with understanding lectures better and energy-savings on mobile devices. We conducted a user study that shows that magnifying text based on laser gestures on a slide helps direct a viewer's attention to the relevant text. We also performed empirical measurements on real cellphones to confirm that selectively dimming less relevant regions of the video frame would reduce energy consumption significantly.
Degree ProgramGraduate College