Show simple item record

dc.contributor.advisorCohen, Paulen_US
dc.contributor.authorMejia, Maria Helena
dc.creatorMejia, Maria Helenaen_US
dc.date.accessioned2013-01-14T19:06:06Z
dc.date.available2013-01-14T19:06:06Z
dc.date.issued2012
dc.identifier.urihttp://hdl.handle.net/10150/265361
dc.description.abstractThe goal of human action recognition on videos is to determine in an automatic way what is happening in a video. This work focuses on providing an answer to this question: given consecutive frames from a video where a person or persons are doing an action, is an automatic system able to recognize the action that is going on for each person? Seven approaches have been provided, most of them based on an alignment process in order to find a measure of distance or similarity for obtaining the classification. Some are based on fluents that are converted to qualitative sequences of Allen relations to make it possible to measure the distance between the pair of sequences by aligning them. The fluents are generated in various ways: representation based on feature extraction of human pose propositions in just an image or a small sequence of images, changes of time series mainly on the angle of slope, changes of the time series focus on the slope direction, and propositions based on symbolic sequences generated by SAX. Another approach based on alignment corresponds to Dynamic Time Warping on subsets of highly dependent parts of the body. An additional approach explored is based on SAX symbolic sequences and respective pair wise alignment. The last approach is based on discretization of the multivariate time series, but instead of alignment, a spectrum kernel and SVM are used as is employed to classify protein sequences in biology. Finally, a sliding window method is used to recognize the actions along the video. These approaches were tested on three datasets derived from RGB-D cameras (e.g., Microsoft Kinect) as well as ordinary video, and a selection of the approaches was compared to the results of other researchers.
dc.language.isoenen_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.subjectGesture Recognitionen_US
dc.subjectMachine learningen_US
dc.subjectComputer Scienceen_US
dc.subjectActivity recognitionen_US
dc.subjectComputer visionen_US
dc.titleHuman Action Recognition on Videos: Different Approachesen_US
dc.typetexten_US
dc.typeElectronic Dissertationen_US
thesis.degree.grantorUniversity of Arizonaen_US
thesis.degree.leveldoctoralen_US
dc.contributor.committeememberDowney, Peteren_US
dc.contributor.committeememberBarnard, Jacobusen_US
dc.contributor.committeememberMorrison, Claytonen_US
dc.contributor.committeememberCohen, Paulen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.namePh.D.en_US
refterms.dateFOA2018-09-04T00:58:50Z
html.description.abstractThe goal of human action recognition on videos is to determine in an automatic way what is happening in a video. This work focuses on providing an answer to this question: given consecutive frames from a video where a person or persons are doing an action, is an automatic system able to recognize the action that is going on for each person? Seven approaches have been provided, most of them based on an alignment process in order to find a measure of distance or similarity for obtaining the classification. Some are based on fluents that are converted to qualitative sequences of Allen relations to make it possible to measure the distance between the pair of sequences by aligning them. The fluents are generated in various ways: representation based on feature extraction of human pose propositions in just an image or a small sequence of images, changes of time series mainly on the angle of slope, changes of the time series focus on the slope direction, and propositions based on symbolic sequences generated by SAX. Another approach based on alignment corresponds to Dynamic Time Warping on subsets of highly dependent parts of the body. An additional approach explored is based on SAX symbolic sequences and respective pair wise alignment. The last approach is based on discretization of the multivariate time series, but instead of alignment, a spectrum kernel and SVM are used as is employed to classify protein sequences in biology. Finally, a sliding window method is used to recognize the actions along the video. These approaches were tested on three datasets derived from RGB-D cameras (e.g., Microsoft Kinect) as well as ordinary video, and a selection of the approaches was compared to the results of other researchers.


Files in this item

Thumbnail
Name:
azu_etd_12455_sip1_m.pdf
Size:
5.284Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record