Show simple item record

dc.contributor.advisorBarnard, Kobusen_US
dc.contributor.authorDel Pero, Luca
dc.creatorDel Pero, Lucaen_US
dc.date.accessioned2013-07-25T15:51:39Z
dc.date.available2013-07-25T15:51:39Z
dc.date.issued2013
dc.identifier.urihttp://hdl.handle.net/10150/297040
dc.description.abstractPeople can understand the content of an image without effort. We can easily identify the objects in it, and figure out where they are in the 3D world. Automating these abilities is critical for many applications, like robotics, autonomous driving and surveillance. Unfortunately, despite recent advancements, fully automated vision systems for image understanding do not exist. In this work, we present progress restricted to the domain of images of indoor scenes, such as bedrooms and kitchens. These environments typically have the "Manhattan" property that most surfaces are parallel to three principal ones. Further, the 3D geometry of a room and the objects within it can be approximated with simple geometric primitives, such as 3D blocks. Our goal is to reconstruct the 3D geometry of an indoor environment while also understanding its semantic meaning, by identifying the objects in the scene, such as beds and couches. We separately model the 3D geometry, the camera, and an image likelihood, to provide a generative statistical model for image data. Our representation captures the rich structure of an indoor scene, by explicitly modeling the contextual relationships among its elements, such as the typical size of objects and their arrangement in the room, and simple physical constraints, such as 3D objects do not intersect. This ensures that the predicted image interpretation will be globally coherent geometrically and semantically, which allows tackling the ambiguities caused by projecting a 3D scene onto an image, such as occlusions and foreshortening. We fit this model to images using MCMC sampling. Our inference method combines bottom-up evidence from the data and top-down knowledge from the 3D world, in order to explore the vast output space efficiently. Comprehensive evaluation confirms our intuition that global inference of the entire scene is more effective than estimating its individual elements independently. Further, our experiments show that our approach is competitive and often exceeds the results of state-of-the-art methods.
dc.language.isoenen_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.subjectBayesian inferenceen_US
dc.subjectComputer Visionen_US
dc.subjectIndoor scenesen_US
dc.subjectObject recognitionen_US
dc.subjectScene understandingen_US
dc.subjectComputer Scienceen_US
dc.subject3D reconstructionen_US
dc.titleTop-Down Bayesian Modeling and Inference for Indoor Scenesen_US
dc.typetexten_US
dc.typeElectronic Dissertationen_US
thesis.degree.grantorUniversity of Arizonaen_US
thesis.degree.leveldoctoralen_US
dc.contributor.committeememberBarnard, Kobusen_US
dc.contributor.committeememberCohen, Paulen_US
dc.contributor.committeememberEfrat, Alonen_US
dc.contributor.committeememberMorrison, Claytonen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.namePh.D.en_US
refterms.dateFOA2018-06-19T03:49:41Z
html.description.abstractPeople can understand the content of an image without effort. We can easily identify the objects in it, and figure out where they are in the 3D world. Automating these abilities is critical for many applications, like robotics, autonomous driving and surveillance. Unfortunately, despite recent advancements, fully automated vision systems for image understanding do not exist. In this work, we present progress restricted to the domain of images of indoor scenes, such as bedrooms and kitchens. These environments typically have the "Manhattan" property that most surfaces are parallel to three principal ones. Further, the 3D geometry of a room and the objects within it can be approximated with simple geometric primitives, such as 3D blocks. Our goal is to reconstruct the 3D geometry of an indoor environment while also understanding its semantic meaning, by identifying the objects in the scene, such as beds and couches. We separately model the 3D geometry, the camera, and an image likelihood, to provide a generative statistical model for image data. Our representation captures the rich structure of an indoor scene, by explicitly modeling the contextual relationships among its elements, such as the typical size of objects and their arrangement in the room, and simple physical constraints, such as 3D objects do not intersect. This ensures that the predicted image interpretation will be globally coherent geometrically and semantically, which allows tackling the ambiguities caused by projecting a 3D scene onto an image, such as occlusions and foreshortening. We fit this model to images using MCMC sampling. Our inference method combines bottom-up evidence from the data and top-down knowledge from the 3D world, in order to explore the vast output space efficiently. Comprehensive evaluation confirms our intuition that global inference of the entire scene is more effective than estimating its individual elements independently. Further, our experiments show that our approach is competitive and often exceeds the results of state-of-the-art methods.


Files in this item

Thumbnail
Name:
azu_etd_12819_sip1_m.pdf
Size:
11.85Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record