%0 Generic %D 2014 %T Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts. %A Xianjie Chen %A Roozbeh Mottaghi %A Xiaobai Liu %A Sanja Fidler %A Raquel Urtasun %A Alan Yuille %X

Detecting objects becomes difficult when we need to deal with large shape deformation, occlusion and low resolution. We propose a novel approach to i) handle large deformations and partial occlusions in animals (as examples of highly deformable objects), ii) describe them in terms of body parts, and iii) detect them when their body parts are hard to detect (e.g., animals depicted at low resolution). We represent the holistic object and body parts separately and use a fully connected model to arrange templates for the holistic object and body parts. Our model automatically decouples the holistic object or body parts from the model when they are hard to detect. This enables us to represent a large number of holistic object and body part combinations to better deal with different “detectability” patterns caused by deformations, occlusion and/or low resolution.
We apply our method to the six animal categories in the PASCAL VOC dataset and show that our method significantly improves state-of-the-art (by 4.1% AP) and provides a richer representation for objects. During training we use annotations for body parts (e.g., head, torso, etc), making use of a new dataset of fully annotated object parts for PASCAL VOC 2010, which provides a mask for each part.

%8 06/2014 %1

arXiv:1406.2031

%2

http://hdl.handle.net/1721.1/100179

%0 Generic %D 2014 %T Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding. %A Roozbeh Mottaghi %A Sanja Fidler %A Alan Yuille %A Raquel Urtasun %A Devi Parikh %X

Recent trends in image understanding have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning, and local appearance based classifiers. In this work, we are interested in understanding the roles of these different tasks in improved scene understanding, in particular semantic segmentation, object detection and scene recognition. Towards this goal, we “plug-in” human subjects for each of the various components in a state-of-the-art conditional random field model. Comparisons among various hybrid human-machine CRFs give us indications of how much “head room” there is to improve scene understanding by focusing research efforts on various individual tasks.

%8 06/2014 %1

arXiv:1406.3906

%2

http://hdl.handle.net/1721.1/100184