Multiple Object Category Detection and Localization Using Generative and Discriminative Models

Dipankar DAS  Yoshinori KOBAYASHI  Yoshinori KUNO  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E92-D   No.10   pp.2112-2121
Publication Date: 2009/10/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E92.D.2112
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Image Recognition, Computer Vision
Keyword: 
object detection and localization,  SVM,  pLSA,  merging feature,  context information,  

Full Text: PDF>>
Buy this Article




Summary: 
This paper proposes an integrated approach to simultaneous detection and localization of multiple object categories using both generative and discriminative models. Our approach consists of first generating a set of hypotheses for each object category using a generative model (pLSA) with a bag of visual words representing each object. Based on the variation of objects within a category, the pLSA model automatically fits to an optimal number of topics. Then, the discriminative part verifies each hypothesis using a multi-class SVM classifier with merging features that combines spatial shape and appearance of an object. In the post-processing stage, environmental context information along with the probabilistic output of the SVM classifier is used to improve the overall performance of the system. Our integrated approach with merging features and context information allows reliable detection and localization of various object categories in the same image. The performance of the proposed framework is evaluated on the various standards (MIT-CSAIL, UIUC, TUD etc.) and the authors' own datasets. In experiments we achieved superior results to some state of the art methods over a number of standard datasets. An extensive experimental evaluation on up to ten diverse object categories over thousands of images demonstrates that our system works for detecting and localizing multiple objects within an image in the presence of cluttered background, substantial occlusion, and significant scale changes.