Please login using the form on menu list.|
It is required to login for Full-Text PDF.
Real-Time Counting People in Crowded Areas by Using Local Empirical Templates and Density Ratios
IEICE TRANSACTIONS on Information and Systems Vol.E95-D No.7 pp.1791-1803
Publication Date: 2012/07/01
Online ISSN: 1745-1361
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Machine Vision and its Applications)
local empirical templates,
local density ratios,
density ratio bounds,
and people counting,
Full Text: PDF(4.9MB)
In this paper, a fast and automated method of counting pedestrians in crowded areas is proposed along with three contributions. We firstly propose Local Empirical Templates (LET), which are able to outline the foregrounds, typically made by single pedestrians in a scene. LET are extracted by clustering foregrounds of single pedestrians with similar features in silhouettes. This process is done automatically for unknown scenes. Secondly, comparing the size of group foreground made by a group of pedestrians to that of appropriate LET captured in the same image patch with the group foreground produces the density ratio. Because of the local scale normalization between sizes, the density ratio appears to have a bound closely related to the number of pedestrians who induce the group foreground. Finally, to extract the bounds of density ratios for groups of different number of pedestrians, we propose a 3D human models based simulation in which camera viewpoints and pedestrians' proximity are easily manipulated. We collect hundreds of typical occluded-people patterns with distinct degrees of human proximity and under a variety of camera viewpoints. Distributions of density ratios with respect to the number of pedestrians are built based on the computed density ratios of these patterns for extracting density ratio bounds. The simulation is performed in the offline learning phase to extract the bounds from the distributions, which are used to count pedestrians in online settings. We reveal that the bounds seem to be invariant to camera viewpoints and humans' proximity. The performance of our proposed method is evaluated with our collected videos and PETS 2009's datasets. For our collected videos with the resolution of 320 × 240, our method runs in real-time with good accuracy and frame rate of around 30 fps, and consumes a small amount of computing resources. For PETS 2009's datasets, our proposed method achieves competitive results with other methods tested on the same datasets ,.