Attention-Guided Spatial Transformer Networks for Fine-Grained Visual Recognition

Dichao LIU  Yu WANG  Jien KATO  

IEICE TRANSACTIONS on Information and Systems   Vol.E102-D   No.12   pp.2577-2586
Publication Date: 2019/12/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2019EDP7045
Type of Manuscript: PAPER
Category: Image Recognition, Computer Vision
recognition,  attention,  fine-grained,  deep learning,  

Full Text: PDF(7.8MB)>>
Buy this Article

The aim of this paper is to propose effective attentional regions for fine-grained visual recognition. Based on the Spatial Transformers' capability of spatial manipulation within networks, we propose an extension model, the Attention-Guided Spatial Transformer Networks (AG-STNs). This model can guide the Spatial Transformers with hard-coded attentional regions at first. Then such guidance can be turned off, and the network model will adjust the region learning in terms of the location and scale. Such adjustment is conditioned to the classification loss so that it is actually optimized for better recognition results. With this model, we are able to successfully capture detailed attentional information. Also, the AG-STNs are able to capture attentional information in multiple levels, and different levels of attentional information are complementary to each other in our experiments. A fusion of them brings better results.