ASAN: Self-Attending and Semantic Activating Network towards Better Object Detection

Xinyu ZHU  Jun ZHANG  Gengsheng CHEN  

IEICE TRANSACTIONS on Information and Systems   Vol.E103-D   No.3   pp.648-659
Publication Date: 2020/03/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2019EDP7164
Type of Manuscript: PAPER
Category: Image Recognition, Computer Vision
object detection,  self-attending,  semantic activating,  

Full Text: PDF(3.6MB)>>
Buy this Article

Recent top-performing object detectors usually depend on a two-stage approach, which benefits from its region proposal and refining practice but suffers low detection speed. By contrast, one-stage approaches have the advantage of high efficiency while sacrifice their accuracies to some extent. In this paper, we propose a novel single-shot object detection network which inherits the merits of both. Motivated by the idea of semantic enrichment to the convolutional features within a typical deep detector, we propose two novel modules: 1) by modeling the semantic interactions between channels and the long-range dependencies between spatial positions, the self-attending module generates both channel and position attention, and enhance the original convolutional features in a self-guided manner; 2) leveraging the class-discriminative localization ability of classification-trained CNN, the semantic activating module learns a semantic meaningful convolutional response which augments low-level convolutional features with strong class-specific semantic information. The so called self-attending and semantic activating network (ASAN) achieves better accuracy than two-stage methods and is able to fulfil real-time processing. Comprehensive experiments on PASCAL VOC indicates that ASAN achieves state-of-the-art detection performance with high efficiency.