BFF R-CNN: Balanced Feature Fusion for Object Detection

Hongzhe LIU
Ningwei WANG
Xuewei LI
Cheng XU
Yaze LI

IEICE TRANSACTIONS on Information and Systems   Vol.E105-D    No.8    pp.1472-1480
Publication Date: 2022/08/01
Publicized: 2022/05/17
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2021EDP7261
Type of Manuscript: PAPER
Category: Image Recognition, Computer Vision
deep learning,  neural network,  object detection,  feature fusion,  

Full Text: PDF(1.5MB)>>
Buy this Article

In the neck part of a two-stage object detection network, feature fusion is generally carried out in either a top-down or bottom-up manner. However, two types of imbalance may exist: feature imbalance in the neck of the model and gradient imbalance in the region of interest extraction layer due to the scale changes of objects. The deeper the network is, the more abstract the learned features are, that is to say, more semantic information can be extracted. However, the extracted image background, spatial location, and other resolution information are less. In contrast, the shallow part can learn little semantic information, but a lot of spatial location information. We propose the Both Ends to Centre to Multiple Layers (BEtM) feature fusion method to solve the feature imbalance problem in the neck and a Multi-level Region of Interest Feature Extraction (MRoIE) layer to solve the gradient imbalance problem. In combination with the Region-based Convolutional Neural Network (R-CNN) framework, our Balanced Feature Fusion (BFF) method offers significantly improved network performance compared with the Faster R-CNN architecture. On the MS COCO 2017 dataset, it achieves an average precision (AP) that is 1.9 points and 3.2 points higher than those of the Feature Pyramid Network (FPN) Faster R-CNN framework and the Generic Region of Interest Extractor (GRoIE) framework, respectively.

open access publishing via