BBS-Net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network

Deng Ping Fan, Inception Institute of Artificial Intelligence
Yingjie Zhai, Nankai University
Ali Borji, HCL America
Jufeng Yang, Nankai University
Ling Shao, Inception Institute of Artificial Intelligence


© 2020, Springer Nature Switzerland AG. Multi-level feature fusion is a fundamental topic in computer vision for detecting, segmenting and classifying objects at various scales. When multi-level features meet multi-modal cues, the optimal fusion problem becomes a hot potato. In this paper, we make the first attempt to leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to develop a novel cascaded refinement network. In particular, we 1) propose a bifurcated backbone strategy (BBS) to split the multi-level features into teacher and student features, and 2) utilize a depth-enhanced module (DEM) to excavate informative parts of depth cues from the channel and spatial views. This fuses RGB and depth modalities in a complementary way. Our simple yet efficient architecture, dubbed Bifurcated Backbone Strategy Network (BBS-Net), is backbone independent and outperforms 18 SOTAs on seven challenging datasets using four metrics.