《厦门大学学报（自然科学版）》

由于热图像存在无颜色信息,边缘模糊,细节信息较弱等问题,较难获得高质量的图像分割效果.为解决这个问题,在编码-解码(encode-decode)架构的基础上,本文增加了多级像素空间注意模块(multi-level pixel spatial attention module, MPAM)、边缘提取模块(edge extraction module, EEM)和小目标提取模块(tiny target extraction module, TTM).其中,MPAM能使网络充分保留细节的同时捕捉到语义信息,EEM和TTM分别提取具有语义信息的边缘和小目标等细节特征.为提高各类别边缘相交区域像素点和小目标物体的预测精度,设计了专门的损失函数对已获得的边缘和小目标特征进行监督训练,提高各类别边缘相交区域像素点和小目标物体的预测精度.将该方法分别应用于课题组构建的热图像数据集SCUT_SEG、公开的热图像数据集SODA和合成热红外数据集Cityscpae,实验结果表明:本文方法比FCN、PSPNet、Deeplabv3+、MCNet、EC-CNN等5种网络分割算法效果略好,性能提升约2.2个百分点.

Objective: As a fundamental process for night-time autonomous driving and night-time intelligent monitoring, thermal image segmentation has drawn extensive attention. Despite much research effort that has been devoted to thermal image segmentation, high quality segmented results cannot be readily obtained due to the absence of color information, blurred edges, and weak details in thermal images. Here, based on Deeplabv3+, we propose a tiny-target and edge-enhanced network to tackle problems related to edges and small targets.
Methods: The tiny targets and edge-enhanced network algorithm is based on the baseline of Deeplabv3+ or other segmentation baselines, and the former is used herein. First, we design a multi-level pixel spatial attention module (MPAM). This MPAM enables the network to make full use of the feature and context information of each layer so that details at the pixel space can be effectively recovered. Second, we design both an edge extraction module (EEM) and a tiny target extraction module (TTM), which are modeled in EEM and TTM, respectively. Output results of modules above can lead to accurate features of the edges and small targets. Finally, specialized loss functions have been designed to supervise the edge and tiny target features to improve the accuracy of small target and pixels along edges, because the ground truth map can supervise these functions.
Results: In existing semantic segmentation methods based on CNNs, the intensity, shape and texture of features are mixed, and the segmentation of small targets and edge targets cannot be handled properly. For the purpose of obtaining strong semantic information, the network structure of stacked layers is generally used for information extraction, also leading to the loss of a large amount of detailed information. To validate the effectiveness of our proposed method, we first conduct a great number of comparative experiments in the thermal infrared dataset (SCUT_SEG) with several mostly- similar algorithms. After we have visualized and analyzed these results in detail, the proposed algorithm does improve the segmentation of small targets on edges. Second, a number of ablation experiments have also been designed to further validate the effectiveness of each proposed module. To verify that our model can be adapted to different datasets in the same scenario, we have chosen to experiment on publicly available SODA and synthetic thermal infrared Cityscapes. Final experimental results on three thermal image datasets, namely SCUT_SEG, SODA, and synthetic thermal infrared Cityscapes show that our method yields a slight 2.2% improvement compared with other state-of-art algorithms in the same scenario. In terms of segmentation accuracies for specific target classes and for small targets and details such as edges, the proposed algorithm offers more satisfactory segmentation results.
Conclusions: Existing semantic image segmentation algorithms for thermal infrared images endure problems of losses of details such as edges and small targets. The designed MPAM capably and fully utilizes the feature and context information of each layer to effectively recover details at the pixel space. Via explicitly modeling, EEM and TTE modules are used to recover the detail information such as edges and small targets. Specific functions are provided to supervise edge features and small target features so that the two features can be maintained. Finally, accuracies of small-target and edge features with semantic category information can be improved.

引言
1 本文方法
2 实验
3 总结

图1 整体网络框架<br/>Fig.1 The overall network framework

图1 整体网络框架
Fig.1 The overall network framework

图2 像素注意力模块<br/>Fig.2 Pixel attention module

图2 像素注意力模块
Fig.2 Pixel attention module

图3 热图像边缘提取模块<br/>Fig.3 Edge extraction module

图3 热图像边缘提取模块
Fig.3 Edge extraction module

图4 热图像小目标提取模块<br/>Fig.4 Tiny object extraction module

图4 热图像小目标提取模块
Fig.4 Tiny object extraction module

图5 SCUT_SEG数据分析<br/>Fig.5 SCUT_SEG data analysis

图5 SCUT_SEG数据分析
Fig.5 SCUT_SEG data analysis

表1 基于scut_seg数据集最流行分割算法性能对比<br/>Tab.1 Performance comparison with the most popular segmentation algorithms on scut_seg dataset

表1 基于scut_seg数据集最流行分割算法性能对比
Tab.1 Performance comparison with the most popular segmentation algorithms on scut_seg dataset

图6 各个算法可视化实验结果<br/>Fig.6 Visualization results of each algorithm

图6 各个算法可视化实验结果
Fig.6 Visualization results of each algorithm

图7 FCN和本文算法边缘可视化效果<br/>Fig.7 Edge visualization results of FCN and ours

图7 FCN和本文算法边缘可视化效果
Fig.7 Edge visualization results of FCN and ours

图8 Deeplabv3+与本文算法边缘可视化结果<br/>Fig.8 Edge visualization results of Deeplabv3+ and ours

图8 Deeplabv3+与本文算法边缘可视化结果
Fig.8 Edge visualization results of Deeplabv3+ and ours

图9 我们算法各个组成部分可视化结果<br/>Fig.9 Visualization results of each components of our algorithm

图9 我们算法各个组成部分可视化结果
Fig.9 Visualization results of each components of our algorithm

表2 最相近方法比较<br/>Tab.2 Comparison with the most relevant method

表2 最相近方法比较
Tab.2 Comparison with the most relevant method

表3 基于基线模型的改进<br/>Tab.3 Improvements based on the baseline model

表3 基于基线模型的改进
Tab.3 Improvements based on the baseline model

表4 基于不同损失函数分割的消融实验<br/>Tab.4 Ablation experiments based on different loss function segmentation

表4 基于不同损失函数分割的消融实验
Tab.4 Ablation experiments based on different loss function segmentation

表5 本文算法基于SODA数据集与最流行算法分割性能的对比<br/>Tab.5 Comparison of our method with some existing methods on SODA dataset

表5 本文算法基于SODA数据集与最流行算法分割性能的对比
Tab.5 Comparison of our method with some existing methods on SODA dataset

表6 基于SODA数据集最流行算法各个类别分割性能的对比<br/>Tab.6 Performance comparison with the most popular segmentation algorithms on SODA

表6 基于SODA数据集最流行算法各个类别分割性能的对比
Tab.6 Performance comparison with the most popular segmentation algorithms on SODA

表7 基于STI-Cityscapes合成热红外数据集各个算法分割性能对比<br/>Tab.7 Comparison of our method with the methods on STI-Cityscapes

表7 基于STI-Cityscapes合成热红外数据集各个算法分割性能对比
Tab.7 Comparison of our method with the methods on STI-Cityscapes

[1] FARABET C,COUPRIE C,NAJMAN L,et al.Learning hierarchical features for scene labeling[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1915-1929.
[2] EVERINGHAM M,ESLAMI S M A,VAN GOOL L,et al.The pascal visual object classes challenge:a retrospective[J].International Journal of Computer Vision,2015,111(1):98-136.
[3] MOTTAGHI R,CHEN X,LIU X,et al.The role of context for object detection and semantic segmentation in the wild[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2014:891-898.
[4] CORDTS M,OMRAN M,RAMOS S,et al.The cityscapes dataset for semantic urban scene understanding[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2016:3213-3223.
[5] ZHOU B,ZHAO H,PUIG X,et al.Scene parsing through ade20k dataset[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2017:633-641.
[6] CAESAR H,UIJLINGS J,FERRARI V.Coco-stuff:thing and stuff classes in context[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2018:1209-1218.
[7] JANAI J,GUNEY F,BEHL A,et al.Computer vision for autonomous vehicles:problems,datasets and state of the art[J].Foundations and Trends in Computer Graphics and Vision,2020,12(1/2/3):1-308.
[8] SAHIN C,GARCIA H G,SOCK J,et al.A review on object pose recovery:from 3D bounding box detectors to full 6D pose estimators[J].Image and Vision Computing,2020,96:103898.
[9] GEVERS T,SMEULDERS A W M.Image search engines:an overview[M].Amsterdam:University of Amsterdam,2004.
[10] DE BRUIJNE M.Machine learning approaches in medical image analysis:from detection to diagnosis[J].Medical Image Analysis,2016,33:94-97.
[11] ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2017:2881-2890.
[12] CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848.
[13] CHENG D,MENG G,XIANG S,et al.Fusionnet:edge aware deep convolutional networks for semantic segmentation of remote sensing harbor images[J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,2017,10(12):5769-5783.
[14] BERTASIUS G,SHI J,TORRESANI L.Semantic segmentation with boundary neural felids[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2016:3602-3610.
[15] PENG C,ZHANG X,YU G,et al.Large kernel matters-improve semantic segmentation by global convolutional network[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2017:4353-4361.
[16] MUNOZ X,FREIXENET J,CUFI X,et al.Strategies for image segmentation combining region and boundary information[J].Pattern Recognition Letters,2003,24(1/2/3):375-392.
[17] TAKIKAWA T,ACUNA D,JAMPANI V,et al.Gated-SCNN:gated shape CNNs for semantic segmentation[C]∥International Conference on Computer Vision.Piscataway:IEEE,2019:5229-5238.
[18] CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]∥European Conference on Computer Vision.Cham:Springer,2018:801-818.
[19] DING H,JIANG X,LIU A Q,et al.Boundary-aware feature propagation for scene segmentation[C]∥International Conference on Computer Vision.Piscataway:IEEE,2019:6819-6829.
[20] HA Q,WATANABE K,KARASAWA T,et al.MFNet:towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes[C]∥Conference on Intelligent Robots and Systems.Piscataway:IEEE,2017:5108-5115.
[21] SUN Y,ZUO W,LIU M.RTFNet:RGB-thermal fusion network for semantic segmentation of urban scenes[J].IEEE Robotics and Automation Letters,2019,4(3):2576-2583.
[22] SHIVAKUMAR S S,RODRIGUES N,ZHOU A,et al.Pst900:RGB-thermal calibration,dataset and segmentation network[C]∥International Conference on Robotics and Automation.Piscataway:IEEE,2020:9441-9447.
[23] LI C,XIA W,YAN Y,et al.Segmenting objects in day and night:edge-conditioned CNN for thermal image semantic segmentation[J].IEEE Transactions on Neural Networks and Learning Systems,2020,32(7):3069-3082.
[24] XIONG H,CAI W,LIU Q.MCNet:multi-level correction network for thermal image semantic segmentation of nighttime driving scene[J].Infrared Physics & Technology,2021,113:103628.
[25] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2016:770-778.
[26] ACUNA D,KAR A,FIDLER S.Devil is in the edges:learning semantic boundaries from noisy annotations[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2019:11075-11083.
[27] LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:common objects in context[C]∥European Conference on Computer Vision.Cham:Springer,2014:740-755.
[28] CHEN X,MOTTAGHI R,LIU X,et al.Detect what you can:detecting and representing objects using holistic models and body parts[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2014:1971-1978.
[29] RONNEBERGER O,FISCHER P,BROX T.U-net:convolutional networks for biomedical image segmentation[C]∥International Conference on Medical Image Computing and Computer-assisted Intervention.Cham:Springer,2015:234-241.
[30] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2015:3431-3440.
[31] YU C,WANG J,PENG C,et al.Learning a discriminative feature network for semantic segmentation[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2018:1857-1866.
[32] YU C,WANG J,PENG C,et al.Bisenet:bilateral segmentation network for real-time semantic segmentation[C]∥European Conference on Computer Vision.Cham:Springer,2018:334-349.
[33] ZHAO H,ZHANG Y,LIU S,et al.Psanet:point-wise spatial attention network for scene parsing[C]∥European Conference on Computer Vision.Cham:Springer,2018:70-286.
[34] CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking atrous convolution for semantic image segmentation[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2017:1-13.
[35] DAI J,QI H,XIONG Y,et al.Deformable convolutional networks[C]∥International Conference on Computer Vision.Piscataway:IEEE,2017:764-773.
[36] CAO Y,XU J,LIN S,et al.GCNet:non-local networks meet squeeze-excitation networks and beyond[C]∥International Conference on Computer Vision Workshops.Piscataway:IEEE,2019:1-13.
[37] LI H,XIONG P,FAN H,et al.DFANet:deep feature aggregation for real-time semantic segmentation[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2019:9522-9531.
[38] WANG P,CHEN P,YUAN Y,et al.Understanding convolution for semantic segmentation[C]∥Winter Conference on Applications of Computer Vision.Piscataway:IEEE,2018:1451-1460.
[39] CHEN L C,BARRON J T,PAPANDREOU G,et al.Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform[C]∥Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2016:4545-4554.
[40] ROMERA E,ALVAREZ J M,BERGASA L M,et al.ERFNet:efficient residual factorized convnet for real-time semantic segmentation[J].IEEE Transactions on Intelligent Transportation Systems,2017,19(1):263-272.

备注

引言

1 本文方法

2 实验

3 总结

学报简介

备注

引言

1 本文方法

2 实 验

3 总 结

学报简介

2 实验

3 总结