☆ 4.7 Article

Laplacian Pyramid Neural Network for Dense Continuous-Value Regression for Complex Scenes

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2021)

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

卷 32, 期 11, 页码 5034-5046

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2020.3026669

关键词

Estimation; Task analysis; Laplace equations; Semantics; Image reconstruction; Buildings; Satellites; Deep neural network; dense continuous-value regression (DCR); depth estimation; height estimation; Laplacian pyramid

类别

Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic

资金

National Key Research and Development Plan of China [2017YFB1002202]
National Natural Science Foundation of China (NSFC) [61632006, U19B2038, 61620106009]
Fundamental Research Funds for the Central Universities [WK3490000003]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The article introduces a new neural network architecture LAPNet for dense continuous-value regression (DCR) problems. By utilizing Laplacian pyramid and adaptive feature fusion module, LAPNet efficiently reconstructs high-quality signals in complex scenes while preserving both global structure and details.

Many computer vision tasks, such as monocular depth estimation and height estimation from a satellite orthophoto, have a common underlying goal, which is regression of dense continuous values for the pixels given a single image. We define them as dense continuous-value regression (DCR) tasks. Recent approaches based on deep convolutional neural networks significantly improve the performance of DCR tasks, particularly on pixelwise regression accuracy. However, it still remains challenging to simultaneously preserve the global structure and fine object details in complex scenes. In this article, we take advantage of the efficiency of Laplacian pyramid on representing multiscale contents to reconstruct high-quality signals for complex scenes. We design a Laplacian pyramid neural network (LAPNet), which consists of a Laplacian pyramid decoder (LPD) for signal reconstruction and an adaptive dense feature fusion (ADFF) module to fuse features from the input image. More specifically, we build an LPD to effectively express both global and local scene structures. In our LPD, the upper and lower levels, respectively, represent scene layouts and shape details. We introduce a residual refinement module to progressively complement high-frequency details for signal prediction at each level. To recover the signals at each individual level in the pyramid, an ADFF module is proposed to adaptively fuse multiscale image features for accurate prediction. We conduct comprehensive experiments to evaluate a number of variants of our model on three important DCR tasks, i.e., monocular depth estimation, single-image height estimation, and density map estimation for crowd counting. Experiments demonstrate that our method achieves new state-of-the-art performance in both qualitative and quantitative evaluation on the NYU-D V2 and KITTI for monocular depth estimation, the challenging Urban Semantic 3D (US3D) for satellite height estimation, and four challenging benchmarks for crowd counting. These results demonstrate that the proposed LAPNet is a universal and effective architecture for DCR problems.

Laplacian Pyramid Neural Network for Dense Continuous-Value Regression for Complex Scenes

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Laplacian Pyramid Neural Network for Dense Continuous-Value Regression for Complex Scenes

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文