4.7 Article

Multi-Stage Feature Pyramid Stereo Network-Based Disparity Estimation Approach for Two to Three-Dimensional Video Conversion

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2020.3014053

Keywords

Three-dimensional displays; Estimation; Feature extraction; Two dimensional displays; Training; Neural networks; TV; 2D to 3D video conversion; neural network; deep learning; disparity estimation; feature pyramid; depth image based rendering (DIBR)

Funding

  1. Science and Technology Development Fund of Macao [FDCT079/2016/A2, MYRG2017-00218-FST, MYRG2018-00111-FST]
  2. National Natural Science Foundation of China [U1605252, 61872307]

Ask authors/readers for more resources

This article presents a new multi-stage network for video conversion, with two training stages: initial disparity estimation and depth-image-based rendering as an extra component. Through experiments, it is shown that this method has good performance in improving disparity estimation and generating high-quality 3D images.
Disparity estimation is a popular topic in computer vision and has drawn increasing attention in recent years. In this article, we propose a new multi-stage network for the purpose of two to three-dimensional video conversion that contains two training stages: an initial disparity estimation as the first training stage and depth-image-based rendering (DIBR) as an extra component to form the second training stage. In the first training stage, we propose a revised end-to-end feature pyramid stereo network, in which the original non-pyramid structure is replaced by a bottom-up convolutional neural network pyramid for disparity regression. It utilizes the spatial information by concatenating different scale features to boost the performance on boundary consistency. Mirror connections between feature extraction and disparity regression on the corresponding layers are also added to improve the quality of the results. In the second stage, we propose an improved disocclusion filling technique in the DIBR branch and connect the non-neural-network method to the disparity estimation network. This two-stage training strategy can work effectively to generate the improved disparity estimation for two to three-dimensional video conversion. Extensive experiments are conducted and some selected state-of-the-art algorithms are compared with our proposed approach on the popular KITTI2015 and Scene Flow datasets. The results demonstrate that our estimated disparity map can generate high quality 3D images.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available