Hot Paper - Joint 3D facial shape reconstruction and texture completion from a single image

Joint 3D facial shape reconstruction and texture completion from a single image

来自单个图像的联合 3D 面部形状重建和纹理完成
単一の画像からのジョイント3D顔の形状の再構築とテクスチャの完成
단일 이미지에서 관절 3D 얼굴 모양 재구성 및 질감 완성
Reconstrucción conjunta de formas faciales en 3D y finalización de texturas a partir de una sola imagen
Reconstruction conjointe de la forme du visage en 3D et achèvement de la texture à partir d'une seule image
Совместная 3D-реконструкция формы лица и завершение текстуры из одного изображения

Xiaoxing Zeng 曾小星 ¹ ², Zhelun Wu ¹, Xiaojiang Peng 彭小江 ¹, Yu Qiao 乔宇 ¹

¹ Shenzhen Institute of Advanced Technology, ChineseAcademy of Sciences, Shenzhen, China
中国深圳中国科学院深圳先进技术研究院
² University of Chinese Academy of Sciences, Beijing, China
中国北京中国科学院大学

https://doi.org/10.1007/s41095-021-0238-4

http://cvm.tsinghuajournals.com/EN/10.1007/s41095-021-0238-4

Computational Visual Media, 16 December 2021

Abstract

Recent years have witnessed significant progress in image-based 3D face reconstruction using deep convolutional neural networks. However, current reconstruction methods often perform improperly in self-occluded regions and can lead to inaccurate correspondences between a 2D input image and a 3D face template, hindering use in real applications. To address these problems, we propose a deep shape reconstruction and texture completion network, SRTC-Net, which jointly reconstructs 3D facial geometry and completes texture with correspondences from a single input face image.

In SRTC-Net, we leverage the geometric cues from completed 3D texture to reconstruct detailed structures of 3D shapes. The SRTC-Net pipeline has three stages. The first introduces a correspondence network to identify pixel-wise correspondence between the input 2D image and a 3D template model, and transfers the input 2D image to a U-V texture map. Then we complete the invisible and occluded areas in the U-V texture map using an inpainting network. To get the 3D facial geometries, we predict coarse shape (U-V position maps) from the segmented face from the correspondence network using a shape network, and then refine the 3D coarse shape by regressing the U-V displacement map from the completed U-V texture map in a pixel-to-pixel way.

We examine our methods on 3D reconstruction tasks as well as face frontalization and pose invariant face recognition tasks, using both in-the-lab datasets (MICC, MultiPIE) and in-the-wild datasets (CFP). The qualitative and quantitative results demonstrate the effectiveness of our methods on inferring 3D facial geometry and complete texture; they outperform or are comparable to the state-of-the-art.

Reviews and Discussions