S²Contact: Graph-based Network for 3D Hand-Object Contact Estimation with Semi-Supervised Learning

Tze Ho Elden Tse^*

Zhongqun Zhang^*

Kwang In Kim

Ales Leonardis

Feng Zheng

Hyung Jin Chang

University of Birmingham, UNIST, SUSTech ECCV 2022

[Paper]

[Code]

Overview of our semi-supervised learning framework, S²Contact. (a) The model is pre-trained on a small annotated dataset. (b) Then, it is deployed on unlabelled datasets to collect pseudo-labels. The pseudo-labels are filtered with confidence-based on visual and geometric consistencies. Upon predicting the contact map, the hand and object poses are optimised to achieve target contact via a contact model.

Video

Abstract

Being able to reason about the physical contacts between hands and objects is crucial in understanding hand-object manipulation. However, despite the efforts in accurate 3D annotations in hand and object datasets, there still exist gaps in 3D hand and object reconstructions. Recent works leverage contact maps to refine inaccurate hand-object pose estimations and generate grasps given object models. However, they require explicit 3D supervision which is seldom available and therefore, are limited to constrained settings, e.g., where thermal cameras observe residual heat left on manipulated objects. In this paper, we propose a novel semi-supervised framework that allows us to learn contact from monocular videos. Specifically, we leverage visual and geometric consistency constraints in large-scale datasets for generating pseudo-labels in semi-supervised learning and propose an efficient graph- based network to infer contact. Our semi-supervised learning framework achieves a favourable improvement over the existing supervised learning methods trained on data with ‘limited’ annotations. Notably, our proposed model is able to achieve superior results with less than half the network parameters and memory access cost when compared with the commonly-used PointNet-based approach. We show benefits from using a contact map that rules hand-object interactions to produce more accurate reconstructions. We further demonstrate that training with pseudo- labels can extend contact map estimations to out-of-domain objects and generalise better across multiple datasets.

Framework

A schematic illustration of our framework. We adopt our proposed graph-based network GCN-Contact as backbone. We utilise a teacher-student mutual learning framework which is composed of a learnable student and an EMA teacher. The student network is trained with labelled data. For unlabelled data, the student network takes pseudo contact labels from its EMA teacher and compares with its predictions. (a) refers to contact consistency constraint for consistency training. To improve the quality of pseudo-label, we adopt a confidence-based filtering mechanism to geometrically (b) and visually (c) filter out predictions that violate contact constraints.

Paper and Supplementary Material

Tze Ho Elden Tse, Zhongqun Zhang, Kwang In Kim, Ales Leonardis, Feng Zheng and Hyung Jin Chang
S²Contact: Graph-based Network for 3D Hand-Object Contact Estimation with Semi-Supervised Learning
In ECCV, 2022.

[Paper]

[Supplementary]

Acknowledgements

This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP--2022--2020--0--01789) supervised by the IITP (Institute of Information \& Communications Technology Planning \& Evaluation) and the Baskerville Tier 2 HPC service (https://www.baskerville.ac.uk/) funded by the Engineering and Physical Sciences Research Council (EPSRC) and UKRI through the World Class Labs scheme (EP/T022221/1) and the Digital Research Infrastructure programme (EP/W032244/1) operated by Advanced Research Computing at the University of Birmingham. KIK was supported by the National Research Foundation of Korea (NRF) grant (No. 2021R1A2C2012195) and IITP grants (IITP--2021--0--02068 and IITP--2020--0--01336). ZQZ was supported by China Scholarship Council (CSC) Grant No. 202208060266. AL was supported in part by the EPSRC (grant number EP/S032487/1). FZ was supported by the National Natural Science Foundation of China under Grant No. 61972188 and 62122035. This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.