Zibin Wang and Ronald Chung
3D human pose estimation, manifold learning, nonlinear mapping
We address the problem of determining 3D human pose from video data. The emphases are the use of multiple views captured by arbitrarily-positioned cameras, as well as the exploitation of prior knowledge on the possible human actions the observed human subject exercises. The use of multiple views allows false judgment over one view to be corrected by more revealing evidence in other views. It also allows self-occlusion of a human pose in a view to be addressed. The approach is based on learning a low-dimensional manifold that embraces kinematic constraint, action context constraint, and multi-view geometric constraint altogether. We show that a 3D manifold surface embedding of a human performing a specific action could be learnt directly from multiple views without geometric deformation. Compared with the one-view manifold, this multi-view manifold preserves more local continuity and contains less self-intersection in its structure. Gaussian Processes are used to obtain (1) the nonlinear mapping from the multi-view 2D human joint space to the latent space, and (2) the nonlinear mapping from the latent space to the 3D human pose space. Experiments on several benchmark datasets show that the 3D multi-view manifold could successfully integrate all the constraints, and the proposed framework could estimate 3D human pose efficiently and robustly, even if the pose is under self-occlusion.
Important Links:
Go Back