You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Yufei 33f49cf1f2 register vitpose-b/l/h for coco 3 years ago
..
posetrack18 register vitpose-b/l/h for coco 3 years ago
README.md register vitpose-b/l/h for coco 3 years ago

README.md

Learning Temporal Pose Estimation from Sparsely-Labeled Videos

PoseWarper (NeurIPS'2019)
@inproceedings{NIPS2019_gberta,
title = {Learning Temporal Pose Estimation from Sparsely Labeled Videos},
author = {Bertasius, Gedas and Feichtenhofer, Christoph, and Tran, Du and Shi, Jianbo, and Torresani, Lorenzo},
booktitle = {Advances in Neural Information Processing Systems 33},
year = {2019},
}

PoseWarper proposes a network that leverages training videos with sparse annotations (every k frames) to learn to perform dense temporal pose propagation and estimation. Given a pair of video frames, a labeled Frame A and an unlabeled Frame B, the model is trained to predict human pose in Frame A using the features from Frame B by means of deformable convolutions to implicitly learn the pose warping between A and B.

The training of PoseWarper can be split into two stages.

The first-stage is trained with the pre-trained model and the main backbone is fine-tuned in a single-frame setting.

The second-stage is trained with the model from the first stage, and the warping offsets are learned in a multi-frame setting while the backbone is frozen.