The xR-EgoPose Dataset has been introduced in the paper "xR-EgoPose: Egocentric 3D Human Pose from an HMD Camera" (ICCV 2019, oral). It is a dataset of ~380 thousand photo-realistic egocentric camera images in a variety of indoor and outdoor spaces.
The code contained in this repository is a PyTorch implementation of training and inference code of the model.
Download the official data set from the official repository. The authors provided the downloaded script but it seems to be broken and does not download all the files. It is recommended to download the zip files from the provided link manually.
Once you have downloaded all the tar.gz files, run
python utils/extract_data.py --input {path of the downloaded tar.gz zip files} --output {path to extracted files}
Please create folders:
/TrainSet
/ValSet
/TestSet
Then, put the extracted output folders according to the set type as shown below.
Train-set | Test-set | Val-set |
---|---|---|
female_001_a_a | female_004_a_a | male_008_a_a |
female_002_a_a | female_008_a_a | |
female_002_f_s | female_010_a_a | |
female_003_a_a | female_012_a_a | |
female_005_a_a | female_012_f_s | |
female_006_a_a | male_001_a_a | |
female_007_a_a | male_002_a_a | |
female_009_a_a | male_004_f_s | |
female_011_a_a | male_006_a_a | |
female_014_a_a | male_007_f_s | |
female_015_a_a | male_010_a_a | |
male_003_f_s | male_014_f_s | |
male_004_a_a | ||
male_005_a_a | ||
male_006_f_s | ||
male_007_a_a | ||
male_008_f_s | ||
male_009_a_a | ||
male_010_f_s | ||
male_011_f_s | ||
male_014_a_a |
The organized folder structure of dataset should look something like this:
TrainSet
├── female_001_a_a
│ ├── env 01
│ │ └── cam_down
│ │ ├── depth
│ │ ├── json
│ │ ├── objectId
│ │ ├── rgba
│ │ ├── rot
│ │ └── worldp
│ ├── ...
│ └── env 03
│
ValSet
│
│
TestSet
Install Conda Environment
conda create -n venv_xrego python=3.9
conda activate venv_xrego
Install Pytorch Version 1.7.1
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
Install Required Packages
pip install -r requirements.txt
To train the 2D Heatmap Estimation Module based on Resnet101 Architecture, run:
python train.py --training_type train2d --gpu {gpu id} --log_dir {experiments/Train2d}
Training is done for 3 Epochs, 7.0GB of VRAM is needed for training. On a single 3090 GPU, training takes approximately 30-40 minutes per epoch.
You can also download the pretrained checkpoint from this link. The checkpoint is located under Train2d folder.
To train the 3D Lifting Module, run:
python train.py --training_type train3d --gpu {gpu id} --log_dir {experiments/Train3d}
Training is done for 3 Epochs, 3.0GB of VRAM is needed for training. On a single 3090 GPU, training takes approximately 25-35 minutes per epoch.
You can also download the pretrained checkpoint from this link. The checkpoint is located under Train3d folder.
To finetune the 2D Heatmap and 3D Lifting Module into a single model, run:
python train.py --training_type finetune --gpu {gpu id} --log_dir {experiments/Finetune} --load_2d_model {path to trained 2D Heatmap Module} --load_3d_model {path to trained 3D Pose Lifting Module}
Training is done for 3 Epochs, 10GB of VRAM is needed for training. On a single 3090 GPU, training takes approximately 40 minutes per epoch.
You can also download the pretrained checkpoint from this link. The checkpoint is located under Finetune folder.
In order to qualitatively and quantitavely evaluate the performance of the model, run the demo:
python demo.py --gpu {gpu id} --load_model {path to trained finetuned model} --data{type of data to test on: train, test, val} --save_dir {path of output folder of visualizations}
Make sure to load the finetuned model as model path. The default data is set to the testing set. The visualization includes the original image with the overlay of predicted 3d joints in orange and the ground truth 3d joints in blue. The terminal prints the MPJPE error in mm.
To run inference on your custom data or other datasets, run:
python inference.py --gpu {gpu id} --load_model {path to trained finetuned model} --input_dir {path to folder containing images} --save_dir {path of output folder of visualizations}
Since the model is fully trained solely on the xR-EgoPose synthetic dataset, results may not be as expected due to the domain gap.
@inproceedings{tome2019xr,
title={xR-EgoPose: Egocentric 3D Human Pose from an HMD Camera},
author={Tome, Denis and Peluse, Patrick and Agapito, Lourdes and Badino, Hernan},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={7728--7738},
year={2019}
}
@ARTICLE{tome2020self,
author={D. {Tome} and T. {Alldieck} and P. {Peluse} and G. {Pons-Moll} and L. {Agapito} and H. {Badino} and F. {De la Torre}},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera},
year={2020},
volume={},
number={},
pages={1-1},
doi={10.1109/TPAMI.2020.3029700}
}
Base code is adapted from xR-Egopose's official repository. The base Resnet code is brought from Microsoft Research Bin Xiao ([email protected]). Some parts of the implementation, ideas are referenced and adapted from these users: @twice154, @ahuard0, @FloralZhao, @jiangyutong. Thanks!