EgoHOD

This repo is the official implementation of EgoHOD at ICLR 2025

"Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning"
Baoqi Pei, Yifei Huang, Jilan Xu, Guo Chen, Yuping He, Lijin Yang,
Yali Wang, Weidi Xie, Yu Qiao, Fei Wu, Limin Wang

Todo

Introduction

In egocentric video understanding, the motion of hands and objects as well as their interactions play a significant role by nature. However, existing egocentric video representation learning methods mainly focus on aligning video representation with high-level narrations, overlooking the intricate dynamics between hands and objects. In this work, we aim to integrate the modeling of fine-grained hand-object dynamics into the video representation learning process. Since no suitable data is available, we introduce HOD, a novel pipeline employing a hand-object detector and a large language model to generate high-quality narrations with detailed descriptions of hand-object dynamics. To learn these fine-grained dynamics, we propose EgoVideo, a model with a new lightweight motion adapter to capture fine-grained hand-object motion information. Through our co-training strategy, EgoVideo effectively and efficiently leverages the fine-grained hand-object dynamics in the HOD data. Extensive experiments demonstrate that our method achieves state-of-the-art performance across multiple egocentric downstream tasks, including improvements of 6.3% in EK-100 multi-instance retrieval, 5.7% in EK-100 classification, and 16.3% in EGTEA classification in zero-shot settings. Furthermore, our model exhibits robust generalization capabilities in hand-object interaction and robot manipulation tasks.

Installation

https://github.com/OpenRobotLab/EgoHOD.git
conda env create -f environment.yml
conda activate hod
pip install -r requirements.txt

Datasets

You can get our HOD annotations from this Huggingface link.

Pretraining

For training EgoVideo model without adapter, you can simply run the following code:

bash ./exps/pretrain.sh

Notes:

Modify the yml files in ./configs before running the scripts.

For training without slurm script, you can simply run

python main_pretrain.py --config_file configs/clip_base.yml

For model with Adapter, we will release the pretraining code soon.

Pretrained Model

For our pretrained model, you can download checkpoint from this link.

Finetuning

We will update the code soon.

Zero-shot Evaluation

For zero-shot evaluation, you can simply run the scripts in exps as follows:

bash exps/eval_ekcls.sh

We provide the evaluation code for EK100-MIR, EK100-CLS, EGTEA, and EGOMCQ.

Cite

If you find this repository useful, please use the following BibTeX entry for citation.

@misc{pei2025modeling,
      title={Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning}, 
      author={Baoqi Pei, Yifei Huang, Jilan Xu, Guo Chen, Yuping He, Lijin Yang, Yali Wang, Weidi Xie, Yu Qiao, Fei Wu, Limin Wang},
      year={2025},
      eprint={2503.00986},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

This repository is built based on mae and AVION. Thanks to the contributors of the great codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
__pycache__		__pycache__
assets		assets
configs		configs
dataset		dataset
evaluation		evaluation
exps		exps
model		model
output_dir		output_dir
util		util
README.md		README.md
engine_pretrain.py		engine_pretrain.py
environment.yml		environment.yml
main_pretrain.py		main_pretrain.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EgoHOD

Todo

Introduction

Installation

Datasets

Pretraining

Pretrained Model

Finetuning

Zero-shot Evaluation

Cite

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

OpenRobotLab/EgoHOD

Folders and files

Latest commit

History

Repository files navigation

EgoHOD

Todo

Introduction

Installation

Datasets

Pretraining

Pretrained Model

Finetuning

Zero-shot Evaluation

Cite

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages