You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 28, 2025. It is now read-only.
This repository contains the official PyTorch implementation for the paper
5
-
> Frederic Z. Zhang, Dylan Campbell and Stephen Gould. _Efficient Two-Stage Detection of Human–Object Interactions with a Novel Unary–Pairwise Transformer._
5
+
> Frederic Z. Zhang, Dylan Campbell and Stephen Gould. _Efficient Two-Stage Detection of Human–Object Interactions with a Novel Unary–Pairwise Transformer._ arXiv preprint arXiv:2112.01838.
> ...<br/>However, the success of such one-stage HOI detectors can largely be attributed to the representation power of transformers. We discovered that when equipped with the same transformer, their two-stage counterparts can be more performant and memory-efficient, while taking a fraction of the time to train. In this work, we propose the Unary–Pairwise Transformer, a two-stage detector that exploits unary and pairwise representa-tions for HOIs. We observe that the unary and pairwise parts of our transformer network specialise, with the former preferentially increasing the scores of positive examples and the latter decreasing the scores of negative examples. We evaluate our method on the HICO-DET and V-COCO datasets, and significantly outperform state-of-the-art approaches. At inference time, our model with ResNet50 approaches real-time performance on a single GPU.
@@ -29,6 +29,8 @@ We provide weights for UPT models pre-trained on HICO-DET and V-COCO for potenti
The inference speed was benchmarked on a GeForce RTX 3090. Note that weights of the UPT model include those of the detector (DETR). You do not need to download the DETR weights, unless you want to train the UPT model from scratch. Training UPT-R50 with 8 GeForce GTX TITAN X GPUs takes around `5` hours on HICO-DET and `40` minutes on V-COCO, almost a tenth of the time compared to other one-stage models such as [QPIC](https://github.com/hitachi-rd-cv/qpic).
32
+
## Contact
33
+
For general inquiries regarding the paper and code, please post them in [Discussions](https://github.com/fredzzhang/upt/discussions). For bug reports and feature requests, please post them in [Issues](https://github.com/fredzzhang/upt/issues). You can also contact me at [email protected].
32
34
## Prerequisites
33
35
1. Install the lightweight deep learning library [Pocket](https://github.com/fredzzhang/pocket). The recommended PyTorch version is 1.9.0.
34
36
2. Download the repository and the submodules.
@@ -59,7 +61,8 @@ git submodule update
59
61
cd /path/to/upt/vcoco
60
62
ln -s /path/to/coco ./mscoco2014
61
63
```
62
-
64
+
## License
65
+
UPT is released under the [BSD-3-Clause License](./LICENSE).
63
66
## Inference
64
67
We have implemented inference utilities with different visualisation options. Provided you have downloaded the model weights to `checkpoints/`, run the following command to visualise detected instances together with the attention maps from the cooperative and competitive layers. Use the flag `--index` to selectimages, and `--box-score-thresh` to modify the filtering threshold on object boxes.
65
68
```bash
@@ -93,7 +96,7 @@ If you find our work useful for your research, please consider citing us:
93
96
@article{zhang2021upt,
94
97
author = {Frederic Z. Zhang and Dylan Campbell and Stephen Gould},
95
98
title = {Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer},
96
-
journal = {arXiv preprint},
99
+
journal = {arXiv preprint arXiv:2112.01838},
97
100
year = {2021}
98
101
}
99
102
@@ -105,4 +108,4 @@ If you find our work useful for your research, please consider citing us:
0 commit comments