Skip to content

Commit 87d8318

Browse files
committed
update README
1 parent b73ff45 commit 87d8318

File tree

3 files changed

+129
-10
lines changed

3 files changed

+129
-10
lines changed

README.md

+129-10
Original file line numberDiff line numberDiff line change
@@ -9,37 +9,156 @@ This is the official codebase of the paper
99

1010
## Overview
1111

12+
*GeomEtry-Aware Relational Graph Neural Network (**GearNet**)* is a simple yet effective structure-based protein encoder.
13+
It encodes spatial information by adding different types of sequential or structural edges and then performs relational message passing on protein residue graphs, which can be further enhanced by an edge message passing mechanism.
14+
Though conceptually simple, GearNet augmented with edge message passing can achieve very strong performance on several benchmarks in a supervised setting.
15+
16+
![GearNet](./asset/GearNet.png)
17+
18+
Five different geometric self-supervised learning methods based on protein structures are further proposed to pretrain the encoder, including **Multivew Contrast**, **Residue Type Prediction**, **Distance Prediction**, **Angle Prediction**, **Dihedral Prediction**.
19+
Through extensively benchmarking these pretraining techniques on diverse
20+
downstream tasks, we set up a solid starting point for pretraining protein structure representations.
21+
22+
![SSL](./asset/SSL.png)
23+
1224
This codebase is based on PyTorch and [TorchDrug] ([TorchProtein](https://torchprotein.ai)). It supports training and inference
13-
with multiple GPUs or multiple machines.
25+
with multiple GPUs.
1426

1527
[TorchDrug]: https://github.com/DeepGraphLearning/torchdrug
1628

1729
## Installation
1830

19-
You may install the dependencies via either conda or pip. Generally, NBFNet works
31+
You may install the dependencies via either conda or pip. Generally, GearNet works
2032
with Python 3.7/3.8 and PyTorch version >= 1.8.0.
2133

2234
### From Conda
2335

2436
```bash
25-
conda install pytorch=1.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
26-
conda install pyg -c pyg
27-
conda install rdkit easydict pyyaml -c conda-forge
37+
conda install torchdrug pytorch=1.8.0 cudatoolkit=11.1 -c milagraph -c pytorch-lts -c pyg -c conda-forge
38+
conda install easydict pyyaml -c conda-forge
39+
```
40+
41+
### From Pip
42+
43+
```bash
44+
pip install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
45+
pip install torchdrug
46+
pip install easydict pyyaml
2847
```
2948

3049

3150
## Reproduction
3251

33-
To reproduce the results of GearBind, use the following command. Alternatively, you
34-
may use `--gpus null` to run GearBind on a CPU. All the datasets will be automatically
35-
downloaded in the code.
52+
### Training From Scratch
53+
54+
To reproduce the results of GearNet, use the following command. Alternatively, you
55+
may use `--gpus null` to run GearNet on a CPU. All the datasets will be automatically downloaded in the code.
56+
It takes longer time to run the code for the first time due to the preprocessing time of the dataset.
57+
58+
```bash
59+
# Run GearNet on the Enzyme Comission dataset with 1 gpu
60+
python script/downstream.py -c config/downstream/EC/gearnet.yaml --gpus [0]
61+
```
3662

3763
We provide the hyperparameters for each experiment in configuration files.
3864
All the configuration files can be found in `config/*.yaml`.
3965

40-
To run GearBind with multiple GPUs, use the following commands
66+
To run GearNet with multiple GPUs, use the following commands.
67+
68+
```bash
69+
# Run GearNet on the Enzyme Comission dataset with 4 gpus
70+
python -m torch.distributed.launch --nproc_per_node=4 script/downstream.py -c config/downstream/EC/gearnet.yaml --gpus [0,1,2,3]
71+
```
72+
73+
74+
### Pretraining and Finetuning
75+
By default, we will use the AlphaFold Datase for pretraining.
76+
To pretrain GearNet-Edge with Multiview Contrast, use the following command.
77+
Similar, all the datasets will be automatically downloaded in the code and preprocessed for the first time you run the code.
78+
79+
```bash
80+
# Pretrain GearNet-Edge with Multiview Contrast
81+
python script/pretrain.py -c config/pretrain/mc_gearnet_edge.yaml --gpus [0]
82+
```
83+
84+
After pretraining, you can load the model weight from the saved checkpoint via the `--ckpt` argument and then finetune the model on downstream tasks.
4185

4286
```bash
43-
python -m torch.distributed.launch --nproc_per_node=4 script/run.py -c config/downstream/gearnet.yaml --gpus [0,1,2,3]
87+
# Finetune GearNet-Edge on the Enzyme Commission dataset
88+
python script/downstream.py -c config/downstream/EC/gearnet_edge.yaml --gpus [0] --ckpt <path_to_your_model>
4489
```
4590

91+
## Results
92+
Here are the results of GearNet w/ and w/o pretraining on standard benchmark datasets. All the results are obtained with 4 A100 GPUs (40GB). Note results may be slightly different if the model is trained with 1 GPU and/or a smaller batch size.
93+
More detailed results are listed in the paper.
94+
95+
<table>
96+
<tr>
97+
<th>Method</th>
98+
<th>EC</th>
99+
<th>GO-BP</th>
100+
<th>GO-MF</th>
101+
<th>GO-CC</th>
102+
</tr>
103+
<tr>
104+
<th>GearNet</th>
105+
<td>0.730</td>
106+
<td>0.356</td>
107+
<td>0.503</td>
108+
<td>0.414</td>
109+
</tr>
110+
<tr>
111+
<th>GearNet-Edge</th>
112+
<td>0.810</td>
113+
<td>0.403</td>
114+
<td>0.580</td>
115+
<td>0.450</td>
116+
</tr>
117+
<tr>
118+
<th>Multiview Contrast</th>
119+
<td>0.874</td>
120+
<td>0.490</td>
121+
<td>0.654</td>
122+
<td>0.488</td>
123+
</tr>
124+
<tr>
125+
<th>Residue Type Prediction</th>
126+
<td>0.843</td>
127+
<td>0.430</td>
128+
<td>0.604</td>
129+
<td>0.465</td>
130+
</tr>
131+
<tr>
132+
<th>Distance Prediction</th>
133+
<td>0.839</td>
134+
<td>0.448</td>
135+
<td>0.616</td>
136+
<td>0.464</td>
137+
</tr>
138+
<tr>
139+
<th>Angle Prediction</th>
140+
<td>0.853</td>
141+
<td>0.458</td>
142+
<td>0.625</td>
143+
<td>0.473</td>
144+
</tr>
145+
<tr>
146+
<th>Dihedral Prediction</th>
147+
<td>0.859</td>
148+
<td>0.458</td>
149+
<td>0.626</td>
150+
<td>0.465</td>
151+
</tr>
152+
</table>
153+
154+
## Citation
155+
If you find this codebase useful in your research, please cite the following paper.
156+
157+
```bibtex
158+
@article{zhang2022protein,
159+
title={Protein representation learning by geometric structure pretraining},
160+
author={Zhang, Zuobai and Xu, Minghao and Jamasb, Arian and Chenthamarakshan, Vijil and Lozano, Aurelie and Das, Payel and Tang, Jian},
161+
journal={arXiv preprint arXiv:2203.06125},
162+
year={2022}
163+
}
164+
```

asset/GearNet.png

305 KB
Loading

asset/SSL.png

181 KB
Loading

0 commit comments

Comments
 (0)