Skip to content

Commit 34d4637

Browse files
committed
init commit for downstream configs
1 parent 12ded19 commit 34d4637

32 files changed

+1420
-1
lines changed

README.md

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,45 @@
1-
# GearNet
1+
# GearNet: Geometry-Aware Relational Graph Neural Network
2+
3+
4+
This is the official codebase of the paper
5+
6+
[Protein Representation Learning by Geometric Structure Pretraining](https://arxiv.org/abs/2203.06125)
7+
8+
[Zuobai Zhang](https://oxer11.github.io/), [Minghao Xu](https://chrisallenming.github.io/), [Arian Jamasb](https://jamasb.io/), [Vijil Chenthamarakshan](https://researcher.watson.ibm.com/researcher/view.php?person=us-ecvijil), [Aurelie Lozano](https://researcher.watson.ibm.com/researcher/view.php?person=us-aclozano), [Payel Das](https://researcher.watson.ibm.com/researcher/view.php?person=us-daspa), [Jian Tang](https://jian-tang.com/)
9+
10+
## Overview
11+
12+
This codebase is based on PyTorch and [TorchDrug] ([TorchProtein](https://torchprotein.ai)). It supports training and inference
13+
with multiple GPUs or multiple machines.
14+
15+
[TorchDrug]: https://github.com/DeepGraphLearning/torchdrug
16+
17+
## Installation
18+
19+
You may install the dependencies via either conda or pip. Generally, NBFNet works
20+
with Python 3.7/3.8 and PyTorch version >= 1.8.0.
21+
22+
### From Conda
23+
24+
```bash
25+
conda install pytorch=1.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
26+
conda install pyg -c pyg
27+
conda install rdkit easydict pyyaml -c conda-forge
28+
```
29+
30+
31+
## Reproduction
32+
33+
To reproduce the results of GearBind, use the following command. Alternatively, you
34+
may use `--gpus null` to run GearBind on a CPU. All the datasets will be automatically
35+
downloaded in the code.
36+
37+
We provide the hyperparameters for each experiment in configuration files.
38+
All the configuration files can be found in `config/*.yaml`.
39+
40+
To run GearBind with multiple GPUs, use the following commands
41+
42+
```bash
43+
python -m torch.distributed.launch --nproc_per_node=4 script/run.py -c config/downstream/gearnet.yaml --gpus [0,1,2,3]
44+
```
45+

config/downstream/EC/BERT.yaml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
output_dir: ~/scratch/protein_output
2+
3+
dataset:
4+
class: EnzymeCommission
5+
path: ~/scratch/protein-datasets/
6+
test_cutoff: 0.95
7+
transform:
8+
class: ProteinView
9+
view: residue
10+
11+
task:
12+
class: MultipleBinaryClassification
13+
model:
14+
class: ProteinBERT
15+
input_dim: 21
16+
hidden_dim: 512
17+
num_layers: 4
18+
num_heads: 8
19+
intermediate_dim: 2048
20+
hidden_dropout: 0.1
21+
attention_dropout: 0.1
22+
criterion: bce
23+
metric: ['auprc@micro', 'f1_max']
24+
num_mlp_layer: 2
25+
26+
optimizer:
27+
class: Adam
28+
lr: 5.0e-5
29+
30+
engine:
31+
gpus: {{ gpus }}
32+
batch_size: 8
33+
log_interval: 1000
34+
35+
metric: f1_max
36+
37+
train:
38+
num_epoch: 200

config/downstream/EC/CNN.yaml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
output_dir: ~/scratch/protein_output
2+
3+
dataset:
4+
class: EnzymeCommission
5+
path: ~/scratch/protein-datasets/
6+
test_cutoff: 0.95
7+
transform:
8+
class: ProteinView
9+
view: residue
10+
11+
task:
12+
class: MultipleBinaryClassification
13+
model:
14+
class: ProteinConvolutionalNetwork
15+
input_dim: 21
16+
hidden_dims: [1024, 1024]
17+
kernel_size: 5
18+
padding: 2
19+
criterion: bce
20+
metric: ['auprc@micro', 'f1_max']
21+
num_mlp_layer: 2
22+
23+
optimizer:
24+
class: Adam
25+
lr: 1.0e-4
26+
27+
engine:
28+
gpus: {{ gpus }}
29+
batch_size: 32
30+
log_interval: 1000
31+
32+
metric: f1_max
33+
34+
train:
35+
num_epoch: 200

config/downstream/EC/ESM.yaml

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
output_dir: ~/scratch/protein_output
2+
3+
dataset:
4+
class: EnzymeCommission
5+
path: ~/scratch/protein-datasets/
6+
test_cutoff: 0.95
7+
transform:
8+
class: Compose
9+
transforms:
10+
- class: ProteinView
11+
view: residue
12+
- class: TruncateProtein
13+
max_length: 550
14+
15+
task:
16+
class: MultipleBinaryClassification
17+
model:
18+
class: ESM
19+
path: ~/scratch/protein-model-weights/esm-model-weights/
20+
model: ESM-1b
21+
criterion: bce
22+
metric: ['auprc@micro', 'f1_max']
23+
num_mlp_layer: 2
24+
25+
optimizer:
26+
class: Adam
27+
lr: 1.0e-4
28+
29+
engine:
30+
gpus: {{ gpus }}
31+
batch_size: 2
32+
log_interval: 1000
33+
34+
lr_ratio: 0.1
35+
36+
metric: f1_max
37+
38+
train:
39+
num_epoch: 200

config/downstream/EC/LSTM.yaml

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
output_dir: ~/scratch/protein_output
2+
3+
dataset:
4+
class: EnzymeCommission
5+
path: ~/scratch/protein-datasets/
6+
test_cutoff: 0.95
7+
transform:
8+
class: ProteinView
9+
view: residue
10+
11+
task:
12+
class: MultipleBinaryClassification
13+
model:
14+
class: ProteinLSTM
15+
input_dim: 21
16+
hidden_dim: 640
17+
num_layers: 3
18+
criterion: bce
19+
metric: ['auprc@micro', 'f1_max']
20+
num_mlp_layer: 2
21+
22+
optimizer:
23+
class: Adam
24+
lr: 5.0e-5
25+
26+
engine:
27+
gpus: {{ gpus }}
28+
batch_size: 8
29+
log_interval: 1000
30+
31+
metric: f1_max
32+
33+
train:
34+
num_epoch: 200

config/downstream/EC/ResNet.yaml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
output_dir: ~/scratch/protein_output
2+
3+
dataset:
4+
class: EnzymeCommission
5+
path: ~/scratch/protein-datasets/
6+
test_cutoff: 0.95
7+
transform:
8+
class: ProteinView
9+
view: residue
10+
11+
task:
12+
class: MultipleBinaryClassification
13+
model:
14+
class: ProteinResNet
15+
input_dim: 21
16+
hidden_dims: [512, 512, 512, 512, 512, 512, 512, 512]
17+
layer_norm: True
18+
dropout: 0.1
19+
criterion: bce
20+
metric: ['auprc@micro', 'f1_max']
21+
num_mlp_layer: 2
22+
23+
optimizer:
24+
class: Adam
25+
lr: 2.0e-4
26+
27+
engine:
28+
gpus: {{ gpus }}
29+
batch_size: 8
30+
log_interval: 1000
31+
32+
metric: f1_max
33+
34+
train:
35+
num_epoch: 200

config/downstream/EC/gearnet.yaml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
output_dir: ~/scratch/protein_output
2+
3+
dataset:
4+
class: EnzymeCommission
5+
path: ~/scratch/protein-datasets/
6+
test_cutoff: 0.95
7+
transform:
8+
class: ProteinView
9+
view: residue
10+
11+
task:
12+
class: MultipleBinaryClassification
13+
model:
14+
class: GearNet
15+
input_dim: 21
16+
hidden_dims: [512, 512, 512, 512, 512, 512]
17+
batch_norm: True
18+
concat_hidden: True
19+
short_cut: True
20+
readout: 'sum'
21+
num_relation: 7
22+
graph_construction_model:
23+
class: GraphConstruction
24+
node_layers:
25+
- class: AlphaCarbonNode
26+
edge_layers:
27+
- class: SequentialEdge
28+
max_distance: 2
29+
- class: SpatialEdge
30+
radius: 10.0
31+
min_distance: 5
32+
- class: KNNEdge
33+
k: 10
34+
min_distance: 5
35+
edge_feature: gearnet
36+
criterion: bce
37+
num_mlp_layer: 3
38+
metric: ['auprc@micro', 'f1_max']
39+
40+
optimizer:
41+
class: AdamW
42+
lr: 1.0e-4
43+
weight_decay: 0
44+
45+
engine:
46+
gpus: {{ gpus }}
47+
batch_size: 2
48+
log_interval: 1000
49+
50+
metric: f1_max
51+
52+
train:
53+
num_epoch: 200
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
output_dir: ~/scratch/protein_output
2+
3+
dataset:
4+
class: EnzymeCommission
5+
path: ~/scratch/protein-datasets/
6+
test_cutoff: 0.95
7+
transform:
8+
class: ProteinView
9+
view: residue
10+
11+
task:
12+
class: MultipleBinaryClassification
13+
model:
14+
class: GearNet
15+
input_dim: 21
16+
hidden_dims: [512, 512, 512, 512, 512, 512]
17+
batch_norm: True
18+
concat_hidden: True
19+
short_cut: True
20+
readout: 'sum'
21+
num_relation: 7
22+
edge_input_dim: 59
23+
num_angle_bin: 8
24+
graph_construction_model:
25+
class: GraphConstruction
26+
node_layers:
27+
- class: AlphaCarbonNode
28+
edge_layers:
29+
- class: SequentialEdge
30+
max_distance: 2
31+
- class: SpatialEdge
32+
radius: 10.0
33+
min_distance: 5
34+
- class: KNNEdge
35+
k: 10
36+
min_distance: 5
37+
edge_feature: gearnet
38+
criterion: bce
39+
num_mlp_layer: 3
40+
metric: ['auprc@micro', 'f1_max']
41+
42+
optimizer:
43+
class: AdamW
44+
lr: 1.0e-4
45+
weight_decay: 0
46+
47+
engine:
48+
gpus: {{ gpus }}
49+
batch_size: 2
50+
log_interval: 1000
51+
52+
model_checkpoint: {{ ckpt }}
53+
54+
metric: f1_max
55+
56+
train:
57+
num_epoch: 200

config/downstream/GO-BP/BERT.yaml

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
output_dir: ~/scratch/protein_output
2+
3+
dataset:
4+
class: GeneOntology
5+
path: ~/scratch/protein-datasets/
6+
branch: BP
7+
test_cutoff: 0.95
8+
transform:
9+
class: ProteinView
10+
view: residue
11+
12+
task:
13+
class: MultipleBinaryClassification
14+
model:
15+
class: ProteinBERT
16+
input_dim: 21
17+
hidden_dim: 512
18+
num_layers: 4
19+
num_heads: 8
20+
intermediate_dim: 2048
21+
hidden_dropout: 0.1
22+
attention_dropout: 0.1
23+
criterion: bce
24+
metric: ['auprc@micro', 'f1_max']
25+
num_mlp_layer: 2
26+
27+
optimizer:
28+
class: Adam
29+
lr: 5.0e-5
30+
31+
engine:
32+
gpus: {{ gpus }}
33+
batch_size: 8
34+
log_interval: 1000
35+
36+
metric: f1_max
37+
38+
train:
39+
num_epoch: 200

0 commit comments

Comments
 (0)