Skip to content

Commit 8e981db

Browse files
lygztqzhangtianqihetong007
authored
[Example] Add Graph Cross Net (GXN) example for pytorch backend (dmlc#2559)
* add sagpool example for pytorch backend * polish sagpool example for pytorch backend * [Example] SAGPool: use std variance * [Example] SAGPool: change to std * add sagpool example to index page * add graph property prediction tag to sagpool * [Example] add graph classification example HGP-SL * [Example] fix sagpool * fix bug * [Example] change tab to space in README of hgp-sl * remove redundant files * remote redundant network * [Example]: change link from code to doc in HGP-SL * [Example] in HGP-SL, change to meaningful name * [Example] Fix path mistake for 'hardgat' * [Bug Fix] Fix undefined var bug in LegacyTUDataset * upt * [Bug Fix] Fix cache file name bug in TUDataset * [Example] Add GXN example for pytorch backend * modify readme * add more exp result Co-authored-by: zhangtianqi <[email protected]> Co-authored-by: Tong He <[email protected]>
1 parent 9fc5eed commit 8e981db

File tree

10 files changed

+1572
-0
lines changed

10 files changed

+1572
-0
lines changed

examples/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ The folder contains example implementations of selected research papers related
4545
| [GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation](#gnnfilm) | :heavy_check_mark: | | | | |
4646
| [Hierarchical Graph Pooling with Structure Learning](#hgp-sl) | | | :heavy_check_mark: | | |
4747
| [Graph Representation Learning via Hard and Channel-Wise Attention Networks](#hardgat) |:heavy_check_mark: | | | | |
48+
| [Graph Cross Networks with Vertex Infomax Pooling](#gxn) | | | :heavy_check_mark: | | |
4849
| [Towards Deeper Graph Neural Networks](#dagnn) | :heavy_check_mark: | | | | |
4950

5051
## 2020
@@ -73,6 +74,10 @@ The folder contains example implementations of selected research papers related
7374
- Example code: [Pytorch](../examples/pytorch/GNN-FiLM)
7475
- Tags: multi-relational graphs, hypernetworks, GNN architectures
7576

77+
- <a name="gxn"></a> Li, Maosen, et al. Graph Cross Networks with Vertex Infomax Pooling. [Paper link](https://arxiv.org/abs/2010.01804).
78+
- Example code: [Pytorch](../examples/pytorch/gxn)
79+
- Tags: pooling, graph classification
80+
7681
- <a name="dagnn"></a> Liu et al. Towards Deeper Graph Neural Networks. [Paper link](https://arxiv.org/abs/2007.09296).
7782
- Example code: [Pytorch](../examples/pytorch/dagnn)
7883
- Tags: over-smoothing, node classification

examples/pytorch/gxn/README.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# DGL Implementation of Graph Cross Networks with Vertex Infomax Pooling (NeurIPS 2020)
2+
3+
This DGL example implements the GNN model proposed in the paper [Graph Cross Networks with Vertex Infomax Pooling](https://arxiv.org/pdf/2010.01804.pdf).
4+
The author's codes of implementation is in [here](https://github.com/limaosen0/GXN)
5+
6+
7+
The graph dataset used in this example
8+
---------------------------------------
9+
The DGL's built-in LegacyTUDataset. This is a serial of graph kernel datasets for graph classification. We use 'DD', 'PROTEINS', 'ENZYMES', 'IMDB-BINARY', 'IMDB-MULTI' and 'COLLAB' in this GXN implementation. All these datasets are randomly splited to train and test set with ratio 0.9 and 0.1 (which is similar to the setting in the author's implementation).
10+
11+
NOTE: Follow the setting of the author's implementation, for 'DD' and 'PROTEINS', we use one-hot node label as input node features. For ENZYMES', 'IMDB-BINARY', 'IMDB-MULTI' and 'COLLAB', we use the concatenation of one-hot node label (if available) and one-hot node degree as input node features.
12+
13+
DD
14+
- NumGraphs: 1178
15+
- AvgNodesPerGraph: 284.32
16+
- AvgEdgesPerGraph: 715.66
17+
- NumFeats: 89
18+
- NumClasses: 2
19+
20+
PROTEINS
21+
- NumGraphs: 1113
22+
- AvgNodesPerGraph: 39.06
23+
- AvgEdgesPerGraph: 72.82
24+
- NumFeats: 1
25+
- NumClasses: 2
26+
27+
ENZYMES
28+
- NumGraphs: 600
29+
- AvgNodesPerGraph: 32.63
30+
- AvgEdgesPerGraph: 62.14
31+
- NumFeats: 18
32+
- NumClasses: 6
33+
34+
IMDB-BINARY
35+
- NumGraphs: 1000
36+
- AvgNodesPerGraph: 19.77
37+
- AvgEdgesPerGraph: 96.53
38+
- NumFeats: -
39+
- NumClasses: 2
40+
41+
IMDB-MULTI
42+
- NumGraphs: 1500
43+
- AvgNodesPerGraph: 13.00
44+
- AvgEdgesPerGraph: 65.94
45+
- NumFeats: -
46+
- NumClasses: 3
47+
48+
COLLAB
49+
- NumGraphs: 5000
50+
- AvgNodesPerGraph: 74.49
51+
- AvgEdgesPerGraph: 2457.78
52+
- NumFeats: -
53+
- NumClasses: 3
54+
55+
56+
How to run example files
57+
--------------------------------
58+
If you want to reproduce the author's result, at the root directory of this example (gxn), run
59+
60+
```bash
61+
bash scripts/run_gxn.sh ${dataset_name} ${device_id} ${num_trials} ${print_trainlog_every}
62+
```
63+
64+
If you want to perform a early-stop version experiment, at the root directory of this example, run
65+
66+
```bash
67+
bash scripts/run_gxn_early_stop.sh ${dataset_name} ${device_id} ${num_trials} ${print_trainlog_every}
68+
```
69+
70+
where
71+
- dataset_name: Dataset name used in this experiment. Could be DD', 'PROTEINS', 'ENZYMES', 'IMDB-BINARY', 'IMDB-MULTI' and 'COLLAB'.
72+
- device_id: ID of computation device. -1 for pure CPU computation. For example if you only have single GPU, set this value to be 0.
73+
- num_trials: How many times does the experiment conducted.
74+
- print_training_log_every: Print training log every ? epochs. -1 for silent training.
75+
76+
77+
NOTE: If your have problem when using 'IMDB-BINARY', 'IMDB-MULTI' and 'COLLAB', it could be caused by a bug in `LegacyTUDataset`/`TUDataset` in DGL (see [here](https://github.com/dmlc/dgl/pull/2543)). If your DGL version is less than or equal to 0.5.3 and you encounter problems like "undefined variable" (`LegacyTUDataset`) or "the argument `force_reload=False` does not work" (`TUDataset`), try:
78+
- use `TUDataset` with `force_reload=True`
79+
- delete dataset files
80+
- change `degree_as_feature(dataset)` and `node_label_as_feature(dataset, mode=mode)` to `degree_as_feature(dataset, save=False)` and `node_label_as_feature(dataset, mode=mode, save=False)` in `main.py`.
81+
82+
Performance
83+
-------------------------
84+
85+
**Accuracy**
86+
87+
**NOTE**: Different from our implementation, the author uses fixed dataset split. Thus there may be difference between our result and the author's result. **To compare our implementation with the author's, we follow the setting in the author's implementation that performs model-selection on testset**. We also try early-stop with patience equals to 1/5 of the total number of epochs for some datasets. The result of `Author's Code` in the table below are obtained using first-ford data as the test dataset.
88+
89+
| | DD | PROTEINS | ENZYMES | IMDB-BINARY | IMDB-MULTI | COLLAB |
90+
| ------------------| ------------ | ----------- | ----------- | ----------- | ---------- | ---------- |
91+
| Reported in Paper | 82.68(4.1 ) | 79.91(4.1) | 57.50(6.1) | 78.60(2.3) | 55.20(2.5) | 78.82(1.4) |
92+
| Author's Code | 82.05 | 72.07 | 58.33 | 77.00 | 56.00 | 80.40 |
93+
| DGL | 82.97(3.0) | 78.21(2.0) | 57.50(5.5) | 78.70(4.0) | 52.26(2.0) | 80.58(2.4) |
94+
| DGL(early-stop) | 78.66(4.3) | 73.12(3.1) | 39.83(7.4) | 68.60(6.7) | 45.40(9.4) | 76.18(1.9) |
95+
96+
97+
**Speed**
98+
99+
Device:
100+
- CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
101+
- GPU: Tesla V100-SXM2 16GB
102+
103+
In seconds
104+
105+
| | DD | PROTEINS | ENZYMES | IMDB-BINARY | IMDB-MULTI | COLLAB(batch_size=64) | COLLAB(batch_size=20) |
106+
| ------------- | ----- | -------- | ------- | ----------- | ---------- | --------------------- | --------------------- |
107+
| Author's Code | 25.32 | 2.93 | 1.53 | 2.42 | 3.58 | 96.69 | 19.78 |
108+
| DGL | 2.64 | 1.86 | 1.03 | 1.79 | 2.45 | 23.52 | 32.29 |
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
import os
2+
import sys
3+
import logging
4+
import torch
5+
import numpy as np
6+
from dgl.data import LegacyTUDataset
7+
import json
8+
9+
10+
def _load_check_mark(path:str):
11+
if os.path.exists(path):
12+
with open(path, 'r') as f:
13+
return json.load(f)
14+
else:
15+
return {}
16+
17+
def _save_check_mark(path:str, marks:dict):
18+
with open(path, 'w') as f:
19+
json.dump(marks, f)
20+
21+
22+
def node_label_as_feature(dataset:LegacyTUDataset, mode="concat", save=True):
23+
"""
24+
Description
25+
-----------
26+
Add node labels to graph node features dict
27+
28+
Parameters
29+
----------
30+
dataset : LegacyTUDataset
31+
The dataset object
32+
concat : str, optional
33+
How to add node label to the graph. Valid options are "add",
34+
"replace" and "concat".
35+
- "add": Directly add node_label to graph node feature dict.
36+
- "concat": Concatenate "feat" and "node_label"
37+
- "replace": Use "node_label" as "feat"
38+
Default: :obj:`"concat"`
39+
save : bool, optional
40+
Save the result dataset.
41+
Default: :obj:`True`
42+
"""
43+
# check if node label is not available
44+
if not os.path.exists(dataset._file_path("node_labels")) or len(dataset) == 0:
45+
logging.warning("No Node Label Data")
46+
return dataset
47+
48+
# check if has cached value
49+
check_mark_name = "node_label_as_feature"
50+
check_mark_path = os.path.join(
51+
dataset.save_path, "info_{}_{}.json".format(dataset.name, dataset.hash))
52+
check_mark = _load_check_mark(check_mark_path)
53+
if check_mark_name in check_mark \
54+
and check_mark[check_mark_name] \
55+
and not dataset._force_reload:
56+
logging.warning("Using cached value in node_label_as_feature")
57+
return dataset
58+
logging.warning("Adding node labels into node features..., mode={}".format(mode))
59+
60+
# check if graph has "feat"
61+
if "feat" not in dataset[0][0].ndata:
62+
logging.warning("Dataset has no node feature 'feat'")
63+
if mode.lower() == "concat":
64+
mode = "replace"
65+
66+
# first read node labels
67+
DS_node_labels = dataset._idx_from_zero(
68+
np.loadtxt(dataset._file_path("node_labels"), dtype=int))
69+
one_hot_node_labels = dataset._to_onehot(DS_node_labels)
70+
71+
# read graph idx
72+
DS_indicator = dataset._idx_from_zero(
73+
np.genfromtxt(dataset._file_path("graph_indicator"), dtype=int))
74+
node_idx_list = []
75+
for idx in range(np.max(DS_indicator) + 1):
76+
node_idx = np.where(DS_indicator == idx)
77+
node_idx_list.append(node_idx[0])
78+
79+
# add to node feature dict
80+
for idx, g in zip(node_idx_list, dataset.graph_lists):
81+
node_labels_tensor = torch.tensor(one_hot_node_labels[idx, :])
82+
if mode.lower() == "concat":
83+
g.ndata["feat"] = torch.cat(
84+
(g.ndata["feat"], node_labels_tensor), dim=1)
85+
elif mode.lower() == "add":
86+
g.ndata["node_label"] = node_labels_tensor
87+
else: # replace
88+
g.ndata["feat"] = node_labels_tensor
89+
90+
if save:
91+
check_mark[check_mark_name] = True
92+
_save_check_mark(check_mark_path, check_mark)
93+
dataset.save()
94+
return dataset
95+
96+
97+
def degree_as_feature(dataset:LegacyTUDataset, save=True):
98+
"""
99+
Description
100+
-----------
101+
Use node degree (in one-hot format) as node feature
102+
103+
Parameters
104+
----------
105+
dataset : LegacyTUDataset
106+
The dataset object
107+
108+
save : bool, optional
109+
Save the result dataset.
110+
Default: :obj:`True`
111+
"""
112+
# first check if already have such feature
113+
check_mark_name = "degree_as_feat"
114+
feat_name = "feat"
115+
check_mark_path = os.path.join(
116+
dataset.save_path, "info_{}_{}.json".format(dataset.name, dataset.hash))
117+
check_mark = _load_check_mark(check_mark_path)
118+
119+
if check_mark_name in check_mark \
120+
and check_mark[check_mark_name] \
121+
and not dataset._force_reload:
122+
logging.warning("Using cached value in 'degree_as_feature'")
123+
return dataset
124+
125+
logging.warning("Adding node degree into node features...")
126+
min_degree = sys.maxsize
127+
max_degree = 0
128+
for i in range(len(dataset)):
129+
degrees = dataset.graph_lists[i].in_degrees()
130+
min_degree = min(min_degree, degrees.min().item())
131+
max_degree = max(max_degree, degrees.max().item())
132+
133+
vec_len = max_degree - min_degree + 1
134+
for i in range(len(dataset)):
135+
num_nodes = dataset.graph_lists[i].num_nodes()
136+
node_feat = torch.zeros((num_nodes, vec_len))
137+
degrees = dataset.graph_lists[i].in_degrees()
138+
node_feat[torch.arange(num_nodes), degrees - min_degree] = 1.
139+
dataset.graph_lists[i].ndata[feat_name] = node_feat
140+
141+
if save:
142+
check_mark[check_mark_name] = True
143+
dataset.save()
144+
_save_check_mark(check_mark_path, check_mark)
145+
return dataset

0 commit comments

Comments
 (0)