Skip to content

Commit ada23d5

Browse files
authored
Merge pull request #131 from tigergraph/node2vec-doc-improvement
doc(Node2Vec): improve documentation
2 parents 91cc137 + d4bf463 commit ada23d5

File tree

1 file changed

+34
-8
lines changed
  • algorithms/GraphML/Embeddings/Node2Vec

1 file changed

+34
-8
lines changed

algorithms/GraphML/Embeddings/Node2Vec/README.md

Lines changed: 34 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,50 @@
11
# Node2Vec
22

3+
Node2Vec is a vertex embedding algorithm proposed in [node2vec: Scalable Feature Learning for Networks](https://arxiv.org/abs/1607.00653?context=cs). TigerGraph splits the computation into two parts: the random walk process and the embedding training process. Assuming that you are using version 3.6 or greater of the TigerGraph database, ignore the UDF install instructions.
4+
35
## [TigerGraph Node2Vec Documentation](https://docs.tigergraph.com/graph-ml/current/node-embeddings/node2vec)
46

57
## Instructions
68

9+
### Random Walk Process Install
10+
There are two different random walk processes to choose from. The first is regular random walks, implemented in `tg_random_walk.gsql`. This is equivalent to setting `p` and `q` parameters of Node2Vec both to 1, which is also equivalent to the [DeepWalk](https://arxiv.org/pdf/1403.6652.pdf) paper. This version is more performant than `tg_weighted_random_walk.gsql`, due to the less computation that is needed. If the graph is large, you may want to batch the random walk process to reduce memory consumption. Use `tg_random_walk_batch.gsql` if this is desired.
11+
12+
The second option is weighted random walk, as described in the Node2Vec paper. This is implemented in the `tg_weighted_random_walk_sub.gsql` and `tg_weighted_random_walk.gsql`. If your TigerGraph database version is below 3.6, see the UDF installation instructions below. If the graph is large, you may want to batch the random walk process to reduce memory consumption. Use `tg_weighted_random_walk_batch.gsql` with `tg_weighted_random_walk_sub.gsql` if desired.
13+
14+
**To install the un-weighted random walk:** copy the algorithm from `tg_random_walk.gsql` and install it on the database using the standard query install process.
15+
16+
**To install the weighted random walk:** copy `tg_weighted_random_walk_sub.gsql` and install. Then copy and install `tg_weighted_random_walk.gsql`.
17+
18+
### Node2Vec Embedding Install
19+
Once the random walks have been generated, we can use the output to train the Node2Vec model. To install, make sure the proper UDFs are installed. If you are using a TigerGraph database of version 3.6 or greater, the UDFs are pre-installed.
20+
21+
**To install Node2Vec query:** copy the query from `tg_node2vec.gsql` and install on the database.
22+
723
### Preliminary Notes
8-
** Vim is the text editor of choice in this README, any other text editors such as Emacs or Nano will suffice in the commands listed below
24+
Vim is the text editor of choice in this README, any other text editors such as Emacs or Nano will suffice in the commands listed below
925
\
10-
** `<TGversion>` should be replaced with your current Tigergraph version number
26+
`<TGversion>` should be replaced with your current Tigergraph version number
27+
28+
### UDF installation
29+
30+
#### Weighted Random Walk UDF install
31+
If you are using `tg_weighted_random_walk_sub.gsql`, then you will need to install the `tg_random_udf.cpp`. **The code defined in `tg_random_udf.cpp` should be pasted inside the `UDIMPL`f namespace inside of `ExprFunctions.hpp`.
32+
```bash
33+
# open file and paste code
34+
35+
$ vim ~/tigergraph/app/<TGversion>/dev/gdk/gsql/src/QueryUdf/ExprFunctions.hpp
36+
```
1137

12-
### Getting UDF
13-
`node2vec()` is a user-defined function utilized in `node2vec_query.gsql` \
14-
**The code defined in `UDF` should be pasted inside the `UDIMPL` namespace inside of `ExprFunctions.hpp`
38+
#### Node2Vec UDF install
39+
`tg_node2vec_sub()` is a UDF that is called in `tg_node2vec.gsql`. \
40+
**The code defined in `tg_node2vec_sub.cpp` should be pasted inside the `UDIMPL` namespace inside of `ExprFunctions.hpp`
1541
```bash
1642
# open file and paste code
1743

1844
$ vim ~/tigergraph/app/<TGversion>/dev/gdk/gsql/src/QueryUdf/ExprFunctions.hpp
1945
```
2046

21-
### Getting Word2vec file
47+
##### Getting Word2vec file
2248
There are multiple options to get `word2vec.h`
2349
1. Download/Copy `word2vec.h` file into `~/tigergraph/app/<TGversion>/dev/gdk/gsdk/include` directory
2450
2. Create the file and copy the code from `word2vec.h` and paste it into the newly created file (steps shown below)
@@ -30,7 +56,7 @@ $ cd ~/tigergraph/app/<TGversion>/dev/gdk/gsdk/include/
3056
$ vim word2vec.h
3157
```
3258

33-
### Including word2vec
59+
##### Including word2vec
3460
The newly created `word2vec.h` needs to be included in the `ExprUtil.hpp` file
3561
```bash
3662
$ vim ~/tigergraph/app/<TGversion>/dev/gdk/gsql/src/QueryUdf/ExprUtil.hpp
@@ -60,7 +86,7 @@ $ PUT ExprFunctions from "/home/tigergraph/tigergraph/app/<TGversion>/dev/gdk/gs
6086
### Running Queries
6187
** The following instructions can be done with GraphStudio or GSQL terminal
6288
1. Install the `random_walk` query
63-
2. Run query `random_walk` with desired parameters. Visit https://docs.tigergraph.com/tigergraph-platform-overview/graph-algorithm-library#parameters for a description of the random walk query parameters
89+
2. Run query `random_walk` with desired parameters. Visit https://docs.tigergraph.com/graph-ml/current/node-embeddings/node2vec for a description of the random walk query parameters. Make sure that TigerGraph has the correct permissions to write to the output directory you specify.
6490
3. (optional) Inspect output of random_walk query
6591
```bash
6692
# For the default filepath parameter

0 commit comments

Comments
 (0)