forked from gaolk/graph-database-benchmark
-
Notifications
You must be signed in to change notification settings - Fork 12
/
Copy pathREADME
281 lines (229 loc) · 11.9 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
############################################################
# Copyright (c) 2015-now, TigerGraph Inc.
# All rights reserved
# It is provided as it is for benchmark reproducible purpose.
# anyone can use it for benchmark purpose with the
# acknowledgement to TigerGraph.
# Author: Litong Shen [email protected]
############################################################
This article documents the details on how to reproduce the graph database benchmark result on JanusGraph.
Data Sets
===========
- graph500 edge file: http://service.tigergraph.com/download/benchmark/dataset/graph500-22/graph500-22
- graph500 vertex file: http://service.tigergraph.com/download/benchmark/dataset/graph500-22/graph500-22_unique_node
- graph500 vertex header file: http://service.tigergraph.com/download/benchmark/dataset/graph500-22/graph500-22-node-header.txt
- graph500 edge header file: http://service.tigergraph.com/download/benchmark/dataset/graph500-22/graph500-22-edge-header.txt
- twitter edge file: http://service.tigergraph.com/download/benchmark/dataset/twitter/twitter_rv.tar.gz
- twitter vertex file: http://service.tigergraph.com/download/benchmark/dataset/twitter/twitter_rv.net_unique_node
- twitter vertex header file: http://service.tigergraph.com/download/benchmark/dataset/twitter/twitter_rv-node-header.txt
- twitter edge header file: http://service.tigergraph.com/download/benchmark/dataset/twitter/twitter_rv-edge-header.txt
Hardware & Major enviroment
================================
- Amazon EC2 machine r4.8xlarge.
- OS Amazon Linux AMI 2018.03
- Java build 1.8.0_171-b10
- 32vCPUs
- 244GiB memory
- attached a 512G EBS-optimized Provisioned IOPS SSD (IO1), IOPS we set is 25.6k.
Raw data and JanusGraph binary are put on this SSD.
JanusGraph Version
==================
- JanusGraph v0.2.1 downloaded from
https://github.com/JanusGraph/janusgraph/releases/download/v0.2.1/janusgraph-0.2.1-hadoop2.zip
Install JanusGraph
==================
# unzip to put under a folder. E.g. /ebs/install/janusgraph/ folder, this is an installation point we will use through
out this README. Replace it to your installation point when trying to reproduce.
unzip janusgraph-0.2.1-hadoop2.zip
It will create a janusgraph-0.2.1-hadoop2 folder.
Download scripts
================
# create a folder to store scripts, codes, configuration files, and README
# suppose your folder locate in /ebs/benchmark/code/janusgraph/
Move configuration files
=======================
# create a /conf folder under installation point /ebs/install/janusgraph/ by following command
mkdir /ebs/install/janusgraph/conf
# suppose your scipts and java code located in /ebs/benchmark/code/janusgraph/
# move configuration files (*.properties) to conf folder by following command
mv /ebs/benchmark/code/janusgraph/graph500-janusgraph-cassandra.properties /ebs/install/janusgraph/conf/
mv /ebs/benchmark/code/janusgraph/twitter-janusgraph-cassandra.properties.properties /ebs/install/janusgraph/conf/
Split twitter data
==================
need to split twitter data into 10 files then load each one by one
# redirect to the folder contains twitter data
# make a folder to store split twitter data
mkdir /ebs/raw/twitter_rv/splited
- copy FileSpliter.java to /splited folder
cp /ebs/benchmark/code/janusgraph/FileSplitor.java /ebs/raw/twitter_rv/splited/
- split twitter data by following command
cd /ebs/raw/twitter_rv/splited/
javac FileSplitor.java
java FileSplitor /ebs/raw/twitter_rv/twitter_rv.net
Launch Cassandra backend
========================
# Tune MAX_HEAP_SIZE and HEAP_NEWSIZE of cassandra environment file: cassandra-env.sh
# JVM setting: MAX_HEAP_SIZE="24G" HEAP_NEWSIZE = "2400M" line no: 143
cd /ebs/install/janusgraph/janusgraph-0.2.1-hadoop2/conf/cassandra/
vim cassandra-env.sh +143
- launch Cassandra under the respective installation location use following cammand lines
cd /ebs/install/janusgraph/janusgraph-0.2.1-hadoop2
./bin/cassandra
Loading data
==============
- compile_load_knn.sh: bash script to compile java codes for loading data and k-neighborhood query. Description is inside.
- load_graph500_single_thread.sh: bash script use single thread to load graph500 data. Description is inside.
Note: DO NOT need to load vertex data first, then load edge data.
- vertex_loader.sh: bash script use single/multi threads to load vertex data
and generate hashmap for internalId : externalId. Description is inside.
- edge_graph500_multi_threads.sh: bash script to load graph500 edge data. Description is inside.
- edge_loader_twitter.sh: bash script use single/multi threads to load spilted twitter edge data. Description is inside.
Before load:
------------
# JanusGraph will not skip duplication so you have to remove duplicate data (edge) first
- remove duplicate edge data
# need to modify parameters in dedup.java
File file (line 11): path to raw edge data
String resultPath (line 17): path to new edge data
compile and run java codes for dedup process
javac dedup.java
java dedup
- compile java codes for loading data and k-neighborhood query.
bash compile_load_knn.sh
Note: need to modify parameters in compile_load_knn.sh, description is inside.
To load:
------------
- NOTE: need to delete hashmap file before reload vertex file.
- Load graph500 data
1. single thread
nohup bash load_graph500_single_thread.sh &
Note: need to modify parameters in load_graph500_single_thread.sh, description is inside.
2. multi threads: load vertex data first, then load edge data
# need to modify parameters in multiThreadVertexImporter.java
String hashMapName (line 142): name for hashmap file that write to disk
String hashMapPath (line 142): path for hashmap file that write to disk
# use following cammand lines to compile and load data
bash compile_load_knn.sh
nohup bash vertex_loader.sh &
nohup bash edge_graph500_multi_threads.sh
Note: need to modify parameters in vertex_loader.sh and edge_graph500_multi_threads.sh
- Load twitter data
1. single thread
# before load, modify following parameters in singleThreadVertexImporter.java:
String hashMapName (line 67): name for generating hashmap
String hashMapPath (line 68): "/ebs/raw/twitter_rv/" + hashMapName // assume your twitter_rv located in /ebs/raw/twitter_rv/
# use following cammand lines to compile and load data
bash compile_load_knn.sh
nohup bash vertex_loader.sh &
nohup bash edge_loader_twitter.sh &
Note: need to modify parameters in vertex_loader.sh and edge_loader_twitter.sh
2. multi thread
# before load, modify following parameters in multiThreadVertexImporter.java
String hashMapName (line 142): name for hashmap file that write to disk
String hashMapPath (line 142): path for hashmap file that write to disk
# use following cammand lines to compile and load data
bash compile_load_knn.sh
nohup bash vertex_loader.sh &
nohup bash edge_loader_twitter.sh &
Note: need to modify parameters in vertex_loader.sh and edge_loader_twitter.sh.
- check final storage . change the path to your .db folder
du -hc /ebs/install/janusgraph/janusgraph-0.2.1-hadoop2/db/cassandra/data/graph500/
du -hc /ebs/install/janusgraph/janusgraph-0.2.1-hadoop2/db/cassandra/data/twitter_hashMap/
Run benchmark
================
Before running the benchmark script below, please make sure the current user has the READ permission from the raw file,
and the WRITE permission on the folder where you put the benchmark script folder
Output folder
-----------------
Assuming you are in the script folder /ebs/benchmark/code/janusgraph, create /result folder to store results by following command:
mkdir result
Graph500
-----------------
# khop
**********
redirect to the folder contains knn-graph500.sh by following command:
cd /ebs/benchmark/code/janusgraph
# modify following parameters in JanusKNeighbor.java:
String resultFileName (line 61): "KN-latency-Graph500-" + steps
String resultFilePath (line 62): path/to/result/folder/ + resultFileName
// set timeout to 180s for 1-hop and 2-hops, set to 9000s for 3-hops and 6-hops
final Duration timeout (line 74): Duration.ofSeconds(180)
- use the following command to compile k-hops java code:
bash compile_load_knn.sh
Note: before running modify the parameters in scripts. Description is inside.
- use the following command to run k-hops query:
nohup bash knn-janus.sh &
Note: before running modify the parameters in scripts. Description is inside.
#wcc
************
# redirect to the folder contains compile_WCC.sh
cd /ebs/benchmark/code/janusgraph
# modify following parameters in JanusWCC.java (e.g. locate in /ebs/benchmark/code/janusgraph/WCC):
String resultFileName (line 50): "WCC-latency-graph500"
String resultFilePath (line 51): path/to/result/folder/ + resultFileName
final Duration timeout (line 57): set timeout in seconds
- use the following command to compile WCC java code:
bash compile_WCC.sh
Note: before running modify the parameters in scripts. Description is inside.
- use the following command to run WCC:
nohup bash WCC-janus.sh &
Note: before running modify the parameters in scripts. Description is inside.
#page rank, run 10 iterations
*****************************
# redirect to the folder contains compile_PG.sh
cd /ebs/benchmark/code/janusgraph
# modify following parameters in JanusPageRank.java (e.g. locate in /ebs/benchmark/code/janusgraph/PG)
String resultFileName (line 50): "PageRank-latency-Graph500"
String resultFilePath (line 51): path/to/result/folder/ + resultFileName
final Duration timeout (line 57): set timeout in seconds
- use the following command to compile WCC java code:
bash compile_PG.sh
Note: before running modify the parameters in scripts. Description is inside.
- use the following command to run PG:
nohup bash pg-janus.sh &
Note: before running modify the parameters in scripts. Description is inside.
Twitter
-------------
#khop
***********
# redirect to the folder contains knn-graph500.sh by following command:
cd /ebs/benchmark/code/janusgraph
# modify following parameters in JanusKNeighbor.java:
String resultFileName (line 61): "KN-latency-Twitter-" + steps
String resultFilePath (line 62): path/to/result/folder/ + resultFileName
// set to 180s for 1-hop and 2-hops, set to 9000s for 3-hops and 6-hops
final Duration timeout (line 74): Duration.ofSeconds(180)
- use the following command to compile k-hops java code:
bash compile_load_knn.sh
Note: before running modify the parameters in scripts. Description is inside.
- use the following command to run k-hops query:
nohup bash knn-janus.sh &
Note: before running modify the parameters in scripts. Description is inside.
#wcc
************
# redirect to the folder contains compile_WCC.sh
cd /ebs/benchmark/code/janusgraph
# modify following parameters in JanusWCC.java (e.g. locate in /ebs/benchmark/code/janusgraph/WCC):
String resultFileName (line 50): "WCC-latency-twitter"
String resultFilePath (line 51): path/to/result/folder/ + resultFileName
final Duration timeout (line 57): set timeout in seconds
- use the following command to compile WCC java code:
bash compile_WCC.sh
Note: before running modify the parameters in scripts. Description is inside.
- use the following command to run WCC:
nohup bash WCC-janus.sh &
Note: before running modify the parameters in scripts. Description is inside.
#page rank, run 10 iterations.
*****************************
# redirect to the folder contains compile_PG.sh
cd /ebs/benchmark/code/janusgraph
# modify following parameters in JanusPageRank.java (e.g. locate in /ebs/benchmark/code/janusgraph/PG)
String resultFileName (line 50): "PageRank-latency-twitter"
String resultFilePath (line 51): path/to/result/folder/ + resultFileName
final Duration timeout (line 57): set timeout in seconds
- use the following command to compile WCC java code:
bash compile_PG.sh
Note: before running modify the parameters in scripts. Description is inside.
- use the following command to run PG:
nohup bash pg-janus.sh &
Note: before running modify the parameters in scripts. Description is inside.