janusgraph
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||
############################################################ # Copyright (c) 2015-now, TigerGraph Inc. # All rights reserved # It is provided as it is for benchmark reproducible purpose. # anyone can use it for benchmark purpose with the # acknowledgement to TigerGraph. # Author: Litong Shen [email protected] ############################################################ This article documents the details on how to reproduce the graph database benchmark result on JanusGraph. Data Sets =========== - graph500 edge file: http://service.tigergraph.com/download/benchmark/dataset/graph500-22/graph500-22 - graph500 vertex file: http://service.tigergraph.com/download/benchmark/dataset/graph500-22/graph500-22_unique_node - graph500 vertex header file: http://service.tigergraph.com/download/benchmark/dataset/graph500-22/graph500-22-node-header.txt - graph500 edge header file: http://service.tigergraph.com/download/benchmark/dataset/graph500-22/graph500-22-edge-header.txt - twitter edge file: http://service.tigergraph.com/download/benchmark/dataset/twitter/twitter_rv.tar.gz - twitter vertex file: http://service.tigergraph.com/download/benchmark/dataset/twitter/twitter_rv.net_unique_node - twitter vertex header file: http://service.tigergraph.com/download/benchmark/dataset/twitter/twitter_rv-node-header.txt - twitter edge header file: http://service.tigergraph.com/download/benchmark/dataset/twitter/twitter_rv-edge-header.txt Hardware & Major enviroment ================================ - Amazon EC2 machine r4.8xlarge. - OS Amazon Linux AMI 2018.03 - Java build 1.8.0_171-b10 - 32vCPUs - 244GiB memory - attached a 512G EBS-optimized Provisioned IOPS SSD (IO1), IOPS we set is 25.6k. Raw data and JanusGraph binary are put on this SSD. JanusGraph Version ================== - JanusGraph v0.2.1 downloaded from https://github.com/JanusGraph/janusgraph/releases/download/v0.2.1/janusgraph-0.2.1-hadoop2.zip Install JanusGraph ================== # unzip to put under a folder. E.g. /ebs/install/janusgraph/ folder, this is an installation point we will use through out this README. Replace it to your installation point when trying to reproduce. unzip janusgraph-0.2.1-hadoop2.zip It will create a janusgraph-0.2.1-hadoop2 folder. Download scripts ================ # create a folder to store scripts, codes, configuration files, and README # suppose your folder locate in /ebs/benchmark/code/janusgraph/ Move configuration files ======================= # create a /conf folder under installation point /ebs/install/janusgraph/ by following command mkdir /ebs/install/janusgraph/conf # suppose your scipts and java code located in /ebs/benchmark/code/janusgraph/ # move configuration files (*.properties) to conf folder by following command mv /ebs/benchmark/code/janusgraph/graph500-janusgraph-cassandra.properties /ebs/install/janusgraph/conf/ mv /ebs/benchmark/code/janusgraph/twitter-janusgraph-cassandra.properties.properties /ebs/install/janusgraph/conf/ Split twitter data ================== need to split twitter data into 10 files then load each one by one # redirect to the folder contains twitter data # make a folder to store split twitter data mkdir /ebs/raw/twitter_rv/splited - copy FileSpliter.java to /splited folder cp /ebs/benchmark/code/janusgraph/FileSplitor.java /ebs/raw/twitter_rv/splited/ - split twitter data by following command cd /ebs/raw/twitter_rv/splited/ javac FileSplitor.java java FileSplitor /ebs/raw/twitter_rv/twitter_rv.net Launch Cassandra backend ======================== # Tune MAX_HEAP_SIZE and HEAP_NEWSIZE of cassandra environment file: cassandra-env.sh # JVM setting: MAX_HEAP_SIZE="24G" HEAP_NEWSIZE = "2400M" line no: 143 cd /ebs/install/janusgraph/janusgraph-0.2.1-hadoop2/conf/cassandra/ vim cassandra-env.sh +143 - launch Cassandra under the respective installation location use following cammand lines cd /ebs/install/janusgraph/janusgraph-0.2.1-hadoop2 ./bin/cassandra Loading data ============== - compile_load_knn.sh: bash script to compile java codes for loading data and k-neighborhood query. Description is inside. - load_graph500_single_thread.sh: bash script use single thread to load graph500 data. Description is inside. Note: DO NOT need to load vertex data first, then load edge data. - vertex_loader.sh: bash script use single/multi threads to load vertex data and generate hashmap for internalId : externalId. Description is inside. - edge_graph500_multi_threads.sh: bash script to load graph500 edge data. Description is inside. - edge_loader_twitter.sh: bash script use single/multi threads to load spilted twitter edge data. Description is inside. Before load: ------------ # JanusGraph will not skip duplication so you have to remove duplicate data (edge) first - remove duplicate edge data # need to modify parameters in dedup.java File file (line 11): path to raw edge data String resultPath (line 17): path to new edge data compile and run java codes for dedup process javac dedup.java java dedup - compile java codes for loading data and k-neighborhood query. bash compile_load_knn.sh Note: need to modify parameters in compile_load_knn.sh, description is inside. To load: ------------ - NOTE: need to delete hashmap file before reload vertex file. - Load graph500 data 1. single thread nohup bash load_graph500_single_thread.sh & Note: need to modify parameters in load_graph500_single_thread.sh, description is inside. 2. multi threads: load vertex data first, then load edge data # need to modify parameters in multiThreadVertexImporter.java String hashMapName (line 142): name for hashmap file that write to disk String hashMapPath (line 142): path for hashmap file that write to disk # use following cammand lines to compile and load data bash compile_load_knn.sh nohup bash vertex_loader.sh & nohup bash edge_graph500_multi_threads.sh Note: need to modify parameters in vertex_loader.sh and edge_graph500_multi_threads.sh - Load twitter data 1. single thread # before load, modify following parameters in singleThreadVertexImporter.java: String hashMapName (line 67): name for generating hashmap String hashMapPath (line 68): "/ebs/raw/twitter_rv/" + hashMapName // assume your twitter_rv located in /ebs/raw/twitter_rv/ # use following cammand lines to compile and load data bash compile_load_knn.sh nohup bash vertex_loader.sh & nohup bash edge_loader_twitter.sh & Note: need to modify parameters in vertex_loader.sh and edge_loader_twitter.sh 2. multi thread # before load, modify following parameters in multiThreadVertexImporter.java String hashMapName (line 142): name for hashmap file that write to disk String hashMapPath (line 142): path for hashmap file that write to disk # use following cammand lines to compile and load data bash compile_load_knn.sh nohup bash vertex_loader.sh & nohup bash edge_loader_twitter.sh & Note: need to modify parameters in vertex_loader.sh and edge_loader_twitter.sh. - check final storage . change the path to your .db folder du -hc /ebs/install/janusgraph/janusgraph-0.2.1-hadoop2/db/cassandra/data/graph500/ du -hc /ebs/install/janusgraph/janusgraph-0.2.1-hadoop2/db/cassandra/data/twitter_hashMap/ Run benchmark ================ Before running the benchmark script below, please make sure the current user has the READ permission from the raw file, and the WRITE permission on the folder where you put the benchmark script folder Output folder ----------------- Assuming you are in the script folder /ebs/benchmark/code/janusgraph, create /result folder to store results by following command: mkdir result Graph500 ----------------- # khop ********** redirect to the folder contains knn-graph500.sh by following command: cd /ebs/benchmark/code/janusgraph # modify following parameters in JanusKNeighbor.java: String resultFileName (line 61): "KN-latency-Graph500-" + steps String resultFilePath (line 62): path/to/result/folder/ + resultFileName // set timeout to 180s for 1-hop and 2-hops, set to 9000s for 3-hops and 6-hops final Duration timeout (line 74): Duration.ofSeconds(180) - use the following command to compile k-hops java code: bash compile_load_knn.sh Note: before running modify the parameters in scripts. Description is inside. - use the following command to run k-hops query: nohup bash knn-janus.sh & Note: before running modify the parameters in scripts. Description is inside. #wcc ************ # redirect to the folder contains compile_WCC.sh cd /ebs/benchmark/code/janusgraph # modify following parameters in JanusWCC.java (e.g. locate in /ebs/benchmark/code/janusgraph/WCC): String resultFileName (line 50): "WCC-latency-graph500" String resultFilePath (line 51): path/to/result/folder/ + resultFileName final Duration timeout (line 57): set timeout in seconds - use the following command to compile WCC java code: bash compile_WCC.sh Note: before running modify the parameters in scripts. Description is inside. - use the following command to run WCC: nohup bash WCC-janus.sh & Note: before running modify the parameters in scripts. Description is inside. #page rank, run 10 iterations ***************************** # redirect to the folder contains compile_PG.sh cd /ebs/benchmark/code/janusgraph # modify following parameters in JanusPageRank.java (e.g. locate in /ebs/benchmark/code/janusgraph/PG) String resultFileName (line 50): "PageRank-latency-Graph500" String resultFilePath (line 51): path/to/result/folder/ + resultFileName final Duration timeout (line 57): set timeout in seconds - use the following command to compile WCC java code: bash compile_PG.sh Note: before running modify the parameters in scripts. Description is inside. - use the following command to run PG: nohup bash pg-janus.sh & Note: before running modify the parameters in scripts. Description is inside. Twitter ------------- #khop *********** # redirect to the folder contains knn-graph500.sh by following command: cd /ebs/benchmark/code/janusgraph # modify following parameters in JanusKNeighbor.java: String resultFileName (line 61): "KN-latency-Twitter-" + steps String resultFilePath (line 62): path/to/result/folder/ + resultFileName // set to 180s for 1-hop and 2-hops, set to 9000s for 3-hops and 6-hops final Duration timeout (line 74): Duration.ofSeconds(180) - use the following command to compile k-hops java code: bash compile_load_knn.sh Note: before running modify the parameters in scripts. Description is inside. - use the following command to run k-hops query: nohup bash knn-janus.sh & Note: before running modify the parameters in scripts. Description is inside. #wcc ************ # redirect to the folder contains compile_WCC.sh cd /ebs/benchmark/code/janusgraph # modify following parameters in JanusWCC.java (e.g. locate in /ebs/benchmark/code/janusgraph/WCC): String resultFileName (line 50): "WCC-latency-twitter" String resultFilePath (line 51): path/to/result/folder/ + resultFileName final Duration timeout (line 57): set timeout in seconds - use the following command to compile WCC java code: bash compile_WCC.sh Note: before running modify the parameters in scripts. Description is inside. - use the following command to run WCC: nohup bash WCC-janus.sh & Note: before running modify the parameters in scripts. Description is inside. #page rank, run 10 iterations. ***************************** # redirect to the folder contains compile_PG.sh cd /ebs/benchmark/code/janusgraph # modify following parameters in JanusPageRank.java (e.g. locate in /ebs/benchmark/code/janusgraph/PG) String resultFileName (line 50): "PageRank-latency-twitter" String resultFilePath (line 51): path/to/result/folder/ + resultFileName final Duration timeout (line 57): set timeout in seconds - use the following command to compile WCC java code: bash compile_PG.sh Note: before running modify the parameters in scripts. Description is inside. - use the following command to run PG: nohup bash pg-janus.sh & Note: before running modify the parameters in scripts. Description is inside.