Skip to content

Commit 4f2e188

Browse files
committed
galileo first commit
0 parents  commit 4f2e188

File tree

414 files changed

+46333
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

414 files changed

+46333
-0
lines changed

.gitignore

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
__pycache__/
2+
_ext/
3+
build/
4+
dist/
5+
.cache/
6+
.eggs/
7+
*.egg-info/
8+
.coverage*
9+
*.so*
10+
.*.swp
11+
.vscode/
12+
.DS_Store
13+
.models/
14+
.logs/
15+
.data/

.travis.yml

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
notifications:
2+
email: false
3+
4+
jobs:
5+
include:
6+
- sudo: required
7+
services:
8+
- docker
9+
env: DOCKER_IMAGE=jdgalileo/galileo:devel-cpu
10+
BUILD_TARGET=cpu
11+
- sudo: required
12+
services:
13+
- docker
14+
env: DOCKER_IMAGE=jdgalileo/galileo:devel-gpu
15+
BUILD_TARGET=gpu
16+
17+
install:
18+
- docker pull $DOCKER_IMAGE
19+
20+
script:
21+
- docker run --rm -e BUILD_TARGET=$BUILD_TARGET -v `pwd`:/workspace $DOCKER_IMAGE bash /workspace/docker/build_wheel.sh
22+
- ls /workspace/dist

LICENSE

+409
Large diffs are not rendered by default.

README.md

+48
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
<div align="center">
2+
<img src="docs/imgs/logo.jpg" height="240" />
3+
</div>
4+
5+
[![Build Status](https://travis-ci.org/JDGalileo/galileo.svg?branch=main)](https://travis-ci.org/JDGalileo/galileo)
6+
[![PyPI version](https://badge.fury.io/py/jdgalileo.svg)](https://badge.fury.io/py/jdgalileo)
7+
[![Anaconda-Server Badge](https://anaconda.org/jdgalileo/jdgalileo/badges/version.svg)](https://anaconda.org/jdgalileo/jdgalileo)
8+
9+
近年来,图计算在搜索、推荐和风控等场景中获得显著的效果,但也面临超大规模异构图训练,与现有的深度学习框架Tensorflow和PyTorch结合等难题。
10+
11+
Galileo(伽利略)是一个图深度学习框架,具备超大规模、易使用、易扩展、高性能、双后端等优点,旨在解决超大规模图算法在工业级场景的落地难题,提供图神经网络和图嵌入等模型的训练评估及预测能力。
12+
13+
# 架构介绍
14+
15+
<div align="center">
16+
<img src="docs/imgs/arch.jpg" height="450" /><br/>
17+
Galileo整体架构
18+
</div>
19+
20+
Galileo图深度学习框架采用分层设计理念,主要分为分布式图引擎、图多后端框架、图模型三层。
21+
- **分布式高性能图引擎**:采用紧凑高效的内存结构表达图数据,能够以极低内存支持**超大规模异构图**;基于ZeroCopy机制实现全链路调用,高性能图查询和图采样。
22+
- **图多后端框架**:支持Tensorflow和PyTorch双后端,配置化单机分布式训练,支持Keras和Estimator训练,提供统一的图查询和图采样接口,**易扩展**
23+
- **图模型**:遵循数据与模型解耦,提升代码复用性;基于组件化设计,降低模型实现难度,支持Message Passing范式编写图模型,也支持Python直接访问训练后端接口,**易使用且灵活性高**
24+
25+
26+
# 开始使用
27+
我们提供了Galileo的[pip和conda包](docs/pip.md),推荐在[docker镜像](https://hub.docker.com/r/jdgalileo/galileo)中使用Galileo,免去了安装依赖包的烦恼。也可以从[源码编译安装](docs/install.md)Galileo。
28+
29+
阅读[入门教程](docs/introduce.md)开始使用Galileo。
30+
31+
如果Galileo目前实现的[图模型](examples/README.md)无法满足需求,可以[定制化图模型](docs/custom.md)
32+
33+
使用自己的图数据可以参考[图数据准备](docs/data_prepare.md)
34+
35+
如果图数据量大,可以参考[分布式训练](docs/train.md)
36+
37+
想要了解更多Galileo接口参考[API文档](docs/api.md)
38+
39+
40+
# 联系我们
41+
欢迎通过issue和邮件组([email protected])联系我们。
42+
43+
# LICENSE
44+
Galileo图深度学习框架使用Apache License 2.0许可。
45+
46+
# 致谢
47+
Galileo图深度学习框架由京东集团-京东零售-技术与数据中心荣誉出品,在此感谢京东零售算法通道的大力支持,同时感谢商业提升事业部、搜索与推荐平台部等兄弟部门在开发及使用过程中提出的宝贵意见。
48+

conda/build.sh

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#!/bin/bash
2+
3+
pre=$(ls -d $PIP_CACHE_DIR/../_h_env*)
4+
src_dir=$RECIPE_DIR/..
5+
6+
echo "build $src_dir $PKG_VERSION to $pre"
7+
8+
cd $src_dir
9+
pip install --no-deps --prefix $pre dist/jdgalileo-${PKG_VERSION}-cp38-cp38-linux_x86_64.whl
10+

conda/meta.yaml

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
{% set version = "1.0.0" %}
2+
3+
package:
4+
name: jdgalileo
5+
version: {{ version }}
6+
7+
build:
8+
number: 1
9+
binary_relocation: False
10+
11+
requirements:
12+
run:
13+
- python >=3.8
14+
15+
about:
16+
license: Apache 2.0

conda/post-link.sh

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
#!/bin/bash
2+
python -c "import galileo;print(galileo.libs_dir)" > /etc/ld.so.conf.d/galileo.conf && ldconfig

conda/pre-unlink.sh

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#!/bin/bash
2+
/bin/rm -f /etc/ld.so.conf.d/galileo.conf
3+
ldconfig

docker/base.Dockerfile

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Copyright 2020 JD.com, Inc. Galileo Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
# ==============================================================================
15+
16+
ARG BASE_IMAGE
17+
ARG INSTALL_CUDA
18+
FROM ${BASE_IMAGE}
19+
20+
COPY base_deps.sh /tmp/
21+
ENV INSTALL_CUDA=${INSTALL_CUDA} MAX_JOBS=16
22+
RUN bash /tmp/base_deps.sh && rm -f /tmp/base_deps.sh

docker/base_deps.sh

+159
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
#!/bin/bash
2+
# Copyright 2020 JD.com, Inc. Galileo Authors. All Rights Reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
# ==============================================================================
16+
17+
set -e -u
18+
19+
INSTALL_CUDA=${INSTALL_CUDA:-0}
20+
MAX_JOBS=${MAX_JOBS:-32}
21+
PYPI_URL=https://mirrors.ustc.edu.cn/pypi/web/simple
22+
TORCH_URL=https://download.pytorch.org/whl/torch_stable.html
23+
HADOOP_URL=${HADOOP_URL:-https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz}
24+
ZK_BIN_URL=${ZK_BIN_URL:-https://archive.apache.org/dist/zookeeper/zookeeper-3.5.6/apache-zookeeper-3.5.6-bin.tar.gz}
25+
OPENSSL_URL=${OPENSSL_URL:-http://www.openssl.org/source/openssl-1.1.0c.tar.gz}
26+
27+
28+
function install_gcc() {
29+
#for nvidia/cuda:10.1-cudnn7-devel-centos7
30+
GCC_VERSION=8.4.0
31+
GCC_URL=https://mirrors.ustc.edu.cn/gnu/gcc/gcc-${GCC_VERSION}/gcc-${GCC_VERSION}.tar.xz
32+
MAKE_URL=https://mirrors.ustc.edu.cn/gnu/make/make-4.3.tar.gz
33+
CMAKE_URL=https://github.com/Kitware/CMake/releases/download/v3.19.2/cmake-3.19.2-Linux-x86_64.sh
34+
35+
sed -e 's|^mirrorlist=|#mirrorlist=|g' \
36+
-e 's|^#baseurl=http://mirror.centos.org/centos|baseurl=https://mirrors.ustc.edu.cn/centos|g' \
37+
-i.bak /etc/yum.repos.d/CentOS-Base.repo
38+
yum -y install wget which vim
39+
yum -y groupinstall 'Development Tools'
40+
wget -qO make.tar.gz ${MAKE_URL} && tar xf make.tar.gz && rm -f make.tar.gz
41+
pushd make-4.3 && ./configure --prefix=/usr/local/ && make -j${MAX_JOBS}
42+
make install && popd && rm -rf make-4.3 && yum -y remove make
43+
ln -srf /usr/local/bin/make /usr/bin/gmake
44+
ln -srf /usr/local/bin/make /usr/bin/make
45+
46+
wget -qO cmake.sh ${CMAKE_URL} && bash cmake.sh --skip-license --prefix=/usr/local && rm -f cmake.sh
47+
wget -q ${GCC_URL} && tar xf gcc-${GCC_VERSION}.tar.xz && rm -f gcc-${GCC_VERSION}.tar.xz
48+
pushd gcc-${GCC_VERSION} && ./contrib/download_prerequisites
49+
./configure --enable-checking=release --enable-languages=c,c++,obj-c++ --disable-multilib
50+
make -j${MAX_JOBS} && make install && popd && rm -rf gcc-${GCC_VERSION} && yum -y remove gcc
51+
}
52+
53+
function install_ssl() {
54+
yum -y install wget zlib zlib-devel
55+
wget -O openssl-1.1.0c.tar.gz ${OPENSSL_URL}
56+
tar xf openssl-1.1.0c.tar.gz
57+
pushd openssl-1.1.0c
58+
./config shared zlib
59+
make -j${MAX_JOBS}
60+
make install_sw
61+
popd
62+
rm -fr openssl-1.1.0*
63+
yum clean all && rm -rf /var/cache/yum/*
64+
}
65+
66+
function install_zk() {
67+
yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel
68+
yum clean all && rm -rf /var/cache/yum/*
69+
echo "export JAVA_HOME=/usr/lib/jvm/java" >> /root/.bashrc
70+
echo "export CLASSPATH=\$(/opt/hadoop/bin/hadoop classpath --glob)" \
71+
>> /root/.bashrc
72+
73+
wget -O hadoop.tar.gz ${HADOOP_URL}
74+
tar -xf hadoop.tar.gz -C /opt
75+
rm -f hadoop.tar.gz
76+
mv /opt/hadoop* /opt/hadoop
77+
78+
wget -O zookeeper.tar.gz ${ZK_BIN_URL}
79+
tar xf zookeeper.tar.gz -C /usr/local/
80+
rm -f zookeeper.tar.gz && mv /usr/local/apache-zookeeper-3.5.6-bin \
81+
/usr/local/zookeeper
82+
mkdir -p /usr/local/zookeeper/data
83+
echo -e "dataDir=/usr/local/zookeeper/data\nclientPort=2181" \
84+
> /usr/local/zookeeper/conf/zoo.cfg
85+
echo "JAVA_HOME=/usr/lib/jvm/java" \
86+
> /usr/local/zookeeper/conf/zookeeper-env.sh
87+
}
88+
89+
function setup_env() {
90+
paths=""
91+
libs="/lib64:/usr/local/lib:/usr/local/lib64"
92+
libs+=":/usr/lib/jvm/java/jre/lib/amd64/server"
93+
libs+=":/opt/hadoop/lib/native"
94+
if [ ${INSTALL_CUDA} -ne 0 ];then
95+
paths+="/usr/local/bin"
96+
paths+=":/usr/local/anaconda3/bin"
97+
paths+=":/usr/local/cuda/bin"
98+
libs+=":/usr/local/anaconda3/lib"
99+
libs+=":/usr/local/cuda/lib64"
100+
libs+=":/usr/local/nvidia/lib"
101+
libs+=":/usr/local/nvidia/lib64"
102+
fi
103+
paths+=":/usr/lib/jvm/java/bin"
104+
paths+=":/usr/local/zookeeper/bin"
105+
paths+=":/opt/hadoop/bin"
106+
echo "export PATH=${paths}:\$PATH" >> /root/.bashrc
107+
echo "export LD_LIBRARY_PATH=${libs}:\$LD_LIBRARY_PATH" >> /root/.bashrc
108+
echo "export LIBRARY_PATH=${libs}:\$LIBRARY_PATH" >> /root/.bashrc
109+
echo "export MAX_JOBS=${MAX_JOBS}" >> /root/.bashrc
110+
set +u
111+
source /root/.bashrc || true
112+
}
113+
114+
function install_py3() {
115+
#for nvidia/cuda:10.1-cudnn7-devel-centos7
116+
ANACONDA_URL=https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh
117+
wget -qO anaconda3.sh ${ANACONDA_URL} && bash anaconda3.sh -b -p /usr/local/anaconda3
118+
rm -f anaconda3.sh && cp /usr/local/anaconda3/lib/libstdc++.so.6.0.26 /lib64
119+
ln -srf /lib64/libstdc++.so.6.0.26 /lib64/libstdc++.so.6
120+
ln -srf /lib64/libstdc++.so.6.0.26 /usr/lib64/libstdc++.so.6
121+
}
122+
123+
function install_deps_gpu() {
124+
#for nvidia/cuda:10.1-cudnn7-devel-centos7
125+
pip=/usr/local/anaconda3/bin/pip3
126+
conda=/usr/local/anaconda3/bin/conda
127+
${pip} install -i ${PYPI_URL} pip -U
128+
${pip} config set global.index-url ${PYPI_URL}
129+
${pip} install tensorflow==2.3.0 networkx==2.3 attrs
130+
${pip} install torch==1.6.0+cu101 torchvision==0.7.0+cu101 torch-scatter -f ${TORCH_URL}
131+
${conda} install -y numpy scipy pyyaml ipython mkl mkl-include scikit-learn
132+
${conda} install -c conda-forge -y kazoo py3nvml
133+
${conda} clean -ya && ${pip} cache purge
134+
}
135+
136+
function install_deps_cpu() {
137+
echo "install for python $1"
138+
pip=/opt/python/$1/bin/pip
139+
${pip} config set global.index-url ${PYPI_URL}
140+
${pip} install pip -U
141+
${pip} install torch==1.6.0+cpu torchvision==0.7.0+cpu -f ${TORCH_URL}
142+
${pip} install torch-scatter tensorflow==2.3.0 networkx==2.3 kazoo attrs
143+
${pip} cache purge
144+
ln -sf /opt/python/$1/bin/pip3 /usr/local/bin/pip3
145+
ln -sf /usr/local/bin/python3.8 /usr/local/bin/python3
146+
}
147+
148+
if [ ${INSTALL_CUDA} -eq 0 ];then
149+
echo "install cpu version"
150+
install_deps_cpu cp38-cp38
151+
else
152+
echo "install gpu version"
153+
install_gcc
154+
install_deps_gpu
155+
fi
156+
157+
install_ssl
158+
install_zk
159+
setup_env

docker/build.sh

+41
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
#!/bin/bash
2+
# Copyright 2020 JD.com, Inc. Galileo Authors. All Rights Reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
# ==============================================================================
16+
17+
set -e -u
18+
19+
version=${1:-1.0.0}
20+
echo build galileo ${version}
21+
22+
# base
23+
docker build -t jdgalileo/galileo:base-cpu -f base.Dockerfile \
24+
--build-arg INSTALL_CUDA=0 \
25+
--build-arg BASE_IMAGE=quay.io/pypa/manylinux2014_x86_64:latest .
26+
27+
docker build -t jdgalileo/galileo:base-gpu -f base.Dockerfile \
28+
--build-arg INSTALL_CUDA=1 \
29+
--build-arg BASE_IMAGE=nvidia/cuda:10.1-cudnn7-devel-centos7 .
30+
31+
# devel
32+
docker build -t jdgalileo/galileo:devel-cpu -f devel.Dockerfile \
33+
--build-arg TARGET=cpu .
34+
docker build -t jdgalileo/galileo:devel-gpu -f devel.Dockerfile \
35+
--build-arg TARGET=gpu .
36+
37+
# include galileo package
38+
docker build -t jdgalileo/galileo:${version}-cpu -f galileo.Dockerfile \
39+
--build-arg TARGET=cpu --build-arg VERSION=${version} .
40+
docker build -t jdgalileo/galileo:${version}-gpu -f galileo.Dockerfile \
41+
--build-arg TARGET=gpu --build-arg VERSION=${version} .

0 commit comments

Comments
 (0)