Skip to content

Commit 64a92cd

Browse files
committed
Update README.md for release 2.0.100
1 parent b264122 commit 64a92cd

File tree

1 file changed

+61
-43
lines changed

1 file changed

+61
-43
lines changed

README.md

+61-43
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This repository holds PyTorch bindings maintained by Intel for the Intel® oneAP
66

77
[PyTorch](https://github.com/pytorch/pytorch) is an open-source machine learning framework.
88

9-
[Intel® oneCCL](https://github.com/oneapi-src/oneCCL) (collective communications library) is a library for efficient distributed deep learning training implementing such collectives like `allreduce`, `allgather`, `alltoall`. For more information on oneCCL, please refer to the [oneCCL documentation](https://spec.oneapi.com/versions/latest/elements/oneCCL/source/index.html) and [oneCCL specification](https://spec.oneapi.com/versions/latest/elements/oneCCL/source/index.html).
9+
[Intel® oneCCL](https://github.com/oneapi-src/oneCCL) (collective communications library) is a library for efficient distributed deep learning training implementing such collectives like `allreduce`, `allgather`, `alltoall`. For more information on oneCCL, please refer to the [oneCCL documentation](https://spec.oneapi.com/versions/latest/elements/oneCCL/source/index.html).
1010

1111
`oneccl_bindings_for_pytorch` module implements PyTorch C10D ProcessGroup API and can be dynamically loaded as external ProcessGroup and only works on Linux platform now.
1212

@@ -16,28 +16,28 @@ The table below shows which functions are available for use with CPU / Intel dGP
1616

1717
| | CPU | GPU |
1818
| :--------------- | :---: | :---: |
19-
| `send` | × | × |
20-
| `recv` | × | × |
19+
| `send` | × | |
20+
| `recv` | × | |
2121
| `broadcast` |||
2222
| `all_reduce` |||
2323
| `reduce` |||
2424
| `all_gather` |||
2525
| `gather` |||
2626
| `scatter` | × | × |
27-
| `reduce_scatter` | × | × |
27+
| `reduce_scatter` | | |
2828
| `all_to_all` |||
2929
| `barrier` |||
3030

3131

3232
## Pytorch API Align
3333

34-
We recommend Anaconda as Python package management system. The following is the corresponding branches (tags) of `oneccl_bindings_for_pytorch` and supported Pytorch.
34+
We recommend using Anaconda as Python package management system. The followings are the corresponding branches (tags) of `oneccl_bindings_for_pytorch` and supported Pytorch.
3535

3636
| `torch` | `oneccl_bindings_for_pytorch` |
3737
| :-------------------------------------------------------------: | :-----------------------------------------------------------------------: |
3838
| `master` | `master` |
39-
| [v1.13.0](https://github.com/pytorch/pytorch/tree/v1.13.0) | [ccl_torch1.13.100](https://github.com/intel/torch-ccl/tree/ccl_torch1.13.100) |
40-
| [v1.13.0](https://github.com/pytorch/pytorch/tree/v1.13.0) | [ccl_torch1.13](https://github.com/intel/torch-ccl/tree/ccl_torch1.13) |
39+
| [v2.0.1](https://github.com/pytorch/pytorch/tree/v2.0.1) | [ccl_torch2.0.100](https://github.com/intel/torch-ccl/tree/ccl_torch2.0.100) |
40+
| [v1.13](https://github.com/pytorch/pytorch/tree/v1.13) | [ccl_torch1.13](https://github.com/intel/torch-ccl/tree/ccl_torch1.13) |
4141
| [v1.12.1](https://github.com/pytorch/pytorch/tree/v1.12.1) | [ccl_torch1.12.100](https://github.com/intel/torch-ccl/tree/ccl_torch1.12.100) |
4242
| [v1.12.0](https://github.com/pytorch/pytorch/tree/v1.12.0) | [ccl_torch1.12](https://github.com/intel/torch-ccl/tree/ccl_torch1.12) |
4343
| [v1.11.0](https://github.com/pytorch/pytorch/tree/v1.11.0) | [ccl_torch1.11](https://github.com/intel/torch-ccl/tree/ccl_torch1.11) |
@@ -46,34 +46,34 @@ We recommend Anaconda as Python package management system. The following is the
4646
| [v1.8.1](https://github.com/pytorch/pytorch/tree/v1.8.1) | [ccl_torch1.8](https://github.com/intel/torch-ccl/tree/ccl_torch1.8) |
4747
| [v1.7.1](https://github.com/pytorch/pytorch/tree/v1.7.1) | [ccl_torch1.7](https://github.com/intel/torch-ccl/tree/ccl_torch1.7) |
4848
| [v1.6.0](https://github.com/pytorch/pytorch/tree/v1.6.0) | [ccl_torch1.6](https://github.com/intel/torch-ccl/tree/ccl_torch1.6) |
49-
| [v1.5-rc3](https://github.com/pytorch/pytorch/tree/v1.5.0-rc3) | [beta09](https://github.com/intel/torch-ccl/tree/beta09) |
49+
| [v1.5-rc3](https://github.com/pytorch/pytorch/tree/v1.5.0-rc3) | [beta09](https://github.com/intel/torch-ccl/tree/beta09) |
5050

5151
The usage details can be found in the README of corresponding branch. The following part is about the usage of v1.9 tag. if you want to use other version of torch-ccl please checkout to that branch(tag). For pytorch-1.5.0-rc3, the [#PR28068](https://github.com/pytorch/pytorch/pull/28068) and [#PR32361](https://github.com/pytorch/pytorch/pull/32361) are need to dynamicall register external ProcessGroup and enable `alltoall` collective communication primitive. The patch file about these two PRs is in `patches` directory and you can use it directly.
5252

5353
## Requirements
5454

55-
- Python 3.6 or later and a C++17 compiler
55+
- Python 3.8 or later and a C++17 compiler
5656

57-
- PyTorch v1.13.0
57+
- PyTorch v2.0.1
5858

5959
## Build Option List
6060

6161
The following build options are supported in Intel® oneCCL Bindings for PyTorch*.
6262

6363
| Build Option | Default Value | Description |
64-
| :---------------------------------: | :------------: | :-------------------------------------------------------------------------------------------------: |
64+
| :---------------------------------- | :------------- | :-------------------------------------------------------------------------------------------------- |
6565
| COMPUTE_BACKEND | | Set oneCCL `COMPUTE_BACKEDN`,set to `dpcpp` and use DPC++ Compiler to enable support for Intel XPU |
66+
| USE_SYSTEM_ONECCL | OFF | Use oneCCL library in system |
6667
| CCL_PACKAGE_NAME | oneccl-bind-pt | Set Wheel Name |
6768
| ONECCL_BINDINGS_FOR_PYTORCH_BACKEND | cpu | Set BACKEND |
68-
| CCL_SHA_VERSION | False | Add git head sha version to Wheel name |
69-
| BUILD_NO_ONECCL_PACKAGE | False | Package the Wheel without oneCCL library |
69+
| CCL_SHA_VERSION | False | add git head sha version to Wheel name |
7070

71-
## Launch Option List
71+
## Lunch Option List
7272

73-
The following launch options are supported in Intel® oneCCL Bindings for PyTorch*.
73+
The following lunch options are supported in Intel® oneCCL Bindings for PyTorch*.
7474

75-
| Launch Option | Default Value | Description |
76-
| :--------------------------------------: | :-----------: | :-------------------------------------------------------------------: |
75+
| Lunch Option | Default Value | Description |
76+
| :--------------------------------------- | :------------ | :-------------------------------------------------------------------- |
7777
| ONECCL_BINDINGS_FOR_PYTORCH_ENV_VERBOSE | 0 | Set verbose level in ONECCL_BINDINGS_FOR_PYTORCH |
7878
| ONECCL_BINDINGS_FOR_PYTORCH_ENV_WAIT_GDB | 0 | Set 1 to force the oneccl_bindings_for_pytorch wait for GDB attaching |
7979

@@ -94,37 +94,54 @@ The following launch options are supported in Intel® oneCCL Bindings for PyTorc
9494
```bash
9595
# for CPU Backend Only
9696
python setup.py install
97-
# use DPC++ Compiler to enable support for Intel XPU
98-
BUILD_NO_ONECCL_PACKAGE=ON COMPUTE_BACKEND=dpcpp python setup.py install
97+
# for XPU Backend: use DPC++ Compiler to enable support for Intel XPU
98+
# build with oneCCL from third party
99+
COMPUTE_BACKEND=dpcpp python setup.py install
100+
# build without oneCCL
101+
export INTELONEAPIROOT=${HOME}/intel/oneapi
102+
USE_SYSTEM_ONECCL=ON COMPUTE_BACKEND=dpcpp python setup.py install
99103
```
100-
101-
**Note:** To run the torch-ccl without oneCCL library installed, Please make sure you have installed oneCCL in the oneAPI basekit from https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#base-kit
104+
105+
### Install PreBuilt Wheel
106+
107+
Wheel files are avaiable for the following Python versions.
108+
109+
| Extension Version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 | Python 3.11 |
110+
| :---------------: | :--------: | :--------: | :--------: | :--------: | :---------: | :---------: |
111+
| 2.0.100 | | |||||
112+
| 1.13 | ||||| |
113+
| 1.12.100 | ||||| |
114+
| 1.12.0 | ||||| |
115+
| 1.11.0 | ||||| |
116+
| 1.10.0 ||||| | |
102117

103118
```bash
104-
source $basekit_root/ccl/latest/env/vars.sh
119+
python -m pip install oneccl_bind_pt==2.0.100 -f https://developer.intel.com/ipex-whl-stable-xpu
105120
```
106121

107-
### Install PreBuilt Wheel
122+
### Runtime Dynamic Linking
108123

109-
Wheel files are avaiable for the following Python versions.
124+
- If oneccl_bindings_for_pytorch is built without oneCCL and use oneCCL in system, dynamic link oneCCl from oneAPI basekit (recommended usage):
125+
126+
```bash
127+
source $basekit_root/ccl/latest/env/vars.sh
128+
```
129+
130+
Note: Make sure you have installed [basekit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#base-kit) when using Intel® oneCCL Bindings for Pytorch\* on Intel® GPUs.
110131

111-
| Extension Version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 |
112-
| :---------------: | :--------: | :--------: | :--------: | :--------: | :---------: |
113-
| 1.13.100 | |||||
114-
| 1.13 | |||||
115-
| 1.12.100 | |||||
116-
| 1.12.0 | |||||
117-
| 1.11.0 | |||||
118-
| 1.10.0 ||||| |
132+
- If oneccl_bindings_for_pytorch is built with oneCCL from third party or installed from prebuilt wheel:
133+
Dynamic link oneCCL and Intel MPI libraries:
119134

120-
Installation for CPU:
121135
```bash
122-
python -m pip install oneccl_bind_pt==1.13 -f https://developer.intel.com/ipex-whl-stable-cpu
136+
source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl;print(torch_ccl.cwd)")/env/setvars.sh
123137
```
124-
Installation for GPU:
138+
139+
Dynamic link oneCCL only (not including Intel MPI):
140+
125141
```bash
126-
python -m pip install oneccl_bind_pt -f https://developer.intel.com/ipex-whl-stable-xpu
142+
source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl;print(torch_ccl.cwd)")/env/vars.sh
127143
```
144+
128145
## Usage
129146

130147
example.py
@@ -155,15 +172,12 @@ model = torch.nn.parallel.DistributedDataParallel(model, ...)
155172
...
156173
```
157174

158-
(oneccl_bindings_for_pytorch is installed along with the MPI tool set.)
175+
(oneccl_bindings_for_pytorch is built without oneCCL, use oneCCL and MPI(if needed) in system)
159176

160177
```bash
161-
162-
source <oneccl_bindings_for_pytorch_path>/env/setvars.sh
163-
164-
# eg:
165-
# $ oneccl_bindings_for_pytorch_path=$(python -c "from oneccl_bindings_for_pytorch import cwd; print(cwd)")
166-
# $ source $oneccl_bindings_for_pytorch_path/env/setvars.sh
178+
source $basekit_root/ccl/latest/env/vars.sh
179+
source $basekit_root/mpi/latest/env/vars.sh
180+
```
167181

168182
mpirun -n <N> -ppn <PPN> -f <hostfile> python example.py
169183
```
@@ -236,6 +250,10 @@ mpirun -n 2 -l python profiling.py
236250

237251
```
238252

253+
## Known Issues
254+
255+
For Point-to-point communication, directly call dist.send/recv after initializing the process group in launch script will trigger runtime error. Because all ranks of the group are expected to participate in this call to create communicators in our current implementation, while dist.send/recv only has a pair of ranks' participation. As a result, dist.send/recv should be used after collective call, which ensures all ranks' participation. The further solution for supporting directly call dist.send/recv after initializing the process group is still under investigation.
256+
239257
## License
240258

241259
[BSD License](https://github.com/intel/torch-ccl/blob/master/LICENSE)

0 commit comments

Comments
 (0)