Skip to content

Commit 376cc1d

Browse files
authored
Refine README
2 parents 3e01fda + de706d7 commit 376cc1d

File tree

1 file changed

+14
-8
lines changed

1 file changed

+14
-8
lines changed

README.md

+14-8
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This repository holds PyTorch bindings maintained by Intel for the Intel® oneAP
66

77
[PyTorch](https://github.com/pytorch/pytorch) is an open-source machine learning framework.
88

9-
[Intel® oneCCL](https://github.com/oneapi-src/oneCCL) (collective communications library) is a library for efficient distributed deep learning training implementing such collectives like `allreduce`, `allgather`, `alltoall`. For more information on oneCCL, please refer to the [oneCCL documentation](https://spec.oneapi.com/versions/latest/elements/oneCCL/source/index.html) and [oneCCL specification](https://spec.oneapi.com/versions/latest/elements/oneCCL/source/index.html).
9+
[Intel® oneCCL](https://github.com/oneapi-src/oneCCL) (collective communications library) is a library for efficient distributed deep learning training implementing such collectives like `allreduce`, `allgather`, `alltoall`. For more information on oneCCL, please refer to the [oneCCL documentation](https://spec.oneapi.com/versions/latest/elements/oneCCL/source/index.html).
1010

1111
`oneccl_bindings_for_pytorch` module implements PyTorch C10D ProcessGroup API and can be dynamically loaded as external ProcessGroup and only works on Linux platform now.
1212

@@ -31,11 +31,12 @@ The table below shows which functions are available for use with CPU / Intel dGP
3131

3232
## Pytorch API Align
3333

34-
We recommend Anaconda as Python package management system. The following is the corresponding branches (tags) of `oneccl_bindings_for_pytorch` and supported Pytorch.
34+
We recommend using Anaconda as Python package management system. The followings are the corresponding branches (tags) of `oneccl_bindings_for_pytorch` and supported Pytorch.
3535

3636
| `torch` | `oneccl_bindings_for_pytorch` |
3737
| :-------------------------------------------------------------: | :-----------------------------------------------------------------------: |
3838
| `master` | `master` |
39+
| [v2.0.1](https://github.com/pytorch/pytorch/tree/v2.0.1) | [ccl_torch2.0.100](https://github.com/intel/torch-ccl/tree/ccl_torch2.0.100) |
3940
| [v1.13](https://github.com/pytorch/pytorch/tree/v1.13) | [ccl_torch1.13](https://github.com/intel/torch-ccl/tree/ccl_torch1.13) |
4041
| [v1.12.1](https://github.com/pytorch/pytorch/tree/v1.12.1) | [ccl_torch1.12.100](https://github.com/intel/torch-ccl/tree/ccl_torch1.12.100) |
4142
| [v1.12.0](https://github.com/pytorch/pytorch/tree/v1.12.0) | [ccl_torch1.12](https://github.com/intel/torch-ccl/tree/ccl_torch1.12) |
@@ -45,33 +46,34 @@ We recommend Anaconda as Python package management system. The following is the
4546
| [v1.8.1](https://github.com/pytorch/pytorch/tree/v1.8.1) | [ccl_torch1.8](https://github.com/intel/torch-ccl/tree/ccl_torch1.8) |
4647
| [v1.7.1](https://github.com/pytorch/pytorch/tree/v1.7.1) | [ccl_torch1.7](https://github.com/intel/torch-ccl/tree/ccl_torch1.7) |
4748
| [v1.6.0](https://github.com/pytorch/pytorch/tree/v1.6.0) | [ccl_torch1.6](https://github.com/intel/torch-ccl/tree/ccl_torch1.6) |
48-
| [v1.5-rc3](https://github.com/pytorch/pytorch/tree/v1.5.0-rc3) | [beta09](https://github.com/intel/torch-ccl/tree/beta09) |
49+
| [v1.5-rc3](https://github.com/pytorch/pytorch/tree/v1.5.0-rc3) | [beta09](https://github.com/intel/torch-ccl/tree/beta09) |
4950

5051
The usage details can be found in the README of corresponding branch. The following part is about the usage of v1.9 tag. if you want to use other version of torch-ccl please checkout to that branch(tag). For pytorch-1.5.0-rc3, the [#PR28068](https://github.com/pytorch/pytorch/pull/28068) and [#PR32361](https://github.com/pytorch/pytorch/pull/32361) are need to dynamicall register external ProcessGroup and enable `alltoall` collective communication primitive. The patch file about these two PRs is in `patches` directory and you can use it directly.
5152

5253
## Requirements
5354

54-
- Python 3.6 or later and a C++17 compiler
55+
- Python 3.8 or later and a C++17 compiler
5556

56-
- PyTorch v1.13.0
57+
- PyTorch v2.0.1
5758

5859
## Build Option List
5960

6061
The following build options are supported in Intel® oneCCL Bindings for PyTorch*.
6162

6263
| Build Option | Default Value | Description |
63-
| :---------------------------------: | :------------: | :-------------------------------------------------------------------------------------------------: |
64+
| :---------------------------------- | :------------- | :-------------------------------------------------------------------------------------------------- |
6465
| COMPUTE_BACKEND | | Set oneCCL `COMPUTE_BACKEDN`,set to `dpcpp` and use DPC++ Compiler to enable support for Intel XPU |
66+
| USE_SYSTEM_ONECCL | OFF | Use oneCCL library in system |
6567
| CCL_PACKAGE_NAME | oneccl-bind-pt | Set Wheel Name |
6668
| ONECCL_BINDINGS_FOR_PYTORCH_BACKEND | cpu | Set BACKEND |
67-
| CCL_SHA_VERSION | False |add git head sha version to Wheel name |
69+
| CCL_SHA_VERSION | False | add git head sha version to Wheel name |
6870

6971
## Lunch Option List
7072

7173
The following lunch options are supported in Intel® oneCCL Bindings for PyTorch*.
7274

7375
| Lunch Option | Default Value | Description |
74-
| :--------------------------------------: | :-----------: | :-------------------------------------------------------------------: |
76+
| :--------------------------------------- | :------------ | :-------------------------------------------------------------------- |
7577
| ONECCL_BINDINGS_FOR_PYTORCH_ENV_VERBOSE | 0 | Set verbose level in ONECCL_BINDINGS_FOR_PYTORCH |
7678
| ONECCL_BINDINGS_FOR_PYTORCH_ENV_WAIT_GDB | 0 | Set 1 to force the oneccl_bindings_for_pytorch wait for GDB attaching |
7779

@@ -248,6 +250,10 @@ mpirun -n 2 -l python profiling.py
248250

249251
```
250252

253+
## Known Issues
254+
255+
For Point-to-point communication, directly call dist.send/recv after initializing the process group in launch script will trigger runtime error. Because all ranks of the group are expected to participate in this call to create communicators in our current implementation, while dist.send/recv only has a pair of ranks' participation. As a result, dist.send/recv should be used after collective call, which ensures all ranks' participation. The further solution for supporting directly call dist.send/recv after initializing the process group is still under investigation.
256+
251257
## License
252258

253259
[BSD License](https://github.com/intel/torch-ccl/blob/master/LICENSE)

0 commit comments

Comments
 (0)