You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+10-26
Original file line number
Diff line number
Diff line change
@@ -18,9 +18,9 @@ MSCCL++ is a development kit for implementing highly optimized distributed GPU a
18
18
19
19
***Runtime Performance Optimization for Dynamic Workload.** As we can easily implement flexible communication logics, we can optimize communication performance even during runtime. For example, we can implement the system to automatically choose different communication paths or different collective communication algorithms depending on the dynamic workload at runtime.
20
20
21
-
## Key Features (v0.2)
21
+
## Key Features (v0.3)
22
22
23
-
MSCCL++ v0.2 supports the following features.
23
+
MSCCL++ v0.3 supports the following features.
24
24
25
25
### In-Kernel Communication Interfaces
26
26
@@ -124,31 +124,15 @@ Customized proxies can be used for conducting a series of pre-defined data trans
124
124
125
125
Most of key components of MSCCL++ are designed to be easily customized. This enables MSCCL++ to easily adopt a new software / hardware technology and lets users implement algorithms optimized for their own use cases.
126
126
127
-
## Status & Roadmap
127
+
### New in MSCCL++ v0.3 (Latest Release)
128
+
* Updated interfaces
129
+
* Add Python bindings and interfaces
130
+
* Add Python unit tests
131
+
* Add more configurable parameters
132
+
* Add a new single-node AllReduce kernel
133
+
* Fix bugs
128
134
129
-
MSCCL++ is under active development and a part of its features will be added in a future release. The following describes key features of each version.
130
-
131
-
### MSCCL++ v0.4 (TBU)
132
-
* Automatic task scheduler
133
-
* Dynamic performance tuning
134
-
135
-
### MSCCL++ v0.3 (TBU)
136
-
* Tile-based communication: efficient transport of 2D data patches (tiles)
137
-
* GPU computation interfaces
138
-
139
-
### MSCCL++ v0.2 (Latest Release)
140
-
* Basic communication functionalities and new interfaces
141
-
- GPU-side communication interfaces
142
-
- Host-side helpers: bootstrap, communicator, and proxy
143
-
- Supports both NVLink and InfiniBand
144
-
- Supports both in-SM copy and DMA/RDMA
145
-
* Communication performance optimization
146
-
- Example code outperforms NCCL/MSCCL AllGather/AllReduce/AllToAll
147
-
* Development pipeline
148
-
* Documentation
149
-
150
-
### MSCCL++ v0.1
151
-
* Proof-of-concept, preliminary interfaces
135
+
See details from https://github.com/microsoft/mscclpp/issues/89.
0 commit comments