You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when we decompose all_reduce into reduce_scatter and all_gather, we arbitrarily select a tensor dimension to do the split along. We find which dimension evenly divides by the number of devices along the cluster_axis we are performing the CCL on, and if no such tensor dim exists, we throw an error. We should select the tensor dim that results in the best performance.
The text was updated successfully, but these errors were encountered:
Currently, when we decompose all_reduce into reduce_scatter and all_gather, we arbitrarily select a tensor dimension to do the split along. We find which dimension evenly divides by the number of devices along the cluster_axis we are performing the CCL on, and if no such tensor dim exists, we throw an error. We should select the tensor dim that results in the best performance.
The text was updated successfully, but these errors were encountered: