[TorchComms] Update readme on cp and async tp #1924
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Remove them from known issues
cp
TEST_BACKEND=nccl TRAIN_FILE=torchtitan.experiments.torchcomms.train CONFIG_FILE="./torchtitan/models/llama3/train_configs/debug_model.toml" ./run_train.sh --parallelism.context_parallel_degree 2
[rank0]:[titan] 2025-10-21 11:50:24,918 - root - INFO - step: 1 loss: 8.2808 grad_norm: 1.4344 memory: 0.60GiB(0.63%) tps: 884 tflops: 0.06 mfu: 0.01%
[rank0]:[titan] 2025-10-21 11:50:24,918 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40
[rank0]:[titan] 2025-10-21 11:50:25,010 - root - INFO - step: 2 loss: 7.9722 grad_norm: 1.5075 memory: 0.65GiB(0.68%) tps: 89,241 tflops: 6.39 mfu: 0.65%
[rank0]:[titan] 2025-10-21 11:50:25,133 - root - INFO - step: 3 loss: 7.2171 grad_norm: 1.9729 memory: 0.67GiB(0.70%) tps: 66,834 tflops: 4.78 mfu: 0.48%
[rank0]:[titan] 2025-10-21 11:50:25,261 - root - INFO - step: 4 loss: 6.3402 grad_norm: 2.3603 memory: 0.67GiB(0.70%) tps: 64,239 tflops: 4.60 mfu: 0.46%
[rank0]:[titan] 2025-10-21 11:50:25,355 - root - INFO - step: 5 loss: 5.3055 grad_norm: 2.6067 memory: 0.67GiB(0.70%) tps: 87,354 tflops: 6.25 mfu: 0.63%
[rank0]:[titan] 2025-10-21 11:50:25,447 - root - INFO - step: 6 loss: 4.7225 grad_norm: 2.6398 memory: 0.67GiB(0.70%) tps: 89,556 tflops: 6.41 mfu: 0.65%
[rank0]:[titan] 2025-10-21 11:50:25,888 - root - INFO - step: 7 loss: 4.3229 grad_norm: 2.1676 memory: 0.67GiB(0.70%) tps: 18,607 tflops: 1.33 mfu: 0.13%
[rank0]:[titan] 2025-10-21 11:50:25,971 - root - INFO - step: 8 loss: 4.0035 grad_norm: 1.6869 memory: 0.67GiB(0.70%) tps: 98,424 tflops: 7.05 mfu: 0.71%
[rank0]:[titan] 2025-10-21 11:50:26,059 - root - INFO - step: 9 loss: 3.9520 grad_norm: 1.3777 memory: 0.67GiB(0.70%) tps: 93,815 tflops: 6.72 mfu: 0.68%
[rank0]:[titan] 2025-10-21 11:50:26,173 - root - INFO - step: 10 loss: 3.6494 grad_norm: 1.3774 memory: 0.67GiB(0.70%) tps: 72,228 tflops: 5.17 mfu: 0.52%
[rank0]:[titan] 2025-10-21 11:50:26,336 - root - INFO - Dumping profiler traces at step 10
[rank0]:[titan] 2025-10-21 11:50:26,372 - root - INFO - Finished dumping profiler traces in 0.04 seconds
[rank0]:[titan] 2025-10-21 11:50:26,373 - root - INFO - Dumping memory snapshot at step 10
async tp
