This repository was archived by the owner on Aug 7, 2024. It is now read-only.
Commit b099049
Combine amax reduction calls (#163)
Summary:
~~Add an option to combine the amax sync reduction~~ (Use combine-reduction as the default behavior)
- Combine the reduction call of each type amax scaling factor (totally 3 all_reduce calls). We can also further combine them into one single call.
- Verified other tests can still pass. So we don't need to change existing benchmark code.
- pytest test/test_base.py
- ./test/test_fsdp.sh
- Tested the new option using small llama models with 8 fsdp groups. Time taken by sync_float8_amax_and_scale_history reduced from 29ms[1] to 3ms[2].
[1] Traces without combine reduction, https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/trace.138932292910521.json.gz&bucket=acadia
[2] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/trace.202842416426594.json.gz&bucket=acadia
\* Trace[2] was updated after addressing the comments.
\*\* Need Meta internal access to open these traces.
Pull Request resolved: #163
Reviewed By: drisspg
Differential Revision: D52271595
Pulled By: y-sq
fbshipit-source-id: 65d27d32cb4d291dc6fbe62b7a916cf2e32e64821 parent c40de9b commit b099049
1 file changed
+53
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
90 | 102 | | |
91 | | - | |
| 103 | + | |
92 | 104 | | |
93 | 105 | | |
94 | 106 | | |
| |||
103 | 115 | | |
104 | 116 | | |
105 | 117 | | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
106 | 124 | | |
107 | 125 | | |
108 | 126 | | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
119 | 156 | | |
| 157 | + | |
| 158 | + | |
120 | 159 | | |
121 | 160 | | |
122 | 161 | | |
123 | 162 | | |
124 | | - | |
125 | | - | |
126 | | - | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
127 | 166 | | |
128 | 167 | | |
129 | 168 | | |
| |||
0 commit comments