Skip to content

Commit 118a165

Browse files
CaoEpytorchmergebot
authored andcommitted
[Inductor][CPP] Add transposed B matrix support for CppMicroGemmFP32Vec (pytorch#147068)
* Add transposed B support for CppMicroGemmFP32Vec. * Add support for cases where N is not divisible by `block_n`. Expand CppMicroGemmFP32Vec to generate gemm kernel that supports transposed B and N of arbitrary size. This is the basis for pytorch#147069 to get better performance. Pull Request resolved: pytorch#147068 Approved by: https://github.com/leslie-fang-intel, https://github.com/jansel, https://github.com/jgong5
1 parent 6a3a1f9 commit 118a165

File tree

2 files changed

+394
-11
lines changed

2 files changed

+394
-11
lines changed

aten/src/ATen/cpu/vec/vec256/vec256_16bit_float.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,12 @@ static_assert(
208208
return _mm256_loadu_si256(reinterpret_cast<const __m256i*>(ptr));
209209

210210
__at_align__ int16_t tmp_values[size()];
211+
#ifndef __msvc_cl__
212+
#pragma unroll
213+
#endif
214+
for (const auto i : c10::irange(count, size())) {
215+
tmp_values[i] = 0;
216+
}
211217
std::memcpy(tmp_values, ptr, count * sizeof(int16_t));
212218
return _mm256_loadu_si256(reinterpret_cast<const __m256i*>(tmp_values));
213219
}

0 commit comments

Comments
 (0)