Alignas Array struct (pytorch#14920)

syed-ahmed · facebook-github-bot · commit a97cf568a4cf · 2018-12-10T17:58:03.000-08:00
Summary: This PR aligns the Array struct such that cuda vector performance improvements can be utilized. I tested this by using it on our Philox header. Note how the vector store instruction gets used for cuda vector types and when using alignas on Array, vs when not using alignas on Array. With cuda vector type (uint4, uint2, float4): https://godbolt.org/z/UaWOmR With alignas: https://godbolt.org/z/Eeh0t5 Without alignas: https://godbolt.org/z/QT63gq Pull Request resolved: pytorch#14920 Differential Revision: D13406751 Pulled By: soumith fbshipit-source-id: 685b1010ef1f576dde30c278b1e9b642f87c843d
diff --git a/aten/src/ATen/cuda/Array.h b/aten/src/ATen/cuda/Array.h
@@ -7,7 +7,11 @@
 namespace at { namespace cuda {
 
 template <typename T, int size>
+#ifndef __HIP_PLATFORM_HCC__
+struct alignas(16) Array {
+#else
 struct Array {
+#endif
   T data[size];
 
   C10_HOST_DEVICE T operator[](int i) const {